開放原始碼的R語言在資料分析上的知名度及使用率近年已經大幅增加。以下只截取報告中最重要的部份。全文請見:http://r4stats.com/articles/popularity/
Surveys of Use
One way to estimate the relative popularity of data analysis software is though a survey. Rexer Analytics does a survey each year asking about tools used for data mining. The difference between software for classical data analysis software and data mining seems like more of a marketing concept than one based on any actual difference in analytic need. Figure 3 shows the results of just one “check all that apply” type question about the tools that respondents reported using in 2009 (the survey was taken in 2010).
Figure 3. Data mining/analytic tools reported in use on Rexer Analytics survey during 2009.
We see that R comes out on top, followed by SAS and SPSS. The entire report contained over 40 questions on topics such as algorithms used, fields, challenges, data, impact of the economy on the field, and more. More comprehensive results are available here. It’s interesting to note that SPSS and SAS are used more often than their more expensive products aimed specifically at data mining, SPSS IBM Modeler (formerly Clementine) and SAS Enterprise Miner. This data is two years old now and due to be updated soon.
The results of a similar survey done by the data mining web site KDnuggets in 2012 are shown in Figure 4. This one shows R in first place with 30.7% of users reporting having used it for a real project. Excel is almost as popular. It seems out of place among so many more capable packages, but Excel is a tool that almost everyone has and knows how to use.
It’s interesting to note that four of the top five packages used were open source. While open source packages are clearly playing a major role in analytics, people still reported using more commercial software (1086) than open source (927).
Figure 4. Percent of KDnuggets survey respondents that reported using software for analytics, data mining or big data project for 12 months prior to May 2012.