Saturday, July 21, 2012

SPSS is not dead


This blog was published recently showing that the use of R continues to grow in academia. One of the graphs (Figure 1) showed citations (using google scholar) of different statistical packages in academic papers (to which I have added annotations).
Figure 1: Citations of stats packages (from http://blog.revolutionanalytics.com/2012/04/rs-continued-growth-in-...)

At face value, this graph implies a very rapid decline in SPSS use since 2005. I sent a tongue in cheek tweet about this graph, and this perhaps got interpreted that I thought SPSS use was on the decline. So, I thought I’d write this blog. The thing about this graph is it deals with citations in academic papers. The majority of people do not cite the package they use to analyse their data, so this might just reflect a decline in people stating that they used SPSS in papers. Also, it might be that users of software such as R are becomming more inclined to cite the package to encourage others to use it (stats package preference does for some people mimic the kind of religious fervor that causes untold war and misery. Most packages have their pros and cons and some people should get a grip). Also, looking at my annotations on Figure 1 you can see that the decline in SPSS is in no way matched by an upsurge in the use of R/Stata/Systat. This gap implies some mysterious ghost package that everyone is suddenly using but is not included on this graph. Or perhaps people are just ditching SPSS for qualitative analysis or doing it by handJ
If you really want to look at the decline/increase of package use then there are other metrics you could use. This article details lots of them. For example you could look at how much people talk about packages online (Figure 2).
Figure 2: online talk of stats packages (Image from http://r4stats.com/popularity)

Based on this R seems very popular and SPSS less so. However, the trend for SPSS is completely stable between 2005-2010 (the period of decline in the Figure 1). Discussion of R is on the increase though. Again though you can’t really compare R and SPSS here because R is more difficult to use than SPSS (I doubt that this is simply my opinion, I reckon you could demonstrate empirically that the average user prefers the SPSS GUI to R’s command interface if you could be bothered). People are, therefore, more likely to seek help on discussion groups for R than they are for SPSS. It’s perhaps not an index of popularity so much as usability. 
There are various other interesting metrics discussed in the aforementioned article. Perhaps the closest we can get to an answer to package popularity (but not decline in use) is survey data on what tools people use for data mining. Figure 3 shows that people most frequently report R, SPSS and SAS. Of course this is a snapshot and doesn’t tell us about usage change. However, it shows that SPSS is still up there. I’m not sure what types of people were surveyed for this figure, but I suspect it was professional statisticians/business analysts rather than academics (who would probably not describe their main purpose as data mining). This would also explain the popularity of R, which is very popular amongst people who crunch numbers for a living.
Figure 3: Data mining/analytic tools reported in use on Rexer Analytics survey during 2009 (from http://r4stats.com/popularity).
To look at the decline or not of SPSS in academia what we really need is data about campus licenses over the past few years. There were mumblings about Universities switching from SPSS after IBM took over and botched the campus agreement, but I’m not sure how real those rumours were. In any case, the teething problems from the IBM take over seem to be over (at least most people have stopped moaning about them). Of course, we can’t get data on campus licenses because it’s sensitive data that IBM would be silly to put in the public domain. I strongly suspect campus agreements have not declined though. If they have, IBM will be doing all that they can (and they are an enormously successful company) to restore them because campus agreements are a huge part of SPSS’s business.
Also, I doubt campus agreements have declined because they will stop for two main reasons (1) SPSS isn’t used by anyone anymore, (2) the cost become prohibitive. These two reasons are related obviously – the point at which they stop the agreement will be a function of cost and campus usage. In terms of campus usage, If you grew up using SPSS as an undergraduate or postgraduate, you’re unlikely to switch software later in your academic career (unless you’re a geek like me who ‘enjoys’ learning R). So, I suspect the demand is still there. In terms of cost, as I said, I doubt IBM are daft enough to price themselves out of the market.
So, despite my tongue in cheek tweet, I very much doubt that there is a mass exodus from SPSS. Why would there be? Although some people tend to be a bit snooty about SPSS, it's a very good bit of software: A lot of what it does, it does very well. There are things I don’t like about it (graphs, lack of robust methods, their insistence on moving towards automated analysis), but there’s things I don’t like about R too. Nothing is perfect, but SPSS's user-friendly interface allows thousands of people who are terrified of stats to get into it and analyse data and, in my book, that's a very good thing.