"Our article reported linear mixed models showing interactive effects of testosterone level and perceived makeup attractiveness on women’s makeup preferences. These models did not include random slopes for the term perceived makeup attractiveness, and we have now learned that the Type 1 error rate can be inflated when by-subject random slopes are not included (Barr, Levy, Scheepers, & Tily, 2013). Because the interactions were not significant in reanalyses that addressed this issue, we are retracting this article from the journal."
The purpose of this blog is to explain why I believe (other things being equal) the authors should have published a correction and not retracted the article. Much of what I think isn't specific to this example, it just happens to have been triggered by it.
To retract ...
I assume that the authors' decision to retract is motivated by a desire to rid the world of false knowledge. By retracting, the original paper is removed from the universe thus reducing the risk of 'false knowledge' on this topic spreading. A correction would not minimise the risk that the original article was cited or followed up by other researchers unless it was sort of tagged onto the end of the paper. If a correction appears as a separate paper then it may well be overlooked. However, I think this is largely a pragmatic issue for the publishers to sort out: just make it impossible for someone to get the original paper without also getting the correction. Job done.
To not retract ...
If you read the full account of the retraction, the authors fitted a model, published the details of that model in the supplementary information with the paper and then posted their data the Open Science Framework for others to use. They have been completely transparent. Someone else re-analysed the data and included the aforementioned random slope, and alerted the authors to the differences in the model (notably this crucial interaction term). The authors retracted the paper. I would argue that a correction would be better for the following reasons.
Undeserved repetitional damage
One of the things that really bugs me about science these days (especially psychology) is the witch-hunt-y-ness of it (yes, that's not a real word). Scientists happily going about their business with good intentions, make bad decisions and suddenly everyone is after their head. This is evidenced by the editor feeling the need to make this comment in the retraction: "I would like to add an explicit statement that there is every indication that this retraction is entirely due to an honest mistake on the part of the authors." The editor is attempting damage limitation for the authors.
The trouble is that retractions come with baggage ranging from 'the scientists don't know what they're doing' (at best) to hints that they have deliberately misled everyone for their own gain. This baggage is unnecessary. Don't get me wrong, I've seen people do terrible things with data (in the vast majority of cases out of ignorance, not deceit) and I'm convinced that the incentive structures in academia are all wrong (quantity is valued over quality), but deep down I still like to believe that scientists care about science/knowledge. Given how open they have been with their analysis and data, these scientists strike me as being people who care about science They are to be applauded for their openness, and not burdened with the baggage of retraction. A correction would have better reflected their honesty and integrity.
Retraction implies there is one correct way to model the data
Retracting the paper implies 'we did it wrong'. Did the authors analyse their data incorrectly though? Here's some food for thought. Raphael Silberzahn and colleagues published a paper in which they gave the same research question and the same data set to 29 research teams and examined how they addressed the question (there is an overview of the paper here, and the paper itself is available here). Essentially they found a lot of variability in what statistical models were applied to answer the question including tobit regression, logistic regression (sometimes multilevel, sometimes not), poisson regression (sometimes multilevel, sometimes not), spearman's correlation, OLS regression, WLS regression, Bayesian logistic regression (sometimes multilevel, sometimes not). You get the gist. The resulting odds ratios for the effect ranged from 0.89 to 2.93 (although all but 2 were > 1). Confidence intervals for these odds ratios ranged in width quite widely. The positive thing was that if you look at Figure 1 in the paper, despite variation in the models applied, there was a fair bit of consensus in the odds ratio and confidence intervals produced (about half of the point estimates/CIs - the ones from team 26 to team 9 - line up pretty well despite the varying models applied). However, it goes to show that give a data set and a question to 29 research teams and they will analyse it differently. Is there one correct model? Are 28 teams wrong and 1 team correct. No, data analysis is always about decisions, and although there can be unequivocally wrong decisions, there is rarely only one correct decision.
So, Fisher and colleagues didn't include a random slope, someone else did. This change in model specification affected the model parameters and p-values. Is the inclusion of the random slope any more correct than it's exclusion? That's somewhat a matter of opinion. Of course, its exclusion could have led to a Type I error (if you fixate on p-values), but the more interesting question is why it changes the model, how it changes it and what the implications are moving forward. The message (for me) from the Silberzahn paper is that if any of us let other scientists loose with our data, they would probably do different things with it that would affect the model parameters and p-values. Just as has happened here. The logic of this particular retraction is that every scientist should retract every paper they have ever published on the grounds that there were probably other models that could have been fit, and if they had been then the parameter estimates in the paper would be different. A correction (rather than retraction) would have allowed readers and researcher in this field to consider the findings in the light of the difference that the random slope makes to the model.
So, Fisher and colleagues didn't include a random slope, someone else did. This change in model specification affected the model parameters and p-values. Is the inclusion of the random slope any more correct than it's exclusion? That's somewhat a matter of opinion. Of course, its exclusion could have led to a Type I error (if you fixate on p-values), but the more interesting question is why it changes the model, how it changes it and what the implications are moving forward. The message (for me) from the Silberzahn paper is that if any of us let other scientists loose with our data, they would probably do different things with it that would affect the model parameters and p-values. Just as has happened here. The logic of this particular retraction is that every scientist should retract every paper they have ever published on the grounds that there were probably other models that could have been fit, and if they had been then the parameter estimates in the paper would be different. A correction (rather than retraction) would have allowed readers and researcher in this field to consider the findings in the light of the difference that the random slope makes to the model.
Retraction devalues the original study
Here's how science works. People generate theories, they transform them into testable hypotheses, they collect data, they evaluate the evidence for their hypothesis. Then other people get interested in the same theory and collect more data and this adds to the collective knowledge on the topic. Sometimes the new data contradicts the old data, in which case people update their beliefs. We do not, however, retract all of the old papers because this new one has thrown up different evidence. That would be silly, and yet I think that is all that has happened here with Fisher et al.'s paper. They fitted one model to the data, drew some conclusions, then someone else moved forward with the data and found something different. Retraction implies that the original study was of no value whatsoever, it must be hidden away never to be seen. Regardless of how you analyse the data, if the study was methodologically sound (I don't know if it was - I can't access it because its been retracted) then it adds value to the research question irrespective of the significance of an interaction in the model. A retraction removes this knowledge from the world, it becomes a file drawer paper rather than information that is out in the open. We are deprived of the evidence within the paper (including how that evidence changes depending on what model you fit to the data). A correction allows this evidence to remain public, and better still updates that evidence in the light of new analysis in useful ways ...Retraction provides the least useful information about the research question
By retracting this study we are none the wiser about the hypothesis. All we know is that a p-value that was below < .05 flipped to the other side of that arbitrary threshold when a random slope was included in the model. It could have changed from .049 to .051 for all we know, in which case the associated parameter has most likely not really changed much at all. It might have changed from .00000000001 to .99, in which case the impact has been more dramatic. A retraction deprives us of this information. In a correction, the authors could present the new model, its parameters and confidence intervals (incidentally, on the topic of which I recommend Richard Morey's recent paper) and we could see how things have changed as a result of including the random slope. A correction provides us with specific and detailed evidence with which we can update our beliefs from the original paper. A correction allows the reader to determine what they believe. A retraction provides minimal and (I'd argue) unhelpful information about how the model changed, and about how to update our beliefs about the research question. All we are left with is to throw the baby out with the bathwater and pretend, like the authors, that the study never happened. If the methods were sound, then the study is valuable, and the new analysis is not damning but simply sheds new light on the hypotheses being tested. A retraction tells us little of any use.
Where to go ...
What this example highlights to me is how science needs to change, and how the publication process also needs to change. Science moves forward through debate, through people challenging ideas, and this is a good example. If the paper were in a PLoS style journal that encourages debate/comment then a retraction would not have been necessary. Instead, the models and conclusions could have been updated for all to see, the authors could have updated their conclusions based on these new analyses, and knowledge on the topic would have advanced. You'd end up with a healthy debate instead of evidence being buried. One of the challenges of open science and the OSF is to convince scientists that by making their data public they are not going to end up in these situations where they are being witch hunted, or pressured into retractions. Instead, we need to embrace systems that allow us to present different angles on the same data, to debate conclusions, and to strive for truth by looking at the same data from different perspectives ... and for none of that to be perceived as a bad thing. Science will be better for it.
References
Fisher, C. I., Hahn, A. C., DeBruine, L. M., & Jones, B. C. (2015). Women’s preference for attractive makeup tracks changes in their salivary testosterone. Psychological Science, 26, 1958–1964. doi:10.1177/0956797615609900