Friday, November 19, 2010

Rates of Scientific Fraud Retractions

ResearchBlogging.orgIvan Oransky on his Retraction Watch blog pointed to a paper by R. Grant Steen looking at numbers of retraction and whether they were due to fraud or error. Ivan pointed to a news item on The Great Beyond by Richard Van Noorden looking at one slightly surprising claim in the paper:"American scientists are significantly more prone to engage in data fabrication or falsification than scientists from other countries". Van Noorden looked at the data in a bit more detail and wasn't convinced, but didn't fully run the numbers. So I thought I would.

Here's the relevant data. The numbers of retractions due to error, fraud, and Unknown are from the original paper (extracted from PubMed for 2000 to 2009, and categorised by Steen). Some of the total publication data is from The Great Beyond: I extracted the missing total publication data (using the same webpage as Van Noorden). I have also combined the "Asia" and "Other" categories, because I wasn't going to go through and get the data for every Asian country.
(sorry for the very large space that follows)


















CountryErrorFraudUnknownTotal Publications
USA1698471819543
China60209185786
Japan41181377976
India2717695718
UK3672350760
S Korea278390052
Germany2230294164
Australia1331131826
Canada1520194777
Italy1160201922
Turkey132072615
France1210181318
Greece100237094
Iran91119696
Others23588332057808


Steen, in the original paper, reported the main country comparisons like this:

The results of this study show unequivocally that scientists in the USA are responsible for more retracted papers than any other country (table 3). These results suggest that American scientists are significantly more prone to engage in data fabrication or falsification than scientists from other countries. There was no evidence to support a contention that papers submitted from China or other Asian nations and indexed in PubMed are more likely to be fraudulent.

We can see that the first sentence is true: the US produced the most retracted papers. But (as Van Noorden noted), they also produce more papers than most countries, so the others may not be. Steen apparently tried to remove this effect by normalising by the number of papers retracted due to error. If scientists produce papers retractable due to error at a constant rate, then this could be a nice correction, as it would (under a few more assumptions) factor out the rate of reporting retractable papers. But there are some big assumptions in there.

Van Noorden calculated the rate of retraction per paper for the top 7 countries, and came to this conclusion:

But this does not mean that any US scientist is more likely to engage in data fraud than a researcher from another country. Indeed, a check on PubMed publications versus retractions for frauds suggests that s/he may be less likely to do so (though the statistical significance of this finding has not yet been tested).

So, time to answer the question of statistical significance. The statistical analysis is fairly simple (here is the R code, if you want it): the next paragraph gives the gory details so if you want, skip it.

Basically, I assume that each paper has a probability of being retracted, and it is constant for every paper from a country. Because the probabilities are so small, it is convenient to treat the number of retractions as a count (i.e. Poisson distributed), with a rate proportional to the total number of papers (technically, this means using the log of the number of papers as an offset). I then use a Poisson regression, which models the rate of retraction on the log scale.

It*s convenient to plot the results in figures. These are the estimates of the log rate of retraction, with standard errors. First for errors:

RetractionErrors.png

The dotted line is the mean rate over all countries. We can see that the US has a comparatively low error rate, indeed the "western" countries (I'm including Japan in this) tend to have lower rates of retraction due to error. The fraud results are different:

RetractionFraud.png

The line for Greece is because it didn't have any errors (the point estimate is -∞ and the estimated standard errors are pretty big too): that can be ignored. We can see that the US has a slightly higher estimated rate of retraction due to fraud, which corresponds to about 30% more fraud per paper than average. But China and India have higher rates of retraction due to fraud than the US (and p-value fans will be happy to know that they are both statistically significance, with lots of stars to make you happy). China has about 3 times as many fraud retractions per paper as average, and India 5 times as many.

What does this mean for fraud and dishonesty? It may not mean that Indian scientists are more dishonest: it may be that they are no more or less honest than anyone else, just they they are caught more often and made to retract. I'll let others debate that: I have weak opinions, but no more data to back these up.

But Richard Van Noorden was right in his conclusions: the US doesn't produce the papers most likely to be retracted because of fraud. More generally, one should normalise by the right thing - and also be careful about what you're actually measuring: it may not be what you want to measure (here it's not the rate of fraud but the rate of retraction because of fraud).

Reference

No comments:

Post a Comment