More Low-reproduction-rate Science

23andme and other personal genome services make predictions based on research identifying correlations between particular genes and disease or other traits. However, most published studies in the area can’t (yet) be reproduced:

It asks (a) if initial findings have been repeatable and (b) how much we should trust the repetition attempts. To answer the first question, they found that only a third (10 of 37) of initial findings were repeated when tested a second time. If things were working well, all of the initial findings would have been repeatable. The low replication rate doesn’t mean that two-thirds of the initial findings were false. Perhaps the replication attempts were poorly done and allof the initial findings would have held up if they were better done (e.g., larger samples). Or perhaps the replication attempts were biased toward positive results and none of the initial findings would have held up if they were better done.

The review paper also found that positive replication attempts had much smaller samples (median sample size about 150) than negative replication attempts (median sample size about 380). This suggests that the negative replication attempts are more trustworthy than the positive ones. The true replication rate is probably lower than one-third.


23andme does a decent job of classifying the strength of evidence, but they probably don’t account for the poor replication rate, which may be due to data-mining effects (not fixing hypotheses in advance, or doing proper multi-hypothesis significance testing). Failure to do this is little better than fraud, of course. If you lie about the story that lead to your having published a model fit to some data, then that’s nearly as bad as completely inventing the data. see also