Federal Reserve economists Andrew Chang and Phillip Li set about researching how many of the results published in top economics journals could be replicated - repeating the study and finding the same results.
They looked at 67 papers in 13 reputable academic journals.
The result is shocking: Without the help of the authors, only a third of the results could be independently replicated by the researchers. Even with their help, only about half, or 49%, could.
That leads the researchers to a pretty blunt conclusion: "Because we are able to replicate less than half of the papers in our sample even with help from the authors, we assert that economics research is usually not replicable."
They're not using ancient stuff, either. The papers were all published from 2008 to 2013.
Astonishingly, 49% is actually quite a high rate compared with those of similar studies. One investigation in 2006 looked at more than 100 articles in the Journal of Money, Credit and Banking, and found that only 8% could be replicated.
The replication crisis is an awkward subject in the field of psychology already. In August, a new study suggested that of 100 published pieces in top psychology journals, only 36% were replicable.
It's a pretty massive issue for economics, especially given the impact that the subject has on public policy. Li and Chang use a well-known paper by Carmen Reinhart and Ken Rogoff as an example. The study showed a significant growth drop-off once a country's national debts reached 90% of gross domestic product, but three years after being published the study was found to contain a significant Microsoft Excel error that changed the magnitude of the effect.
Major political figures including the EU's Olli Rehn and theChang and Li said 76% of the economics papers were replicable when the code and data used to find the result were provided to the people replicating the results. Excluding those for which data is not available, however, the 67 original papers are cut down to just 39.
The investigators even argue that they're being extremely generous with their definition of a successful replication:
If the paper estimates a fiscal multiplier for GDP of 2.0, then any multiplier greater than 1.0 would produce the same qualitative result (i.e., there is a positive multiplier effect and government spending is not merely a transfer or crowding out private investment). We define success using this extremely loose definition to get an upper bound on what the replication success rate could potentially be.
It's clear that some of the problem is due to bad data storage, disorganisation, and the different varieties of software used in the studies, rather than errors - but it's not a glowing review for the subject.
All in all, it's a pretty dismal result for the dismal