A conversation - Where do ecology and evolution stand in the broader ‘reproducibility crisis’ of science?

By Tim Parker | January 31, 2018

[This post has been originally posted on ecoevotransparency.org]

In this post, I float some ideas that I’ve had about the ‘reproducibility crisis’ as it is emerging in ecology and evolutionary biology, and how this emergence may or may not differ from what is happening in other disciplines, in particular psychology. Two other experts on this topic (Fiona Fidler and David Mellor) respond to my ideas, and propose some different ideas as well. This process has led me to reject some of the ideas I proposed, and has led me to what I think is a better understanding of the similarities (and differences) among disciplines.

Here’s why my co-authors are experts on this topic (more so than I am):

Fiona’s PhD thesis was about explaining disciplinary differences between psychology, ecology and medicine in their responses to criticism of null hypothesis significance testing, and she’s been interacting with researchers from multiple disciplines for 20 years. She often works closely with ecologists, but she has the benefit of an outsider’s perspective.

David has a PhD in behavioral ecology, and now works at the Center for Open Science, interacting on a daily basis with researchers from a wide range of disciplines as journals and other institutions adopt transparency standards.

TP: Several years ago, Shinichi Nakagawa and I wrote a short opinion piece arguing that ecology and evolutionary biology should look to other disciplines for ideas to reduce bias and improve the reliability of our published literature. We had become convinced that bias was common in the literature. Evidence of bias was stacking up in other disciplines as well, and the risk factors in those discipline seemed widespread in ecology and evolution. People in those other disciplines were responding with action. In psychology, these actions included new editorial policies in journals, and major efforts to directly assess reproducibility with large-scale replications of published studies. Shinichi and I were hoping to see something similar happen in ecology and evolutionary biology.

To an important extent ecologists and evolutionary biologists have begun to realize there is a problem, and they have started taking action. Back in 2010, several journals announced they would require authors to publicly archive the data behind their reported results. This wasn’t a direct response to concerns about bias, but it was an important step towards ecologists and evolutionary biologists accepting the importance of transparency. In 2015 representatives from about 30 major journals in ecology and evolutionary biology joined advocates for increased transparency to discuss strategies for reducing bias. From this workshop emerged a consensus that the recently-introduced TOP (Transparency and Openness Promotion) guidelines would be a practical way to help eco-evo journals implement transparency standards. Another outcome was TTEE (Tools for Transparency in Ecology and Evolution), which were designed to help journals in ecology and evolutionary biology implement TOP guidelines. A number of journals published editorials stating their commitment to TOP. Many of these journals have now also updated their editorial policies and instructions to authors to match their stated commitments to transparency. A few pioneering journals, such as Conservation Biology, have instituted more dramatic changes to ensure, to the extent possible, that authors are fully transparent regarding their reporting. A handful of other papers have also been published, reviewing evidence of bias or making recommendations for individual or institutional action.

Despite this long list of steps towards transparency, it seems to me that the groundswell seen in psychology has not yet transpired in ecology and evolution. For instance, only one ecology or evolution journal (BMC Ecology) has yet adopted registered reports (the most rigorous way to reduce bias on the part of both authors and journals), and there has been only one attempt to pursue a major multi-study replication effort and it has not yet gained major funding.

FF: At this point I feel the need to add that what Tim wrote above does describe an incredible amount of action in a short time in the disciplines of ecology and evolution. It might be harder to see the change when you’ve made it yourself 🙂.

TP: I agree that there have been important changes, but it seems to me that many ecologists and evolutionary biologists remain unconvinced or unaware of the types of problems that led Shinichi and me to try to kick start this movement in the first place. A few months ago the Dynamic Ecology blog conducted an informal survey asking “What kind of scientific crisis is the field of ecology having?” Only about a quarter of those voting were convinced that ecology was having a crisis, and only about 40% of respondents thought a reproducibility crisis was the sort of crisis ecology was having or was most likely to have in the future. So, ecologists (at least those who fill out surveys on the Dynamic Ecology blog), aren’t convinced there is a crisis, and even if there is a crisis, they’re not convinced that it’s in the form of the ‘reproducibility crisis’ discussed so much recently in psychology, medicine, economics, and some other disciplines. Of course not everyone in psychology thinks there’s a crisis either, but my sense is that the notion of a crisis is much more widely accepted there.

So why aren’t ecologists and evolutionary biologists more concerned? We’ve got the risk factors for a reproducibility crisis in abundance. What’s different about perceptions in ecology and evolutionary biology? I don’t claim to know, but I entertain several hypotheses below.

It seems highly plausible to me that many in ecology and evolution have simply not seen or appreciated the evidence needed to convince them that there is a problem. In psychology, one of the catalysts of the ‘crisis’ was the publication of an article in a respected journal claiming to have evidence that people could see into the future. The unintended outcome of this article, the conclusions of which were largely rejected by the field, was that many researchers in psychology realized that false results could emerge from standard research practices, and this was unsettling to many. In ecology and evolution, we haven’t experienced this sort wake-up call.

DM: I think that was a huge wake-up call, that something so unlikely could be presented with the same standard techniques that every study used. In eco/evo, the inherent plausibility (dare I say, our priors), may more difficult to judge, so a wild claim presented with flimsy evidence is not as easily spotted as being so wild.

However, I think a major underlying cause is the lack of value given to direct replication studies. Direct replications are the sad workhorse of science: they’re the best way to judge the credibility of a finding but virtually no credit is given for conducting them (and good luck trying to get one funded!). I think that a subset of psychological research was fairly easy to replicate using inexpensive study designs (e.g., undergraduate or online research participants), and so some wild findings were somewhat easy to check with new data collection.

In ecology, there are certainly some datasets that can be fairly easily re-collected, but maybe not as many. Furthermore, I sense that ecologists have an easier time attributing a “failure to replicate” to either 1) as of yet unknown moderating variables or 2) simple environmental change (in field studies). So the skepticism may be less sharp on published claims.

FF: At the moment, my research group is analysing data from a survey we did of over 400 ecology and evolution researchers, asking what they think about the role of replication in science. So far our results suggest that the vast majority of researchers think replication is very important. We’ve been a bit surprised by the results. We were expecting many more researchers to be dismissive of direct replication in particular, or to argue that it wasn’t possible or applicable in ecology. But in our survey sample, that wasn’t a mainstream view. Of course, it’s hard to reconcile this with its virtual non-existence of direct replication in the literature. We can really only explain the discrepancy by appealing to institutional (e.g., editorial and grant policies) and cultural norms (e.g., what we believe gets us promoted). In ecology, neither have been broken to the extent that they have in psychology, despite individual researchers having sound intuitions about the importance of replication.

TP: Another possibility to explain why so many ecologists and evolutionary biologists remain unconvinced that there is a replication crisis is that bias may actually be less widespread in ecology and evolutionary biology than in psychology. Let me be clear. The evidence that bias is a serious problem in ecology and evolutionary biology is compelling. However, this bias may be less intense on average than in psychology, and it may be that bias varies more among sub-disciplines within eco-evo, so there may be some ecologists and evolutionary biologists who can, with good reason, be confident in the conclusions drawn in their subdiscipline.

FF: Hmm, I think it’s more likely that psychologists are simply more accepting that bias is a real thing that’s everywhere, because they are psychologists and many study bias as their day job.

TP: OK, I buy that psychologists may be more open to the existence of bias because its one of the things psychologists study. However, I’d like to at least consider some possibilities of differences in bias and some other differences in perception of bias.

For instance, maybe in subdisciplines where researchers begin with strong a priori hypotheses, they are more likely to use their ‘researcher degrees of freedom’ to explore their data until they find patterns consistent with their hypothesis. This is a seriously ironic possibility, but one I’ve warmed to. The relevant flip side to this is that many researchers in ecology and evolution (though I think more often in ecology) often conduct exploratory studies where they have no reason to expect or hope for one result over another, and readily acknowledge the absence of strong a priori hypotheses. This could lead to less bias in reporting, therefore greater reliability of literature, and more of a sense that the literature is reliable. I should point out, though, that bias can still emerge in the absence of a priori hypotheses if researchers are not transparent about the full set of analyses they conduct, and I know this happens at least some of the time.

FF: So there are two claims. First, that if you have strong a priori hypotheses you might be more likely to use researcher degrees of freedom. This certainly seems plausible. You really want your hypotheses to be true, so you’re more inclined to make it so. Second, researchers in ecology and evolution are less likely to have strong a priori hypotheses than researchers in psychology. The latter is a disciplinary difference I just don’t see, but it’s an empirical question. It’s a great sociology of science question.

TP: Well,I like empirical questions, and I’d certainly like to know the answer to that one.

Moving on to throw out yet another hypothesis, it is my relatively uniformed perception that there is probably much more heterogeneity in methods across ecology and evolutionary biology than across psychology. If some methods present fewer ‘researcher degrees of freedom’, then bias may be less likely in some cases.

FF: This reminds me of older attempts to demonstrate grand differences between the disciplines. For example, there’s a common perception that the difference between hard and soft sciences is that physics etc are more cumulative than psychology and behavioural sciences. But attempts to pin this down, like this one from Larry Hedges, shows there are more similarities than differences. I’m generally pretty skeptical about attributing differences in research practice to inherent properties of what we study. They usually turn out to be explained by more mundane institutional and social factors.

TP: Well, this next idea is subject to the same critique, but I’ll present it anyway. Statistical methods may be much more heterogeneous across sub-disciplines, and even across studies within subdisciplines of ecology and evolution. This could mean that some researchers are conducting analyses in ways that are actually less susceptible to bias. It could also mean that researchers fail to recognize the risks of bias in whatever method they are using because they focus on the differences between their method and other more widespread methods. In other words, many ecologists and evolutionary biologists may believe that they are not at risk of bias, even if they are.

FF: If you look at very particular sub-fields you may well find differences, but my bet is these can be explained by the cultural norms of a small group of individuals (e.g., the practices in particular labs that have a shared academic lineage).

TP: There certainly are some sub-disciplines where a given stats practice has become the norm, such as demographers studying patterns of survival by comparing and averaging candidate models using AIC and the ‘information theoretic’ approach. I’m not prepared to say how common this sort of sub-field standardization is, however.

Again, on to another hypotheses. Some ecologists and evolutionary biologists test hypotheses that are likely to be true, and some test hypotheses that are unlikely to be true. It is not widely recognized, but it is easily shown that testing unlikely hypotheses leads to a much higher proportion of observed relationships being due to chance (when real signal is rare, most patterns are just due to noise). It may be that unlikely hypotheses are more common in psychology, and thus their false positive rate is higher on average than what we experience in ecology and evolutionary biology. I strongly suspect that the likelihood of hypotheses varies a good bit across ecology and evolutionary biology, but certainly if you’re in a subdiscipline that mostly tests likely hypotheses, it would be reasonable to have more confidence in that published literature.

FF: I don’t really know what to say about this. It could be that better researchers test more hypotheses that are likely. Or maybe not. Maybe crummy researchers do, because they just go for low-hanging fruit. I concede that the a prior likelihood of a hypothesis being true would definitely be be correlated to something, but not that it would be a property of a discipline.

TP: Well, I’m not quite done with my ‘property of a discipline’ hypotheses, so here’s another. In some subfields of psychology, conducting a publishable study requires substantially less work than in many subfields of ecology and evolutionary biology. For instance, as David mentioned earlier, papers in psychology are sometimes based on answers to a few hundred surveys administered to undergraduate students (a resource that’s not in short supply in a university). If studies are easy to come by, then opting not to publish (leaving a result in the proverbial file drawer) is much cheaper. In eco/evo, gathering a comparable amount of data might take years and lots of money, so it’s not so easy to just abandon an ‘uninteresting’ result and go out and gather new data instead.

FF: It’s not clear to me how big the file drawer problem is in any discipline. To be clear, I’m not saying publication bias isn’t a problem. We know it is. But are whole studies are really left in file drawers, or are they cherry picked and p-hacked back into the literature? There is a little less publication bias in ecology (~74% of papers publish ‘positive’ results compared to psychology’s ~92%) but there is probably also slightly lower statistical power. Tim’s explanation is not implausible, but I doubt we currently have enough evidence to say either way.

TP: As David mentioned briefly above, in ecology and evolutionary biology, dramatic differences among study systems (different species, different ecosystems, even stochastic or directional change over time in the ‘same’ system) make it easy to believe that differences in results among studies are due to meaningful biological differences among these studies. It seems that we do not take the inevitability of sampling error seriously, and thus rarely seriously consider the fact than many reported findings will be wrong (even WITHOUT the bias that we know is there and that should be elevating the rate of incorrect findings).

DM: This is related to the fact that in ecology and evolutionary biology, there’s no culture of direct replication. If most studies are conducted just once, there’s no reliable way to assess their credibility. If a study is replicated, it’s usually couched as a conceptual replication with known differences in the study. That new twist is the intellectual progeny of the author. If the results aren’t the same as the original, chalk it up to whatever those differences were. However, direct replications, where the expectation is for similar results, are the best way to assess credibility empirically.

This lack of direct replication has led to plausible deniability that there is any problem. And since there is no perceived problem, there is no need to empirically look for a problem (only a real troublemaker would do that!).

TP: We are clearly in agreement here, David. Now we just need to figure out how to establish some better institutional incentives for replication.

While we’re planning that, I’ll throw out my last hypothesis, which if right, would mean that all my other hypotheses were largely unnecessary. Psychology is a much larger discipline than ecology and evolutionary biology. Because of this, it may be that the number of people actively working to promote transparency in psychology is larger overall, but is a similar proportion to the number working in ecology and evolutionary biology.

FF: This seems very likely to me, and also something we should calculate sometime.

What I found in my PhD research on attempts to reform statistical practices through the 1970s-2000s (i.e., to get rid of Null Hypothesis Significance Testing) was the medicine banned it (and it snuck back in), psychology showed some progress, and ecology was behind at that time. But almost all disciplinary differences turn out to be institutional and social/cultural, rather than an inherent property of studying that particular science.

This scientific reform about reproducibility differs from the NHST one because the main players are much more aware of best behaviour change practices. The NHST reform was lead by cranky old men (almost exclusively!) writing cranky articles that often insulted researchers intelligence and motives. This new reform has by and large been led by people who know how to motivate change. (There are some early exceptions here.) Psychologists should be ahead of this game, given their core business.

DM: I think psychologists are certainly aware of bias, but ecologists are too. I suspect that a missing element is one of those outstanding claims that deserves to be checked. Results that seem “too good to be true” probably are, and identifying those will likely be the first step to assessing credibility of a field’s body of work through direct replication.

TP: Thanks to Fiona and David for engaging in this discussion. Here some brief take-homes:

It may be that psychologists are NOT considerably more concerned about the replication crisis than are ecologists and evolutionary biologists. Instead it may be that the much larger number of psychology researchers means there are more concerned psychologists only in absolute numbers, but similar numbers proportionally.
To the extent that psychologists may have greater levels of concerns about reproducibility, much of this may be attributable to a single major event in psychology in which a result widely believed to be false was derived through common research practices and published in a respectable journal. It may also be that psychologists tend to be more comfortable with the idea that they have biases that could influence their research.
Ecologists may recognize the value of replication, but their use of replication to assess validity of earlier conclusions is too rare to have led them to see low rates of replicability.
Some of the other ideas we discussed above may be worth empirical exploration, but we should be aware that hypotheses rooted in fundamental differences between disciplines have often not been strongly supported in the past.

[The opinions expressed in this blog post are those of the authors and are not necessarily endorsed by SORTEE.]

Search

Categories

Tags