Ecologists and evolutionary biologists can and should pre-register their research

By Tim Parker | October 26, 2017

[This post has been originally posted on ecoevotransparency.org]

I wrote a draft of this post a few weeks ago, and now seems like a good time for it to see the light of day given the great new pre-print just posted on OSF Preprints by Brian Nosek, David Mellor, and co-authors. They describe the utility of pre-registration across a variety of circumstances. I do something similar here, though I focus on ecology and evolutionary biology and I don’t try to be as thorough as Nosek et al. For greater depth of analysis, check out their paper. On to my post.

Transparency initiatives are gaining traction in ecology and evolutionary biology. Some of these initiatives have become familiar - data archiving is quickly becoming business as usual - though others are still rare and strange to most of us. Pre-registration is squarely in this second category. Although I know a number of ecologists / evolutionary biologists who are starting to pre-register their work (and I’ve participated in a few pre-registrations myself), I would guess that most eco/evo folks don’t even know what pre-registration is, and many who do know probably wonder if it would even be worth doing. My goals here are to explain what pre-registration is, why it’s useful, and why most ecologists and evolutionary biologists could be using it on a regular basis.

What is pre-registration?

At its most thorough, a pre-registration involves archiving a hypothesis and a detailed study design, including a data analysis plan, prior to gathering data. However, as you’ll read below, the data analysis plan is typically the core element of a useful pre-registration, and a pre-registration can happen after data gathering as long as the analysis plan is declared without knowledge of the outcome of the analysis or its alternatives. Pre-registrations are archived in a public registry (the Open Science Framework, OSF, for example) so that they can later be compared to the analysis is ultimately conducted. Depending on the pre-registration archive, the pre-registration may be embargoed to maintain confidentiality of a research plan until it is completed. Once a pre-registration is filed, it cannot be edited, though it could potentially be updated with further pre-registrations. When a pre-registered study is published, the paper should cite (or better yet, link to) the pre-registration to show the extent to which the plan was followed.

Why is pre-registration a useful component of transparency?

People (and, including all of us) are worryingly good at filtering available evidence so that they end up seeing the world that they expect to see rather than the world as it actually is. In other circumstances, after noticing a pattern, we readily convince ourselves that we predicted (or would have predicted) that particular outcome. All the while, we fool ourselves into believing we’re being unbiased. Science is all about avoiding these biases and taking honest stock of available evidence, but in the absence of adequate safeguards, there is good evidence that scientists can fall prey to cognitive biases (for a striking example, see van Wilgengurg and Elgar 2013). Pre-registration is one of a number of tools that helps scientists take a clear eyed view of evidence, and it helps those of us reading scientific papers to identify evidence that is less likely to have been run through a biased filter. When scientists fiddle with analyses and can see how that fiddling impacts results, there is a great temptation to choose the analyses that produce the most desirable outcome. If this biased subset of results gets published and other results go unreported, we get a biased understanding of the world. In my ignorant past I’ve conducted and presented analyses this way, and nearly every other ecologist and evolutionary biologist I’ve talked to about this admits to doing this sort of thing at least once. For this and other reasons (Fidler et al. 2016, Parker et al. 2016), I think this problem is common enough to reduce the average reliability of the published literature. Pre-registration could improve average reliability of this literature and help us identify papers that are less likely to be biased.

Why is pre-registration a viable tool for ecologists and evolutionary biologists?

I have written this section as a series of hypothetical concerns or questions from ecologists or evolutionary biologists, followed by responses to those concerns / questions.

I work in the field and I have to refine my methods, or even my questions, over weeks or months through trial and error?

You can pre-register after your methods are finalized. When starting work in a new system or with a new method, you generally won’t be ready to complete a particularly useful pre-registration until you’ve gotten your hands dirty. You’ll need to figure out what works and what doesn’t work through trial and error. Unless you have excellent guidance from experts in the system / method, you probably want to hold off finalizing your pre-registration until you’ve been in the field and landed on a method that works. It would still be good to think long and hard about the project before heading to the field. Develop as detailed a methodological plan as is reasonable (in many cases, you’ll have done this already at the proposal stage) and talk to a statistician to develop a tentative analysis plan. Once you’ve begun to implement a set of methods you feel good about, then complete your pre-registration.

What if I have to change my methods part way through the project?

Of course, even if you go through the trouble of field testing your methods before finalizing your pre-registration, things still might change. You might come back a second year to find that conditions demand a revised protocol. If you have to scrap your first year’s data because you can’t continue, then you probably want to create an entirely new pre-registration based on your new methods. On the other hand, if your data from last year are still usable and you’ve just had to make modest changes, then you have some choices. You could just wait until you write the manuscript to explain why your data gathering methods changed, or you could file a new pre-registration that acknowledges (and links to) the earlier protocol but also introduces the new methods. The old protocol won’t disappear, but the evolution of your project is now transparent.

I work with existing data (e.g., from long-term projects, from existing citizen science projects, from my own metaphorical file drawer, for meta-analysis, etc.), so I can’t pre-register prior to data gathering?

Pre-registration can be useful at any point before you start to examine your data for biologically relevant patterns either through examining data plots or through initial statistical analyses. If you haven’t peaked at the data yet, go for it. Pre-register a detailed analysis plan.

What if I see patterns in my data that I want to follow-up on with analyses that I didn’t pre-register?

Not a problem. Just distinguish your post hoc analyses from your preregistered analyses in your paper. Ideally you’d also report all your post hoc exploration and declare that you have done so. If you have too many to report in your paper, present them in supplementary material or even in a data repository.

I focus on discovery. I don’t typically have a priori hypotheses when I start a project?

Pre-registration can still be for you. The primary purpose of pre-registration is to promote transparency. Exploratory work is vital. We just want to know that we are not being shown a biased subset of your exploratory outcomes. Thus if you have a study and analysis plan, you pre-register it, and then preprint results from the full set of analyses you presented in your pre-registration, we know we are not getting a biased subset.

I don’t develop an analysis plan until I have my data so that I can see how they are distributed and how viable different modeling alternatives are with the real data?

There are several options here. You could develop a decision tree that anticipates modeling decisions you will need to make and lays out criteria for making those decisions. Other options include working with some form of your actual data in a trial phase. For instance, you could sacrifice a portion of your data for model exploration, select a set of models to test, pre-register those, and then assess them with your remaining (unexplored) data. Alternatively you could scramble your full data set, or add some sort of noise, refine your analysis plan with these ‘fake’ data, then pre-register and re-run the analysis with the real data.

I don’t want to develop a detailed analysis plan. There are too many unforeseen circumstances and I’m bound to ultimately deviate from my plan?

I have two responses to this concern. The first is to see my previous reply - there are ways to pre-register after you have your data and have confirmed that an analysis is likely to be appropriate with your data. My second point is that, just as field methods change in response to circumstances, so do statistical methods. A pre-registration doesn’t prevent us from changing an analysis, it just helps us be transparent about these changes. Among other things, this transparency probably helps us make sure that when we do change our plan, we’re doing so for a good reason.

If I can just pre-register an analysis plan after collecting my data, why should I bother to pre-register the other portions of my study methods?

Although I think it’s much better to pre-register an analysis plan than to not pre-register at all, pre-registering the whole study design is helpful for a variety of reasons. For one, pre-registering prior to completion of data gathering (or better yet, before data gathering), help makes it clear that your pre-registered analysis plan could not have been influenced by any knowledge (conscious or unconscious) about patterns in the data. Early pre-registration also facilitates transparency about the project as a whole. Later when you publish the results, other researchers can understand the scope of your work and can be shown (hopefully), that you’re not just publishing subset (potentially a biased subset) of the project. And if you never publish your work, then your pre-registration is evidence that someone at least considered doing this project at some point, and this could be useful information to other researchers down the line. A well-executed pre-registration might also help set expectations for the role of individual collaborators.

Pre-registration is just extra work?

In most cases, pre-registration should not dramatically change workload. If you’ve written a grant proposal, much of the work of pre-registration will already be done. If your grant proposal doesn’t include a detailed analysis plan, presumably the manuscript you write to report your results will include a detailed explanation of your analytic methods, and so a pre-registration just shifts the timing of this writing. Likewise, if this isn’t grant funded research, some other parts of your methods, and presumably parts of your introduction, will be ready and waiting in draft form when you complete your pre-registered study and go to write it up. To the extent that you end up writing more about your analyses in a pre-registration than you would have in a paper that reported only a subset of your analyses, this is the price for doing transparent and reliable science. You should have been reporting all this information somewhere anyway.

If I pre-register, I might be scooped?

You can embargo your pre-registration so that it’s private until you choose to share it. Pre-registrations on the site AsPredicted can remain private indefinitely. On the OSF, embargos are limited to four years.

I’m a student just starting a project and so I don’t know enough about my system to pre-register?

If you’re mentored by someone familiar with this system, then you’ll want to work closely with your mentor to develop your pre-registration. If this isn’t possible, read through my suggestions above. There are various paths forward, from waiting until you’ve worked out the kinks in your methods to various ways of pre-registering after you have data. Think carefully and identify the path that’s best for you.

If you have other concerns or questions about how you could apply pre-registration to your work, I’d love to hear about them. Let’s have a discussion.

Not all work needs to be pre-registered, but most work could be pre-registered. And this is important because pre-registration will help ecologists and evolutionary biologists improve transparency and thus, I expect, reduce bias in a wide array of circumstances.

[The opinions expressed in this blog post are those of the authors and are not necessarily endorsed by SORTEE.]