Replication crisis in psychology

In 2015, a team of psychologists tried to repeat 100 psychological studies and found that only 36% of the original statistically significant results could be replicated.

This study of studies called the Reproducibility Project was published in the prestigious journal Science and created waves in the world of psychology, forcing researchers to question their methods and plunging the entire field into a replication crisis that it’s still struggling to come out from.

Many argued that because psychological studies can’t be reproduced, they must be unreliable.

Recently, another team of researchers decided to test the reproducibility of the reproducibility project. As it turned out, even the reproducibility project could not be reproduced.

This was bound to happen. Psychology is much more complex than people like to think. After all, it’s an objective study of subjectivity. Quantifying human subjectivity is no piece of cake.

When you’re trying to understand human behaviour, you’re trying to ‘get inside people’s heads’ to see how they think.

In an earlier article titled Why psychology is a different kind of science, I emphasized that human behaviour is by no means a linear cause-and-effect phenomenon.

There are hundreds of variables involved that are often ignored even if the researchers take great pains to create ideal and controlled conditions.
There can be multiple causes behind a single behaviour and multiple possible behavioural responses to a single stimulus.

On top of this, context plays a major role in understanding human behaviour. In other hard sciences, context can be safely ignored because it’s assumed that phenomena occur due to universal laws that must be applicable in all contexts and so a change in context should not affect a change in the results.

It’s this very assumption that places huge importance on replication in science because if something cannot be replicated, it probably isn’t true.

Drawbacks of typical psychological studies

A typical psychological study looks like this:

You collect a sample of human beings that is supposed to represent the general human population and you test your hypothesis on them. If you find that a majority in your sample (say 75%) confirm your hypothesis, then your hypothesis is said to be backed up by strong evidence.

Well, what about the rest of the 25%, one might ask? 

They’re simply ignored.

It’s not difficult to see what’s wrong with this type of thinking. Anomalies cannot be ignored. They’re as important as the hypothesis that you’re trying to confirm. If 25% did not confirm your hypothesis, why not? Maybe answering that question can provide us with more valuable insights.

Say you have 10 people that you want to conduct an experiment on. You think you have fully ‘controlled’ the conditions that could possibly influence your experiment in any way.

Suppose the experiment is designed to test the effects of positive images such as flowers, sceneries, and good food on the moods of people.

How naïve is it to assume that all your 10 subjects are in a similar, neutral psychological state that you can study and work on? That they’re not already feeling something inside? That they’re not already influenced by something?

Can you afford to ignore the effects their recent life events might be having on their psyche right now?

Say you do ignore all that and go on with your experimentation.

Here’s your result:

7 out of 10 people (70%) reported feeling positive upon being exposed to positive images for five minutes. Hence, we have strong evidence that exposing yourself to positive images for 5 minutes can improve mood.

But wait.

What about the rest of the 30%? Are they from a different planet or what? Why didn’t they feel good after being exposed to the positive images?

Let’s call those three people X, Y, and Z. 

Here’s what was going on with them:

X had an argument with his spouse last night and even though he acted cool in front of the researchers, he was feeling slightly upset underneath and so was in no mood to feel positive.

A bunch of friends recently played a prank on Y by gifting her flowers that spurted out pepper powder when smelled at. When Y was shown the image of flowers, it reminded her of this recent unpleasant event and, as a result, she felt angry instead of positive.

Z hates consumerism and junk food to the core. When he was shown images of burgers and ice-creams, he thought the researchers were promoting consumerism and unhealthy eating, which made him feel hostile.

3 people got a promotion last month, 2 made a successful business deal about a week ago and 1 got married yesterday. No wonder they’re already feeling good. Only one among all the subjects probably felt good only because of watching the images on the screen.  

This was just a hypothetical example of 10 people. Of course, the sample size is never this small and the larger it is, the more accurate the results are believed to be since they sort of iron out these minor discrepancies. Do they really?

More people means more variance in individual psychological make-up. Imagine the degree of complexity that would result when thousands of people participate in a study carrying their own individual emotional baggage.

This is why studies that study general human behaviour based on evolutionary theory tend to be more accurate than those that study individual human behaviour.

The general aspects of human behaviour can be tested by empirical data but individual aspects of human behaviour based primarily on past life experiences are hard to test this way.

The central problem (and paradox) in psychology is that not only is an individual like others but at the same time he’s also unique and many studies come up with generalizations that may or may not be applicable to everyone in all the contexts.

Ignoring the individual psychological make-up of a person is not a good idea because it strongly shapes behaviour.

replication crisis

Does this mean researchers are wasting their time?

Absolutely not. Every study can provide some valuable insight into the human psyche, whether it can be replicated or not. Replication in itself is an important scientific tool but in psychology, we’re dealing with way too many variables.

Statistics is all about generalizations. If researchers want to make generalizations then they’ll need to stick to general human behaviour that is covered by fields like evolutionary psychology, cognitive psychology, neuroscience, and developmental psychology.

At the same time, it’s worth noting that quirks in individual human behaviour are rarely captured by statistical experimental methods.