Does X,
given F,
Cause Y?

What a fake sleep study can tell us about research design.

by Florina Sutanto and Arjun Kakkar

This is a simple cause-and-effect model.

We often think in these terms to make sense of the patterns in our lives. In school, you probably came to the conclusion that “doing homework causes me to get good grades”. Sometimes, this relationship is less clear-cut—did your ex really break up with you because of that ugly haircut, or was there something else at play? Either way, it’s a useful framework for evaluating whether something happened because something else happened first.

Researchers take this line of thinking several steps further by expanding this model to account for other, more complex dynamics.

For example, the seemingly straightforward relationship between exercise and sleep quality…

could really be this:

You get the idea.

This process of determining these cause-and-effect relationships between variables is called causal inference. It involves using statistical methods to isolate the causal effect of one variable on another—basically, quantitatively identifying how a change in X leads to a corresponding change in Y.

Easier said than done, right? The world is tricky! Convoluted! Think about the thousands of factors that could go into something as simple as a night of bad sleep—age, mood, weather, what you had for dinner, when you had dinner, who pissed you off at work yesterday, a random nightmare, etc. So how do you isolate the effect of one variable when there’s so much noise around it?

Let’s go back to this hypothesis.

A simple way to gauge the effect of exercise on your sleep is by tracking various workout intensities and recording how you feel the next morning on a scale from 0 (crappiest night ever) to 1 (best sleep yet), every day for a month. You know that coffee could also affect sleep, so you try to control for it by drinking the same amount each day so that it doesn’t muddle up the experiment.

Here’s that data visualized on a chart.

Your Exercise and Sleep Quality (1 month)

Slide to reveal data.

Day 1 of Week 1 0 of 28 data points

Interesting! Exercise does seem to have an effect on your sleep quality here, and you can see that there’s a general peak indicating your ideal exercise duration. Too little and your sleep quality suffers; too much and—well, you can see it yourself above. (This is a cautionary tale about staying too long on the treadmill.)

So…are these results truly conclusive? Not exactly. 28 data points is a pretty small sample, and the overall trend could look different if you were to collect, say, a thousand more data points. Also, maybe you weren’t too strict with your coffee intake—you needed an extra cup one Monday morning, or you skipped because the line was too long that day—and that’s okay. Your data won’t always be perfect. What this does suggest to us, though, is that there’s probably merit in increasing our sample size. Ideally by a lot.

Research Brain Activated

You’ve deduced that there’s some ideal duration of exercise that will lead to the best sleep. Now, in true researcher mode, we want to take it a step further and scale this up—we want to know whether there’s an ideal exercise duration for more people at large.

To collect information about a larger group of people, you put up a survey poster all over your neighborhood, asking people to self-report their exercise duration today and how they feel the next morning. Since coffee intake is on your radar, you also ask them to report how many cups they drank that day.

Your neighbors heeded your call. The results are in. The raw data reveals…not much of a pattern.

But what happens when we control for coffee intake, this time by grouping each person’s response based on how many cups they drank?

Survey Results (1 Night)

Group by coffee

Voila. We’ve surfaced new insights: people who drank less coffee tended to sleep better at every level of exercise, while those who drank more cups tended to sleep worse, even after exercising. From this, we can infer that coffee does seem to have an effect on the observed relationship between exercise and sleep, making it an important covariate (a related variable). Plus, the data was pretty easy to collect (a printer and some scotch tape to the rescue), which was a great bonus.

That being said, there’s still a lot missing here. One: there’s a lot of variation in the data—we can see that they’re not clustered tightly around the trend lines. Two: since we relied on pure observational data—data that we had no hand in manipulating or influencing, like we did in the first experiment—we can’t be certain whether other unmeasured variables are confounding the results. It could be that everyone who drank less coffee that day happened to be less stressed, or got more sunlight, or owned really high-quality beds, which could very much affect sleep quality…and render our observation moot.

So how do we deal with this problem of hidden variables?

We Randomize 🎲

Your friends are now amused by how committed you are to this scientific bit. After some convincing and a few bribes, you manage to corral 21 of them into participating in your sleep study for a week—any longer and it’d be difficult to get everyone to stay on board. We randomly assign three to each exercise regimen group and ask them to maintain their daily caffeine intake habits for the entire week.

120

150

180

Minutes of Exercise

Randomization is key here because we want to make sure that your friends who are heavy drinkers, or work the same stressful finance job, or live in quieter neighborhoods, aren’t grouped together; it’s important that they’re spread out so we can really isolate the effect of exercise on sleep quality.

And…it’s been a week!

Randomized Trial Results (1 Week)

Turns out the overall trend isn’t so different from our original self-study: sleep quality tends to peak between 30-90 minutes of exercise for this particular group of people, while under- or over-exercising can harm it. This is a pretty illuminating outcome, suggesting that there is a consistent underlying mechanism in how exercise and sleep interact. As a last step, how can we verify that this effect isn’t just a fluke caused by random chance or variability in the data?

Enter the hypothesis test.

Let’s compare the results of the 0 exercise group and the 90 minute one to see if the difference we observed is statistically meaningful.

We start with the null hypothesis: the default assumption that there is no difference between the two groups.

THERE IS NO DIFFERENCE IN SLEEP QUALITY FROM DOING 0 MINUTES OF EXERCISE OR 90 MINUTES OF EXERCISE

First, we calculate the means of the two groups: 0.195 for the 0 exercise group and 0.53 for the 90 minute group. We then use a t-test to compare the difference between each group’s mean with the variability within each group. The result is a t-value of 10.03, which tells us that the observed difference between the two groups’ averages is about 10 times larger than the variation we’d expect if exercise had no effect on sleep quality.

Converting this to a p-value shows that there is less than a 0.001% chance of observing a difference this large if exercise truly had no effect on sleep quality. Since this probability is minuscule, we can confidently reject the null hypothesis and conclude that exercising does have a statistically significant positive impact on sleep quality.

Nice! We had a hypothesis, probed at it from different angles using various research methods, and came away with a deeper understanding than where we started. That’s science, baby.

What Happens in the Real World?

Okay, using simulated data made the whole process easier to explain, but I admit this setup was a liiiitle idealistic (although exercise really does help with sleep IRL). The real world is infinitely more complex, full of variables lurking beneath the surface that are often mathematically difficult to account for. We can dig in pretty deep, but we’ll typically end up hitting worldly limitations sooner or later: reminders that our grasp of truth is partial and provisional, shaped as much by our human perspective as by the data itself.

The methodologies we choose to answer our questions—sometimes as simple as writing down the patterns we notice, other times as complex as designing decades-long studies—are all strategies we’ve developed to keep pushing against those limits. They don’t always make the noise disappear, but they help us separate the signals from chaos just enough to move knowledge forward.

Sometimes this work leads us nowhere. Sometimes it baffles us even more instead of elucidating the answer. But it’s still a worthwhile endeavor, perhaps the most worthwhile of them all for us: it follows through the instinct to ask why. We ask if smoking causes lung cancer to help people make more informed choices. We ask how we can make crops more resilient, how we can cure illnesses, how events that happened long ago may help us understand our future. We ask why the world still spins. It’s the question that propels us forward, and so it’s essential that we try to parse out the answer as best as we can.

Does X, given F, Cause Y?

Research Brain Activated

We Randomize 🎲

What Happens in the Real World?

Does X,
given F,
Cause Y?