On the Persistent Problem of Low Sample Size
As a kid I used to watch the cartoon Popeye the Sailor. If there is one thing you’d remember about Popeye, it was his love of spinach. In almost every episode Popeye reached a point where he would be knocked out or near defeat. You would think it was over for Popeye, but then, something magical would happen.
Popeye would open a can of spinach, consume its contents, and save the day. If you want to see what I am talking about, there is a five minute compilation video on YouTube of Popeye’s spinach binges in all their glory.
As a kid, the takeaway from such a cartoon was simple: Eat spinach and you will get muscles. Eat spinach and you will be strong.
I never questioned this logic as a child, but later in life I wondered why the creators of Popeye choose spinach as Popeye’s wonder food. Why not potatoes or chicken or something else? It was only after reading The Half-Life of Facts by Samuel Arbesman, that I found my answer:
Back in 1870, Erich von Wolf, a German chemist, examined the amount of iron within spinach, among many other green vegetables. In recording his findings, von Wolf accidentally misplaced a decimal point when transcribing data from his notebook, changing the iron content in spinach by an order of magnitude. While there are actually only 3.5 milligrams of iron in a 100-gram serving of spinach, the accepted fact became 35 milligrams. To put this in perspective, if the calculation were correct each 100-gram serving would be like eating a small piece of a paper clip.
Once this incorrect number was printed, spinach’s nutritional value became legendary. So when Popeye was created, studio executives recommended he eat spinach for his strength, due to its vaunted health properties. Apparently Popeye helped increase American consumption of spinach by a third!
Looking back now it’s funny that a cartoon character changed the spinach consumption of an entire generation because of one mistaken study. One misplaced decimal led to thousands of American children opening up cans of spinach in hopes of getting strong like Popeye.
Though this mistake was probably a net benefit for society, it illustrates the problem of small sample sizes and how they can lead us astray. In statistics, the letter “N” commonly refers to the number of observations used in a study. When it comes to the study that informed Popeye’s love for spinach, N equals one.
In this post, I plan on tackling the topic of small sample sizes, how they lead us astray, and why the most important decisions in our lives are biased by them. Let’s begin.
Why None of Our Ancestors Were Statisticians
It’s easy to poke fun at humans for their reliance on small sample sizes when making decisions, but this criticism fails to recognize our ancestral environment. We evolved in a world where making sure you had a sufficient sample size before making a decision was not necessary to pass on your genes. In fact, waiting for a sufficient sample size would most certainly lead to the opposite result—your extinction.
Imagine you visit a new watering hole for a drink and find yourself sick to your stomach a few hours later. Most people would never visit that watering hole again. But, you aren’t most people. You are a statistician. You determine that you don’t have a sufficient sample size to know whether the water at this new watering hole is safe or not, so you decide to give it another shot. Unfortunately, after a few big gulps, you get sick, die, and fail to pass on your genes.
Though this example is a bit simplistic, it illustrates why none of our ancestors were statisticians. Waiting for a sufficient sample size before acting, in many cases, was detrimental to survival and reproduction. This is true because of the asymmetry of costs associated with the errors surrounding this decision. For example, if you decide that the water is not safe when it is, it’s an inconvenience. However, if you decide that the water is safe when it isn’t, it’s a death sentence.
In the statistical sense, this is the difference between a Type I and Type II error. A Type I error is a false positive (i.e. thinking the water is dangerous when it is, in fact, safe) while a Type II error is a false negative (i.e. thinking the water is safe when it is, in fact, dangerous). To remember which error is which, I like using this funny graphic from Unbiased Research:
In all seriousness though, this framework explains why humans evolved to minimize Type II error. Because while the cost of a Type I error is typically just embarrassment, the cost of a Type II error can be more severe.
Understanding this difference also explains why we tend to focus more on negative events than on positive events. For example, psychology researchers at Ohio State University found that participants’ brains reacted more strongly to negative stimuli than to positive stimuli.
Their finding had nothing to do with the number of stimuli shown either. This illustrates that, when the perceived risks are large, we don’t need a sufficient sample size to pay attention. Our ancestors knew this well and behaved accordingly. For them, acting on insufficient information and avoiding a risky outcome was better than being a prudent statistician and ending their genetic legacy.
Cool Story, Bro
Small sample sizes also persist for reasons other than risk reduction. One of these reasons is the human craving for stories and narratives. Even though we have the sum of human knowledge at our fingertips, anecdotes continue to dominate our conversations in place of data-based arguments. For example, if I claimed that “smoking is bad for your health,” someone might respond with, “Well, I had a great-grandfather who smoked and he lived until he was 103.” It’s these kinds of argument that make me want to say, “So what?”
Does one observation invalidate reams of scientific evidence illustrating the adverse effect of smoking? Of course not. Yet, that’s what so many of these anecdotal arguments come down to. “Well I have a friend who…” or “In my personal experience…” should come with a warning of “Low sample size incoming!” This doesn’t mean that these kinds of anecdotes aren’t important or useful, just that they may not be representative of a deeper truth.
Of course, few people are actually arguing against the negative impact of smoking, but replace “smoking” with a more widely debated topic and we are off to the races. For example, I recently watched a documentary on Netflix called The Game Changers about high performance athletes who eat primarily a plant-based (vegan) diet. I have nothing against a plant-based diet, but the way the evidence was laid out in this documentary was just absurd.
In one of the examples where they try to illustrate how a plant-based diet improves performance, they highlight how Conor McGregor, a known steak eater, lost an MMA fight to Nate Diaz, a plant-based fighter. The documentary tries to imply that McGregor lost because of his diet instead of the multitude of other factors that could have affected his performance.
Well, guess what? Usain Bolt won three gold medals during the 2008 summer Olympics in Beijing while on a diet of 100 McDonald’s Chicken Nuggets a day. Bolt consumed over 47,000 calories of nuggets during his stay in China, but somehow won gold again and again and again. Should we make a documentary called “Nugget Nutrition: How the Golden Arches Lead to Golden Medals”?
You might snicker at my suggestion, but I’m not laughing. Why? Because The Game Changers documentary got a 99% audience score on Rotten Tomatoes! This is what proper statistical thinking is up against. Fallacious anecdotal arguments and small sample sizes plague our discourse.
And don’t just take my word for it either. Peter Attia, a leading physician and expert on nutrition, wrote a scathing critique of The Game Changers for the same reasons I took issue with the film.
Surely, data will save us from this problem, right? Not quite. Even when we have datasets at our disposal, the problem of small sample sizes can persist.
Not Enough Controls?
So, you’ve cast aside anecdotes and have now armed yourself with some data. What could go wrong? Well, not all data is created equal. Just because you have a dataset, this doesn’t imply that it is representative enough to answer the hypothesis you are examining. The most common problem I see is the lack of good control samples within datasets.
For example, imagine we are trying to determine whether vegetables increase life expectancy. If we look at the data we will see that those who consume more vegetables live longer than those who consume fewer vegetables. Case closed right? Not exactly.
Because if we look a little closer at the data we will also see that the people who consume more vegetables are also more likely to engage in many other longevity-enhancing behaviors (i.e. exercise, eating fruit, etc.). So how can we claim that vegetables cause us to live longer if most of those individuals who consume more veggies are also partaking in many other healthy behaviors? We are back to our sample size issue once again.
The error being made in this hypothetical example is known as healthy user bias. As I explained above, the healthy user bias implies that those who exhibit one healthy behavior also exhibit other healthy behaviors, so parsing out the effect of one of these behaviors from the others can be difficult. As I have said before, “Just try and find me a sufficient sample size of ultra-marathoners that also smoke a pack of cigarettes a day and you will understand this plight.”
The investment world probably also has a problem similar to healthy user bias that I might call “good investor bias.” Good investor bias implies that successful investors are likely to avoid all of the bad behaviors that get many other investors into trouble (i.e. excessive trading, paying commissions, leverage, etc.). So, if we look at “good” investors we might conclude that it was their asset allocation or some other attribute that made them successful, when, in reality, it was their avoidance of the most common investing pitfalls.
No matter what field you are in, data is not a panacea. You still have to spend time understanding its uses and limitations. As my undergraduate economics advisor used to say, “There is no substitute for thinking.” And in the information age, this idea has never been more important.
How Many Dates Have You Been On?
The same thinking that saved our ancestors and allowed us to build narratives around events can also lead us astray. Though we live in a time of information abundance, many of the most important decisions we make in our lives occur in areas where we have little comparative data to go on. If you are skeptical, consider the following line of questioning:
How many careers have you tried in your life? How many cities have you lived in? How many dates did you go on before choosing your life partner?
Even if you think you had a sufficient sample size when making these decisions, your information was still limited in many respects. For example, even if you tried out multiple careers, did you try them all in the same city? Or maybe you went on 50 first dates, but you never went on 50 seventh dates. You never lived with 50 different partners. You never raised 50 different families. And guess what? You never will. All of us have to make major life decisions without the benefit of having made these decisions many times before.
And what happens if you change in some material way? How much will your future experiences change as well? For example, imagine if you were extremely obese and then you suddenly lost a lot of weight. Think about how that would impact your dating life. All of that historical data you collected about relationships would be near worthless when assessing your current prospects.
I know this well because in many ways I am not the same person that I was 10 years ago. Most of my formative social experiences occurred in a time when I wore acid-washed jeans, black t-shirts, and had hair down past my shoulders. How many of those prior experiences are still shaping my current behavior? I don’t know.
What I do know is that the most important decisions we make in our lives will be under conditions of incomplete information. We won’t have much experience to go off of, but we will have to make a choice nonetheless.
Though I might have made the battle against small sample sizes seem hopeless throughout this post, there is one way to fight back…
How to “Increase” Your Sample Size
If you want to make more informed decisions you have to increase your sample size of experience. Unfortunately, this is easier said than done. We don’t exactly have the ability to live hundreds of different lives…or do we? Is there a way to crowdsource experience and learn from the decisions of others? Yes, it’s called reading!
That’s right. Reading is the most effective way to broaden your experience without actually having to live it. Of course, some things have to be lived to be understood, but this isn’t true for everything. For example, you don’t have to be an investment banker to understand that the job requires grueling hours. You can read about it in books and in online forums. You don’t have to date hundreds of people, you can read about what factors are most important in determining relationship success and find the appropriate partner that way.
And in the modern age, doing this has never been easier. We have so much information today that crowdsourcing experience isn’t a pipe dream. You really don’t have to face the world alone. You can learn from the wise men and women across the ages who made the mistakes that you no longer have to make. This is the real way to defeat small sample sizes.
I know this because that is how I learned to be a “good” investor. I read about it in books and in blogs. I didn’t have to lose fortunes to know what it was like to do so. I didn’t have to try leverage to see its dangers. I just had to read and absorb the lessons of others.
If you are interested in crowdsourcing some more experience, check out my recommended reads at my Amazon store. These books represent my most purchased recommended readings on this site since I started blogging exactly 150 weeks ago. Here’s to the next 150 weeks. Thank you for reading!
This is post 150. Any code I have related to this post can be found here with the same numbering: https://github.com/nmaggiulli/of-dollars-and-data