Published in · 5 min read · Oct 30, 2022
To this day, this question still haunts people in UX.
As a response, some often cite:
“A usability test with 5 users is enough to identify around 85% of problems.”
After all, even Jakob Nielsen said so back in 2000.
But people often forget this only applies to problems with a ≥ 31% likelihood of happening.
So with 5 users you may never discover smaller, hidden issues.
To understand the optimal number of users, let’s focus on each research outcome.
Usability tests
Testing usability is a great way to identify the most obvious problems in the experience.
This type of qualitative research should be iterative. That is, we’re supposed to fix the issues we’ve identified before testing it again.
To spot the most obvious issues, 5 users per round are usually more than enough.
However, things are a little different when it comes to uncovering issues that are less likely to occur. In these cases, it’s highly reccommended to increase the sample size to 25 or 30 users per round.
So where did we get the idea that 5 users are enough?
The 5-user claim
Jakob Nielsen’s claim was based on a formula he had previously devised alongside his associate Thomas K Landauer.
In 1993, Nielsen and Landauer attempted to calculate the optimal number of users for the cost of a given research study.
They estimated that a single user was likely to identify 31% of all experience issues in a software application.
Revisiting this study
More recently, UX researcher Thibault Greenen has noted that Nielsen and Launder might have largely overestimated their calculations by pointing to 31%.
According to Greenen, software applications at the time were simple and scarce.
Things are a little more complex now. The increasing growth in technology has led to a surge in problems that specific user segments face.
In an optimistic approach, Greenen assumed users would only discover half the issues in current-day software applications.
As such, he revisited the formula by cutting the probability in half, from 31% to 15.5%.
He found that 27–30 users are enough to discover 99% of all problems in the experience.
This represents a huge step up from the previous claim, but does it mean we stop relying on 5 users per round?
Not necessarily. Especially if we’re targeting a specific role that isn’t overly segmented and we want to save costs.
If you have more specific criteria, then it’s safe to rely on a larger amount of users.
And if your goal is to spot every single thing wrong in the experience, including issues with a likelihood of occurrence of around 1 to 10%, then you’ll need 40 to 80 users.
Ultimately, it’s up to you to decide if that is worth the significant increase in research costs and the total time spent on the study.
Card sorting tests
These tests give you valuable insights into how users mentally structure information.
There’s a common saying in UX that goes like this:
“We’re testing the design, not you.”
But in card sorting, we’re actually testing users.
By understanding users’ mental models, we can restructure the information architecture to make the experience much more intuitive for them.
In these tests, Nielsen recommends at least 15 users for a high enough correlation between the results from the sample and the ultimate results.
Statistical analyst Jeff Sauro also noted little guidance on finding the right sample size for card sorting tests.
He cites another study from Tullius and Wood (2004), in which the cluster results from 168 users would’ve been similar to tests with 20 to 30 users.
Consider using ≥ 30 users for more accurate results or a minimum of 15 users.
Metrics such as success rate, time spent, or customer satisfaction should be reported with statistical information.
Don’t report metrics obtained from a small sample size because the margin of error is far too large.
This means that claiming “7 out of 10 people were successful” in your qualitative research is very likely to be misleading.
Larger the size, the smaller the error
You’ll often see metrics alongside a confidence level, which represents how certain you are of the results. And the possible variance is represented by the margin of error.
Example: At a 50% success rate and a ± 15 % margin of error, the true score could be anywhere from 35% to 65%.
Your chance of getting skewed results lowers as you increase the sample size. A lower margin of error also gives you a much better estimate.
Therefore, I recommend aiming for a ≥ 90% confidence level and measuring things at +- 10% margins.
To determine the best sample size, identify the purpose of your research and what type of metrics you’ll be collecting.
These guidelines aren’t one-size-fits-all.
You should decide carefully and based on your specific use case.
Qualitative research
Usability tests will help you uncover the most common issues in the experience:
- Identifying common issues: 5 users per round.
- Discovering less obvious problems: ≥ 30 users per round.
Card sorting tests allow you to understand the mental model of your users:
- Minimum: 15 users.
- More accurate: ≥ 30 users.
Quantitative research
These guidelines are for simple assessments that don’t include multiple comparisons within a study.
- Potentially misleading: ≤ 21 users.
- More solid results: ≥ 40 users.
References
Thank you for reading.
How big is your sample size for your usual research?