Holdout Group
Contributor
What is a Holdout Group?
A holdout group is a set of users deliberately kept out of all experiments and product changes over a defined period—often months or even a year. These users see the “static” version of your product, unaffected by any optimizations, new features, or test variants.
The purpose? To act as a baseline. By comparing their behavior with that of exposed users (those who receive the new features or experiment winners), you can estimate the true cumulative impact of your experimentation program.
Holdout groups are sometimes called universal holdbacks or global holdouts.
How Does It Work?
To implement a holdout group, teams need:
- User identification and tagging: The system must persistently recognize who’s in the holdout group across time and devices.
- Strict exclusion logic: Holdout users must be reliably excluded from every experiment and feature launch.
- Static experience delivery: The original version of the product must remain available and functional for these users.
- Consistent data tracking: You need accurate metrics for the holdout group to compare outcomes.
This only works if your platform can ensure full isolation from experimental changes—often only possible for logged-in users.
Why Use a Holdout Group?
Holdout groups are powerful when you want to:
- Measure cumulative impact of multiple experiments across a time period (e.g., quarterly).
- Benchmark experimentation value: Are your tests and changes actually improving key business metrics?
- Understand long-term effects: For example, are short-term gains in engagement sustained or do they fade over time?
- Measure platform or system-level impact, such as the overhead introduced by your experimentation engine itself.
They give you the full picture; not just what worked in isolation, but what all your efforts combined achieved.
What Are the Drawbacks of a Holdout Group?
Despite the benefits, holdout groups come with real trade-offs:
- Lost revenue: You’re intentionally withholding potentially valuable improvements from real users.
- Statistical power issues: Since holdout groups are typically small (e.g., 5–10% of users), it’s harder to get statistically significant insights.
- Technical complexity: You need strong infrastructure to maintain a static experience, prevent “pollution” from users entering or exiting the group, and handle exclusion logic reliably.
- Bias risk: Identity loss (e.g., due to cookie churn), survivorship bias, or selection bias can make the group unrepresentative over time.
- Low actionability: Even if you see a negative impact, you may not know which change caused it. You only get a cumulative view.
This is not a casual tactic. It’s a strategic move for teams focused on long-term learning.
Best Practices of Using a Holdout Group in Experimentation
- Use only when justified: Holdout groups are expensive. Don’t default to them unless you’re aiming to measure long-term cumulative impact or validate platform behavior.
- Target new users for degradation holdbacks to avoid confusing regular users.
- Leverage persistent identity to avoid group pollution.
- Be mindful of the timeframe: 6-12 months is common, but long durations increase risk of bias.
- Run checks like SRM detection to ensure setup integrity.
If a holdout group isn’t feasible, consider meta-analysis or long-term A/B tests as alternatives.
“A holdout group is a segment of users intentionally excluded from all experiments and experiment winners over time (i.e., a year). They serve as a baseline for comparison against the group that sees the optimized website.
The main advantage is that you can measure the true impact of your experimentation program. By comparing the holdout group’s conversions and AOV with regular website visitors, you know the uplift caused by your experimentation efforts.
While this might sound ideal, it has many challenges and major downsides.
First, it is hard to track users for an extended period. This will primarily only work on logged-in users, which means you can only compare them with other logged-in users. Therefore, you will miss a lot of data if not everyone is logged in all the time.
Assuming your experimentation leads to higher revenue, you will also miss extra revenue from the users in your holdout group. As the users in the holdout group can not be part of any experiment, your MDEs increase, slowing down your experimentation program. In summary, while a holdout group sounds ideal for analyzing the true impact of experimentation, it is hard to set up and comes with several major downsides.”
Ruben de Boer, Independent Experimentation & Decision Strategy Leader