Peeking
Contributor
What is Peeking in A/B Testing?
Peeking, also known as optional stopping, occurs when you check the results of an ongoing A/B test and make decisions based on early data, before the test has properly concluded. It’s one of the most damaging practices in experimentation.
Even a single early peek at your results dramatically increases the risk of making a false decision. Every time you check, you inflate your Type I error rate—the chance you incorrectly believe there’s a winning variant when, in reality, any observed difference is due to random noise. The more you peek, the less trustworthy your p-values, confidence intervals, and experiment outcomes become.
Peeking doesn’t just affect “winning” calls either. It can cause you to falsely conclude a test failed (increasing Type II errors) or to misinterpret the entire customer behavior landscape based on incomplete data.
Why Peeking is a Problem
- False Positives: You’re much more likely to conclude a false win and launch a bad variant
- False Negatives: You risk killing good ideas prematurely
- Distorted Statistics: p-values and confidence intervals no longer behave properly after peeking
- Resource Waste: Development and marketing teams act on misleading results
- Erosion of Trust: Teams lose confidence in experimentation when decisions repeatedly fail to deliver expected outcomes
“Peeking refers to checking the interim results of an A/B test with the intent to take action before it completes. It is very common for experiments to look at “significant” in the beginning due to noise in the data, novelty effects, etc. This can lead to wrong decisions based on a subset of the sample that can be very costly for the organization.
Peeking can be avoided by building a robust “test plan” where the sample size, significance level, and test duration are pre-determined before starting an experiment. It also helps to invest in educating the broader stakeholder group on why it’s important to wait until the test sample is reached before any decision is made. Another less common alternative is using sequential tests (instead of more commonly used “fixed sample” tests), which allows for peeking but at the cost of sacrificing some statistical power.”
Anjali Arora Mehra, AI Strategy and Transformation Leader
How to Prevent Peeking
- Lock your test plan before launch: Define the sample size, significance level, and minimum run time up front—and stick to it
- Educate stakeholders: Make it clear that acting on interim results undermines the entire experiment
- Use Sequential Testing: Sequential tests allow interim data checks while properly adjusting for error inflation, using alpha- and beta-spending functions
- Apply Bayesian Methods: Bayesian frameworks can handle continuous monitoring without traditional peeking biases, though minimum run times still matter
- Run A/A Tests: Demonstrate how even identical variants can appear “significant” when results are peeked at early—making the risk real and visible for your team
Best Practices When Under Pressure to Peek
- Restrict dashboard access until the experiment hits pre-specified criteria
- Frame early looks as exploratory only—make no launch decisions based on them
- Communicate risk clearly: “Acting early saves time today but can destroy millions in misinformed decisions later”