A/A Testing

Contributor

Annemarie Klaassen,

Global Head of Digital Experience Optimization at MediaMarktSaturn

What is A/A Testing?

A/A testing, sometimes called a Null test, is when you run an experiment where the control and treatment groups are shown the exact same experience. There’s no change between the two groups at all.

The goal isn’t to find a winner. It’s to check if your testing platform and setup are working correctly. If two identical versions produce a significant difference, something’s wrong.

Teams use A/A tests to catch hidden issues that could corrupt real A/B tests later — like traffic not splitting properly (SRM), tracking bugs, flickering, or randomization bias.

It’s one of the best ways to build trust in your experimentation system.

How Does an A/A Test Work?

You split users into two groups, just like a normal A/B test. Both groups see the same page, app experience, or feature.

Then you monitor the results the same way you would with an actual experiment.

What you’re mainly checking:

False positives: Making sure about 5% of your metrics show significance at a 95% confidence level — no more.
Variance: Watching how metric fluctuations behave over time.
Randomization issues: Ensuring users are truly split randomly without bias.
Technical errors: Catching glitches like flicker, unequal load speeds, or tracking issues.
Data validation: Confirming that experiment data matches your primary reporting system.

If everything looks good, you can move forward with A/B testing confidently. If not, you need to find and fix the problem.

Why Would an A/A Test “Fail”?

Even though you’re testing the same thing, sometimes you’ll see unexpected differences. That usually signals a real problem.

Some causes include:

Incorrect variance calculations
Biased randomization or carry-over effects from past tests
Hidden technical issues like flicker or server differences
Using a different unit of analysis than randomization (e.g., randomizing by user but analyzing by session)
Problems with traffic allocation if your tests aren’t 50/50

When an A/A test “fails” more than expected, it’s a red flag you must investigate before trusting any A/B test results.

Best Practices for Running A/A Tests

Running a good A/A test is not complicated, but it does require discipline.

Keep the control and treatment experiences truly identical. If your regular tests split traffic 70/30 or some other ratio, mirror that here too.
Run A/A tests alongside A/B tests. Don’t wait until after — weave A/A validation into your overall process.
Expect some “significant” results. With a 0.05 significance level, around 5% of metrics will still randomly appear significant. What matters is the overall pattern, not one-off flukes.
Look at the p-value distribution. Over time, p-values should spread out fairly evenly. Big skews hint at bias.
Use A/A tests to teach your team. A hidden A/A test, reported daily, can show how “false wins” pop up if you peek at data too soon.

“A/A tests aren’t a one-time task—they should be a regular part of your experimentation process. They help ensure your setup works as intended by verifying that traffic is correctly split (e.g., 50/50), all intended visitors are included, and key performance indicators (KPIs) are tracked properly. If an A/A test shows a big difference between the two identical versions, it could mean there’s a problem—like tracking errors, incorrect visitor splits, or other setup issues. However, if the test results look normal (inconclusive), that doesn’t automatically mean everything is perfect. And if you do see a difference, don’t panic! Random chance can sometimes create false positives, depending on the significance level. Instead of focusing on a single result, think of A/A testing as a routine health check for your platform. By incorporating A/A tests into your process, you can catch issues early and ensure your A/B test results lead to the right decisions.”

Annemarie Klaassen, Global Head of Digital Experience Optimization at MediaMarktSaturn

Why A/A Testing Matters

Skipping A/A testing is like flying blind. You could be making big business decisions based on broken experiments and not even know it.

When you take the time to run A/A tests regularly, you’re:

Protecting your data integrity
Spotting platform or tracking bugs early
Building real confidence in your experimentation program

Even teams at Microsoft, LinkedIn, and Bing bake A/A tests into their ongoing processes.

It’s that important.

Related Terms

Back to Glossary