Frequentist

Contributor

Silvia Guardingo,

What is the Frequentist Approach in A/B Testing?

Frequentist inference is the backbone of most traditional A/B testing practices. It uses probability and statistical modeling to evaluate hypotheses, with the goal of learning about the true nature of user behavior based on a sample of data.

In this approach, you start with a general hypothesis—usually the null hypothesis that assumes no difference between variants. Then you use specific experiment results to decide whether the data provides enough evidence to reject that hypothesis.

Unlike Bayesian methods, Frequentist statistics treat the null hypothesis as either true or false and focus on the probability of observing the data (or more extreme outcomes) under that assumption. This is where p-values and confidence intervals come into play.

Frequentist inference is factual. It’s designed to control errors, provide finite-sample guarantees, and deliver trustworthy conclusions without relying on subjective prior beliefs.

Key Steps in a Frequentist A/B Test

Formulate a clear hypothesis: Define exactly what change you’re testing and what effect you expect.
Identify change and metrics: Choose a primary success metric, guardrail metrics, and ensure they align with your goals.
Randomize users into variants: Ensure random allocation to avoid bias.
Run the experiment for a predetermined duration and sample size: No peeking or adjusting after you start.
Use statistical tests (like a t-test): Analyze whether differences between variants are statistically significant.
Decide based on p-values: If the p-value is lower than your pre-set threshold (usually 0.05), you reject the null hypothesis.

The entire process rests on probabilistic assumptions, like the distribution of your metrics being approximately normal (which the Central Limit Theorem often helps ensure with larger samples).

How Frequentist Differs from Bayesian

Frequentist and Bayesian methods approach data differently:

Frequentist						Bayesian
Focuses on error rates in repeated samples						Focuses on updating beliefs based on prior data
Calculates p-values and confidence intervals						Calculates probabilities for hypotheses
Requires fixed sample size for valid conclusions						Can handle flexible stopping rules
Separation between data analysis and decision-making						Combines data with external information for decision-making

In short: Frequentists rely on strict rules and error control. Bayesians bring prior knowledge into the mix.

Why Use Frequentist Methods?

Error Control: Designed to limit false positives and false negatives.
Finite-Sample Guarantees: Offers reliable results with the sample you actually have, not theoretical infinite data.
Foundation for Trust: Considered the gold standard for scientific causal inference.
Clear, Recognized Outputs: P-values and confidence intervals are widely understood (even if sometimes misused).
Objective Inference: Separates statistical conclusions from business decisions, allowing stakeholders to weigh outcomes based on context.

Platforms like Convert Experiences provide both Frequentist and Bayesian methods for conducting A/B tests and surfacing actionable results.

Risks and Common Pitfalls

P-value Confusion: A p-value is not the probability that your hypothesis is true. It’s the probability of observing your data (or something more extreme) if the null hypothesis is true.
Underpowered Tests: Running tests with too few users increases the risk of missing real effects.
Peeking: Checking results early inflates false positive rates and invalidates your test.
Multiple Testing Problems: Testing multiple variants, metrics, or segments without adjustments can make false positives inevitable.
Biased Samples: Poor randomization or tracking errors can break the assumptions Frequentist methods rely on.
Over-Reliance on Significance: Statistical significance doesn’t always equal practical business significance.

Without good experimental hygiene—randomization, adequate sample sizes, and pre-defined hypotheses—Frequentist results can easily mislead.

Best Practices for Frequentist A/B Testing

Define your hypothesis before starting.
Calculate required sample size and minimum detectable effect (MDE) upfront.
Stick to a fixed test duration and sample size unless using a proper sequential design.
Avoid early stopping unless you’ve planned for it statistically.
Correct for multiple comparisons if you’re testing lots of variants or metrics.
Use A/A tests regularly to validate your platform and randomization.
Rely on experienced analysts to review test design and interpretation.

Always contextualize results with business goals; statistical significance is only part of the story.

“When using frequentist methods, the first step is to determine the required sample size. To calculate it, we need to define the statistical power, significance level, and Minimum Detectable Effect (MDE).

The frequentist approach is more rigid than the Bayesian approach. You must wait until the predetermined sample size is reached before calculating the p-value. Based on the p-value, the null hypothesis is either rejected or not. Usually, we reject the null hypothesis if the p-value is equal to or less than 0.05.

The interpretation of the p-value is confusing for many people. It is not the probability that B is better than A (this is Bayesian), but rather the probability of obtaining the observed data or more extreme results if the null hypothesis were true.”

Silvia Guardingo, CRO Technical Lead at Garaje de ideas

Related Terms

Back to Glossary