Many people define a hypothesis as an “educated guess”.
To be more precise, a properly constructed hypothesis predicts a possible outcome to an experiment or a test where one variable (the independent one) is tweaked and/or modified and the impact is measured by the change in behavior of another variable (generally the dependent one).
A hypothesis should be specific (it should clearly define what is being altered and what is the expected impact), data-driven (the changes being made to the independent variable should be based on historic data or theories that have been proven in the past), and testable (it should be possible to conduct the proposed test in a controlled environment to establish the relationship between the variables involved, and disprove the hypothesis - should it be untrue.)
What is the Cost of a Hastily Assembled Hypothesis?
According to an analysis of over 28,000 tests run using the Convert Experiences platform, only 1 in 5 tests proves to be statistically significant.
While more and more debate is opening up around sticking to the concept of 95% statistical significance, it is still a valid rule of thumb for optimizers who do not want to get into the fray with peeking vs. no peeking, and custom stopping rules for experiments.
There might be a multitude of reasons why a test does not reach statistical significance. But framing a tenable hypothesis that already proves itself logistically feasible on paper is a better starting point than a hastily assembled assumption.
Moreover, the aim of an A/B test may be to extract a learning, but some learnings come with heavy costs. 26% decrease in conversion rates to be specific.
A robust hypothesis may not be the answer to all testing woes, but it does help prioritisation of possible solutions and leads testing teams to pick low hanging fruits.
How is an A/B Testing Hypothesis Different?
An A/B test should be treated with the same rigour as tests conducted in laboratories. That is an easy way to guarantee better hypotheses, more relevant experiments, and ultimately more profitable optimization programs.
The focus of an A/B test should be on first extracting a learning, and then monetizing it in the form of increased registration completions, better cart conversions and more revenue.
If that is true, then an A/B test hypothesis is not very different from a regular scientific hypothesis. With a couple of interesting points to note:
Most scientific hypotheses proceed with one independent variable and one dependent variable, for the sake of simplicity. But in A/B tests, there might be changes made to several independent variables at the same time. Under such circumstances it is good to explore the relationship between the independent variables to make sure that they do not inadvertently impact one another. For example changing both the value proposition and button copy of a landing page to determine improvement in click through or completion rates is tricky. Reaching a point where the browser is compelled to click the button could easily have been impacted by the value proposition (as in a strong hook and heading). So what caused the improvement in the dependent variable? Was it the change to the first element or the second one?
The concept of Operational Definition is non-negotiable in most laboratory experiments. And comes baked with the question of ethics or morality. Operation Definition is the specific process that will be used to quantify the change in the value/behavior of the independent variable in the test. As an example, if a test wishes to measure the level of frustration that subjects experience when they are exposed to certain stimuli, researchers must be careful to define exactly how they will measure the output or frustration. Should they allow the test subjects to act out, in which case they may hurt or harm other individuals. Or should they use a non-invasive technique like an fMRI scan to monitor brain activity and collect the needed data. In A/B tests however, since data is collected through relatively inanimate channels like analytics dashboards, generally little thought is spared to Operational Definition and the impact of A/B testing on the human subjects (site traffic in this case).
The 5 Essential Parts of an A/B Testing Hypothesis
A robust A/B testing hypothesis should be assembled in 5 key parts:
This includes a clear outline of the problem (the unexplained phenomenon) observed and what it entails. This section should be completely free of conjecture and rely solely on good quality data - either qualitative and/or quantitative - to bring a potential area of improvement to light. It also includes a mention of the way in which the data is collected.
Proper observation ensures a credible hypothesis that is easy to “defend” later down the line.
This is the where, what, and the who of the A/B test. It specifies the change(s) you will be making to site element(s) in an attempt to solve the problem that has been outlined under “OBSERVATION”. It serves to also clearly define the segment of site traffic that will be exposed to the experiment.
Proper execution guidelines set the rhythm for the A/B test. They define how easy or difficult it will be to deploy the test and thus aid hypothesis prioritization.
This is where you make your educated guess or informed prediction. Based on a diligently identified OBSERVATION and EXECUTION guidelines that are possible to deploy, your OUTCOME should clearly mention two things:
The change (increase or decrease) you expect to see to the problem or the symptoms of the problem identified under OBSERVATION.
The Key Performance Indicators (KPIs) you will be monitoring to gauge whether your prediction has panned out, or not.
In general most A/B tests have one primary KPI and a couple of secondary KPIs or ways to measure impact. This is to ensure that external influences do not skew A/B test results and even if the primary KPI is compromised in some way, the secondary KPIs do a good job of indicating that the change is indeed due to the implementation of the EXECUTION guidelines, and not the result of unmonitored external factors.
An important part of hypothesis formulation, LOGISTICS talk about what it will take to collect enough clean data from which a reliable conclusion can be drawn. How many unique tested visitors, what is the statistical significance desired, how many conversions is enough and what is the duration for which the A/B test should run? Each question on its own merits a blog or a lesson. But for the sake of convenience, Convert has created a Free Sample Size & A/B/N Test Duration Calculator.
Set the right logistical expectations so that you can prioritise your hypotheses for maximum impact and minimum effort.
5. INADVERTENT IMPACT
This is a nod in the direction of ethics in A/B testing and marketing, because experiments involve humans and optimizers should be aware of the possible impact on their behavior.
Often a thorough analysis at this stage can modify the way impact is measured or an experiment is conducted. Or Convert certainly hopes that this will be the case in future. Here’s why ethics do matter in testing.
Now Organize, Prioritise & Learn from Your Hypotheses.