The Truth Behind the “Smallest Snippet Size” Claim (And What Convert Does Differently)
2.8 KB, 13 KB, 17 KB. The snippet size is lightweight. You base your A/B testing tool choice on that because it means the tool won’t affect page performance. Right?
In live experiments, the snippet you install is often just a loader. The real payload arrives later. Sometimes running into hundreds of KB after the pages render.
We unpack this in this article. You’ll learn how script size is actually measured in production, where common claims fall apart, and why “smallest snippet size” doesn’t usually match real-world performance. Then, we’ll tell you how we do it differently.
How We Investigated A/B Testing Script Size in Real Environments
We set out with one key question:
What is the actual payload required to run a real A/B test in production?
The goal of our investigation was to measure the true execution cost of A/B testing scripts across some leading platforms. Specifically:
- How much code is actually delivered to the browser
- How that code is loaded and executed, and
- How much of it is visible in the vendor-reported snippet size claims
Our test subjects were:
- Convert Experiences
- Mida.so
- VWO
- ABlyft
- Webtrends Optimize
- Fibr.ai
- Visually.io
- Amplitude Experiment (included for contrast as a feature flagging system)
Now, the first step was to define what “script size” should mean in its truest sense. Instead of only isolating the snippet installed on the page, we treated script size as the full execution footprint.
That means every asset required to deliver an experiment to a user was included, namely:
- Initial script payload (inline snippet or SDK)
- Additional scripts loaded at runtime
- Total bytes transferred (gzipped and uncompressed)
- Number of network requests triggered
- Timing of execution relative to page render
- Presence of dynamic loading patterns (e.g., script injection, API fetches)
With that definition in place, we inspected each implementation with a combination of tools: Browser DevTools, direct payload measurement, and code analysis.
Methodology
We maintained the same set of evaluation steps for each tool’s snippet.
Step 1: Direct measurement from production environments
We collected the tracking scripts for all tools from live customer sites. Then, they were each measured directly using curl to capture both gzipped transfer size and uncompressed payload to get the exact figures delivered to users.
Step 2: Code-level analysis of how scripts execute
Next, we examined what happens after the initial script loads.
We looked for patterns such as progressive injection, where additional scripts are introduced at runtime, and external API calls fetch experiment configurations or variation logic. This way, we could trace the full execution path of each tool.
Step 3: Measuring runtime dependencies and total payload
Using browser DevTools, we captured the full network waterfall triggered by each testing tool. Talking about secondary scripts, configuration files, and any dynamically injected resources needed to run experiments.
This measurement gave us the total payload required to execute an experiment.
Step 4: Architectural and trade-off analysis
Each platform delivered experiments quite differently. We categorized them based on this. And each architecture had its tradeoffs. More on this in the findings section.
Step 5: Validation against vendor claims and benchmarks
Finally, we lined everything up against what vendors report.
We reviewed official documentation and third-party benchmarks (including the Mida.so benchmark) and compared them against direct measurements from production environments.
Step 6: Individual competitor assessment
As a last step, we assessed each platform independently across the same dimensions:
- Reported script size versus measured payload
- Delivery architecture
- Impact on page load and Core Web Vitals
- Approach to flicker prevention and experiment timing
Findings: What Actually Loads When Experiments Run
1. Reported snippet size rarely reflects the total payload
As you probably guessed, there’s a visible gap between what many vendors claim and what actually runs:
Table 1: Advertised vs measured base SDK
| Tool | Advertised Claim | Measured Base SDK |
Key Observation |
||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Convert | ~93KB baseline | ~93KB gzipped baseline | Full payload delivered upfront. No hidden runtime fetches | ||||||||||||||||||||
| VWO | 2.8KB stub | 14.7KB gzipped minimum (5.2× larger). | Stub excludes dynamically loaded library and campaign code | ||||||||||||||||||||
| ABlyft | 13KB | ~32KB gzipped SDK. ~168.5KB uncompressed. | Claim reflects only initial loader. Full footprint significantly larger | ||||||||||||||||||||
| Mida.so | 17.2KB | ~19.5KB loader. 30-40KB base SDK (1.7–2.3× larger) | Progressive injection model. Runtime configs not included in claim | ||||||||||||||||||||
| Webtrends Optimize | No clear size disclosed | ~170KB uncompressed (third-party benchmark) | Limited transparency on actual payload | ||||||||||||||||||||
| Visually.io | 15.13KB SDK | ~15KB SDK only | Missing experiment configuration footprint | ||||||||||||||||||||
| Fibr.ai | “Zero performance drop” | Not publicly disclosed | No measurable payload data available | ||||||||||||||||||||
| Amplitude Experiment | “<1ms evaluation” | ~63KB uncompressed SDK | Refers to cached evaluation. Not comparable to DOM-based testing |
Table 2: Total payload required to run experiments
| Tool | Total Payload (Observed) | How It’s Delivered |
Key Implication |
||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Convert | ~95-110KB (3-5 experiments) | Single upfront request | Predictable load, no runtime injection | ||||||||||||||||||||
| VWO | Up to ~254KB | Distributed across runtime requests | Payload hidden behind stub | ||||||||||||||||||||
| ABlyft | Up to ~280KB+ | Inline + injected scripts | Large total footprint despite small claim | ||||||||||||||||||||
| Mida.so | Not fully measurable (runtime configs) | API-driven config loading | Total cost deferred and opaque | ||||||||||||||||||||
| Webtrends Optimize | ~170KB (third-party benchmark) | Likely distributed | Limited transparency | ||||||||||||||||||||
| Visually.io | Unknown (configs not included) | Partial disclosure | Incomplete measurement | ||||||||||||||||||||
| Fibr.ai | Not disclosed | Unknown | Cannot verify total cost | ||||||||||||||||||||
| Amplitude Experiment | Varies (flag-based) | Lightweight decisions only | Not comparable to DOM testing |
2. Smaller initial scripts often defer the real cost
Smaller script sizes tend to rely on progressive loading. A lightweight script initializes quickly, experiment configurations are fetched from an API, variation code is loaded or injected after initial render, and additional requests appear in the network waterfall.
Payload is spread across multiple requests, so the initial script stays lightweight.
3. Architecture determines when performance impact occurs
Comparing script size is often inconsistent. Since you’re comparing systems that have different architectures. See here:
| Architecture Type | How It Works | When Payload Arrives |
Key Impact |
||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Embedded Bundle (Convert) | All experiments included in one payload | Upfront | Predictable load, no runtime injection | ||||||||||||||||||||
| Stub + API Config (VWO, Mida, ABlyft) | Loader fetches configs and variations | Distributed over time | Lower initial size, delayed execution | ||||||||||||||||||||
| Feature Flagging (Amplitude) | Returns variant decisions only | Minimal upfront | Not comparable for DOM-based testing |
4. The sync vs async trade-off shows up in every tool
The three different models have their trade-offs.
Synchronous loading tends to apply experiments before the page renders, which prevents flickers but impacts Core Web Vitals.
Asynchronous loading improves perceived performance but increases the risk of flicker since variation changes are applied after page render. Anti-flicker tries to fix this by hiding content temporarily, but this also comes at a performance cost.
How A/B Testing Tools Mislead About Snippet Size
When you hear about the snippet size, oftentimes the story is incomplete.
In their marketing, most A/B testing tool vendors refer only to the initial loader, the smallest possible representation of their system. As our investigation revealed, the real execution is handed off to subsequent requests, each with its own weight and impact on performance.
That’s why the smaller the advertised snippet, the more likely the payload is deferred.
This changes how you compare tools by their snippet size. The “smallest script size” metric just doesn’t work.
Why the “Smallest Script Size” Claim is a Weak Metric
Script size doesn’t look past the entry point for tools with the stub + API config architecture described earlier. It only captures a single moment in the lifecycle of an experiment.
You want to look at:
1. Total payload, not initial payload: Because what matters is the total amount of code required to run your tests. If part of the payload is deferred, the system still pays for it. Just later.
2. Timing of execution: If experiment logic is available early, changes happen before the page stabilizes. If it’s available later, changes can happen after the user has already seen something else. You’ll need powerful flicker prevention to mitigate the risk of skewing your test data. While it removes the visual transition, it delays what users see, making your page feel slower to load and still impacting UX.
Note: Performance metrics like Largest Contentful Paint (LCP) and Cumulative Layout Shift (CLS) in Core Web Vitals are impacted by how much code loads and when the page settles into its final state.
Here’s a more useful way to evaluate scripts:
Instead of asking which script is smallest, ask:
- What is the total payload required to run the experiment?
- When is that payload applied in the render lifecycle?
- Does the user ever see an intermediate state?
This is better than comparing kilobytes, and they hold up across architectures rather than favoring one implementation style over another.
How Convert Approaches Script Delivery Differently
Instead of a lightweight loader that fetches the rest later, Convert’s script delivers the full experimentation engine and all active experiences in a single bundle.
The script contains experiment logic, targeting, and variation code. No follow-up requests to assemble the experiment. This removes dependency on network timing during execution.
At its baseline, the Convert snippet is 93KB gzipped. With typical usage (3-5 experiments), this goes up to ~95-110KB. This is available upfront, not loaded later, which also means you’re not relying too heavily on flicker prevention and can count on a more predictable experiment behavior.
On the flipside, the trade-off is:
- Larger upfront payload than loader-based setups
- Sync loading can slightly impact Core Web Vitals
- Payload grows with the number of active experiments (you’ll need good housekeeping to keep things lean)
Conclusion
Smallest script size claims work because they’re easy to understand and compare. But it doesn’t tell you where execution begins.
When you compare the actual payload and the risk of flicker, you’ll assess true performance and data reliability.
Written By
Uwemedimo Usa
Edited By
Carmen Apostu

