Confidence-graded A/B testing

Winners are decided by replies.
Not opens.

Most A/B tests call a winner off a blip in open rate, days too early. This one waits for real statistical confidence on what actually matters — genuine positive replies — then promotes the winner automatically.

No guessing. No calling it early because someone's impatient. Just a confident answer, once there's actually enough data to have one.

Get started See everything Compass does →

Most A/B tests are open-rate theater.

A subject line that gets opened more isn't a winner — it's just curious. The only test that matters is whether the email underneath actually gets a real reply.

Called too early

is how most A/B tools decide

One subject line edges ahead in opens after 20 sends, and a winner gets crowned on noise.

Curiosity, not interest

is what open rate actually measures

A clickbait subject line on a weak offer still gets opened — and still gets no replies.

No real confidence

behind most "statistically significant" calls

A binary yes/no significance test on a small sample tells you less than it sounds like it does.

How the testing engine actually works.

No manual tracking, no spreadsheet, no deciding when to call it. It runs itself, with discipline.

STEP 01

Real positive replies decide the winner — not opens

What: Every reply is classified. Only genuine interest and meeting requests count toward a "win." Opens and clicks never do.

Why it matters: A subject line can get opened by everyone and still sit on a dead offer. Replies are the only signal that someone actually cared.

STEP 02

It waits for real volume — at least 100 sends a side, 5 days

What: No variant gets crowned off 20 sends and a lucky morning. The test runs until both arms clear a minimum send count and a minimum number of days.

Why it matters: Calling a winner early is the single most common A/B testing mistake. Noise looks like a pattern until you have enough data to tell the difference.

STEP 03

Confidence, not a coin-flip significance test

What: Instead of a binary yes/no significance check, it tracks a Bayesian confidence read — how sure the data makes it that B genuinely beats A — and only acts once that certainty is high.

Why it matters: A test that's 51% likely to be right isn't a result. One the engine is genuinely confident about is.

STEP 04

The winner gets promoted — and the lesson sticks

What: Once confident, the winning copy is promoted automatically and the losing variant retires. The result is saved as a lesson for future campaigns, and the engine keeps watching the winner afterward in case performance drifts.

Why it matters: A test result that disappears the moment the test ends has no compounding value. This one doesn't disappear.

This testing engine vs. typical ESP A/B testing vs. a spreadsheet.

Capability	Typical ESP A/B	Manual / spreadsheet
Winner decided by positive replies, not opens	×	if you track it yourself
Waits for real statistical confidence	sometimes	×
Auto-promotes the winner	×	×
Watches the winner after the call for drift	×	×
Feeds the result into future campaigns	×	×

Every test result connects to the rest of Compass.

Compass Brain

Every confident test result gets written back as a lesson, so the next campaign doesn't have to re-learn what already won.

Fix It with AI

Made a Fix It change and not sure it actually helped? Test it against your current copy and let real replies settle it.

Questions about A/B Testing

How long does a test actually take?

At least 100 sends per variant and 5 days, whichever takes longer to reach. Below that, a result is noise dressed up as a finding. Higher-volume campaigns clear the bar faster; lower-volume ones take longer — the engine doesn't shortcut the wait just because someone's curious.

What happens to the losing variant?

It's retired in favor of the winner once the engine is confident, but it's not wasted — the result gets recorded as a lesson, so the same losing approach isn't quietly tried again in a future campaign.

Does open rate matter at all, if it's not the win metric?

Yes — it's tracked as a deliverability smoke test (a subject line that nobody opens usually means a spam or inbox-placement problem, not a copy problem). It's just never the metric that decides a winner. That's reserved for real positive replies.

Which parts of my campaign can be tested?

Any step in your sequence — subject lines, body copy, calls to action. Compass surfaces a test only once there's enough lead volume in that sub-campaign to make the result meaningful, rather than testing for the sake of testing.

Concierge beta available

Let real replies decide. Not guesses.

A/B testing comes with every Compass campaign — no separate setup, no extra toggle to find.

Book a free setup call Get started

Starter $99/mo · Pro $199/mo · AI included · No free plan

Winners are decided by replies.Not opens.