Experimentation, aka A/B testing, is the gold standard for reliably predicting the impact of proposed changes. Getting it right is fiendishly hard; we're here to help.

How to analyze a staged rollout experiment

Recently we argued that confidence intervals are a poor choice for analyzing staged rollout experiments. In this article, we show a better way: a Bayesian approach that gives decision-makers the answers they really need. Check out our interactive Streamlit app first, then the article for the details.

No, your confidence interval is not a worst-case analysis

Confidence intervals are one of the most misunderstood concepts in statistics. Common sense says the lower bound of a confidence interval is a good estimate of the worst-case outcome, but the definition of the confidence interval doesn't allow us to make this claim. Or does it? Let's take a look.

Announcing ABGlossary, an experiment vocab translator

Defining terms is a key part of experimentation culture. The consequence, however, is that every community has its own experiment jargon, which makes it hard to spot patterns, let alone communicate across groups. We've created a lightweight tool called ABGlossary to help translate experiment vocab.

Choose your experiment platform wisely

To build a culture of fast, reliable, evidence-based innovation, you need an experiment platform. These tools support each stage of the experiment process and, done well, become the beating heart of your infrastructure.

Our experimentation roadmap

Experimentation is the gold standard for predicting the impact of a new idea. Simple designs like A/B testing sound easy but are fiendishly hard to get right. Data scientists are uniquely positioned to solve the challenge and help their companies develop the experimentation muscle.