Experimentation

Our experimentation roadmap

Experimentation is the gold standard for predicting the impact of a new idea, whether it's a new feature or a change to your user experience, infrastructure, operations, or predictive models. A/B testing, in particular, seems ubiquitous in online-first companies, but surprisingly few companies experiment systematically and even fewer do it well.1

We believe good experimentation is rare because it requires buy-in from and close collaboration between many teams: engineering, data engineering, data science & analytics, and relevant business units (design, CRM, product, marketing, etc).

Many references miss this. Methodology textbooks pay little attention to making experiments work in practice.2 Popular experiment platforms' educational material tends to the other extreme, glossing over statistical and engineering details to appeal to non-technical customers.3, 4, 5

Data scientists are often in an ideal position to bridge the gap because our work tends to put us in close communication with lots of other teams anyway. Experimentation can be hard work, and raising the bar for a whole organization is even harder. We hope that by sharing some of our hard-earned knowledge in this area we can make the challenge a little more approachable.

The Experiment Process

experiment process schematic


At a very high level, every experiment has four main phases, each of which is supported by an experimentation platform.

  1. Design and review. Assuming you have an idea in mind to test, the first step is to write an experiment design and specification document, which we sometimes call the experiment flight plan. This document defines what we want to learn, how we plan to learn it, and why we care. It is the source of truth for the experiment and a living document that should be shared between all responsible teams. There are often several iterations of planning and review within this stage.

  2. Implementation. The new ideas to be tested need to be implemented, which may involve engineering, design, and whatever business unit (operations, product, marketing, customer support, etc) owns the domain. Data engineering must ensure the observational units (e.g. users) are assigned to variants as planned, that metrics are instrumented properly, and that all data will make its way to the data warehouse.

  3. Execution. The treatments under test are applied to subjects, and metrics are recorded. Both optimization and guardrail metrics like website loading speed are monitored closely for signs of trouble, and the experiment may be halted if necessary.

  4. Analyze and make decisions. The recorded data is joined, cleaned, and analyzed according to the experiment flight plan. Ultimately, the purpose of an experiment is to make a go/no-go decision whether to launch the new idea more broadly, informed by the experiment analysis. This stage also sometimes cycles between analysis and decision-making work.

  5. The experiment platform. All of the four phases build on this platform, whether it's a unified off-the-shelf product, or a collection of general-purpose tools (e.g. documents for planning, ticketing for engineering requests, notebooks and scripts for analysis, etc). A good platform increases experiment speed and reliability by standardizing and templatizing processes, automating common statistical designs and analyses, and providing a central repository for experiment results.

Our roadmap

To start, we have three pieces in the pipeline:

  • An experimentation Rosetta Stone. One of the first challenges with experimentation is that each community and organization uses different terminology. We are compiling a list to help translate.

  • The Experiment Flight Plan in detail. A checklist for things to think about when designing an experiment.

  • A comparison of experimentation platforms, including Optimizely, VWO, and the pros and cons of the DIY approach.

Experimentation is a deep and complex topic, and we have learned enough to know there's always more to learn. Please reach out if you have feedback about errors, omissions, or suggestions.

Notes and References

  1. Stefan Thomke, 2020. Building a Culture of Experimentation. Harvard Business Review.

  2. Kohavi, Tang, and Xu's Trustworthy Online Controlled Experiments—which we recommended in late February—is a wonderful exception that does balance statistical methodology with organizational and cultural requirements. It is the exception that proves the rule, in our opinion.

  3. VWO. A/B Testing Guide.

  4. Optimizely. Create a basic experiment plan

  5. Optimizely. Create an advanced experiment plan and QA checklist