Bangda Sun

Practice makes perfect

AB Testing (Udacity) Learning Notes (2)

Choosing and Characterizing Metrics.

1. Metrics Basis

1.1 Definition

First think what you are going to do with the metrics, how you are going to use the metrics - before you are going to define them. If you think about how you are going to use the metric, there are two use cases:

  1. invariant check (sanity check): these are the metrics that should not change across the experiment and control groups. They are used to make sure the experiment is running properly.
  2. evaluation.

For evaluation, the first thing to think of is from high level business metric: how much revenue make, what is the market share, how many active users, etc.

The overall definition could be just one sentence: metrics are summary of the data. But many details need to be clarified before calculation.

Single metric or multiple metrics?

  • single metric: could be over-optimized, which means over-looked one single metric and don’t look other things moving
  • multiple metrics: hard to analyze and draw conclusions.

1.2 Examples

For Audacity example, each step in the customer funnel can derive plenty of metrics:

  1. Exploring site
    • number of users who view course list
    • number of users who view course details
  2. Create account
    • number of users who enroll a course
    • number of users who finish lesson 1, 2, etc
    • number of users who sign up for coaching
  3. Complete course
    • number of users who enroll in 2nd course
    • number of users who get jobs after completing courses

2. Collecting Data

Next steps are determine what data to look at to calculate the metrics, more specifically, what events to count? what filters should be applied on data? what are the time frame used on data (1 day or 1 week); how to summarize the data, this should be intuitive.

2.1 Filtering and Segmentation

Filtering includes: user demographic (age, gender, etc), country / region, language, platform, app, etc. The goal of filtering is de-bias the data (abuse, fraud in data), increase the sensitivity and power of AB testing.

Before determining whether to apply filters or not, we can calculate the metrics on slices of the data, and compare the results; we can also check day by day or week by week to spot something that looks unusual.

3. Summary of Metrics

The selection of metrics need to consider both sensitivity and robustness. Sensitivity means the metrics should be able to catch the changes. Robustness means the metrics should not change too much from unrelated changes.

3.1 Categories of Metrics

There are 4 types of metrics listed:

  1. sum and counts
  2. distributional metrics (mean, median, percentiles)
  3. probabilities and rates
  4. ratio