A/B TEST SIMULATION IN R

A/B TEST SIMULATION IN R
1

R

MEDIUM

last hacked on Jan 27, 2019

We did this project to learn about A/B testing and to be friends on a Sunday afternoon. --- From [Wikipedia](https://en.wikipedia.org/wiki/A/B_testing): > A/B testing (or split-testing) is a randomized experiment with two variants `A` and `B`. It includes application of statistical hypothesis testing (or two-sample hypothesis testing), as used in the field of statistics. A/B testing is a way to compare two versions of a single variable, typically by testing a subject's response to variant A against variant B, and determining which of the two variants is more effective. A/B testing can be powerful for determining whether (given a specific success metric) it's worth adding a product feature, implementing a workflow, among other things.

Steps:

What are we testing?

Implementation

Install and load packages

# install.packages("dplyr")
# install.packages("tibble")
library(dplyr)
library(tibble)

Set seed for experiment replicability

# set seed for experiment replicability
set.seed(100)

Sample size calculation

When we do a sample size calculation (with power.prop.test()), we input:

# calculate our sample size in group_a and group_b
power.prop.test(p1 = 0.7, p2 = 0.75, power = 0.8)
Two-sample comparison of proportions power calculation 

         n = 1250.717
        p1 = 0.7
        p2 = 0.75
 sig.level = 0.05
     power = 0.8
alternative = two.sided

NOTE: n is number in *each* group

From this we learn that each of the group samples needs to be at least 1251, each.

Mocking data for control group

# mocking data for group_a: control group
group_a <- tibble(
  user_id = seq(1, 3000, by = 2),
  test_group = rep(c("a"), 1500),
  completed_assessment = sample(
    c("completed", "not_completed"),
    1500,
    replace = TRUE,
    prob = c(0.7, 0.3)
  ),
  overall_score = rnorm(1500, mean = 100, sd = 15)
)

Mocking data for treatment group

# mocking data for group_b: treatment group
group_b <- tibble(
  user_id = seq(2, 3000, by = 2),
  test_group = rep(c("b"), 1500),
  completed_assessment = sample(
    c("completed", "not_completed"),
    1500,
    replace = TRUE,
    prob = c(0.65, 0.35)
  ),
  overall_score = rnorm(1500, mean = 98, sd = 12)
)

Exploring the control group data

glimpse(group_a)
Observations: 1,500
Variables: 4
$ user_id              <dbl> 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57,...
$ test_group           <chr> "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", ...
$ completed_assessment <chr> "completed", "completed", "completed", "completed", "completed", "completed", "not_completed", "completed", "c...
$ overall_score        <dbl> 99.16123, 94.41932, 107.11450, 99.07443, 68.08466, 84.96644, 92.23293, 133.15315, 104.87477, 80.84787, 131.179...
group_a %>% 
  count(completed_assessment)
+   count(completed_assessment)
# A tibble: 2 x 2
  completed_assessment     n
  <chr>                <int>
1 completed             1024
2 not_completed          476

Exploring the experiment group data

glimpse(group_b)
Observations: 1,500
Variables: 4
$ user_id              <dbl> 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58...
$ test_group           <chr> "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", ...
$ completed_assessment <chr> "not_completed", "completed", "completed", "completed", "completed", "completed", "completed", "completed", "c...
$ overall_score        <dbl> 97.10005, 86.63501, 98.00451, 79.70051, 99.18022, 79.68071, 82.09675, 105.24031, 96.82983, 96.36051, 109.07248...
group_b %>% 
  count(completed_assessment)
+   count(completed_assessment)
# A tibble: 2 x 2
  completed_assessment     n
  <chr>                <int>
1 completed              974
2 not_completed          526

Joining our control and experiment groups

group_a_b <- bind_rows(group_a, group_b)

Arraging data by user_id

arranged_group_a_b <- group_a_b %>% 
  arrange(user_id)
# A tibble: 3,000 x 4
   user_id test_group completed_assessment overall_score
     <dbl> <chr>      <chr>                        <dbl>
 1       1 a          completed                     99.2
 2       2 b          not_completed                 97.1
 3       3 a          completed                     94.4
 4       4 b          completed                     86.6
 5       5 a          completed                    107. 
 6       6 b          completed                     98.0
 7       7 a          completed                     99.1
 8       8 b          completed                     79.7
 9       9 a          completed                     68.1
10      10 b          completed                     99.2
# ... with 2,990 more rows

Generating table of completions by test group

prop_table <- table(
  arranged_group_a_b$test_group,
  arranged_group_a_b$completed_assessment
)

two-sample proportion test

prop.test(prop_table)
    2-sample test for equality of proportions with continuity correction

data:  prop_table
X-squared = 3.5979, df = 1, p-value = 0.05785
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.00106645  0.06773312
sample estimates:
   prop 1    prop 2 
0.6826667 0.6493333 

According to this p-value of 0.05785, we do not have enough evidence to reject the null hypothesis (Ho). We reject if p-value < 0.05. Evidently, it's not


script.R (all code):

# install.packages("tidyverse")
library(tibble)
library(dplyr)

# set seed for experiment replicability
set.seed(100)

# calculate our sample size in group_a and group_b
power.prop.test(p1 = 0.7, p2 = 0.75, power = 0.8)

# mocking data for group_a: control group
group_a <- tibble(
  user_id = seq(1, 3000, by = 2),
  test_group = rep(c("a"), 1500),
  completed_assessment = sample(
    c("completed", "not_completed"),
    1500,
    replace = TRUE,
    prob = c(0.7, 0.3)
  ),
  overall_score = rnorm(1500, mean = 100, sd = 15)
)

# mocking data for group_b: treatment group
group_b <- tibble(
  user_id = seq(2, 3000, by = 2),
  test_group = rep(c("b"), 1500),
  completed_assessment = sample(
    c("completed", "not_completed"),
    1500,
    replace = TRUE,
    prob = c(0.65, 0.35)
  ),
  overall_score = rnorm(1500, mean = 98, sd = 12)
)


# exploring group_a: the control
glimpse(group_a)
group_a %>% 
  count(completed_assessment)


# exploring group_b: the experiment
glimpse(group_b)
group_b %>% 
  count(completed_assessment)

# joining our control and experiment groups
group_a_b <- bind_rows(group_a, group_b)

# arraging data by user_id
arranged_group_a_b <- group_a_b %>% 
  arrange(user_id)

# preview of new working dataset
arranged_group_a_b

# generating table of completions by test group
prop_table <- table(
  arranged_group_a_b$test_group,
  arranged_group_a_b$completed_assessment
)

# two-sample proportion test
prop.test(prop_table)

COMMENTS







keep exploring!

back to all projects