A while ago, the popular data journalism site 538 posted a challenging probability puzzle:

On the table in front of you are two coins. They look and feel identical, but you know one of them has been doctored. The fair coin comes up heads half the time while the doctored coin comes up heads 60 percent of the time. How many flips — you must flip both coins at once, one with each hand — would you need to give yourself a 95 percent chance of correctly identifying the doctored coin?

The question proved so difficult, in fact, that 538’s talented puzzle master Oliver Roeder gave an incorrect answer. Someone smarter than me noticed this, and then we worked together to verify with r/statistics and notify those who may have cared (but didn’t), including the authors of a paper Oliver cited. My question to Reddit contains R code that traces exactly where they went wrong.

This kind of puzzle is a classic in statistics text books because it uses a trivial problem - flipping coins - as an example of more meaningful questions. The solutions are often given by comparing hypothetical distributions.

But I wanted to think of this less abstractly: what if you really were sitting in front of two coins, knowing that one has a slight bias? What would be the most efficient way to find out and how long would it take?

### Reverend Bayes Confronts the Coins

The first thought I had reading the 538 post was, “this is perfect for sequential Bayesian updating.” The idea undergirding most of Bayesian statistics is to update your prior probabilities with new data, e.g., start with a belief and then change it on every coin toss.

In this case, we can begin with a fairly strong prior. We know that at least one of the two coins is biased, so the odds are 50/50. If we quickly used Bayes’ Theorem after each flip, how long would it take us to choose the biased coin with 95% confidence?

This is not a trivial simulation. In theory, you could go on flipping the same side of both coins in perpetuity. Given that, and the subtle difference in the coins, I opted for a simulation that lasted 2 million games. Each game consisted of a speedy specter of Bayes, who flipped coins and updated probabilities until he was certain which one was fair.

As it turns out, this method took Bayes-bot an average of 74 flips. The cool thing, however, is that it is possible to be 96% confident in as few as eight flips, if the coins land a certain way. The median is 60, and the max in this simulation was 704.

Notice that the correct answer to the original question is 134 flips, which are needed given that the distribution has a long right tail. But it’s neat to see how often in practice Bayes could beat the theoretical number of flips needed. In this case, 88% of the time, Bayes knew before 134 flips.

```
library(tidyverse)
#### Method two: Bayesian updating simulation ####
# H = coin1 = fair
update <- function(prior, coin_randomizer) {
# This function updates our prior probabilities after looking at both coins
# Flip
# The biased coin is randomly selected.
# They both return .4 or .6 because we plug these figures into Bayes' theorem below
# But actual heads/tails probabilities are either 50/50 or 60/40
flip_coin1 <- sample(c(.4,.6), 1, prob = c(coin_randomizer[1], 1 - coin_randomizer[1]))
flip_coin2 <- sample(c(.4,.6), 1, prob = c(coin_randomizer[2], 1 - coin_randomizer[2]))
# Hypothesis given the data
# Bayes Theorum = prob(A|X) =
# Numerator
# P(X|A)*P(A) /
# Denominator
# P(X) = P(X|A)*P(A) + P(X|not A) * P(not A)
# Prob coin1 is fair given flip1
posterior_after_flip1 <- (.50 * prior) / ((.50 * prior) + flip_coin1 * (1 - prior))
# Prob coin1 is fair given flip2 and the priors informed by flip1
posterior_after_flip2 <- (flip_coin2 * posterior_after_flip1) /
((flip_coin2 * posterior_after_flip1) + .50 * (1 - posterior_after_flip1))
return(posterior_after_flip2)
}
simulate <- function(x) {
# We randomly assign the coins 1x per simulation
coin_randomizer <- sample(c(.5, .4), 2)
# Keep track for checking later
is_fair_coin1 <- if_else(coin_randomizer[1] == .5, 1, .1)
# Start here
n_flips <- 0
prior <- .5
while(prior >= .05 & prior <= .95){
prior <- update(prior, coin_randomizer)
n_flips <- n_flips + 1
}
# Just for updating progress
print(x)
# Return how long it took to be 95% sure
return(n_flips)
}
n_sims <- 2e6
sim_results <- 1:n_sims %>%
map(function(x) simulate(x)) %>%
unlist()
summary(sim_results)
hist(sim_results)
sum(sim_results < 134)
# Let's make sure we get the right # of true positives
test_the_sim <- function(x) {
# We randomly assign the coins 1x per simulation
coin_randomizer <- sample(c(.5, .4), 2)
# Keep track for checking later
is_fair_coin1 <- if_else(coin_randomizer[1] == .5, 1, .1)
# Start here
n_flips <- 0
prior <- .5
while(prior >= .05 & prior <= .95){
prior <- update(prior, coin_randomizer)
n_flips <- n_flips + 1
}
# Just for updating progress
print(x)
return(prior * is_fair_coin1)
}
# Don't need as many tests
n_tests <- 20000
test_results <- 1:n_tests %>%
map(function(x) test_the_sim(x)) %>%
unlist()
true_positivies <- sum(test_results > .95)
true_negatives <- sum(test_results < .004)
(true_negatives + true_positivies) / n_tests
# !It works 96% of the time
```