Mridul Gupta

Extraordinary claims require extraordinary evidence.

Where intuition fails, logic prevails. Penney Ante, or Penney's Game is such an example where human intuition fails. Introduced in 1969 by mathematician Walter Penney, this game is a betting game hence its name Penney Ante. And since its introduction, the game has fascinated probability theorists and puzzle enthusiasts alike. And just like all great probability games, it makes no sense to humans, it's only understood by the calculators 😂.

In the following, I will introduce the rules of the game along with what objectives players must achieve to win, then I will simulate the game and showcase its results when players choose certain strategies, and finally I will walk through some explanations of why some strategies win over others.

1 Penney Ante

Penney Ante is a two player game. Two players are told to choose a sequence of heads or tails of length three. This sequence is called a string. A coin is tossed indefinitely, or until one of the player's sequence appears. Whoever's sequence appears first wins.

What does tossing a coin indefinitely until a certain sequence appears means? Say we are looking for "heads, tails, heads" (abbr. \(HTH\)). We start tossing the coin. If we see an \(H\) followed by a \(T\) followed by another \(H\) right away, we stop there and say we have seen the pattern in the first three tosses. Otherwise, we keep tossing until we see the pattern. In one particular sequence of tosses of the coin we might see \(HHTTHTH\), then we have seen the pattern \(HTH\) after 7 tosses. It might take longer, or shorter to see the coin. That is not our concern. We just keep tossing until we see our desired pattern (or one of the desired patterns in our game).

1.1 Why is this interesting?

There are eight possible strings to choose from, \(s_1=HHH\), \(s_2=HHT\), \(s_3=HTH\), \(s_4=HTT\), \(s_5=THH\), \(s_6=THT\), \(s_7=TTH\), and \(s_8=TTT\). If we toss a coin three times, all of these sequences are equally likely to appear. What this means is that all of them have a one in eight change to appear. So, naturally, a human player would expect that it doesn't matter what they choose, the probability of winning against an opponent would be just \(50\%\). So, where it gets weird is that the this not true when you continue tossing the coin until you see one of the two sequences that correspond to one of the players. Infinite retries change the dynamics of the game. What that means is that there is a strategy to choose a string such that your chances of winning is always \(\gt 50\%\). As we will see, the second player to choose the string can always choose a string that is more than \(50\%\) likely to show up before the first player's.

This is easy to see in some cases, for example if player one (we'll call her Alice from now on) chooses \(s_1=HHH\) and player two (say, Bob) chooses \(s_5=THH\). It is not that difficult to see that Alice only wins if the first three coin tosses result in heads, otherwise we know that a tails has appeared either in the first coin toss, the second, or the third. And now, before Alice can win by getting three heads, there will always be a tails that precedes those three heads. And Bob just needs a tails followed by two heads to win. Thus, the chances of winning of Alice in this case is one in eight.

But it is not so trivial to see in other scenarios.

2 Simulations and The Best Strategy

Before going further, I really want to convince you that there is always a better than \(50\%\) strategy. I made my computer simulate games between Alice and Bob with different strategies, and making them play against each other in 1000 games. As expected \(HHH\) wins roughly one in eight games or \(1000\div 8\approx 125\) games against \(THH\). And by symmetry, \(TTT\) wins \(\approx 125\) games against \(HTT\). There are 64 games in total, and they are summarised in the following table. The way to read this table is like so: if you want to find out how many games does string \(s\) win over string \(r\), you find the column of \(s\) from the x-axis, and the row of \(r\) from the y-axis and read the entry in the cell at the intersection of these row and column. Remember, the numbers are number of games string 2 wins over string 1. For example, \(HHH\) wins \(489\) games against \(HHT\); and \(HTH\) wins \(497\) games against \(THT\).

What's important to note is that if we know the opponent's strategy (String 1) in advance, we can always select the better strategy as the darkest square (the colors are inverted in dark mode for visibility, please switch to light mode for correct interpretation, or think "light" when I say "dark" and vice-versa) in the row that minimizes the number of wins of String 1. This means that there is no universally best strategy. Rock beats paper, scissor beats paper, but rock beats scissor.

Given a string, the best way to find the best string for a player is not by simulating \(64000\) games and then trying to find the cell with the highest number. Instead, mathematicians like to calculate the exact winning probabilities using methods that can computed on paper. The best strategy, it turns out, for Alice given that Bob selected string \(s\) is to take the center character of the string and flip it and then append the first two characters of Bob's string \(s\). So, if \(s=HTH\), then Alice first takes the middle character which is \(T\) and flips it into an \(H\). She then appends the first two characters of \(s\), which is \(HT\), to it. This gives \(HHT\), which is what Alice should choose to maximize her winning chances against Bob if he chooses \(HTH\). This can be verified from the simulation results, \(HHT\) wins \(662\) games out of \(1000\) games played against \(HTH\).

Please note that the pattern above is purely accidental, and has resulted from finding the best strategy against any strategy using probability calculations. The pattern just appears in this particular case, but if Alice and Bob were told to choose strings of length five, for example, the same pattern wouldn't work. The correct way to go about it is to go through the calculations. Once you have done the calculations, then it is okay to summarise that into a pattern like the one above if you find one.

The following table summarizes the answer to the question which string should Alice choose given Bob chose a string before her to maximize her chances of winning.

Best strategy for Alice
#	Bob	Alice
1	\(HHH\)	\(THH\)
2	\(HHT\)	\(THH\)
3	\(HTH\)	\(HHT\)
4	\(HTT\)	\(HHT\)
5	\(THH\)	\(TTH\)
6	\(THT\)	\(TTH\)
7	\(TTH\)	\(HTT\)
8	\(TTT\)	\(HTT\)

Now that you know how to win any game, I'll briefly talk about how the math is calculated because it is very interesting, and will give some intuition why some strings win while others lose.

3 Maths: The Fun Part

3.1 Expected Waiting Time

Before I outline the method to compute the probability of Alice winning with string \(s_A\) given that Bob chose \(s_B\), I want to talk about a concept called expected waiting time. The expected waiting time of a string \(s\) is the expected number of coin tosses that must be made to see the string. This is a number, let's call it \(\mathbb{E}(W(s))\) where \(\mathbb{E}\) signifies expected, and \(W(s)\) signifies the waiting time of string \(s\). The expected part means that if we play this game multiple times, and record after how many tosses we see the string \(s\) then the average of those numbers is close to this \(\mathbb{E}(W(s))\), and the distance from this number decreases as the number of games over which it is averaged becomes large.

To do so, we will need to model the state of the world and how it evolves using a state graph. This will help us answer the question, "how much of our desired sequence have we seen so far?" Let's start with an easy example. Say we are playing this game with strings of length one instead of strings of length three. And we want to find out the expected waiting time for \(H\). Then we can draw the following state transition diagram.

Our states are the red circles (colors are inverted in dark mode), and the arrows between them are the transitions. The state with no outgoing arrows is the final state, the arrow coming from no other state marks the initial state, or the start state. We have either not seen any \(H\)'s, or we have seen one, and our search ends. When we see \(T\)'s, from our perspective, it's like we are still at the initial state. So we will reuse the start state. The goal is to reduce repetition so that computations can be performed easily. More precisely, in the initial state when we haven't seen any \(H\)'s, we can either see a \(T\) and stay in the same state, or see an \(H\) and move to the second state. The movement from state to state is probabilistic, and we might take any arrow with \(1/2\) probability.

Now, we can extend it to the case of strings of length three. Say, we are analysing the wait time of \(s=HHH\). This will make the following diagram.

We either have seen an \(H\), \(HH\), or \(HHH\). Or, we see a \(T\) in between and undo all the work and start from the beginning. It's like playing a game of snakes and ladders 🐍🪜🎲.

The expected waiting time then is the expected time to reach \(HHH\) from the start state. To compute this notice that from any state \(\alpha\), we must take 1 step to go to one of two possible states \(\beta\) or \(\gamma\) with probabilities equal to \(1/2\). So, half the time the time to reach \(HHH\) from \(\alpha\) will be \(1+\mathbb{E}(\beta)\), and the other half times it will be \(1+\mathbb{E}(\gamma)\). On average, it is not difficult to see that \(\mathbb{E}(\alpha)=1+\mathbb{E}(\beta)/2+\mathbb{E}(\gamma)/2\).

In the above mathematical expression, we can put in different states for \(\alpha, \beta,\) and \(\gamma\). There is one exception, when the state is \(HHH\), the expected time to reach \(HHH\) is just \(0\). To make it easier to write, I'll call the expected time to reach from the start state \(x=\mathbb{E}(\varnothing)\), \(y=\mathbb{E}(H)\), \(z=\mathbb{E}(HH)\), and \(0=\mathbb{E}(HHH)\). Thus, we need to solve the equations \(x=1+x/2+y/2\), \(y=1+x/2+z/2\), and \(z=1+x/2+0/2\) to get the expected times to reach the final state \(HHH\). We will have the value of \(x\), which is the expected waiting time of the string \(HHH\). I will not go through the steps, but the solution is \(x\)\(=\mathbb{E}(\varnothing)\)\(=\mathbb{E}(W(HHH))=14\). Try tossing a coin and see if this is true by averaging over, say, 10 times.

Let's see what happens when we consider another string, say, \(THH\). We get a different state transition diagram, the one below.

You can see that the snakes don't take you back so far. By making mistakes, you don't undo your work so much. In fact, the expected waiting time computed using calculations same as above for this string is \(8\).

Wow! No wonder \(THH\) defeats \(HHH\).

3.2 Alice vs. Bob

Finally, it's time to see what happens when Alice plays against Bob. When two strings compete, the dynamics of the game change. The interaction of two strings in this way creates new phenomena that don't happen when waiting for a single sequence to appear. The appearance of parts of one sequence affects the chances of the other. This interaction becomes clear when we look at the state transition diagram of the game.

Continuing with our example, let's say Bob chooses \(s_B=HHH\), and we say based on waiting times that \(s_A=THH\) is the best string for Alice in this case. Let's verify that it really is the better of the two in the setting of the game when Alice and Bob play against each other. Looking at it from the lens of adversarial games, we will draw this game's state transition diagram something like this.

Notice that it's just the previous two diagrams, with the two chains merged into one diagram. The two chains branch off of the green state (colors are inverted in dark mode), the start state. It is immediately obvious that if you ever see a tails, you exit the \(HHH\) branch to never return.

Now, defining waiting time of two strings that are competing against each other is meaningless. So, instead we will look at the probability of seeing one string before the other. In our state transition diagram, this corresponds to the probability of hitting one of the two states (\(HHH\) or \(THH\)) starting from the green state. Notice that from any state, we go to one two states with a \(1/2\) probability. So, the probability of hitting \(HHH\), say, from any state has to be the sum of these probabilities. This is in complete analogy with how we went about calculating the expected waiting time. Now, the expression is \(P(\alpha)=P(\beta)/2+P(\gamma)/2\). The only exception to this rule will be the probability of hitting \(HHH\) from \(HHH\) is going to be \(1\) and the probability of hitting the \(HHH\) state from \(THH\) is going to be \(0\).

We will again be using some shorthands, \(v=P(\varnothing),\) \(w=P(H),\) \(x=P(HH)\), \(y=P(T),\) \(z=P(TH),\) \(1=P(HHH),\) and \(0=P(THH)\). We then need to solve the following equations: \(v=w/2+y/2\), \(w=x/2+y/2\), \(x=y/2+1/2\), \(y=y/2+z/2\), and \(z=y/2+0/2\). And doing the math shows us that \(v=P(HHH)=1/8\). That's no surprise, we already saw this in section 1. But the power of this method is that this can be applied to more complicated cases that are not so straight-forward to argue about without state transition diagrams. I invite you to create state transition diagrams for other cases and compute probabilities of winning of one sequence over others listed in the table above. I will give some of the values and invite you to compute the rest.

Probability that Alice wins if she follows the best strategy
Bob	Alice	\(P\)(Alice wins)
\(HHH\)	\(THH\)	\(7/8\)
\(HHT\)	\(THH\)	\(3/4\)
\(HTH\)	\(HHT\)	\(2/3\)
\(HTT\)	\(HHT\)	\(2/3\)

Conclusion

That is all I wanna talk about today. I love how weird probability is until you put pen to paper. Then what's on paper makes sense, yet the reality doesn't make any more sense than it did before.