What is the null probability of a distribution of straights in a hand of cards? 1/?
(In which I go insane trying to figure out the answer. Don't mind the gaps.)
(Previously: One, two, three.)
In Max Shinn's post about optimally shuffling cards, he describes looking at the "joint distribution of these frequencies compared to a null distribution." This is slightly ambiguous, but I'm going to interpret this in a similar way as before in previous posts: As the probability mass for the observed frequency of ranks in the hand under the null "uniform without replacement" distribution. Again, in Shinn's post, I'll assume we are drawing a hand of exactly 6 cards. How do we compute the probability mass of the (maximal) rank-clusters/straights that were drawn?
This is trickier compared with the suit and rank distribution probabilities. I'll go through the naive approach first and then explain why it doesn't work.
Let's say we want to calculate the probability of getting say N = 1 straights of threes and K = 3 single cards using a naive approach. This seems pretty straightforward. To draw the straight of three, we need to draw any of the 11 ranks that can start a straight of three — so any of 44 cards — then draw the next rank, then the next, so there are 11*4*4*3! ways of drawing this straight. Two of these are special in a way we'll come back to. But let's say for now we draw a straight starting with a rank 2 card:
The blue box represents the straights of three we might have drawn. Whichever straight of three we drew, it'll have one card in each of the blue-boxed columns. The pink box shows the columns (ranks) we could draw our next single from. In total there are 13 - 5 = 8 possible ranks to draw from, or 32 possible singles we c ould draw.
Suppose we draw a rank 7 card next:
Now we have just 8 - 3 = 5 possible ranks to draw from, or 20 cards. If we draw a 10 next, then we can only draw from Kings or Queens next — that is, 5 - 3 = 2 possible ranks or 8 possible cards.
So, here's the first problem: A staight of \(N\) cards can "use up" or "block off" either \(N + 1\) or \(N + 2\) ranks. If we draw "on the edge of our current space of possibilities," it'll use \(N + 1\) ranks. Otherwise, it'll use \(N + 2\).
(A related, slightly annoying problem is that we have to somehow account for the difference between low cards and high cards. If you draw an ace, there's only one way for that to "expand out" into a straight. But since we don't care about draw order, I think we don't actually have to do this. Intuitively, we should be able to calculate the probability as if we drew the lowest card in the sequence first, ignoring the actual order we draw the cards in. Then we can just multiply to account for not caring about the draw order.)
If we draw a 6 first, then the straight of three from 2, the straight of three again uses only \(N + 1\) possible cards. It's almost like we only need to care about the spacing between consecutive "tiles" and between tiles and the edge. So maybe we don't have to care about the order in which we draw the "tiles..."
So, this problem suggests its own solution: Count the number of possible ways to "lay out the tiles" and then multiply. Specifically, multiply by a factor to account for the number of different ways to "fill in" each "tile" with specific cards from the ranks spanned by that tile. That sounds like it should be pretty simple...
Unfortunately, now we run into the other problem: You can have up to \(4^N\) rank-identical straights of length \(N\). That is: Multiple straights with exactly the same ranks, but slightly different cards.
For example, say we draw an ace of hearts, a two of diamonds, threes of clubs and of spades, a five of hearts, and a seven of clubs. We have two size-three straights:
- (ace of hearts, two of diamonds, three of clubs), and
- (ace of hearts, two of diamonds, three of spades).
These overlap—they share the ace of hearts and the two of diamonds. So, technically this should count as three straights. But this makes it hard to think about the probability calculation.
If we see a hand with two straights of three, there are actually (roughly speaking) two ways to get there. One's easier. First, we can get two straights with disjoint ranks: say (2, 3, 4) and (6, 7, 8). Second, and easier, we can get (2, 3, 4 of hearts) and (2, 3, 4 of clubs). So we have to account for both the easier possibility and the harder possibility in our calculation, and this makes the calculation more annoying to think about.
Making things even worse: It is way easier to draw 2 mingled straights of 3 than 2 disjoint straights of 3. Once we have (2, 3, 4), there are 9 cards we can draw that will give us another straight via mingling, but to draw a disjoint straight we have to get much luckier: We have to draw one of 7 possible start cards, then the next card, then the next card, so that there are only 7 * 3! = 42 ways to draw another disjoint triple, vs. 9*8*6*3! = 288 ways to draw a mingled triple and two singles.
(Another way mingled triples are annoying is that we can't just sum over products (straight size * number of straights of that size) and get the hand size back.)
Any time we have multiple nontrivial straights of the same length, we have to account for both the "disjoint straights" way of drawing all those straights, and the "identical ranks" way of drawing all those. And what's worse is that we might easily have both disjoint and mingled straights.
(This would be forced to happen if we drew say 9 pairs — there's no way to draw 9 pairs as only disjoint or only mingled pairs. Fortunately, with our hand size of 6, we don't have to worry about that.)
I suspect there's a straightforward way to do this using a "tree-based" way of counting, but I suspect it's going to be ugly. It may not lead to a nice closed-form formula that works without special-casing on straight length. But it might work.