Counting more extreme orders
I've been drawing tarot cards without replacement as part of a project to learn the cards. Lately, I've been getting lots of Swords. I'm curious: Can I quantify how early the Swords are in this shuffling of my deck?
Eventually I will have drawn every card in the deck, for a total of 78 cards. So let's suppose I have drawn every card. That defines an ordering over the 78 cards. Can I quantify how heavily the swords are biased to fall at the beginning?
I'm going to reduce this to an easier case. Say I draw 2 apples and 8 oranges from an urn for a total of 10 fruits. I draw the apples not at the beginning but early on. I suspect the apples were near the top of the urn and that's why I drew them first. Can I quantify this? How extremely early did the apples show up in my drawing order?
Suppose I drew the fruits in the order (orange, apple, orange, apple, orange, orange, orange, orange, orange, orange). Call that ordering the draw sequence. I want to quantify how heavily the distribution of apples in my draw sequence is skewed toward the beginning of the draw sequence.
I'll define an apples-first draw score as follows. Let \(d\) be the draw sequence. Given a pair of positions \(i, j\) in \(d\), where \(0 \leq i < j < 10\), the apples-first pair score \(s(i, j; d)\) is 1 if \(i\) is an apple and \(j\) is an orange, 0 if both are apples or both are oranges, and 0 if \(i\) is an apple and \(j\) is an orange. Then the apples-first draw score is \(\sum_{0\leq i < j < 10} s(i, j; d)\).
This score has a few properties. First, it has a clear range. Second, under the assumption that we choose an ordering uniformly at random, all scores are equally likely.
This score has a clear range. It's maximized when we draw the two apples first—then it's \(16 = 8 \cdot 2\). The 2 apples come before every one of the 8 oranges, so we get 8 points for each apple. It's minimized when we draw the two apples last—in which case it's 0.
Not all scores are equally likely, even under the null where all sequences are equally likely. To see this, suppose we draw 4 oranges, 2 apples, and then 4 oranges. We can write this down as a string: ....AA....
. This sequence gets an apples-first draw score of 8 — we get 4 points for each apple. Now suppose we move the first apple earlier and the second apple later to get the draw sequence ...A..A...
. Now we get a score of 5 for the first apple and a score of 3 for the second, totaling 8. So, several meaningfully different sequences get assigned the same score.
This trick generalizes: We can dilate a pair of apples outward and keep the score the same. If we transpose an apple X with the \(i\)th orange preceding it, and another apple Y with the \(i\)th orange following it, our score increases by \(i\) for apple X and decreases by \(i\) for apple Y, so that our total score stays the same. We can also, of course, do this trick in reverse, pinching the apples together while keeping the score the same.
They don't have to dilate toward or away from the draw sequence center. Say we start with A..A.....
. This has a score of 8 + 6 = 14. We can move the two apples forward one and shrink the gap by 1, to get .AA.......
. This gets a score of 7 + 7 = 14. So, this works just as well moving two apples inward.
So, it looks like there are only 2 draw sequences that achieve a score of 14, up to internal reordering of the fruits. I think it's the same on the other side: There are only two sequences that achieve a score of 2.
The number of possible score-preserving reorderings like this seems dependent on the score. Suppose the score is \(S\). Then I suspect there are \(\left\lfloor \min(S, 16 - S) / 2 \right\rfloor + 1\) possible orderings. This formula says there are 5 possible orderings giving a score of 8, and that seems right. Manually counting, I see A........A
, .A......A.
, ..A....A..
, ...A..A...
, and ....AA....
. So, this formula seems plausible.
As a sanity check, does this give the correct number of sequences? There should be \(\binom{10}{2}\) drawing sequences up to same-fruit reorderings. This suggests there are \(5 + 4 + 4 + 4 + 4 + 3 + 3 + 3 + 3 + 2 + 2 + 2 + 2 + 1 + 1 + 1 + 1 = 5 + 4(4 + 3 + 2 + 1) = 5 + 4\cdot 4\cdot 5/2 = 5 + 40 = 45\), which, yes, is \(10\cdot9/2\). So that looks right.
So, assuming this is a correct counting, this gives an easy way to count the sequences more extreme in score. There are two issues left here:
- This doesn't yet give us a nice closed form for the probability of an equal or more extreme score under the null of all draw sequences being equally likely.
- I haven't figured out how the formula changes when we increase the number of apples relative to oranges.
- I haven't figured out how the formula changes when we add more fruits.
Parts (2) and (3) would help us generalize this back to the tarot case, and (1) would help quantify the degree of bias. But those things can wait for other days. This was chunky enough as it is.