Diving into Elo, 1
This is a quick overview of Elo rating systems as described by Wikipedia. I am going to outline the general system here and then outline some questions I had and answered. I'll hold off on the statistics because I haven't yet found the paper about those.
First, an outline of the general system:
- Everyone gets an initial "Elo score."
- After every "competitive event," the participants' Elo scores are updated according to the pair matchups that they played.
- Each player's Elo score is updated according to the difference between their "predicted" score before the event and their "real" event score, calculated over the pair matchups they played.
- We calculate a player's "real" score based on how many matchups they won, lost, or played to a draw. A win counts as 1 point, a draw counts as 0.5, and a loss counts as 0.
- The "predicted" event score is the sum of the predicted scores for each pair matchup.
- The predicted score for a pair matchup is a function of the difference in Elo scores between the two players.
Once might then wonder about the details:
- How does a rating organization assign an initial Elo score? Is there a default score for every starting player? Do we assign individually based on assessments? Somewhere in between? Different systems do different things. In Brawlhalla, apparently every player starts with an Elo score of 1200. In theory, one might have new entrants play a few "qualification games" with a member of the tournament organization team, who has a known rank or score, to assign the new player an initial Elo score.
- Do Elo scores have to be positive? They are in most examples, but the math doesn't seem to force scores to be positive. I don't think Chess has negative Elo scores, but it looks like some other systems do have negative scores.
- How often/when do we update scores? Or: What counts as a "competitive event"? The example on Wikipedia suggests that a full tournament could count. As I understand it, a rating organization would typically update scores in bulk after the tournament, rather than as the tournament is ongoing. However, it looks like Brawlhalla assigns and uses Elo scores on the basis of one-off games, and I think Lichess does something similar. These one-off games aren't part of a tournament, so Brawlhalla/Lichess would have to update the players' Elo scores after every game. Alternatively, they could collect results over time and update Elo scores on a daily, weekly, or monthly basis.
- If we want to calculate individual player Elo scores in for a team game, how do we define the "pair matchups"? How does Brawlhalla's 2v2 mode handle this? We have to count every pair of players in a game as a separate "pair matchup," right? Say Alice and Bob fight Charlie and Daniel in Brawlhalla. Do we treat this like a chess tournament where Alice separately fought Charlie and Daniel in 1v1s, and simply count the outcome of the game (win, loss or draw) twice when calculating Alice's "real score"?
- How exactly do we update? Wikipedia gives the formula: \[\mathrm{Elo}_{\mathrm{new}} := \mathrm{Elo}_\mathrm{old} + K\cdot (\mathrm{RealScore}) - \mathrm{ExpectedScore}\] The number \(K\) here is not fixed; it depends on the scoring organization's rules, which generally specify a value of \(K\) based on the specific player's history and circumstances. Wikipedia describes the USCF as having previously used different values of \(K\), or \(K\)-factors, depending on the player's Elo score, and notes they now use a more complicated formula. Wikipedia also describes FIDE as using a table of \(K\)-factors depending on how many games the player has played, how old they are, and whether they've ever achieved an Elo score of 2400 or above.
- How exactly do we calculate the predicted score? What logistic function specifically? The formula is: \[\mathrm{ExpectedScore}_\mathrm{Alice} := \frac{1}{1 + 10^{-\left(\mathrm{Elo}_\mathrm{Alice} - \mathrm{Elo}_\mathrm{Bob}\right)/400}}\] Note that I've slightly rewrittne the formula from Wikipedia so that when calculating Alice's expected score her Elo ranking appears first in the formula instead of Bob's. The two formulas should give the same number.