2 min read

Is the number of letters in this post a random variable?

Statistical theory concerns random variables, but what is a random variable? We can think of a random variable purely formally as a construction for calculating probabilities. Informally, things are less clear. I think the number of letters in this post could be a random variable in multiple ways. If we want to talk about any as-yet-unknown number as being a random variable, it is worth being clear which way we mean.

Formally, a random variable is a function from the sample space (a.k.a. an arbitrary set) to the set of real numbers. The sample space is mostly uninteresting from a theoretical point of view, though, and it is often simpler to forget about it. Sometimes we abuse the notation slightly and say that, for example, \(X \sim \mathcal{N}(0, 1)\), meaning that \(X\) is a random variable having a standard normal distribution. It's an abuse because the sample space does not technically include a probability measure and neither does the random variable. This is a nice definition to have for doing math and calculating probabilities.

Informally, there are a few commonly used definitions:

  1. Statology defines a random variable as "a variable whose possible values are outcomes of a random process."
  2. Investopedia defines a random variable as (either) "a variable whose value is unknown," or the formal function definition. That page later states that "A random variable is one whose value is unknown a priori, or else is assigned a random variable based on some data generating process or mathematical function."
  3. There is the tickets-in-a-box model, described by user whuber on Cross Validated.

There are two senses in which the number of letters in this post could be a random variable:

  1. We could have a random variable representing the number of letters in this post specifically. Or rather, for the first post on this day of this year — this post specifically can't have a random number of letters, because by the time you read it, it will have a definite number of letters. Most of the uncertainty will be gone; what's left is in the counting and maybe in your mind.
  2. The number of letters in this post could be one possible outcome or realization of a random variable representing, say, the number of letters in a randomly selected post from this blog. Or the number of letters in a randomly selected post from a randomly selected Ghost blog. Or the number of letters in a randomly selected blog post from a randomly selected blog of any sort.

In other words, the number of letters in this post is not "a" random variable. The details of the generating process matter. If we want to talk about this outcome as a random variable, it is worth being clear which random variable and generating process we are talking about.