3 min read

"True" parameters

I suspect the idea of a "true" parameter is sort of a lie. A random Cross-Validated/Stacks Stack Exchange person agrees with me, and the other answer is sort of similar, so I don't think this is too outlandish. It is sort of true enough for some standard cases like population-level adult height statistics. It is less true of other things like political polling statistics. I suspect "true" parameters are meaningless when interpreted too broadly in other contexts, like comparing natural language processing algorithms.

Consider the case of political polling, where we want to determine for example what percentage of Xorpians plan to vote for Vorpnu vs. Glork for president. Xorpians, like many hypothetical creatures, are just humans in funny rubber suits. Like many parameters we might want to learn about, we know this one is not and cannot be stable over time. Everyone* has changes their opinion once in a while, and sometimes an event, like a scandal, makes lots of people change their opinion over a short amount of time. It's also dubious to talk about a "true" parameter at some given moment in time – there are lots of things that could affect peoples' opinions over time, but no polling organizations that I know of coordinate to poll people even close to simultaneously. It's theoretically possible, but sounds expensive, and might be pointless.

I suppose if you polled the entire relevant population literally simultaneously then that would count, but even that seems a bit fishy. If you ask at the same instant, each respondent will answer in their own time, which means it's not simultaneous. You can't force respondents to also answer at exactly the same time. (You can try and set an upper bound beyond which their answer won't be counted. You can't set a lower bound.) To make this make sense we have to assume that their beliefs don't change too heavily on average over the time between question and response. But it seems there can be no fact-of-the-matter about the answers people would give if they all were asked and all answered simultaneously.

(Some people may be on the fence so that there's no fact of the matter about who they will vote for. To the extent that there are not too many Chidi Anagonyes to worry about, though, this shouldn't be a problem. In any case it isn't a show-stopper: We can expect these people to drive up the sampling variance of the statistic, or uncertainty. That makes the statistic less useful, but not yet meaningless as long as the populations for/against remain roughly stable in size, if not in the specific people they are made up of.)

There's also the issue of beliefs (in this sense) being fake. We're avoiding that for now by focusing on "answering polls," which is not about beliefs, and by using that to approximate how people will vote, which is also not about beliefs.

Why do we want to assume such "true" parameters exist? That Cross-Validated user, Christopher Hennig, suggests it's because it is useful for "doing theory" and developing methods. It lets us prove things that in theory would be true if there were a true parameter. That theory can motivate the methods. I think theory motivates methods, to the extent that it does, by intuitive argument rather than formal argument. A formal argument may be too technical-yet-confused to be useful.

The better question I think is: When and why does this silly idea of "true" parameters work out for us? One answer might be: Forget about true parameters. "True" parameters are kind of a joke. The reason they are a useful joke to tell is that in practice, methods that approximate a "true" parameter also give answers that are, under the right conditions, good enough answers.

What makes an answer good enough here? If it points in the right direction. How do you know it's right? You go that way and find out. So: You only know in retrospect. It's worked a lot of times before. Might be worth a try this time.

Is this right? I have no idea. It doesn't seem obviously silly, but I have been wrong before.