Fixed vs. random effects
What is a fixed effect fixed with respect to, or fixed by? I find myself wondering about this in relation to a paper I read yesterday. They describe a linear model like:
\begin{align} y_{ij} &= \beta_0 + \beta_1 X_{ij} + u_{0i} + u_{1i} X_{ij} + e_{ij} \\ u_{0i} &\sim \mathcal{N}(0, \sigma^2_{u_o}) \\ u_{1i} &\sim \mathcal{N}(0, \sigma^2_{u_1}) \\ e_{ij} &\sim \mathcal{N}(0, \sigma^2_e) \end{align}They write that the \(u\) parameters represent random effects and the \(\beta\) terms capture fixed effects. They write that:
Fixed effects are used to model variables that must remain constant in order for the model to preserve its meaning across replication studies; random effects are used to model indicator variables that are assumed to be stochastically sampled from some underlying population and can vary across replications without meaningfully altering the research question. In the context of our Stroop example, we can say that the estimated \(\beta_1\) is a fixed effect, because if we were to run another experiment using a different manipulation (say, a Sternberg memory task) we could no longer reasonably speak of the second experiment being a replication of the first.
This text implies that \(\beta_1\) must be modeling some "variable." It sounds like that is the manipulation used. But how does the estimation "know" what it's supposed to be modeling? I think the correspondence here must be enforced by the optimization process and the correspondence with the variable it's attached to? In this case \(\beta_1\) is attached to \(X_{ij}\). We put \(X_{ij}\) in correspondence with the "experimental condition" of subject \(i\)'s \(j\)th trial, and we put \(y_ij\) in correspondence with the corresponding observed reaction times. When we fit our linear model, \(\beta_1\) is forced to correspond with the difference in means between the conditions – i.e. the Stroop effect. We create a correspondence between the pairs \((X_{ij}, y_{ij})\) and the experiment results and then the fitting process forces \(\beta_1\) into correspondence with the hypothesized Stroop effect.
But how does it make sense to say that "the estimated \(\beta_1\) is a fixed effect"? "If we were to run another experiment using a different manipulation," then sure, that could no longer reasonably be called a replication. I agree in the world of "verbal construct." But how cana number "be" a fixed effect, and what does that have to do with replications?
Maybe a right reading is to say it's fixed in the sense that if we change what \(X_{ij}\) or \(y_{ij}\) means, it's no longer valid. That's trivially true. So I think that's not a useful reading.
I also find it strange to model (as they do later) variables like the site, experimenter, stimulus, task, and instructions as "stochastically sampled from some underlying population." We know that these factors were not sampled uniformly at random from any underlying population. There is no Scientist in the Sky spinning the wheel and assigning replication studies to random combinations of site, experimenter, stimuli, task, and instructions.
What would the populations even be if we wanted to randomly sample over these "random effects" variables?
- Site: Any room in any building anywhere in the U.S.?
- Experimenter: Any psychologist anywhere, or any who has submitted to (Insert Journal Here), or any who is within five miles of the site (making the population "hierarchical" rather than a product)?
- Stimuli: This one is relatively straightforward at least; if we have a list of English color words we should be able to randomly sample congruent pairs relatively easily if we have a conservative model for e.g. "colors that are definitely green." Sampling incongruent pairs is fine, too, just randomly sample a color word and independently sample a random color, then have some human filter out any pairs that happen to be congruent or accept that a few accidentally congruent pairs will show up. (Although now you have a problem if color words aren't evenly distributed across colors — for example if you have lots more words for shades of red (crimson, vermillion, ...) than for shades of green.)
- Task: For Stroop I think we can assume this is fixed.
- Instructions: Any English text that conveys the same factual information about how to perform the task? (I suspect there are way too many of those to be practical and most of these descriptions would not make for good instructions. Arguably "the complete works of Shakespeare, followed by 'Now ignore the preceding and read the following instructions: ...' is such a string. Or: Replace Shakespeare with equal length "monkey on typewriter" text. Or: Replace Shakespeare with text sampled from an English raw word frequency model. Et cetera.)
It is totally impractical to sample from any of these possible populations. The fixed population definitions I could name seem either (1) obviously absurd, as above, or (2) so restricted that generalization once again becomes a problem, because you have to restrict to a practical subset that is much smaller than the relevant population. You just can't perfectly randomize all of the relevant nuisance factors out — if you want to generalize then at some point you have to take a leap from a limited model. That's not to argue in favor of psychology in the right place with respect to randomizing over these factors — but I wonder what Yarkoni thinks about the tradeoff.
The paper makes some good points — but it's unclear what to take away from the bulk of it. Given that these fixed definitions seem silly — what would it mean to model these factors as "stochastically sampled"? How could this be made reasonable? Maybe it is better to do things this way than to leave those effects out entirely — but I expect it to still be overconfident in the sense that we still have to pretend to have eliminated sampling bias when interpreting the model, so that the error estimate is less than it should be, and it seems no rational method can resolve that.
Given that we know even such an adjusted model is wrong — what can we take away from it? It would seem strange to take the numbers produced at face value — but it also feels a bit deranged to say we must throw them out entirely. What do we do instead?