4 min read

The algebra of little-o notation?

I had a professor once who used little-o notation to describe error terms in matrix calculations. I had no context for this notation so had close to zero understanding of what he meant by any of these cryptic equations. I still wonder about this sometimes. These would look something like (making things up):

\(\mathrm{error} = x + y + z + o(\varepsilon)\)

This is to me weird to read, hard to interpret, hard to think about. I'd like to change that.

One thing I've learned since (from Wikipedia and random PDFs found while searching from time to time) is that there is an algebraic aspect to little-o notation. Or, kind of. It's in the same sense as big-O notation.

If you go and look up little-o notation on Wikipedia, you'll get a definition like this. We say that some function \(f(x)\) is \(o(g(x))\) for some eventually-positive function \(g(x)\) if, for every \(\varepsilon > 0\), there is some \(k\) such that for all \(x > k\) we have \(\left|f(x)\right| \leq \varepsilon g(x)\). This is usually written as \(f(x) = o(g(x))\) or \(f = o(g)\), but that's a bit of an abuse of notation – what we're really saying is that \(f(x)\) has this property, which might be expressed as belonging to a set. We are not asserting that \(f\) is equal to something about \(g\).

(This definition is sort of looking at \(f\)'s behavior as things go to infinity. There is also a version of this definition for specific points where instead of looking at "tending toward infinity" behavior, and looking at \(x > k\), we instead say for all \(\varepsilon > 0\) there is a \(\delta\) such that for all x satisfying \(0 < |x - x_0| < \delta\) the inequality with \(\left|f(x)\right| \leq \varepsilon g(x)\) holds.)

This definition is good enough for some purposes, but not the clearest thing in the context of an equation like the fake error equation from the beginning. If we're adding some real numbers with a set, the result be a set and not a real number?

One key thing to know here is that the notation \(o(\varepsilon)\) as used in this equation doesn't actually mean a set. It's another abuse of notation. It actually means "some arbitrary member of this set, we are forgetting about exacty which one here." In other words, it's a function, not a set. It is some function in a set, and we don't care very precisely which function. I think in this case the idea is that we can forget about the exact member if we only want a sort of upper bound.

This is weird to think about. It's not obvious to me that it makes sense if we think about this arbitrary function as "a way of computing a value given an input (which input we don't yet have)," a.k.a. "something formal operation depending on a meaningless variable," or if wet hink of it as as "this thing that has a visualizable shape, some function that you could chart, this thing that has a value at every point in some domain."

In this context of bounding errors, I think there's an extra step involved. This \(o(\varepsilon)\) term feels a little more natural if I think of this error term as "some quantity that depends on another thing" – that is, if we're evaluating the arbitrary function at some particular point, rather than leaving it as a function and adding to it some real numbers (which would have to be like adding constant functions?).

This still feels weird to think about.

It turns out you can also "do math on" or "do algebra on" these things, in a sense. If you have something like \(\varepsilon + o(\varepsilon)\), you might be able to absorb the outer term and get a simple \(o(\varepsilon)\). In the right context, anyway.

The trick is that I'm not sure exactly what the rules are about how you can combine things like this. Looking at it, Wikipedia gives three:

  1. Multiplying by a constant doesn't change the "little-o class." For all \(c \neq 0\), \(f = o(g)\) implies \(c\cdot f = o(g)\).
  2. Multiplying functions \(f = o(F)\) and \(g = o(G)\) pointwise "behaves nicely," in the sense that \(f\cdot g = o(F \cdot G)\).
  3. You have transitivity: If \(f = o(g)\) and \(g = o(h)\) then \(f = o(h)\).

(I would guess analogous properties should hold for the finite case, assuming the "limit points" are identical.)

Do we have additivity? I'm not sure. It feels like we ought to – from what I recall we do for big-O – but Wikipedia doesn't say, and I feel like that is something the page would mention if it were true. It seems like it would be a more basic fact than little-o bieng "compatible with" multiplication.

Question: Suppose \(f = o(h)\) and \(g = o(h)\). Do we have \(f + g = o(h)\)?

Answer: Let \(\varepsilon > 0\). There are \(x_1\) and \(x_2\) such that for all \(x > x_1\) we have \(\left|f(x)\right| \leq (\varepsilon/2) h(x)\), and for all \(x > x_2\) we have \(\left|g(x)\right| \leq (\varepsilon/2) h(x)\). But then for all \(x > x_0 = \max(x_1, x_2)\) we have both inequalities. This implies that for all \(x > x_0\) we can add the inequalities to get \(\left|f(x)\right| + \left|g(x)\right| \leq \varepsilon h(x)\), and applying the triangle inequality in reverse, \(|f(x) + g(x)| \leq \varepsilon h(x)\). So, yes, we have \(f + g = o(h)\). \(\square\)

(Here the finite case seems like it should follow easily. When we are looking at the behavior near a finite limit point \(y\), we just have to swap the conditions using \(x_1, x_2\) for ones using \(\delta_1, \delta_2\) and swap \(\min\) for \(\max\).)

So, it seems like despite my doubts addition does work out nicely for little-o notation. Nice!