Measuring the mutual density of a followership
Visa tweeted once about the density of interconnections in Twitter followerships:
Quoted @visakanv tweet, in case of explosions:
an interesting variable that’s not immediately, obviously easily observable in social graphs (currently): amongst say a dozen celebrities with over 1m followers, whose network is the most dense with interconnected relationships
I wonder, how could we measure this? This is a question about how densely connected a graph is. I bet there's an existing measure for this in terms of graphs.
We can represent the situation as an undirected graph with one vertex per follower. Given two follower accounts A, B, there is an edge connecting A and B if and only if the accounts are "mutuals" — that is, A follows B and B follows A. In what follows, unless otherwise specified all accounts are follower accounts.
(We could model the connections more completely using a directed graph with separate edges for A follows B versus B follows A, but I suspect that makes measuring connection density much more complicated, so let's not.)
From here there are a few ways we could look at connectivity:
- On a scale from "no edges at all" (edgeless graph) to "everyone is mutuals with everyone else" (complete graph), what fraction do we have? If this graph were a complete graph, every account would connect to every other; if our celebrity account had \(n\) followers, we'd have \(\binom{n}{2}\) edges. Relative to \(\binom{n}{2}\), how many edges do we have? Say the number of edges is \(\left|E\right|\); we can measure the connectivity as \(\left|E\right| / \binom{n}{2}\).
- Say we start at the page for an account A and we want to get to some arbitrary other account B's page. Say we get around only by clicking through following/followed by. If we only click on accounts that are mutuals with the account whose page we're looking at is it always possible for us to reach B? If so, the graph is defined to be "connected."
- We can take that notion further: How many follower accounts would we have to delete, at minimum, to make the graph "no longer connected" as we defined "connected"? If we'd have to delete \(k\) accounts, that's how (vertex-)connected the graph is. The higher \(k\) is, the more connected the graph, with the maximum being \(n - 1\).
I looked at a few other notions of connectivity and connectivity-related ideas, but couldn't tell if they make sense in this context:
- Average connectivity (via Alex Li).
- Expander graphs.
- Algebraic connectivity.
- Conductance.
- Efficiency kind of seems similar if not quite the same as connectivity.
- Strength of a graph.
- Graph toughness.
- Small-world networks sound relevant.
Another metric that would add something interesting once you know the connectivity is some measure of clustering — how much do connections in the graph tend to cluster? How "cliquey" is the followerdom? That's another can of worms though.
Another thought: k-vertex-connectivity seems "fragile," in the sense that you can always take a strongly connected graph and make it weakly connected by adding a new component that is connected to the rest of the graph by only one edge. This seems like a weakness of k-vertex-connectivity (as well as k-edge-connectivity!) as a measure of "density of mutuals."
Maybe it's interesting to look at this in terms of a list. That is: Break the graph into a list of cliquey subgraphs in some reasonable way so that the "just added on" subgraph would get ranked first. We can then ask: If we delete the first subgraph, how k-vertex-connected is the resulting graph? What if we delete the second component? And the third? And so on. One would have to compensate somehow for the graph getting smaller as we delete subgraphs. It might work to divide each k in the sequence by the total number of nodes remaining in the graph after deleting the subgraphs up to that point. I suspect the resulting list of numbers might be an interesting way to examine graph connectivity. The hard problem is, how do we select the subgraphs? I suspect finding a min-cut and then deleting the smaller of the two sides of the cut, or failing that the side with fewer edges, or failing an arbitrary one, might work. Or one might have to resolve the edge cases (ha) in some other way. Interesting question.
I have open questions:
- What do the two metrics — "edge fraction" vs. \(k\)-vertex-connectivity — mean in terms of the graph and the social dynamics? What should we expect the graphs to look like at different values of each metric? What impact would different structures have on the way that community operates or feels?
- Are there other metrics besides these two that would make sense here? What could other metrics add to this picture?
- How does clustering play into this?
- Can we construct, given an arbitrary such graph, a reasonable list that interestingly characterizes the graph connectivity and cliquey-ness?
The "edge fraction" together with k-vertex-connectivity and clustering measures seem like a reasonable approach if I had to pick one. But I bet adding other metrics and methods could make for a more complete and interesting picture.