One of the popular models for random graphs is Erdos-Renyi. We generate by taking nodes and between every pair generating an edge with probability . This is nice because it makes computing properties of the graph really easy. The downside is that very few structures actually follow this model – for instance in social networks if and are adjacent and and are adjacent, then and are more likely to be adjacent. This is clearly not the case in . It’s a shame, but at least it’s fun to play with.
A monotone property of a graph is one that is always preserved by adding more edges. For a monotone property then it is a natural question to ask for what values of does have with probability one as goes to infinity? Since the property is monotone there will be some function of at which point almost surely has property . There are in fact sharp thresholds for many properties (as always for a more complete discussion of this , check out ). We’ll focus on connectivity, clearly a monotone property.
We’re going to take a shot at bounding the threshold value from below. Our first approach is an extremely loose bound. Let be the random variable representing the number of edges in (we’ll just call it from now on). If goes to zero then with high probability has no edges at all. This is evident from Markov’s (Hopcroft and Kannan call it the First Moment Method). Since , if drops to zero then the probability there are any edges at all drops to . Since the expected number of edges in the graph is , we see that if where then . So if when , almost surely has no edges and is therefore disconnected with high probability.
Actually we can do a little better than this even with our super-naive approach. We don’t need to drop to zero. A graph is disconnected if it has fewer than edges, so if goes to zero, is almost surely disconnected. Therefore all we need is that . This revision suggests that will do the trick since , and so . For this drops to so is almost surely disconnected.
It should come as no surprise to find that this is still a terrible, terrible bound. There’s a long way between having edges and being connected, since a graph with randomly placed edges is very unlikely to be connected. We can achieve a marginally better bound without doing too much work though. A graph is connected if and only if it has a spanning tree. If a graph almost surely does not have a spanning tree then it is disconnected with high probability. Let be the number of spanning trees in . If drops to then almost surely does not have a spanning tree and so is almost surely disconnected. We know . If then , which goes to . So our new lower bound for is , better by literally an infinitesimal amount.
The actual optimal lower bound for is where . In fact, connectivity experiences what is called a sharp threshold. That is, there exists a such that almost surely is disconnected for where and is almost surely connected for . The proof is a little messy for my tastes, so we’ll just finish the process of bounding from below. The following proof is just taken from Hopcroft and Kannan.
We will now investigate the expected number of isolated vertices (vertices with degree ). Clearly if almost surely has an isolated vertex then is almost surely disconnected. Let be the number of isolated vertices, so . If , then , so . So if then almost surely has no isolated vertices.
Unfortunately this is not actually enough to tell us that for there are almost surely isolated vertices. It could be that most isolated vertices are all in some small set of graphs and the rest all have no isolated vertices. To fix this we need what Hopcroft and Kannan call the second moment method, which is basically using Chebyshev’s to show that if goes to then is almost surely not zero since . An immediate consequence of this by the definition of variance is that if then is almost surely greater than zero.
This is generally a little messier since variance computations are not quite as nice as computing the mean, but in this case it isn’t too bad. We want . Splitting it up into the typical indicator random variables gives us . Since , that part is . There are terms in the second sum and each term is where we avoided double-counting the edge between vertices and . Now we can just compute , substitute in where and send it to infinity.
So almost surely has an isolated vertex is .
As mentioned before, if and , it is possible to show that is almost surely connected (instead of that it almost surely has no isolated vertices). This is shown by Hopcroft and Kannan by considering the “giant component.” In as increases a single component contains most () of the vertices, and the rest are isolated vertices. Once is where the isolated vertices disappear because the giant component has swallowed the whole graph.
Erdos-Renyi is an interesting model to play around with, mostly because it’s the only model I can easily follow most of the math for. For a while I was trying to apply some basic spectral graph theory techniques to it, but I couldn’t make it hold up except in the most basic of facts. For instance the expected Laplacian of is obvious, and if you apply Kirchhoff’s Theorem to it then you get the expected number of spanning trees. What I couldn’t do was figure out the expected second smallest eigenvalue, which is zero if and only if is connected. Perhaps a little more creativity was needed on that front. Hopefully I get a chance to review it.
 Foundations of Data Science by Hopcroft and Kannan. A draft can be found at http://www.cs.cornell.edu/jeh/book112013.pdf
Used for the proof of the sharp threshold for the existence of isolated vertices.