# Loose Bounds on Connectivity in G(n,p)

One of the popular models for random graphs is Erdos-Renyi. We generate ${G(n,p)}$ by taking ${n}$ nodes and between every pair generating an edge with probability ${p}$. This is nice because it makes computing properties of the graph really easy. The downside is that very few structures actually follow this model – for instance in social networks if ${u}$ and ${v}$ are adjacent and ${v}$ and ${w}$ are adjacent, then ${u}$ and ${w}$ are more likely to be adjacent. This is clearly not the case in ${G(n,p)}$. It’s a shame, but at least it’s fun to play with.

A monotone property of a graph is one that is always preserved by adding more edges. For a monotone property ${P}$ then it is a natural question to ask for what values of ${p}$ does ${G(n,p)}$ have ${P}$ with probability one as ${n}$ goes to infinity? Since the property is monotone there will be some function of ${p = f(n)}$ at which point ${G(n,f(n))}$ almost surely has property ${P}$. There are in fact sharp thresholds for many properties (as always for a more complete discussion of this , check out [1]). We’ll focus on connectivity, clearly a monotone property.

We’re going to take a shot at bounding the threshold value from below. Our first approach is an extremely loose bound. Let ${X}$ be the random variable representing the number of edges in ${G(n,p)}$ (we’ll just call it ${G}$ from now on). If ${E[X]}$ goes to zero then with high probability ${G}$ has no edges at all. This is evident from Markov’s (Hopcroft and Kannan call it the First Moment Method). Since ${P(X \geq a) \leq \tfrac{E[X]}{a}}$, if ${E[X]}$ drops to zero then the probability there are any edges at all drops to ${0}$. Since the expected number of edges in the graph is ${E[X] = \binom{n}{2}p = \tfrac{n(n-1)p}{2}}$, we see that if ${p = \tfrac{c}{n^{2+\epsilon}}}$ where ${\epsilon > 0}$ then ${\lim_{n \rightarrow \infty}{E[X]} = \lim_{n \rightarrow \infty}{\tfrac{n(n-1)}{2}\tfrac{c}{n^{2+\epsilon}}} \leq \lim_{n \rightarrow \infty}{\tfrac{c}{2n^{\epsilon}}} = 0}$. So if ${p = n^{-(2+\epsilon)}}$ when ${\epsilon > 0}$, ${G}$ almost surely has no edges and is therefore disconnected with high probability.

Actually we can do a little better than this even with our super-naive approach. We don’t need ${E[X]}$ to drop to zero. A graph is disconnected if it has fewer than ${n-1}$ edges, so if ${Pr(X \geq n-1)}$ goes to zero, ${G}$ is almost surely disconnected. Therefore all we need is that ${E[X] \in o(n)}$. This revision suggests that ${p = \tfrac{c}{n^{1+\epsilon}}}$ will do the trick since ${\lim_{n \rightarrow \infty}{E[X]} = \lim_{n \rightarrow \infty}{\tfrac{n(n-1)p}{2}} \leq \lim_{n \rightarrow \infty}{\tfrac{n^2p}{2}} = \lim_{n \rightarrow \infty}{\tfrac{cn^2}{2n^{1+\epsilon}}} = \lim_{n \rightarrow \infty}{\tfrac{cn^{1-\epsilon}}{2}}}$, and so ${\lim_{n \rightarrow \infty}{\tfrac{E[X]}{n-1}} = \lim_{n \rightarrow \infty}{n^{-\epsilon}}}$. For ${\epsilon > 0}$ this drops to ${0}$ so ${G}$ is almost surely disconnected.

It should come as no surprise to find that this is still a terrible, terrible bound. There’s a long way between having ${n-1}$ edges and being connected, since a graph with ${n-1}$ randomly placed edges is very unlikely to be connected. We can achieve a marginally better bound without doing too much work though. A graph is connected if and only if it has a spanning tree. If a graph almost surely does not have a spanning tree then it is disconnected with high probability. Let ${X}$ be the number of spanning trees in ${G}$. If ${X}$ drops to ${0}$ then ${G}$ almost surely does not have a spanning tree and so is almost surely disconnected. We know ${E[X] = n^{n-2}p^{n-1}}$. If ${p = \tfrac{c}{n}}$ then ${E[X] = \tfrac{c(np)^{n-1}}{n} = \tfrac{c}{n}}$, which goes to ${0}$. So our new lower bound for ${p}$ is ${\tfrac{1}{n}}$, better by literally an infinitesimal amount.

The actual optimal lower bound for ${p}$ is ${\tfrac{c \ln(n)}{n}}$ where ${c < 1}$. In fact, connectivity experiences what is called a sharp threshold. That is, there exists a ${p = f(n)}$ such that ${G}$ almost surely is disconnected for ${p = cf(n)}$ where ${c < 1}$ and is almost surely connected for ${c > 1}$. The proof is a little messy for my tastes, so we’ll just finish the process of bounding ${p}$ from below. The following proof is just taken from Hopcroft and Kannan.

We will now investigate the expected number of isolated vertices (vertices with degree ${0}$). Clearly if ${G}$ almost surely has an isolated vertex then ${G}$ is almost surely disconnected. Let ${X}$ be the number of isolated vertices, so ${E[X] = n(1-p)^{n-1}}$. If ${p = \tfrac{c \ln(n)}{n}}$, then ${E[X] = n(1-p)^{n-1} = n(1-\tfrac{c\ln(n)}{n})^{n-1}}$, so ${\lim_{n \rightarrow \infty}{E[X]} = \lim_{n \rightarrow \infty}{n(1-\tfrac{c \ln(n)}{n})^n} = \lim_{n \rightarrow \infty}{ne^{-c\ln(n)}} = n^{1-c}}$. So if ${c > 1}$ then ${G}$ almost surely has no isolated vertices.

Unfortunately this is not actually enough to tell us that for ${c < 1}$ there are almost surely isolated vertices. It could be that most isolated vertices are all in some small set of graphs and the rest all have no isolated vertices. To fix this we need what Hopcroft and Kannan call the second moment method, which is basically using Chebyshev’s to show that if ${\tfrac{Var(X)}{E^2[X]}}$ goes to ${0}$ then ${X}$ is almost surely not zero since ${Pr(X=0) \leq Pr(|X - E[X]|) \leq \tfrac{Var(X)}{E^2[X]}}$. An immediate consequence of this by the definition of variance is that if ${E[X^2] \leq E^2[X](1+o(1))}$ then ${X}$ is almost surely greater than zero.

This is generally a little messier since variance computations are not quite as nice as computing the mean, but in this case it isn’t too bad. We want ${E[X^2]}$. Splitting it up into the typical indicator random variables gives us ${E[X^2] = E[(x_1 + \cdots x_n)^2] = \sum_{i=1}^{n}{x_i^2} + 2\sum_{i < j}{x_ix_j}}$. Since ${x_i^2 = x_i}$, that part is ${E[X]}$. There are ${n(n-1)/2}$ terms in the second sum and each term is ${(1-p)^{2n-2-1}}$ where we avoided double-counting the edge between vertices ${i}$ and ${j}$. Now we can just compute ${\tfrac{E[X^2]}{E^2[X]}}$, substitute in ${p = \tfrac{c\ln(n)}{n}}$ where ${c < 1}$ and send it to infinity.

$\begin{array}{ccc} \frac{E[X^2]}{E^2[X]} &=& \frac{E[X] + n(n-1)(1-p)^{2n-3}}{E^2[X]} \\ &=& \frac{1}{E[X]} + \frac{n(n-1)(1-p)^{2n-3}}{n^2(1-p)^{2n-2}}\\ &=& \frac{1}{E[X]} + (1-\frac{1}{n})\frac{1}{1-p}\\ &=& 1+o(1)\\ \end{array}$

So ${G}$ almost surely has an isolated vertex is ${c < 1}$.

As mentioned before, if ${p = c\frac{\ln(n)}{n}}$ and ${c > 1}$, it is possible to show that ${G}$ is almost surely connected (instead of that it almost surely has no isolated vertices). This is shown by Hopcroft and Kannan by considering the “giant component.” In ${G(n,p)}$ as ${p}$ increases a single component contains most (${n^{2/3}}$) of the vertices, and the rest are isolated vertices. Once ${p}$ is ${\tfrac{c\ln(n)}{n}}$ where ${c > 1}$ the isolated vertices disappear because the giant component has swallowed the whole graph.

Erdos-Renyi is an interesting model to play around with, mostly because it’s the only model I can easily follow most of the math for. For a while I was trying to apply some basic spectral graph theory techniques to it, but I couldn’t make it hold up except in the most basic of facts. For instance the expected Laplacian of ${G(n,p)}$ is obvious, and if you apply Kirchhoff’s Theorem to it then you get the expected number of spanning trees. What I couldn’t do was figure out the expected second smallest eigenvalue, which is zero if and only if ${G}$ is connected. Perhaps a little more creativity was needed on that front. Hopefully I get a chance to review it.

Sources:
[1] Foundations of Data Science by Hopcroft and Kannan. A draft can be found at http://www.cs.cornell.edu/jeh/book112013.pdf
Used for the proof of the sharp threshold for the existence of isolated vertices.