# Balls and Bins with G(n,m)

It’s been a while since I’ve posted so I figured I’d post something short to get back in the swing of things. I spent a week chasing some results in spectral graph theory without knowing anything about it. I still have a hope to find something there but first I need to fill in some gaps in my knowledge. Anyway, recently I came across this problem set. I really like all the problems but the question about ${G(n,m)}$ caught my attention. I figured I would look into it.

Apparently this is also a standard model, but I hadn’t heard of it. A graph ${G(n,m)}$ is generated by taking ${n}$ vertices and randomly placing ${m}$ edges in the graph. The most notable difference is that vertex pairs are selected with replacement, so there is a possibility of parallel edges. The problem from that assignment allows for self-loops, which is just insanity. I also don’t want to post solutions to other people’s problems. But mostly it is insanity, so we’ll stick to the more standard model.

Actually, some natural questions about this model are well-known results. We can ask for what value of ${m}$ do we get a double edge with high probability; this is just the birthday paradox. We can ask for what value of ${m}$ do we have a subgraph that is complete; this is coupon collector. Another question we can ask is if ${m=n}$, what is a bound on the maximum number of parallel edges?

That last one probably isn’t as common but it was actually a homework problem for me once [1]. It was phrased as follows: If we toss ${n}$ balls into ${n}$ bins by selecting the bin for each ball uniformly at random, show that no bin contains more than ${O(\tfrac{\ln(n)}{\ln(\ln(n))})}$ balls with high probability, more specifically ${1-\tfrac{1}{n}}$.

This an exercise in Chernoff bounds, which are essential in basically every randomized algorithm I’ve ever seen. It places bounds on the probability that a random variable with a binomial distribution is really far from its mean. Actually it can be the sum of negatively associated random variables, but that’s not really important here. The really nice thing is that the bound is exponentially small. There are a bunch of different forms but they’re all derived in a similar manner. The version we’re going to use is best when we want to bound the probability that a random variable is really far from its mean. Moving past this extremely technical terminology, the bound states that ${Pr(X \geq k\mu) \leq (\tfrac{e^{k-1}}{k^k})^{\mu}}$.

The overall plan, as per usual, is to bound the probability that any bin has more than ${O(\tfrac{\ln(n)}{\ln(\ln(n))})}$ and then complete the problem by union bounding. For this to work we want to aim to bound the probability a given bin by ${\tfrac{1}{n^2}}$. We let ${X}$ be the number of balls in a given bin, ${X_i}$ is ${1}$ if the ${i}$th ball is thrown in the bin, ${0}$ otherwise. Clearly ${X = \sum{X_i}}$ and the ${X_i}$ are independent, so we can apply Chernoff. I’m also going to be really lazy with constants, it’s definitely possible to be more precise. We see that ${Pr(X \geq k\mu) \leq (\tfrac{e^{k-1}}{k^k})^{\mu}}$, so if ${\mu = 1}$ and ${k \in O(\tfrac{\ln(n)}{\ln(\ln(n))})}$ then

${Pr(X \geq k) \leq \tfrac{e^{k-1}}{k^k} \leq \tfrac{e^k}{k^k} \approx \tfrac{1}{k^k} \leq \tfrac{1}{n^2}}$

That last step wasn’t obvious at all to me, so here’s a short justification. Let ${k = \tfrac{c\ln(n)}{\ln(\ln(n))}}$ and

${k^{-k} = (\tfrac{c\ln(n)}{\ln(\ln(n))})^{-\tfrac{c\ln(n)}{\ln(\ln(n))}} \leq (\sqrt{\ln(n)})^{-\tfrac{c\ln(n)}{\ln(\ln(n))}} = (\ln(n))^{-\tfrac{c\ln(n)}{2\ln(\ln(n))}} = (\exp(\ln(\ln(n))))^{-\tfrac{c\ln(n)}{2\ln(\ln(n))}} = \exp(-c\tfrac{\ln(n)}{2}) \leq n^{-2}}$

At this point we’re done, since the union bound implies that the probability any bin has more than ${O(\tfrac{\ln(n)}{\ln(\ln(n))})}$ balls is less than ${n\tfrac{1}{n^2} = \tfrac{1}{n}}$ and so the probability that no bin has exceeded ${O(\tfrac{\ln(n)}{\ln(\ln(n))})}$ is ${1-\tfrac{1}{n} \rightarrow 1}$.

That didn’t actually have anything to do with the random graph model specifically, I suppose, but the balls in bins idea has obvious applications elsewhere, like hashing and job assignments. Returning to ${G(n,m)}$, we’re just going to look at the number of isolated vertices in a very brief and incomplete fashion.

A vertex is isolated with probability ${(1 - \tfrac{n-1}{\binom{n}{2}})^m = (1 - \tfrac{2}{n})^m}$ since it’s basically repeated attempts to hit one of the ${n-1}$ edges out of ${\binom{n}{2}}$ possible. Thus the expected number of isolated vertices is ${n(1-\tfrac{2}{n})^m}$. If we let ${m = cn\ln(n)}$ and take the limit we see that

${\displaystyle\lim_{n \rightarrow \infty}{n(1-\tfrac{2}{n})^{c\ln(n)}} = \displaystyle\lim_{n \rightarrow \infty}{n}\displaystyle\lim_{n \rightarrow \infty}{(1-\tfrac{2}{n}}^{n})^{c\ln(n)} = \displaystyle\lim_{n \rightarrow \infty}{ne^{-2c\ln(n)}} = \displaystyle\lim_{n \rightarrow \infty}{n^{1-2c}}}$

and so if ${c > \tfrac{1}{2}}$ there is almost surely no isolated vertex. If this looks familiar it’s because it’s basically the same thing as ${G(n,p)}$. This is kind of obvious anyway. At least from the perspective of isolated vertices the fact that you can have multiple edges makes very little difference. The only change is the probability of getting an edge adjacent to a vertex is now fixed and we’re altering the number of attempts. The variance computation also appears to be identical, so I won’t put it here.

When I was trying to make my own modifications to ${G(n,p)}$ for weighted graphs I was trying all kinds of strange things. I tried generating an edge with probability ${p}$ and then selecting a weight from ${[n]}$ with different distributions. Strange things happened. This is also a pretty nice and easy to understand model. I’ll come back to it once I have learned something about spectral graph theory.

Sources:
[1] I looked around for the website of the class I took the balls and bins problem from, but I can’t seem to find it. The class was Probability and Computing, 15-359, taught by professors Harchol-Balter and Sutner in the spring of 2012.