The Irwin-Hall Distribution

A while ago I came across this problem from Stanford’s Putnam seminar:

Two points are independently selected uniformly at random from a line of length {b}. What is the probability that they are at least a distance {a} apart, where {b > a}? (Putnam 1961 B2)

This problem has a very neat solution which exploits what I find to be a rather clever technique. We put the line on a coordinate system and say it runs from {0} to {b}. Let {X_1} be the coordinate of the first point and {X_2} be the coordinate of the second, where {X_1} and {X_2} are random variables which are clearly uniformly distributed. Now draw the unit square, as in the following diagram [1]:


The shaded region represents the area in which the points are at least {a} apart. It has area {2(\tfrac{1}{2}(b-a)^2) = (b-a)^2}, so the probability the two points are at least {a} apart is simply {\tfrac{(b-a)^2}{b^2}}.

This trick can be applied to a whole host of problems. As far as I can tell the only requirement is that we are sampling from uniform distributions. One somewhat classic problem is to find the probability density function of a random variable {Y = X_1 + X_2} where {X_1, X_2 \sim U(0,1)}. By the same trick we can compute the CDF of {Y}, we just need to split it into cases:


The CDF is therefore {F_Y(s) = \tfrac{s^2}{2}} when {0 < s < 1} and {F_Y(s) = 1 - \tfrac{s^2}{2}} when {1 < s < 2}. Differentiating gives us the triangular distribution {f_Y(s) = s} when {0 < s < 1} and {f_Y(s) = -s} when {1 < s < 2}.

Looking at the last post I thought an interesting idea would be to generalize this to {n} uniform random variables. This is of course a solved problem: the density function is known as the Irwin-Hall distribution. I wanted to take a run at it from our high-dimensional geometry perspective, since we can apply the same idea as above to the unit cube in {n} dimensions to compute the CDF of Irwin-Hall.

Before we attack this we need to know how to compute the volume of the standard {n}-simplex, or the volume of the set {\{\vec{x} | \sum{x_i} < 1\}} where the {x_i}s are nonnegative. We’ll follow the explanation given in [2]. First let’s consider the simplex in {\mathbb{R}^n} defined by {(1,0,\cdots,0), (1,1,0,\cdots,0), \cdots, (1,1,\cdots,1)} and the origin. This is just the set {\{\vec{x} | 1 \geq x_1 \geq x_2 \cdots \geq x_n \geq 0\}}. We can fit exactly {n!} of these in the unit cube since there are {n!} such orderings and each point in the unit cube is in one of these simplices (I’m fudging some issues with the boundaries but it should be fine I think). Thus the original simplex has volume {\tfrac{1}{n!}}. We can map this simplex to the standard simplex by the transformation {y_n = x_n; y_{n-1} = x_n + x_{n-1}; \cdots; y_1 = y_n + y_{n-1} + \cdots + y_1}. This is a transformation with determinant one so the volume remains unchanged, so the volume of the standard simplex is {\tfrac{1}{n!}}.

I drew some pictures of the three dimensional case since some interesting things happen there. We’re keeping track of the volume underneath the plane {x_1 + x_2 + x_3 = s} that’s contained within the unit cube. Similar to the two-dimensional case, our function will be piecewise since it changes when it hits vertices of the cube. First our simplex increases at an obvious rate, maintaining a volume of {\tfrac{s^3}{3!}}. It looks kind of like this:


Then the function changes when {s} passes {1}. The way we compute the new volume is by taking the volume of the red simplex and subtracting out the volume of the blue simplices (outlined in the following figure), which gives us {\tfrac{s^3}{n!} - \binom{3}{1}\tfrac{(s-1)^3}{n!}}.


Finally the function changes for a last time when {s} passes {2}. We could just compute the volume of the missing simplex and subtract that from one, but we’ll try a method that works a little more generally. Note that the blue simplices have overlapped to form new simplices which are outlined in green below. Our method of computation is to take the red simplex, subtract out the blue simplices, and then add back in the green simplices. It’s basically inclusion-exclusion. This gives us {\tfrac{s^3}{n!} - \binom{3}{1}\tfrac{(s-1)^3}{n!} + \binom{3}{2}\tfrac{(s-2)^3}{3!}}.


This appears to generalize in a fairly simple manner. Note that our function changes based on the integer part of {s}. The first piece of our function is naturally just {\tfrac{s^n}{n!}}. When {s} hits {1}, the plane hits exactly {\binom{n}{1}} vertices (more precisely it hits {(1,0,\cdots,0) \cdots (0,\cdots,1)}). Our volume computation then requires that we subtract out the newly formed simplices. These have side length {s-1} and there are exactly {\binom{n}{1}} of them, since one forms at each vertex.

If {\lfloor s \rfloor = i} then we see that we will be computing {\tfrac{1}{n!}\sum_{k=0}^{i}{(-1)^k \binom{n}{k} (s-k)^n}}. Each time {s} crosses an integer {i} the plane hits {\binom{n}{i}} vertices (choose which {i} of the coordinates are {1}) and exactly that many new simplices are formed to be added or subtracted out, depending on parity. Thus our CDF is exactly

{P(X < s) = \tfrac{1}{n!} \displaystyle\sum_{k=0}^{\lfloor s \rfloor}{(-1)^k \binom{n}{k} (s-k)^n}}

And differentiating with respect to {s} gives us our PDF

{f_X(s) = \tfrac{1}{(n-1)!} \displaystyle\sum_{k=0}^{\lfloor s \rfloor}{(-1)^k \binom{n}{k} (s-k)^{n-1}}}

Admittedly there are easier ways to go about proving the expressions for the PDF or CDF are valid, but I like this one since it minimizes the amount of “clairvoyance” involved. As a whole I find this trick with the unit cube to be fairly useful whenever solving problems with uniform distributions. I’m not sure of legitimate practical applications but it seems to come up in contest math from time to time. Hopefully someday I will spot an opportunity to use it in a real-world setting.

[1] I drew the figures but I basically stole and modified the code found here:
[2] Foundations of Data Science by Hopcroft and Kannan.
A draft can be found here:
Used for the computation of volume of a standard simplex


Basic High-dimensional Geometry

I’m not informed about too many of the applications, but apparently high-dimensional geometry is a relevant topic these days. I think it’s pretty fun to think about though, so I’ll talk about some of the basic properties of d-dimensional spheres and some of their counter-intuitive properties.

We’re going to say that a unit hypersphere is the set of points in \mathbb{R}^d, say (x_1, x_2, \cdots, x_d) where \sum_{i=1}^{d}{x_i^2} = 1. This sounds pretty obvious but some strange things happen when the dimension increases, so we’ll be a little careful.

Visualizing a hypersphere is unfortunately beyond my ability, so we’ll have to show a lot of “geometric” results with algebra. We’ll start with the surface area of the unit hypersphere in d dimensions. First we’ll talk about a relevant trick to evaluate a common integral; suppose we wanted to evaluate the following

\begin{array}{ccc}  \displaystyle\int_{-\infty}^{\infty}{e^{-x^2} dx}\\  \end{array}

One way to go about this is to consider the square of the integral and then convert to polar. It’s kind of a strange idea since usually we’re trying to take apart separable integrals, but here it works well.

\begin{array}{lcl}  \int_{-\infty}^{\infty}{e^{-x^2} dx} \int_{-\infty}^{\infty}{e^{-y^2} dy} &=& \int_{-\infty}^{\infty}{\int_{\infty}^{\infty}{e^{-(x^2+y^2)}dx}dy} \\  &=& \int_{0}^{2\pi}{\int_{0}^{\infty}{re^{-r^2} dr d\theta}} \\  &=& \frac{1}{2} \int_{0}^{2\pi}{\int_{0}^{\infty}{e^{-u} du d\theta}} \\  &=& \pi  \end{array}

I skipped a few steps at the end there, but the rest of the math is clear I think. So our original integral is equal to \sqrt{\pi} (positive since e^{-x^2} is always positive).

Now the trick to determining the surface area of a hypersphere is to evaluate a similar integral in two ways, with cartesian and polar coordinates. Evaluating in cartesian coordinates goes like this:

\begin{array}{lcl}  \int_{-\infty}^{\infty}{} \cdots \int_{-\infty}^{\infty}{e^{-(x_1^2 + \cdots + x_d^2)} dx_d \cdots dx_1} &=& (\int_{-\infty}^{\infty}{e^{-x^2}dx})^d \\  &=& \pi^{\tfrac{d}{2}}  \end{array}

Evaluating in polar coordinates is a little bit harder, especially if you’ve forgotten how to apply change of coordinates like me. The important thing to see is that the differential after the change is just r^{d-1}d\Omega dr where d\Omega is the infinitesimal piece of surface area of the solid angle on the unit sphere. I basically try to visualize in three dimensions. Failing at that, I’ll try two dimensions. Failing at that … well, you get the idea.

Anyway, after the change of coordinates we get the following:

\begin{array}{lcl}  \int_{-\infty}^{\infty}{} \cdots \int_{-\infty}^{\infty}{e^{-(x_1^2 + \cdots + x_d^2)} dx_d \cdots dx_1} &=& \int_{S^d}{d\Omega}\int_{0}^{\infty}{r^{d-1}e^{-r^2}dr} \\  &=& A_d \int_{0}^{\infty}{r^{d-1}e^{-r^2}dr} \\  &=& A_d \int_{0}^{\infty}{u^{(d-1)/2}e^{-u} (\tfrac{du}{2\sqrt{u}})} \\  &=& \frac{1}{2} A_d \int_{0}^{\infty}{u^{d/2} e^{-u} du}  \end{array}

where A_d is the surface area of the unit sphere. You might recognize the integral as the gamma function, more specifically \Gamma(d/2) (or in my case, have it pointed out by Bishop in [2]). Either way, we can finally equate the two to see that the surface area of the unit sphere in d dimensions is

\begin{array}{ccc}  A_d = \frac{2 \pi^{d/2}}{\Gamma(d/2)}  \end{array}

Knowing this, finding the volume is easy since V_d = \int_{0}^{1}{A_d r^{d-1} dr}, so V_d = \tfrac{A_d}{d}. Some of the calculus might be sketchy here, but we can at least see that the expressions check out for d=2 and d=3. So the volume of the unit sphere is

\begin{array}{lcl}  V_d = \frac{2 \pi^{d/2}}{(d) (\Gamma(d/2))}  \end{array}

Interestingly enough, this approaches zero as d approaches infinity.

As nice as these formulas are, they don’t actually tell us that much about what the hypersphere “looks” like. There are a couple of properties shown by Hopcroft and Kannan in [1], but the easiest to see is that most of the volume is near the very edge of the sphere. This is clear when we fix \epsilon > 0 and consider the ratio of the volume of a sphere with radius 1-\epsilon to the volume of the unit sphere: \tfrac{(1-\epsilon)^d V_d}{V_d} = (1-\epsilon)^d, which goes to zero as d gets large. Less obvious is that most of the volume is concentrated near the equator, as is the surface area.

We’ll finish up by leaving the realm of all practical application (at least to my knowledge). Consider the Euler Characteristic \chi of a cube. For those who don’t remember (or read about it in The Number Devil and are suspicious of any names you’ve heard there), the Euler Characteristic is (classically) defined as \chi = V-E+F which represent the vertices, edges, and faces of a convex polyhedron. For any such convex polyhedron this always comes out to 2, and the cube is no exception, since 8-12+6=2. How does this extend to higher dimensions?

We’ll keep considering cubes, more specifically the set of points \vec{x} \in \mathbb{R}^n such that 0 \leq x_i \leq 1 for all i. How many vertices does this cube have? Considered coordinate-wise each could be 0 or 1 independently, so there are 2^n vertices. How many edges are there? Well, an edge connects two vertices with exactly one coordinate different (Hamming distance one is another way to think about it, of course). A given vertex therefore has n adjacent vertices and so has n edges. Summing over the entire cube we get \tfrac{n2^n}{2} = n2^{n-1} where we have divided by two because we counted each edge twice, once from each end. Finally, a face can be defined by its vertices which are diagonally opposite. Algebraically this means we can generate all the faces by finding all vertices that differ from a a given vertex in exactly two places and summing over the entire cube. This gives us \tfrac{\binom{n}{2}2^n}{2^2} = \binom{n}{2}2^{n-2}, since each face was generated four times.

Applying Euler’s formula to this does not give us anything interesting in my opinion, but stopping at “faces” in this case seems arbitrary for n-dimensional cubes. We stop at “faces” (or two dimensional cubes) in this particular case because we are working in three dimensions. In the n-dimensional case, wouldn’t it make sense to continue to (n-1)-dimensional objects?

In general we are looking to ask how many i-dimensional cubes does an n-dimensional cube contain on its boundaries. It’s not hard to see from the 3-dimensional reasoning that this is actually just \binom{n}{i}2^{n-i}. If we put these in an alternating series (as Euler does), we’ll get \sum_{i=0}^{n}{(-1)^i\binom{n}{i}2^{n-i}}. Looking at this carefully we recognize that this is just the binomial expansion of (2-1)^n, or 1. In other words, applying a slightly generalized Euler’s formula to an n-dimensional cube will yield 1, regardless of dimension. I’m unsure whether to include the last term since Euler does not in the 3-dimensional case (also it’s always 1, since we are asking “how many n-dimensional cubes does this n-dimensional cube contain”). Leaving that out gives us 1+(-1)^{n-1}, which I suppose is still a nice result but not quite as pretty.

It would be interesting but probably much more difficult to look into a more general case of this. Perhaps regular convex polyhedra wouldn’t be so bad since we probably would be able to exploit similar counting techniques, but in the most general case I have no idea how to go about this. I might try to look into the literature at some point, since I’m sure someone has done it.

If you find any errors or typos, please let me know! I’ll even thank you so that this blog’s vast readership will know of your contribution.

[1] Foundations of Data Science by Hopcroft and Kannan.
A draft can be found here:
Contains a more extensive discussion of hyperspheres.
[2] Neutral Networks for Pattern Recognition by Bishop.
Outlines the proof of the surface area and volume of a hypersphere in Exercises 1.1-1.4.