Complexity of Determining Convexity of Functions

This wasn’t really my week in terms of results, but I decided to post what I was up to anyway. I always try to solve my problems without help from other sources. Sometimes I make up my own problems that I think shouldn’t be too hard and try to solve those by myself. Sometimes that works, but sometimes I am very incorrect about the difficulty of the problem.

I mentioned previously that I’m in CVX 101, an online class on convex optimization. The homework, unsurprisingly, involves using their software (called CVX, based in Matlab) to solve convex optimization problems. I unfortunately did not bother to read the users manual very carefully before attempting some optimization problems. I got a little frustrated because it wouldn’t let me use something like {x^2 + xy + y^2} as an objective function, even though that’s clearly convex. If I had actually read the users guide I would have known that CVX only allows objective functions constructed from a pre-specified set of rules, and functions like the one I was trying didn’t fall into that category. For example in this problem I rewrote the function as {\tfrac{1}{2}(x+y)^2 + \tfrac{1}{2}x^2 + \tfrac{1}{2}y^2}, which is a little inconvenient but CVX accepts since the sum of squares of affine functions is always convex. I asked myself why they would make life so difficult, and I had a guess. I was pretty sure that determining the convexity of a polynomial in {n} variables was NP-hard.

Doing anything with multivariate polynomials is hard in my experience. For example, determining whether a polynomial in {n} variables has at least one real zero is NP-hard. We can reduce from {3}-SAT. One of my professors insisted on stating decision problems in the following manner and I got used to it, so bear with me. Here’s the statement for {3}-SAT:

INSTANCE: A boolean formula in {3}-CNF form with {n} variables
QUESTION: Is the formula satisfiable?

And here’s the statement of the multivariate polynomial problem:

INSTANCE: A polynomial {f: \mathbb{R}^n \rightarrow \mathbb{R}}
QUESTION: Does {f} have a zero on {\mathbb{R}^n?}

For a clause with literals {x_i}, {x_j}, {\neg x_k} map it to a term in a polynomial {[(y_i)(y_j)(1-y_k)]^2}. Map the rest of the clauses appropriately, and let’s call the polynomial we formed {f(Y)}. We see that {f} has a zero if and only if our instance of {3}-SAT is satisfiable. If {3}-SAT has a satisfying assignment and {x_i} is true in that assignment, let {y_i = 0}, otherwise let {y_i = 1}. Clearly a satisfying assignment produces a zero of {f}. If {f} has a zero then all the terms are zero because they’re all nonnegative, and a term is zero if and only if one of the components is zero, so one of the variables is zero or one. Reverse the mapping and we have a satisfying assignment. A last detail is that if we have a variable that isn’t zero or one in the root of {f}, it doesn’t matter in the assignment so we can ignore it.

I’ve never gotten any real verification that thinking like this is a good idea, but when I see a reduction like that I generally think of the problem we reduced to as being “much harder” than the first problem. In this case we could have made all kinds of restrictions to {f}; we only needed a polynomial of degree {6} or less and a very specific type of polynomial. Well, {3}-SAT is already strongly NP-hard so determining zeroes in multivariate polynomials must be really, really hard, right? Of course in this case we know that the multivariate polynomial problem is in NP and so it’s capped in difficulty, so it sort of implies that going from the restricted types of polynomials to the rest of polynomials is actually easy.

Anyway, back to the convexity problem. Here’s a statement:

INSTANCE: A polynomial {f: \mathbb{R}^n \rightarrow \mathbb{R}}
QUESTION: Is {f} convex in {\mathbb{R}^n}?

If {f} is degree {1}, {2}, or {3} this is pretty simple. If it’s degree one the answer is always YES because it’s affine, so convex. If it’s degree three then the answer is always NO. We need only consider the cubic terms. If we find a term of the form {x_i^3} then taking the derivative with respect to {x_i} twice will make it linear in {x_i}, and so a diagonal element of the Hessian can be made negative, which implies that it’s not positive semidefinite by the extended version of Sylvester’s criterion. The same arguments works for terms of the form {x_i^2x_j}. If all the cubic terms are of the form {x_ix_jx_k} then all diagonal elements are zero, and the principal minor using rows and columns {i} and {j} is negative since it’s {0} on the diagonal and {x_k} on the off-diagonal. As for degree two polynomials, we can just write it in its quadratic form {x^TAX} plus some affine stuff. Then {f} is convex iff {A} is positive semidefinite, which we can either test for with Sylvester’s Criterion or old-fashioned high school math, row reduction. I’m not actually sure which is faster but they’re both polynomial time.

At this point I was feeling pretty good. I had knocked out a few cases, and all that remained was showing this for even degree {4} or higher polynomials. In fact if you can show that deciding the convexity of a degree {4} polynomial is hard then you can just tack on a term like {x_{n+1}^{2s}} for some {s > 2} which is of course convex and you’ll show that deciding the convexity of a higher degree polynomial is also NP-hard. I tried a bunch of stuff but nothing I did really got anywhere. After a few days I decided to give in and started googling for some answers.

It turns out that this is was solved somewhat recently. My guess that determining convexity is hard was correct. It’s proven here by Ahmadi, Olshevsky, Parrilo, and Tsitsiklis [1]. The general outline is that determining non-negativity of biquadratic forms, which are expressions of the form {b(x,y) = \sum_{i \leq j, k \leq l}x_ix_jy_ky_l} where {x = (x_1, \cdots, x_n)} and {y = (y_1 \cdots, y_m)} is NP-hard, as shown in [2]. That reduction uses Clique in [3]. In general I find it kind of difficult to make the jump from discrete structures to continuous objects in reductions. The actual paper is clear and shockingly easy to read, so I won’t summarize it here, but I recommend it. The authors in [1] showed the reduction from the biquadratic form problem to convexity, as well as a bunch of other results: the hardness of determining strict and strong convexity and a polynomial time algorithm for determining quasiconvexity.

At any rate, my plans for proving that determining the convexity of quartics is NP-hard were squashed, but I figured relaxing the problem to rational functions was an acceptable concession. After all, you can easily type a rational function into Matlab, so it’s still a concern for the coders of CVX. After a little bit I realized that there was a somewhat dirty way to do this problem as a reduction from the problem of determining whether a polynomial in {n} variables has a zero. As far as Matlab is concerned, a function like {\tfrac{x^2}{x}} can’t be evaluated at {x=0}. Using this you can just take a polynomial in {n} variables and determine whether {\tfrac{f(X)}{f(X)}} is convex, since if it has no zeroes then the polynomial is {1} which is convex and if it has one or more zeroes then it’s not convex according to Matlab because the domain isn’t even convex.

I don’t really like that reduction, so I kept searching. I also relaxed the problem even more, allowing myself to determine the convexity of functions with exponentials and logarithms. Unfortunately I am still on the hunt. If I do manage to find one I’ll edit it in here, but until then that’s all I’ve got.

[1] A.A. Ahmadi, A. Olshevsky, P.A. Parrilo, and J.N. Tsitsiklis. NP-hardness of Deciding Convexity of Quartic Polynomials and Related Problems. Mathematical Programming vol. 137 Issue 1-2, pages 453-476. A copy is available here.
[2] C. Ling, J. Nie, L. Qu, and Y. Ye. Biquadratic optimization over unit spheres and semidefinite programming relaxations. SIAM Journal on Optimization, 20(3):1286-1310, 2009. [3] T.S. Motzkin and E.G. Strauss. Maxima for graphs and a new proof of a theorem of Turan. Canadian Journal of Mathematics, 17:533-540, 1965. A copy is available here.


Basic Facts About Graph Laplacians

Lately I’ve come across a lot of linear algebra. Apparently it’s pretty important. One topic I always thought was pretty fantastic was spectral graph theory, or analyzing graphs from their adjacency or similar matrices. I never really got a chance to look into it at school, but I started trying to understand some very basic results which I think are pretty fascinating. Of course, after I proved these results I looked at more organized approaches to spectral graph theory only to find that I had missed the main point entirely. That’s why I wouldn’t call the content of this post spectral graph theory. It’s more just facts about the Laplacian matrix.

For a simple graph {G} on {n} vertices we’ll define its Laplacian {L(G)} (I will drop the {G} from now on) as the n \times n matrix with entries  as follows:

L_{ij} = \left\{ \begin{array}{ll} \deg(v_i) & \text{if } i=j \\ -1 & \text{if } v_i,v_j \text{ are adjacent} \\ 0 & \text{otherwise}\end{array}\right.

This matrix has a lot of interesting properties. It’s obviously symmetric. It’s also singular, since it maps {\mathbf{1}} to {\mathbf{0}} because the degree is the only positive term and is exactly the number of {-1}s in each row. We can actually say more interesting things about its eigenvalues. First we’ll show that {L} is positive semidefinite, then we’ll show that the multiplicity of the eigenvalue {0} is exactly the number of components in {G}.

The fact that {L} is positive semidefinite is proven in a short and elegant fashion on Wikipedia. Naturally I disregarded it and set out to prove it on my own. This resulted in a proof that I am not really pleased with since it unnecessarily makes use of some pretty high-powered ideas to prove what is probably a very simple result.

At any rate, there are two facts we’re going to have to use that I won’t prove here. The first is Sylvester’s Criterion, which states that a symmetric matrix is positive definite if and only if its leading principal minors are positive. To review in case you forgot (because I did), a minor is the determinant of the matrix you get when you cut out a set of rows and a set of columns. A principal minor is the determinant of the matrix you get when the indices of those rows and columns are the same. A leading principle minor means you cut out the rows and columns from {k+1} to {n} (i.e. the upper left submatrix of size {k}).

The second fact we’re going to use is the extension of this to positive semidefinite matrices. It says that symmetric a matrix is positive semidefinite if and only if all its principal minors are nonnegative. I came across this fact when my friend linked me to his convex optimization homework here [1], where it is stated without proof. Unfortunately there doesn’t seem to be an elegant justification even assuming Sylvester’s criterion. Overall though it is a bit useless. Sylvester’s criterion at least provides a sort of computationally efficient way to test for positive definiteness. The extension clearly does no such thing, since it requires checking {O(2^n)} minors.

With those out of the way here’s how we’re going to proceed. We’re going to only deal with Laplacians of connected matrices for now and the extension will be simple enough. We will encounter matrices which are almost Laplacian but have least one diagonal element that is too large by an integer amount. Phrased another way, this is a matrix of the form {L + D} where {L} is a laplacian and {D} is a nonzero positive semidefinite diagonal matrix with integer entries. I’m going to call this a super-Laplacian. Be warned that this is probably horribly non-standard, although it shouldn’t be.

Our first claim is that a super-Laplacian matrix is positive definite. This can be shown by induction on {n} (the size of the matrix). The case where {n=1} is trivial. The inductive step needs a simple fact. If we take a {k \times k} super-Laplacian (or even a normal Laplacian) and throw out the same subset of rows and columns we’re going to get a super-Laplacian matrix. The way I understand this is throwing out rows and columns is like deleting a set of vertices from the graph, except we’re not reducing the degrees of the vertices that had edges going into the set we just threw out. Futhermore, since we’re only working with connected graphs for the time being, a vertex in the set of vertices we threw out had to be adjacent to some remaining vertex (and so that diagonal entry will be too large), so our modified matrix is super-Laplacian.

Getting back to the proof, we see that all the principal minors of a {k \times k} super-Laplacian are positive by our assumption. All that remains to be shown to satisfy Sylvester’s Criterion is that the determinant of the matrix itself is positive. Any {k \times k} super-Laplacian can be constructed by adding to its diagonal one entry at a time from a Laplacian matrix. When we increase one diagonal entry from the Laplacian, what happens to its determinant? Consider the traditional expansion along a row or column containing that entry. We increase it (increase since we’re on the diagonal so the associated sign is positive) by the minor resulting from throwing out that vertex, i.e. the determinant of a {(k-1) \times (k-1)} super-Laplacian. By our assumption this is positive and so we added a nonzero quantity to the determinant. Further additions do not cause problems since we are similarly increasing by the determinant of a super-Laplacian. So our {k \times k} matrix has positive determinant and by Sylvester’s Criterion it is positive definite.

We’re basically done now by the extended version of Sylvester’s Criterion. A principal minor of a Laplacian is the determinant of a super-Laplacian matrix which is nonnegative, so the Laplacian of a connected graph is positive semidefinite. Extension to graphs with {m>1} components follows by relabeling vertices (or switching rows and columns, depending on how you want to think about it) to get a block matrix of the form

{\left( \begin{array}{ccc} \mathbf{C_1} & & \mathbf{0} \\ & \ddots & \\ \mathbf{0} & & \mathbf{C_m} \end{array} \right)}

where {C_1 \cdots C_m} are positive semidefinite since they’re connected components. The block matrix is positive semidefinite, which can be seen straight from the definition. So all Laplacians are positive semidefinite.

The other result we’re going to show is a relation between the eigenvalues of the Laplacian and the number of components in the graph. What we just did already suggests as much, but we’ll show it a bit more explicitly. Let {\lambda_1, \lambda_2, \cdots, \lambda_n} be the eigenvalues of {L} ordered from least to greatest. Clearly they’re all nonnegative and there are {n} of them (counting multiplicity) since {L} is diagonalizable. We’re going to show that the multiplicity of the eigenvalue {0} is exactly the number of components in {G} (recall that every {L} has eigenvector {\mathbf{1}} and so {\lambda_1} is always {0}).

If {G} has {m} components, then we can find {m} linearly independent vectors in its null space. For ease of visualization we can relabel to form a block diagonal matrix as we did above, and then the independent eigenvectors are just {(\mathbf{1}, \mathbf{0}, \cdots, \mathbf{0}), (\mathbf{0}, \mathbf{1}, \cdots, \mathbf{0}), \cdots, (\mathbf{0}, \mathbf{0}, \cdots, \mathbf{1})} with the appropriate sizes of blocks. So {\lambda_m = 0}.

If {G} has fewer than {m} components, say {m-1} components, then we can find {m-1} columns (and rows) to throw out and we will have a super-Laplacian matrix, since we can throw out one vertex from each component giving us a block diagonal matrix of super-Laplacians. This is full rank, so our original matrix {L} is of rank at least {n-(m-1)}, so the dimension of the null space of {L} is at most {m-1}, and we can conclude that {\lambda_m > 0}.

Like I said, this is far from the heart of spectral graph theory, but it’s a fun thing to play around with since the concepts are really basic and easy to understand. Another thing to note that I haven’t proven: the second smallest eigenvalue of {L} is called the algebraic connectivity of {G}. We saw a little bit of how that might work since {\lambda_2 > 0} if and only if {G} is connected. Apparently a larger {\lambda_2} means it is more connected, although not quite in the sense of {k}-vertex or {k}-edge connectivity. That more or less sums up the basic facts on Wikipedia, but I’m sure there are many more simple observations to be made.

[1] The material for this course helped me a lot in trying to approach this problem, and of course provided the extended Sylvester’s Criterion. I’m in the Stanford Online version and I really enjoy it.

Balls and Bins with G(n,m)

It’s been a while since I’ve posted so I figured I’d post something short to get back in the swing of things. I spent a week chasing some results in spectral graph theory without knowing anything about it. I still have a hope to find something there but first I need to fill in some gaps in my knowledge. Anyway, recently I came across this problem set. I really like all the problems but the question about {G(n,m)} caught my attention. I figured I would look into it.

Apparently this is also a standard model, but I hadn’t heard of it. A graph {G(n,m)} is generated by taking {n} vertices and randomly placing {m} edges in the graph. The most notable difference is that vertex pairs are selected with replacement, so there is a possibility of parallel edges. The problem from that assignment allows for self-loops, which is just insanity. I also don’t want to post solutions to other people’s problems. But mostly it is insanity, so we’ll stick to the more standard model.

Actually, some natural questions about this model are well-known results. We can ask for what value of {m} do we get a double edge with high probability; this is just the birthday paradox. We can ask for what value of {m} do we have a subgraph that is complete; this is coupon collector. Another question we can ask is if {m=n}, what is a bound on the maximum number of parallel edges?

That last one probably isn’t as common but it was actually a homework problem for me once [1]. It was phrased as follows: If we toss {n} balls into {n} bins by selecting the bin for each ball uniformly at random, show that no bin contains more than {O(\tfrac{\ln(n)}{\ln(\ln(n))})} balls with high probability, more specifically {1-\tfrac{1}{n}}.

This an exercise in Chernoff bounds, which are essential in basically every randomized algorithm I’ve ever seen. It places bounds on the probability that a random variable with a binomial distribution is really far from its mean. Actually it can be the sum of negatively associated random variables, but that’s not really important here. The really nice thing is that the bound is exponentially small. There are a bunch of different forms but they’re all derived in a similar manner. The version we’re going to use is best when we want to bound the probability that a random variable is really far from its mean. Moving past this extremely technical terminology, the bound states that {Pr(X \geq k\mu) \leq (\tfrac{e^{k-1}}{k^k})^{\mu}}.

The overall plan, as per usual, is to bound the probability that any bin has more than {O(\tfrac{\ln(n)}{\ln(\ln(n))})} and then complete the problem by union bounding. For this to work we want to aim to bound the probability a given bin by {\tfrac{1}{n^2}}. We let {X} be the number of balls in a given bin, {X_i} is {1} if the {i}th ball is thrown in the bin, {0} otherwise. Clearly {X = \sum{X_i}} and the {X_i} are independent, so we can apply Chernoff. I’m also going to be really lazy with constants, it’s definitely possible to be more precise. We see that {Pr(X \geq k\mu) \leq (\tfrac{e^{k-1}}{k^k})^{\mu}}, so if {\mu = 1} and {k \in O(\tfrac{\ln(n)}{\ln(\ln(n))})} then

{Pr(X \geq k) \leq \tfrac{e^{k-1}}{k^k} \leq \tfrac{e^k}{k^k} \approx \tfrac{1}{k^k} \leq \tfrac{1}{n^2}}

That last step wasn’t obvious at all to me, so here’s a short justification. Let {k = \tfrac{c\ln(n)}{\ln(\ln(n))}} and

{k^{-k} = (\tfrac{c\ln(n)}{\ln(\ln(n))})^{-\tfrac{c\ln(n)}{\ln(\ln(n))}} \leq (\sqrt{\ln(n)})^{-\tfrac{c\ln(n)}{\ln(\ln(n))}} = (\ln(n))^{-\tfrac{c\ln(n)}{2\ln(\ln(n))}} = (\exp(\ln(\ln(n))))^{-\tfrac{c\ln(n)}{2\ln(\ln(n))}} = \exp(-c\tfrac{\ln(n)}{2}) \leq n^{-2}}

At this point we’re done, since the union bound implies that the probability any bin has more than {O(\tfrac{\ln(n)}{\ln(\ln(n))})} balls is less than {n\tfrac{1}{n^2} = \tfrac{1}{n}} and so the probability that no bin has exceeded {O(\tfrac{\ln(n)}{\ln(\ln(n))})} is {1-\tfrac{1}{n} \rightarrow 1}.

That didn’t actually have anything to do with the random graph model specifically, I suppose, but the balls in bins idea has obvious applications elsewhere, like hashing and job assignments. Returning to {G(n,m)}, we’re just going to look at the number of isolated vertices in a very brief and incomplete fashion.

A vertex is isolated with probability {(1 - \tfrac{n-1}{\binom{n}{2}})^m = (1 - \tfrac{2}{n})^m} since it’s basically repeated attempts to hit one of the {n-1} edges out of {\binom{n}{2}} possible. Thus the expected number of isolated vertices is {n(1-\tfrac{2}{n})^m}. If we let {m = cn\ln(n)} and take the limit we see that

{\displaystyle\lim_{n \rightarrow \infty}{n(1-\tfrac{2}{n})^{c\ln(n)}} = \displaystyle\lim_{n \rightarrow \infty}{n}\displaystyle\lim_{n \rightarrow \infty}{(1-\tfrac{2}{n}}^{n})^{c\ln(n)} = \displaystyle\lim_{n \rightarrow \infty}{ne^{-2c\ln(n)}} = \displaystyle\lim_{n \rightarrow \infty}{n^{1-2c}}}

and so if {c > \tfrac{1}{2}} there is almost surely no isolated vertex. If this looks familiar it’s because it’s basically the same thing as {G(n,p)}. This is kind of obvious anyway. At least from the perspective of isolated vertices the fact that you can have multiple edges makes very little difference. The only change is the probability of getting an edge adjacent to a vertex is now fixed and we’re altering the number of attempts. The variance computation also appears to be identical, so I won’t put it here.

When I was trying to make my own modifications to {G(n,p)} for weighted graphs I was trying all kinds of strange things. I tried generating an edge with probability {p} and then selecting a weight from {[n]} with different distributions. Strange things happened. This is also a pretty nice and easy to understand model. I’ll come back to it once I have learned something about spectral graph theory.

[1] I looked around for the website of the class I took the balls and bins problem from, but I can’t seem to find it. The class was Probability and Computing, 15-359, taught by professors Harchol-Balter and Sutner in the spring of 2012.