Tag Archives: sumsets

Effective results on the size and structure of sumsets

For a set {A \subset \mathbb{Z}^d} we define the {N}-fold sumset of {A} via

\displaystyle NA : = \{a_1 + \ldots + a_N : a_i \in A\}.

We have the following result of Khovanskii from 1992.

Theorem 1 (Polynomial size of iterated sumsets): Let {A \subset \mathbb{Z}^d} of size {n}. Then there exists a {N_A \geq 1} and {c_0 , \ldots , c_d \in \mathbb{Q}} such that for {N \geq N_A}

\displaystyle |NA| = \sum_{j=0}^d c_j N^j. \ \ \ \spadesuit

One part of a recent work, Andrew Granville, Aled Walker, and I were able to provide effective bounds for {N_A}. We let

\displaystyle w(A) = \max_{a , b \in A} ||a-b||_{\infty}.

Theorem 2 (Effective Khovanskii): Let {A \subset \mathbb{Z}^d} of size {n}. Then there exists {c_0 , \ldots , c_d \in \mathbb{Q}} such that for any

\displaystyle N \geq 2^{2n} n^{n^3 + 1} w(A)^{n^3}.\ \ \ \ \ (1)

one has

\displaystyle |NA| = \sum_{j=0}^d c_j N^j. \ \ \ \spadesuit

It is not clear that (1) is best possible. Certainly there has to be some dependence on how {A} lies in space, as can be seen by taking the elements of {A} to be increasingly spread out. Curran and Goldmakher showed that when {|A| = d+2}, one needs {N_A} at least the order of { \omega(A)^d }

We first recall an elegant proof of Theorem 1 due to Nathanson and Ruzsa, which is also the starting point for Theorem 2.

Proof of Theorem 1: We consider {\mathbb{Z}_{\geq 0}^n}, equipped with {\leq}, the lexicographical ordering. Let {A = \{a_1 ,\ldots , a_n\}}. We then have a map

\displaystyle \phi : \mathbb{Z}_{\geq 0}^n \rightarrow \bigcup_{j=0}^{\infty} jA,

via

\displaystyle \phi(x_1 , \ldots , x_n) =x_1 a_1 + \ldots + x_n a_n.

It is worth noting that if {\phi} is injective, we immediately deduce Theorem 1 by the stars and bars. Typically {\phi} is not injective, but it turns out to not be a significant problem. We let {U} be the set of elements {x} such that there exists a {y} with {||y||_1 = ||x||_1}, {y < x}, and {\phi(y) = \phi(x)}. We call any element of {U} useless. They are deemed useless as

\displaystyle |NA| = \{x \in \mathbb{Z}_{\geq 0}^n : ||x||_1 = N\} \setminus U.

There is nothing intrinsic that makes elements of {U} useless, rather it is a consequence of our choice of lexicographical ordering. One can check that {U} is closed under translations from elements of {\mathbb{Z}_{\geq 0}^n}.

We need a way to count the elements of {U} and thus propose another definition. We let {\leq_{{\rm unif}}} be the partial ordering where {x \leq_{{\rm unif}} y} if {x} is smaller than {y} coordinate-wise. Let {U_{\min}} be the elements {x \in U} such that there is no {y\in U} with {y <_{{\rm unif}} x}. Dickson’s lemma implies that {U_{\min}} is finite. For a set {U'}, we let

\displaystyle B(N, U') = \{x \in \mathbb{Z}_{\geq 0 }^n: ||x||_1 = N , x >_{{\rm unif}} u, \ \text{for all} \ u \in U'\}.

Thus we have, by the inclusion-exclusion principle,

\displaystyle | \{x \in \mathbb{Z}_{\geq 0}^n : ||x||_1 = N\} \setminus U| = \sum_{U' \subset U_{\min}} (-1)^{|U'|}| B(N , U')|.

Thus it is enough to show for any finite {U'}, {\#B(N,U')} is polynomial in {N}, for {N} large enough. This follows from the same stars and bars argument mentioned above, as long as

\displaystyle N \geq \max_{u \in U'} ||u||_{\infty}, \ \ \ \ \ (2)

and Theorem 1 follows. {\spadesuit}

Note that this proof does not give any effective bound on {N_A}, as we do not have any control over the set {U_{\min}}. In particular, one would like to have a bound on the {\ell^{\infty}} norm of the elements of {U_{\min}}, in light of (2). In general, one cannot make Dickson’s lemma quantitative, but in our case we can use the structure of {U_{\min}} to do so. The point is that {U} is defined by linear relations, so one can appeal to Siegel’s lemma.

Proof (sketch) of Theorem 2: We translate {A} so that {0 \in A}, which does not effect the problem (in particular, {w(A)} remains unchanged). We build upon the proof of Theorem 1. Suppose {x \in U_{\min}}. As {x \in U}, there is a {y \in \mathbb{Z}_{\geq 0}^n} such that {||x||_1 = ||y||_1}, {y < x}, and {\phi(x) = \phi(y)}. Thus

\displaystyle \sum_{i \in I} x_i a_i = \sum_{j \in J} y_i a_j.

As {x \in U_{\min}}, one can check that {I\cap J = \emptyset}. We now construct a matrix {M} as follows. The first row has {\#I} 1’s followed by {\#J} {-1}‘s. The remaining {d} rows are give by {(a_i)_{i \in I}} and {(-a_j)_{j \in J}}. Thus

\displaystyle M (x,y)^T = 0 \ \ \ \ \ (3)

One would hope to apply Siegel’s lemma, which asserts that (3) has a solution such that {||(x,y)||_{\infty}} is small. The primary issue is that this small solution, {x^*} may have nothing to do with {(x,y)^T}. However one can translate {(x,y)^T} by a multiple of {x^*} to create a solution that is small in a single coordinate. Then one “forgets” this coordinate and tries proceeds by induction. A secondary issue is that the {x^* \in \mathbb{Z}^n}, may have negative coordinates, but this turns out to not be a critical issue. All of this carried out in section 6 of the aforementioned paper with Granville and Walker. {\spadesuit}

Entropy and Sumsets: An example

The following post is a result of a discussion with Imre Ruzsa. Motivated by the following easy inequality in additive combinatorics

\displaystyle A+2 \cdot A \subset A+A+A , \ \ q \cdot A := \{qa : a \in A\},

I asked if the following was true for a finitely valued random variable {X}:

\displaystyle H(X+2 \cdot X) \leq H(X+X+X), \ H(X) := -\sum_{x \in X} \mathbb{P}(X = x) \log_2 \mathbb{P}(X = x).\ \ \ \ \ (1)

Here all sums are of independent copies of the random variables. The idea is that one might expect {X+X} to be a bit more uniform than {2 \cdot X}.

First Imre provided a counterexample to the question

\displaystyle H(X+ 2\cdot Y) \leq H(X+Y+Y).

I find this example is particularly elegant. Let {X} be uniform on {\{0,1\}} and {Y} be uniform on {\{0 , \ldots , n\}}. Then {X+2 \cdot Y} is uniform on {\{0 , \ldots , 2n+1\}}, while the support of {X + Y + Y} is {\{0 , \ldots , 2n+1\}} but is not uniform (there is concentration in the middle thanks to the distribution of {Y+Y}).

We then seriously questioned the validity (1). After some discussion, Imre eventually said something about higher dimensional concentration that made me think one should check (1) for the “Gaussian.” The reason Gaussian is in quotes is that it is not finitely valued as assumed in (1), so strictly speaking we cannot check it for the Gaussian. To see if there was hope, I looked at the differential entropy of a real valued random variable {G} with density {p} defined via

\displaystyle H(G) := -\int_{-\infty}^{\infty} p(x) \log p(x) dx.

Let us take {G} to be the Gaussian with mean zero (this is irrelevant for entropy) and variance 1. Recall some basic properties of variance:

\displaystyle {\rm Var}(aG) = a^2 {\rm Var}(G) , \ \ {\rm Var}(G+G) = 2 {\rm Var}(G),

where {a \in \mathbb{R}} and {G+G} is understood to be the sum of two independent copies of {G}. Thus

\displaystyle {\rm Var}(G + 2 \cdot G) = 5 , \ \ {\rm Var}(G + G +G ) = 3.

So we indeed see that (1) is not true for the Gaussian. To construct a finitely valued random variable that does not satisfy (1), we can convolve a Bernoulli random variable with itself until (1) is not satisfied (assuming that going from discrete to continuous does not destroy (1) which is not obvious without checking as {2 \cdot X} has a strange support condition, for instance the same argument would prove H(2 \cdot G) \geq H(G+G) which is clearly not true for discrete random variables). Anyways, I wrote some quick python code to check this and found that for {X = B + B + B} where {B} is the random variable of a fair coin flip, we have

\displaystyle H(X+2 \cdot X) \approx 2.984 , \ \ H(X+X+X) \approx 2.623.

Here {X+ 2 \cdot X} and {X+X+X} are supported on {\{0 , \ldots , 9\}} and so their entropies are bounded by the entropy of uniform distribution on 10 elements which is

\displaystyle \log_2 10 \approx 3.322.

Sometimes entropy analogs of sumset inequalities hold and sometimes they do not (see this paper of Ruzsa or this paper of Tao, or a host of work by Madiman and coauthors).

Convex sumsets and sum-product estimates

Thanks to Junxian Li for carefully reading and providing useful suggestions.

Here we advertise a conjecture that would have consequences to two major problems in additive combinatorics and provide a proof of the state of the art progress towards them. In this post, we say {b \gtrsim a} if {a = O(b \log^c |A|)} for some {c >0}. Let {A, B \subset \mathbb{R}} be finite. We define the sumset and product set via

\displaystyle A+B := \{a + b : a \in A , b \in B\} , \ \ \ \ \ \ AB := \{ab : a \in A , b \in B\}.

We have the beautiful sum–product conjecture of Erdős and Szemerédi.

Conjecture (ESconj): Let {\delta < 1}. Then for any finite {A \subset \mathbb{Z}} one has

\displaystyle |A+A| + |A A| \gtrsim |A|^{1 + \delta}. \ \ \spadesuit

We say a finite {A = \{a_1 < \ldots < a_k\} \subset \mathbb{R}} is convex if

\displaystyle a_{i+1} - a_i > a_i - a_{i-1}, \ \ \ 2 \leq i \leq k-1.

The following conjecture is due to Erdős.

Conjecture (Convex): Suppose {A \subset \mathbb{R}} is convex. Then

\displaystyle |A+A| \gtrsim |A|^2. \ \ \spadesuit

Study of the quantity {d^+(A)} has played a crucial role in recent progress on both conjectures, which we define now. For {A \subset \mathbb{R}} finite, we define

\displaystyle d^{+}(A) := \sup_{B \neq \emptyset}\frac{ E_3^+(A,B) }{|A| |B|^2},

where

\displaystyle E_3^+(A,B) = \sum_x r_{A- B}(x)^3 , \ \ \ \ r_{A-B}(x) = \#\{(a, b ) \in A \times B : x = a - b\}.

See my recent paper on Conjecture (ESconj) for an introduction to {d^+(A)}. We mention here that {1 \leq d^+(A) \leq |A|} and intuitively the larger {d^+(A)} is the more additive structure {A} has. Indeed {d^+(A)} is a higher order analog of the more common additive energy of a set. An application of Cauchy–Schwarz implies, as explain in equation (2) of this paper, reveals

\displaystyle |A|^{3/2} \leq |A+A| d^+(A)^{1/2}.\ \ \ \ \ (1)

Thus if {d^+(A) \lesssim 1}, we have that {|A+A| \gtrsim |A|^{3/2}}. We conjecture that this can be improved.

Conjecture (dplus): Let {A \subset \mathbb{R}} be finite. Then

\displaystyle |A+A| \gtrsim |A|^2 d^+(A)^{-1}. \ \ \ \ \spadesuit

This would imply Conjecture (Convex) and Conjecture (ESconj) for {\delta = 1/2}. The goal of the rest of this post is to prove the current state of the art progress due to Shkredov. We remark that the rest of the post is technical in nature. Please see this survey of de Zeeuw and the introduction of this paper of mine for background. Also, one can see this paper of Murphy, Rudnev, Shkredov, and Shteinikov for another perspective (thanks to Sophie Stevens for pointing this reference out to me).

Theorem 1: (Shkredov) Let {A \subset \mathbb{R}} be finite. Then

\displaystyle |A+A| \gtrsim |A|^{58/37} d^+(A)^{-21/37} . \ \ \spadesuit

His idea is to analyze spectral and combinatorial properties of an associated matrix: the spectral properties depend on {|A+A|}, while the combinatorial properties depend on {d^+(A)}. One of the simplest versions of this idea is that the number of {k}–walks in a graph is the sum of the the {k^{\rm th}} powers of the eigenvalues of the graph, which plays a crucial role in the theory of expander graphs.

For a function {g : \mathbb{R} \rightarrow \mathbb{C}}, we define the following {|A| \times |A|} matrices

\displaystyle T_A^g(x,y) : = 1_A(x)1_A(y) g(x-y), \ \ S_A^g(x,y) : = 1_A(x)1_A(y) g(x+y).

We set

\displaystyle E^+(A , B) := \sum_x r_{A-B}(x)^2 , \ \ \ E^+(A) : = E^+(A,A),

\displaystyle E_3^+(A, B , C) := \sum_x r_{A-A}(x) r_{B-B}(x) r_{C-C}(x) , \ \ E_3^+(A) : = E_3^+(A, A , A) .

Theorem 1 follows from the following two propositions.

Proposition 1: Let

\displaystyle S = \{x \in A+A : r_{A+A}(x) \geq \frac{ |A|^2 }{2|A+A|}\},

be a set of popular sums of {A} and {g = 1_S}. Let {\mu_1 , \ldots , \mu_{|A|}} be the eigenvalues of {S_A^g} with associated orthonormal eigenvectors {f_j : A \rightarrow \mathbb{R}}. Then

\displaystyle \frac{|A|^5}{2^5 |S|} \leq \frac{\mu_1^5}{||g||_2^2 ||g||_{\infty}} \leq \mu_1^2 \langle T_A^{r_{A-A}} f_1 , f_1 \rangle \leq \sum_{j=1}^{|A|} \mu_j^2 \langle T_A^{r_{A-A}} f_j , f_j \rangle \leq d^+(A)^{1/2} |A|^{3/2}E_3^+(A,A,S)^{1/2} . \ \ \ \spadesuit

The second and third inequality hold for general {g}. The right hand side arises from combinatorial properties of {T_A^{r_{A-A}}}, while the left hand side comes from spectral properties. Before proving Proposition 1, we note that in order to apply Proposition 1, we need a bound for {E_3^+(A,A,S)}, which is slightly different than what is considered in the definition of {d^+(A)}. This is handled by the following.

Proposition 2: Let {A \subset \mathbb{R}} be finite and

\displaystyle S = \{x \in A+A : r_{A+A}(x) \geq \frac{ |A|^2 }{2|A+A|}\}.

Then

\displaystyle E_3^+(A,A,S) \lesssim d^+(A)^{11/10} |A|^{6/5} |A+A|^{17/10}. \ \ \ \spadesuit

Theorem 1 follows immediately from combining Proposition 1 and Proposition 2. We start with the proof of Proposition 2.

Proof of Proposition 2: The idea is to bound {E_3^+(A,A,S)} by various second moment estimates and then use the fact that {a - a' = b - b'} if and only if {a + b' = b + a'}. By a dyadic decomposition, there is a set {Q \subset A-A} such that

\displaystyle E_3^+(A,A,S) \gtrsim \sum_{x \in Q} r_{A-A}(x)^2 r_{S-S}(x) , \ \ Q = \{x : q \leq r_{A-A}(x) \leq 2q \}.

We have

\displaystyle E_3^+(A,A,S) \lesssim q E(A,S).\ \ \ \ \ (2)

On the other hand,

\displaystyle E_3^+(A,A,S) \lesssim q^2 \sum_x 1_Q(x) \sum_y 1_S(y) 1_S(x+ y)

\displaystyle \lesssim \frac{|A+A|}{|A|^2} q^2 \sum_y r_{A+A}(y)\sum_x 1_Q(x) 1_S(x+y)

\displaystyle = \frac{|A+A|}{|A|^2} q^2 \sum_y r_{A+A}(y)r_{S-Q}(y) \leq \frac{|A+A|}{|A|^2} q^2 E^+(A,S)^{1/2} E^+(A,Q)^{1/2}.\ \ \ \ \ (3)

By the asymmetric version of (1), we have

\displaystyle E^+(A,Q) \leq d^+(A)^{1/2} |A| |Q|^{3/2},

and we also have

\displaystyle |Q| \leq \#\{x : r_{A-A} (x) \geq q\} \leq E_3^+(A) q^{-3} \leq d^+(A) |A|^3 q^{-3}.

Combining these two, we find

\displaystyle E^+(A,Q) \leq \frac{d^+(A)^2 |A|^{11/2}}{q^{9/2}}.

Using (3), we find

\displaystyle E_3^+(A,A,S) \lesssim E^+(A,S)^{1/2} |A+A| d^+(A) |A|^{3/4} q^{-1/4} \ \ \ \ \ (4)

Combining (2) and (4) yields

\displaystyle E_3^+(A,A,S) \lesssim E^+(A,S)^{1/2} \min \{ q E^+(A,S)^{1/2} , |A+A| d^+(A) |A|^{3/4} q^{-1/4}. \}.

This quantity is minimized when

\displaystyle q^{5/4} = \frac{|A+A| d^+(A) |A|^{3/4}}{E^+(A,S)^{1/2}},

and so

\displaystyle E_3^+(A,A,S) \lesssim E^+(A,S)^{1/10} \left( |A+A| d^+(A) |A|^{3/4} \right)^{4/5}

The result follows from

\displaystyle E^+(A,S) \leq d^+(A)^{1/2} |A| |S|^{3/2}, \ \ |S| \leq |A+A|,

and simplification.

Proof of Proposition 1: Note that {T_A^{r_{A-A}}} is a symmetric, positive–semidefinite matrix. To see the second point, note

\displaystyle T_A^{r_{A-A}}(x,y) = \langle 1_{A - x} , 1_{A - y} \rangle.

Thus {\langle T_A^{r_{A-A}} h , h \rangle \geq 0} for any {h : A \rightarrow \mathbb{C}}. This proves the third inequality

Now, we show the first inequality. We have {||1_S||_2^2 = |S|} and by a popularity argument

\displaystyle \langle S_A^{1_S} 1 , 1 \rangle = \sum_{x \in S} r_{A+A}(x) \geq |A|^2 / 2,

and so {\mu_1 \geq |A| / 2}, since

\displaystyle \mu_1 \geq \sup_{||f||_2 = 1} \frac{\langle T_A^{r_{A-A}} f , f \rangle}{||f||_2}.

We now prove the fourth inequality. Since {f_1 , \ldots , f_{|A|}} are an orthonormal basis of eigenvectors for {S_A^{1_S}},

\displaystyle \sum_{j=1}^{|A|} \mu_j^2 \langle T_A^{r_{A-A}} f_j , f_j \rangle = \sum_{x,y , j } T_A^{r_{A-A}}(x,y) \mu_j f_j(x) \mu_j f_j(y) = \sum_{x,y , j , i} T_A^{r_{A-A}}(x,y) \mu_i f_i(x) \mu_j f_j(y) \sum_z f_j(z) f_i(z)

\displaystyle = \sum_{x,y , z } T_A^{r_{A-A}}(x,y) \sum_i \mu_i f_i(x) f_i(z) \sum_j \mu_j f_j(y) f_j(z) = \sum_{x, y , z} T_A^{r_{A-A}}(x,y) S_A^{1_S} (x,z) S_A^{1_S} (y , z),

where in the final equality, we used that

\displaystyle S_A^{1_S} = \sum_j \mu_j f_j f_j^T,

from the spectral theorem of self adjoint matrices. Now we have

\displaystyle \sum_{x, y , z} T_A^{r_{A-A}}(x,y) S_A^{1_S} (x,z) S_A^{1_S} (y , z) = \sum_{x, y , z \in A} T_A^{r_{A-A}}(x,y) 1_S(x+z) 1_S(y+z).

Using the change of variables {\alpha = x+z} and {\beta = y+z}, we have this is equal to

\displaystyle \sum_{z, \alpha , \beta} r_{A-A}(\alpha - \beta ) 1_S(\alpha) 1_S (\beta) \sum_z 1_A (z) 1_A(\alpha - z) 1_A( \beta - z).

By Cauchy–Schwarz and simplification, we get this is

\displaystyle \leq \left( \left( \sum_{\alpha , \beta} r_{A-A} (\alpha - \beta)^2 1_S(\alpha) 1_S(\beta) \right) \sum_{\alpha , \beta} |\sum_z 1_A(z) 1_A(\alpha - z) 1_A( \beta - z) |^2 \right)^{1/2}

\displaystyle = E_3^+(A)^{1/2} E_3^+(A, A , S)^{1/2}.

The inequality follows from

\displaystyle E_3^+(A) \leq d^+(A) |A|^3.

We now prove the second inequality. It is enough to show

\displaystyle \mu_1 \leq ||g||_{\infty}\left( \sum_z f_1(z) \right)^2, \ \ \ \left( \sum_z \mu_1 f_1(z) \right)^2 \leq ||g||_2^2 \langle T_A^{r_{A-A}} f_1 , f_1 \rangle .

We start with the first inequality. Since {f_1} is a unit vector and {S_A^g f_1 = \mu_1 f_1},

\displaystyle \mu_1 = \sum_x f_1(x) \mu_1 f_1(x) = \sum_{x,y} f_1(x) g(x +y) f_1(y) \leq ||g||_{\infty} \left( \sum_{x} f_1(x) \right)^2.

In the last inequality, we used that we may choose {f_1 \geq 0} entry–wise by the Perron–Frobenius theorem. We move onto the second inequality. Using the change of variables {w = x+y} we find

\displaystyle \sum_{x \in A} \mu_1 f_1(x) = \sum_{x,y} g(x+y) f_1(y) 1_A(x) = \sum_{w , x} g(w) f_1(w-x) 1_A(x) .

By Cauchy–Schwarz and a modest computation, we find this is

\displaystyle \leq ||g||_2 \left( \sum_w \left( \sum_x f_1(w-x) 1_A(x) \right)^2 \right)^{1/2} = ||g||_2 \langle T_A^{r_{A-A}} f_1 , f_1 \rangle^{1/2}. \ \ \spadesuit

We end by mentioning that

\displaystyle |A-A| \gtrsim |A|^{8/5} d^+(A)^{-3/5},

can be inferred from this paper of Schoen and Shkredov, which is stronger than the plus version discussed in this blog post (see this post). Any improvement to either bound would be of interest.

Distinct consecutive r-differences

Junxian Li, a fellow graduate student at UIUC, and I just uploaded our paper On distinct consecutive r-differences to the arXiv.

Our goal is to show for finite A \subset \mathbb{R} with special structure |A+A| \gg |A|^{1 + \delta} for some positive \delta (here we adopt Vinogradov’s notation). Note that |A+A| = 2|A| -1 for arithmetic progressions, so we really need to make assumptions about the structure of A.

Motivated by a paper of Solymosi, we introduce the notion of distinct consecutive r-differences. We say A \subset \mathbb{R} has distinct consecutive r-differences if (a_{i+1} - a_i , \ldots , a_{i+r} - a_{i +r -1}) are distinct for 1 \leq i \leq |A|-r.

Assume A has distinct consecutive r-differences. We show that for any finite B \subset \mathbb{R}, one has |A+B| \gg |A||B|^{1/(r+1)} and that this inequality is sharp in such generality. We wonder if one can improve upon this if B= A and ask what is the largest \theta_r such that |A+A| \gg |A|^{1 + \theta_r/(r+1)}. The above result implies \theta_r \geq 1, while in our paper we use de Bruijn sequences to show \theta_r \leq 2.

When A has additive structure, the results from our paper suggest that A should have few distinct consecutive r-differences. We investigate two of these cases show that these A have very few distinct consecutive differences. In the process, we generalize Steinhaus’ 3-gap theorem as well as a result of Slater concerning return times of irrational numbers under the multiplication map of the Torus.