Category Archives: Additive Combinatorics

Sidon-like Sets

What is the largest set avoiding a linear equation? This general question is a central one in additive combinatorics. Indeed the search of extremal structures gives it a combinatorial flavor, while the presence of an additive equation makes such a question ripe for additive techniques. Such questions include well-studied subjects such as sum-free sets, Sidon sets, and sets avoiding arithmetic progressions. In 1993, Ruzsa made an effort to put these examples into a general framework. In what follows, we focus on Sidon sets and their generalizations, as developed by Ruzsa in the same paper.

We fix {k \geq 2} and consider the linear equation

\displaystyle x_1 + \ldots + x_k = y_1 + \ldots + y_k \ \ \ \ \ (1)

Following, Ruzsa, we say a solution is trivial if the {x_i} are a permutation of the {y_i}. We set {r(N) = r_k(N)} to be the size of the largest set of {[N]} with only trivial solutions to (1). We set {R(N) = R_k(N)} to be the size of the largest set of {[N]} with no solutions to (1) in {A} with {x_1 , \ldots , x_k , y_1 , \ldots y_k} distinct. It follows from these definitions that

\displaystyle r(N) \leq R(N).

We will show below that {R_k(N) \ll_k r(N)}, while it is not known It is not known if {R_k(N) \ll r_k(N)}. Indeed one may feel that most of the solutions to (1) where the variables lie in a set {A} come when the variables are distinct, since otherwise we have imposed an additional constraint. This idea does work well for sets with additive structure, but has shortcomings for general sets. Furthermore, it was shown by Timmons that {r_3(N) \neq (1 + o(1)) R_3(N)}.

For a set {A \subset [N]}, we let {n(A)} be the number of nontrivial solutions to (1) in {A} and let {N(A)} be the number of solutions to (1) with distinct variables.

We briefly remark that in Wooley’s efficient congruencing proof of Vinogradov’s mean value theorem, he estimates quantities similar to {N(A)} (where {A} is the set of {k^{\rm th}} powers), as applications of Hensel’s lemma require the variables to be distinct (modulo a certain prime). Indeed there are some overlapping techniques (see Lemma 1 below).

To start, we first consider the case {k=2}. A set with {n(A) = 0} is known in the additive combinatorics literature as a Sidon set. Thus {r_{A-A}(x) \leq 1} for {x \neq 0}. There is a nice construction due to Ruzsa (modifying a previous construction of Erdős), which is just one of several known constructions.

Example 1 (Large Sidon Set): Let {p} be a prime. Let {g} be a primitive root in {\mathbb{F}_p}. One can check the set of residues in the cyclic group of order {p(p-1)}, defined via the Chinese remainder theorem, by

\displaystyle x = k \ {\rm mod} \ p-1 , \ x = g^k \ {\rm mod} \ p , \ \ \ 0 \leq k \leq p-1,

forms a Sidon set in the cyclic group modulo {p(p-1)} and hence in {\mathbb{Z}}. {\spadesuit}

On the other hand, we have the following upper bound.

Theorem 1: Let {A \subset [N]} be a Sidon set. Then

\displaystyle |A| \leq N^{1/2} + N^{1/4} + 1. \ \ \ \spadesuit

Proof: We follow an argument of Green, which will rely on some basic Fourier analysis (we follow notation from Chapter 4 of Tao and Vu’s book on additive combinatorics). Let {1 \leq u \leq N} and consider {A} as a subset of {Z : = \mathbb{Z} / (N+u) \mathbb{Z}}. Let {I = \{1 , \ldots , u\}}. We have by Parseval,

\displaystyle (N+u)^2\sum_{x \in Z} 1_A * 1_{-A}(x) 1_I * 1_{-I}(x) = (N+u)^3 \sum_{\xi} |\widehat{1_A}(\xi)|^2 |\widehat{1_I}(\xi)|^2 \geq \frac{|A|^2 |I|^2}{N+u},\ \ \ \ \ (2)

in the last step isolating the {\xi = 0} term. Since {A} is a Sidon set we have

\displaystyle (N+u)1_{A} * 1_{-A}(x) \leq 1,

for {0 < |x| \leq u} (note this is not true for other values of {x \in Z}). Thus

\displaystyle (N+u)^2\sum_{x \in Z} 1_A * 1_{-A}(x) 1_I * 1_{-I}(x) \leq |A| u + u^2.\ \ \ \ \ (3)

Combining (2) and (3) and choosing the optimal choice of {u = \lfloor N^{3/4} \rfloor} gives the desired result. {\spadesuit}

Green remarks in his paper that above proof only assumes that {r_{A-A}(x) \leq 1} for {0 < x < N^{3/4}} instead of the full Sidon set property. Combining Example 1 and Theorem 1, we find

\displaystyle N^{1/2} \leq r_2(N) \leq N^{1/2} + N^{1/4} + 1.

It is a famous question of Erdős to improve these bounds (in particular the lower bound), which has been stuck for over fifty years (modulo a improvement by Cilleruelo to {N^{1/2} + N^{1/4} + 1/2}). It seems likely that both the lower and upper bound are not optimal and that the truth lies somewhere between {N^{1/2} + C} and {N^{1/2} + O(N^{\epsilon})} for any {C\geq 1} and {\epsilon > 0}. One can see this blog post of Gowers and the comments within for interesting discussion of this problem.

We now consider general {k} and start with a lower bound.

Proposition 1 (Random sets): For {N} large enough,

\displaystyle r(N) \geq \frac{1}{2}8^{-1/(2k-1)} N^{1/(2k-1)} . \ \ \ \spadesuit

Proof: We use the alteration method, which is discussed in Alon and Spencer’s book on the Probabilistic Method. Choose {A \subset [N]} where each element is chosen independently at random with probability

\displaystyle p = 8^{-1/(2k-1)}\frac{N^{1/(2k-1)}}{N}.

The number of solutions to (1) in {[N]} is at most {N^{2k-1}}, so the expected number of solutions to (1) is at most {p^{2k}N^{2k-1}}. By Markov,

\displaystyle \mathbb{P}(n(A) > 2p^{2k}N^{2k-1}) < \frac{1}{2}.

On the other hand, by say Chebyshev,

\displaystyle \mathbb{P}(|A| < \frac{1}{2} p N) < \frac{1}{2},

for {N} large enough. By the union bound, we may choose an {A} such that {|A| \geq pn/2} and {n(A) \leq 2p^{2k} N^{2k-1}}. By our choice of {p}, we have {n(A) \leq |A|/2}. For each nontrivial solution of (1), we delete an element (the so-called alteration) of {A} that renders it no longer a solution and call the resulting set {A'}. Thus {n(A') = 0} and so {r(N) \geq |A'| \geq pN/2}, as desired. {\spadesuit}

Ruzsa remarks that the probabilistic construction above probably never gives the correct order of magnitude for {r(N)}, where (8) is replaced by a suitable linear equation. We already saw this above, as Example one gives a Sidon set of size {N^{1/2}} while the random construction gives one of size about {N^{1/3}}. Moreover, with an explicit construction, Bose and Chowla showed the following improvement to Proposition 1:

\displaystyle r(N) \geq (1 + o(1)) N^{1 / k},

(see this survey of Kevin O’ Bryant for more information). Timmons showed that

\displaystyle R(N) \geq (2^{1 - 1/k} + o(1))N^{1/k},

by pasting together two sets from the aforementioned Bose-Chowla construction.

We now turn to upper bounds. We let {T_k(A)} be the number of solutions in {A} to (1) For any set with {n(A) = 0}, we have that {T_k(A) \leq k!|A|^k} and so by Cauchy Schwarz

\displaystyle |A|^{2k} \leq |kA| T_k(A) \leq k N k! |A|^k.\ \ \ \ \ (4)

It follows that

\displaystyle r(N) \leq (k! k )^{1/k} N^{1/k},

and thus

\displaystyle r(N) \asymp_k N^{1/k}.

This argument fails to bound {R(N)}, as was observed by Ruzsa, who provided an alternative argument, which we recall below. The basic idea is to still use (4), and to show that {T_k(A)} is still small for sets with {N(A) = 0}. We will need the following lemma.

Lemma 1 (Hölder): Let {f} be a function on the torus. Then

\displaystyle \int |f|^{2k-2} \leq \left( \int |f|^{2k} \right)^{1 - 1/(k-1)} \left(\int |f|^2 \right)^{1/(k-1)}.

To prove this, apply Hölder’s inequality to {|f|^{2k} = |f|^{2k - 2/(k-1)} |f|^{2/(k-1)}} with powers {(k-2)/(k-1)} and {k-1}. Note that one can always interpolate an upper bound for {\int |f|^p} in terms of {\int |f|^u} and {\int |f|^v} as long as {u < p < v}.

Proposition 2: We have

\displaystyle R(N) \leq (1 + o(1) )k^{2 - 1/k} N^{1/k}. \ \ \ \spadesuit

Proof: Suppose that {N(A) = 0} (though we will not make use of this until the end). By (4), it is enough to show

\displaystyle T_k(A) \leq (1 + o(1))k^{2k-2} |A|^k.

Note that

\displaystyle T_k(A) = \int |f|^{2k} , \ \ \ f(x) = \sum_{a\in A} e^{2 \pi i a}.

By Lemma 1 and Parseval, it is enough to show

\displaystyle T_k(A) \leq (1 + o(1))k^2 |A| \int |f|^{2k-2}.

The idea is that {N(A) = 0} allows us to eliminate a variable at a cost of only {k^2 |A|}, as opposed to the trivial bound of {|A|^2}.

We let

\displaystyle s(n) = \#\{(x_1 , \ldots , x_k) \in A^k : x_1 + \ldots + x_k = n, \ \ x_i \ \text{distinct} \},


\displaystyle \sigma(n) = \#\{(x_1 , \ldots , x_k) \in A^k : x_1 + \ldots + x_k = n\}.

Note we have

\displaystyle T_k(A) = \sum_n \sigma(n)^2.

By the triangle inequality in {\ell^2}, it is enough to show

\displaystyle \sum_n (\sigma(n) - s(n))^2 \leq k^4 \int |f|^{2k-2} , \ \ \ \sum_n s(n)^2 \leq k^2 |A| \int |f|^{2k-2} .\ \ \ \ \ (5)

Note the second term is the larger of the two for {|A|} large.

We dispose of the first inequality. Note {\sigma(n) - s(n)} is the number of solutions to {x_1 + \ldots + x_k = n} with some {x_i = x_j}. There are at most {k^2} choices for the indices and then after relabelling we find

\displaystyle 2x_1 + x_2 + \ldots + x_{k-1} = n.


\displaystyle \sum_n (\sigma(n) - s(n))^2 \leq k^4 \int |f(2x)|^2 |f(x)|^{2k-4} \leq k^4 \int |f|^{2k-2},

by Hölder’s inequality.

Now we finally use that {N(A) = 0}. Suppose we have a solution to (1) with distinct {x_i} and distinct {y_j}. Since {N(A) = 0}, this implies {x_i = y_j} for some {1 \leq i , j \leq k}. Thus

\displaystyle \sum_n s(n)^2 \leq k^2 |A| \int |f|^{2k-2},

since we have {k^2} choices for {i,j} and {|A|} choices for {x_i}. {\spadesuit}

The best bound for {r(N)} (for {k} large) is due to Green and is of the form

\displaystyle r(N) \leq (1 + o_N(1)) (1 + o_k(1)) \frac{k}{2e} N^{1/k},\ \ \ \ \ (6)

an improvement to the constant in (4). On the other hand, the best bound for {R(N)} is in a recent paper of Schoen and Shkredov:

\displaystyle R(N) \leq 16 k^{3/2} N^{1/k}, \ \ \ N \ \text{large enough}\ \ \ \ \ (7)

adopting ideas from Timmons as well as Ruzsa’s original work. Schoen and Shkredov implicitly mention the following question.

Question 1: Let {A \subset \mathbb{Z}} such that {N(A) = 0}. Does there exist a large {B \subset A} (say {|B| \geq |A| / 2}) such that {B} has no solutions to

\displaystyle x_1 + \ldots + x_{\ell} = y_1 + \ldots + y_{\ell},\ x_i, y_j \ \text{distinct} \ \ \ 1 \leq \ell \leq k.\ \ \ \ \ (8)

An affirmative answer would allow one to apply (5) and then (4) to {B} and eventually get improved bounds for {R(N)} comparable to what is known for {r(N)} (i.e. the same as (6) up to the absolute constant). Schoen and Shkredov are not able to solve Question 1, but are able to show that (8) holds for many {\ell} and then apply a suitable version of Erdős-Ko-Rado to obtain (7). Underlying their argument is the simple observation that a solution to (1) with {k = m} can be added to a solution with {k = n} to create a new solution with {k = m + n}.

We illustrate part of the argument in the {k=4} case, which is significantly simpler than the general case and already handled by Timmons.

Proposition 3: One has

\displaystyle R_4(N) \leq (1 + o(1)) 2^{1/8} 4^{5/8} N^{1/4}.

Proof: We let {A \subset [N]} with {N(A) = 0}. By the first part of (5) and Lemma 1, we have

\displaystyle T_4(A) \leq (1 + o(1))\sum_n s(n)^2.

By the second part of (5),

\displaystyle \sum_n s(n)^2 \leq 16 |A| \int |f|^{6} \leq 16 |A| (\int |f|^8 )^{1/2} (\int |f|^4)^{1/2},

and so

\displaystyle T_4(A) \leq 16^2 |A|^2 \int |f|^4.

Suppose first that {A} has no solutions to (8) with {\ell = 2}. Then

\displaystyle \int |f|^4 \leq 4 |A|^2 .

Indeed there are at most {2|A|^2} solutions to {a+b= c+d} with either {a=b} or {c=d} and also {\leq 2|A|^2} trivial solutions. It follows that

\displaystyle T_4(A) \leq 4^5 |A|^4.

Combining this with (4) gives the desired result. Now suppose {A} does have a solution to (8) with {\ell =2}, say

\displaystyle x_1 + x_2 = y_1 + y_2. \ \ \ \ \ (9)

Delete these four elements from {A} to create {A'}. Now we claim {A'} does not have a solution to (8) with {\ell = 2}. Indeed if there were, say

\displaystyle a_1 + a_2 = b_1 + b_2,

then adding this to (9) yields

\displaystyle a_1 + a_2 + x_1 + x_2 = b_1 + b_2 + y_1 + y_2,

which contradicts {N(A) = 0}. Thus we may apply the above argument to {A'} to obtain

\displaystyle T_4(A') \leq (1+o(1)) 4^5 |A'|^4,

and the Proposition follows from (4). {\spadesuit}

Exponential Sums over Small Subgroups

The purpose of the post is to recall a theorem of Bourgain and Konyagin that shows cancellation in exponential sums over multiplicative subgroups of {\mathbb{F}_p}, incorporating the point–plane bound incidence bound due to Rudnev. It is notoriously hard to find cancellation in short exponential sums in {\mathbb{F}_p}, for instance improving the Burgess bound is a fundamental open problem in number theory (see this previous blog post for discussion). Bourgain and Konyagin were able to leverage the sum–product phenomenon to show cancellation in certain sums with as few as {p^{\delta}} terms, improving upon the previous best of { p^{1/4+\epsilon}} due to Konyagin (incidentally the Burgess bound relies on a simpler sum–product type bound).

Let {H \leq \mathbb{F}_p^{\times}} be a multiplicative subgroup. We define the Fourier transform of {H} via

\displaystyle \widehat{1_H}(\xi) : = \frac{1}{p} \sum_{h \in H} e_p(-\xi h) , \ \ \ e_p(\theta) : = e^{2 \pi i \theta / p}. \ \ \ \ \ (1)

In (1) and what follows we adopt the notation of Chapter 4 in Tao and Vu (see also my previous blog post for some basic discussion on the discrete Fourier transform).

For any {\xi \in \mathbb{F}_p}, we have the trivial bound

\displaystyle |\widehat{1_H}(\xi)| \leq \mathbb{P}(H) : = |H| / p,

which is obtained at the zero frequency. On the other hand, {1_H} has multiplicative structure and we expect it cannot correlate with an additive character in light of the sum–product phenomenon. This was verified by Bourgain and Konyagin (see also these notes of Green).

Theorem 1 (Bourgain-Glibichuk-Konyagin): Let {H \leq \mathbb{F}_p^{\times}} of size at least {p^{\delta}}. Then

\displaystyle \sup_{\xi \in \mathbb{F}_p^{\times}}|\widehat{1_H}(\xi)| \leq \mathbb{P}(H) p^{-\epsilon}. \ \ \ \spadesuit

Theorem 1 should be compared to the famous Gauss sum estimate (see for instance this previous blog post), but applies to much smaller multiplicative subgroups. The proof relies on three ideas. The first is that if {|\widehat{1_H}(\xi) |} is large for one {\xi \in \mathbb{F}_p^{\times}}, then it is large for many {\xi \in \mathbb{F}_p^{\times}}. Indeed it follows from (1) that

\displaystyle \widehat{1_H}(h \xi) = \widehat{1_H}(\xi) , \ \ h \in H \ \ \ \ \ (2)

We define the spectrum (see Chapter 4 of Tao and Vu for detailed discussion, as well as these notes of Green) via

\displaystyle {\rm Spec}_{\alpha}(H) : = \{\xi \in \mathbb{F}_p : |\widehat{1_H}(\xi) | \geq \alpha \mathbb{P}(H) \}.

By Parseval’s identity of the form

\displaystyle \sum_{\xi \in \mathbb{F}_p} |\widehat{1_H}(\xi)|^2 = \mathbb{P}(H),

and (2), we find

\displaystyle |H| \leq | {\rm Spec}_{\alpha}(H)| \leq \mathbb{P}(H)^{-1} \alpha^{-2}.\ \ \ \ \ (3)

If {|H| = p^{\delta}}, this gives

\displaystyle \alpha \leq p^{1/2 - \delta},

which is only useful for {\delta > 1/2} (for instance this works quite well for Gauss sums). Thus we need new ideas to handle the case {\delta \leq 1/2}. Note this is in alignment with the principle that basic Fourier techniques intrinsically have a “square root barrier.”

The second idea we will use is that {H} has little additive structure in the following form. Recall the additive energy of {A} and {B} is defined via

\displaystyle E^+(A,B) = \{(a,a' , b , b' ) \in A^2 \times B^2 : a + b = a' + b' \}.

Note that

\displaystyle E^+(A,B) = \sum_x r_{A- B}(x)^2 , \ \ \ r_{A-B}(x) : = \#\{(a,b) \in A \times B : x = a -b \}.

Proposition 1 (Sum–Product): Let {H \leq \mathbb{F}_p^{\times}} and {|H|^2 |A| \leq p^2}. Then

\displaystyle E^+(H ,A) \leq |H| |A|^{3/2} \ \ \ \spadesuit

Proposition 1 should be compared to the trivial bound {E^+(H,A) \leq |H| |A|^2}.

Proof: We will use Rudnev’s point–plane incidence bound (Theorem 3 in this paper). To do so, we note that {E^+(H,A)} counts the number of solutions to

\displaystyle h + a = h' + a' , \ \ \ a , a' \in A , \ h, h' \in H.

Since {H} is a multiplicative subgroup, {|H|^2 E^+(H,A)} is the number of solutions to

\displaystyle hh_1 + a = h' h_1' + a' , \ \ \ a , a' \in A , \ h, h' , h_1 , h_1' \in H.

This is precisely the number of incidences between the point set {H \times H \times A} and planes of the form {h x + a = h' y + z}. Thus by Rudnev’s point–plane incidence bound of the form (note there is a condition on the number of maximum colinnear planes which is trivially satisfied in our cartesian product set–up)

\displaystyle I(P , \Pi) \ll n^{3/2} , \ \ \ |P| = |\Pi| = n,

we find

\displaystyle |H|^2 E^+(H,A) \ll |H|^3 |A|^{3/2} . \ \ \ \spadesuit

We now move onto the third idea and general principle is that {{\rm Spec}_{\alpha}(H)} is additively structured. The following Lemma is due to Bourgain and can be found in Lemma 4.37 in Tao and Vu.

Lemma 1 (Additive Structure in Spectrum): Let {A \subset \mathbb{F}_p} and {0 < \alpha \leq 1}. Then for any {S \subset {\rm Spec}_{\alpha}(A)}, one has

\displaystyle \#\{ (\xi_1 , \xi_2) \in S \times S : \xi_1 - \xi_2 \in {\rm Spec}_{\alpha^2/2}(A) \} \geq \frac{\alpha^2}{2} |S|^2. \ \ \ \spadesuit

Lemma 1 roughly asserts that the spectrum is closed under addition. For example, consider the example {A = \{1 , \ldots , K\}} where {K = o(p)}. Here {{\rm Spec}_{\alpha}(A)} is an interval of length {\asymp \alpha^{-1}} (there are more sophisticated examples, see this paper of Green).

Proof of Lemma 1: We set {f = 1_A}. By assumption we have

\displaystyle \alpha \mathbb{P}(A) |S| \leq \sum_{\xi \in S} |\widehat{f}(\xi)| = \sum_{\xi \in S} c(\xi) \widehat{f}(\xi) = \frac{1}{p} \sum_{\xi \in S} \sum_{a \in A} c(\xi) e_p(\xi a),

for some {c(\xi) \in \mathbb{C}} of modulus 1. By changing the order of summation and Cauchy Schwarz,

\displaystyle \mathbb{P}(A)^2 \alpha^2 |S|^2 \leq \frac{|A|}{p^2} \sum_{a\in A} \sum_{\xi , \xi' \in S} c(\xi) c(\xi') e((\xi - \xi') a) \leq \mathbb{P}(A) \sum_{\xi , \xi' \in S} |\widehat{f}(\xi - \xi')|.

Lemma 1 follows from pigeon holing. {\spadesuit}

Suppose, for the sake of discussion, that for all {\alpha}, {S \subset {\rm Spec}_{\alpha}(A)} does not have additive structure in the strong form form {r_{S-S}(x) \ll 1} for all {x}. Then we conclude

\displaystyle \{ (\xi_1 , \xi_2) \in S \times S : \xi_1 - \xi_2 \in T \} \ll |T|,\ \ \ \ \ (4)

and so by Lemma 1 we have a significant growth from {{\rm Spec}_{\alpha}(A)} to {{\rm Spec}_{\alpha^2/2}(A)}. But then applying this again to {{\rm Spec}_{\alpha^2/2}(A)} we have significant growth to {{\rm Spec}_{\alpha^4/8}(A)} and repeating this procedure will eventually contradict the trivial bound

\displaystyle |{\rm Spec}_{\alpha}(A)| \leq p.

When {H} is a multiplicative subgroup, we can show a weaker version of (4) using that {{\rm Spec}_{\alpha}(H)} is a union of cosets of {H} via (2) and Proposition 1. We turn to the details.

Proof of Theorem 1: Fix {0 < \alpha \leq 1} be chosen small enough so that {S_{\alpha} : = {\rm spec}_{\alpha}(H)} is nonempty. By Lemma 1,

\displaystyle \#\{ (\xi_1 , \xi_2) \in S_{\alpha} \times S_{\alpha} : \xi_1 - \xi_2 \in {\rm Spec}_{\alpha^2/2}(H) \} \geq \frac{\alpha^2}{2} |S_{\alpha}|^2,

that is

\displaystyle \frac{\alpha^4}{4} |S_{\alpha}|^4\leq \left( \sum_{z \in {\rm Spec}_{\alpha^2/2}((A)(H) } r_{S_{\alpha} - S_{\alpha}}(z) \right)^2 \leq | {\rm Spec}_{\alpha^2/2}(H)| E^+(S_{\alpha} , S_{\alpha}).\ \ \ \ \ (5)

Now we use Proposition 1 to provide an upper bound for {E^+(S_{\alpha} , S_{\alpha})}. By (2), {S'_{\alpha} : = S_{\alpha} \setminus \{0\}} is a union of cosets of {H}, say {S'_{\alpha} = \cup_{x \in C} Hx}. Thus by the triangle inequality in {\ell^2} and Proposition 1,

\displaystyle E^+(S'_{\alpha} )^{1/2} \leq \sum_{x \in C} E^+(S'_{\alpha} , Hx)^{1/2} = \sum_{x \in C} E^+(S'_{\alpha}x^{-1} , H)^{1/2} \ll \frac{|S_{\alpha}|}{|H|}|H|^{1/2} |S'_{\alpha}|^{3/4}.

Combining with (5), we find

\displaystyle \alpha^4 |H| | {\rm Spec}_{\alpha}(H)|^{1/2} \ll | {\rm Spec}_{\alpha^2/2}(H)| .

By (3), we find that {|{\rm Spec}_{\alpha}(H)| > |H| + 1} and so

\displaystyle \alpha^4 p^{\delta / 2} \leq \alpha^4 |H|^{1/2} \ll \frac{| {\rm Spec}_{\alpha^2/2}(H)|}{|{\rm Spec}_{\alpha}(H)|} .\ \ \ \ \ (6)

Now we let {1 > \alpha_1 > \alpha_2 > \ldots > \alpha_{J+1} > 0}, where {\alpha_{i+1} = \alpha_i^2 / 2} and {\alpha_1 = p^{-\epsilon}}. Thus {{\rm Spec}_{\alpha_1}(H)} contains more than 0 or immediately conclude Theorem 1. Thus (6) holds for all {\alpha_i} since {{\rm Spec}_{\alpha_i}(H)} increases in size as {i} increases. We have

\displaystyle \prod_{i = 1}^{J} \frac{| {\rm Spec}_{\alpha_{i+1}}(H)|}{|{\rm Spec}_{\alpha_i}(H)|} = \frac{| {\rm Spec}_{\alpha_{J+1}}(H)|}{|{\rm Spec}_{\alpha_1}(H)|} \leq p ,

and so there is a {1 \leq j \leq J} such that

\displaystyle \frac{| {\rm Spec}_{\alpha_{j+1}}(H)|}{|{\rm Spec}_{\alpha_j}(H)|} \leq p^{1/J}.

Combining with (6), we find

\displaystyle p^{- 2^J \epsilon + \delta /2} \frac{1}{2^{J}} \ll \alpha_j^4 p^{\delta/2} \ll p^{1/J}.

Choosing {2^J \epsilon \asymp 1}, we find that

\displaystyle p^{\delta / 2} \ll \epsilon^{-1} p^{1 / \log \epsilon^{-1}},

which is a contradiction for {p} large as long as {\delta \ll \log^{-1} \epsilon^{-1}}. {\spadesuit}

As we saw in the proof, the sum–product phenomenon asserts that {H} has little additive structure which is in tension with the general property that {{\rm Spec}_{\alpha}(A)} is additively structured.


Entropy and Sumsets: An example

The following post is a result of a discussion with Imre Ruzsa. Motivated by the following easy inequality in additive combinatorics

\displaystyle A+2 \cdot A \subset A+A+A , \ \ q \cdot A := \{qa : a \in A\},

I asked if the following was true for a finitely valued random variable {X}:

\displaystyle H(X+2 \cdot X) \leq H(X+X+X), \ H(X) := -\sum_{x \in X} \mathbb{P}(X = x) \log_2 \mathbb{P}(X = x).\ \ \ \ \ (1)

Here all sums are of independent copies of the random variables. The idea is that one might expect {X+X} to be a bit more uniform than {2 \cdot X}.

First Imre provided a counterexample to the question

\displaystyle H(X+ 2\cdot Y) \leq H(X+Y+Y).

I find this example is particularly elegant. Let {X} be uniform on {\{0,1\}} and {Y} be uniform on {\{0 , \ldots , n\}}. Then {X+2 \cdot Y} is uniform on {\{0 , \ldots , 2n+1\}}, while the support of {X + Y + Y} is {\{0 , \ldots , 2n+1\}} but is not uniform (there is concentration in the middle thanks to the distribution of {Y+Y}).

We then seriously questioned the validity (1). After some discussion, Imre eventually said something about higher dimensional concentration that made me think one should check (1) for the “Gaussian.” The reason Gaussian is in quotes is that it is not finitely valued as assumed in (1), so strictly speaking we cannot check it for the Gaussian. To see if there was hope, I looked at the differential entropy of a real valued random variable {G} with density {p} defined via

\displaystyle H(G) := -\int_{-\infty}^{\infty} p(x) \log p(x) dx.

Let us take {G} to be the Gaussian with mean zero (this is irrelevant for entropy) and variance 1. Recall some basic properties of variance:

\displaystyle {\rm Var}(aG) = a^2 {\rm Var}(G) , \ \ {\rm Var}(G+G) = 2 {\rm Var}(G),

where {a \in \mathbb{R}} and {G+G} is understood to be the sum of two independent copies of {G}. Thus

\displaystyle {\rm Var}(G + 2 \cdot G) = 5 , \ \ {\rm Var}(G + G +G ) = 3.

So we indeed see that (1) is not true for the Gaussian. To construct a finitely valued random variable that does not satisfy (1), we can convolve a Bernoulli random variable with itself until (1) is not satisfied (assuming that going from discrete to continuous does not destroy (1) which is not obvious without checking as {2 \cdot X} has a strange support condition, for instance the same argument would prove H(2 \cdot G) \geq H(G+G) which is clearly not true for discrete random variables). Anyways, I wrote some quick python code to check this and found that for {X = B + B + B} where {B} is the random variable of a fair coin flip, we have

\displaystyle H(X+2 \cdot X) \approx 2.984 , \ \ H(X+X+X) \approx 2.623.

Here {X+ 2 \cdot X} and {X+X+X} are supported on {\{0 , \ldots , 9\}} and so their entropies are bounded by the entropy of uniform distribution on 10 elements which is

\displaystyle \log_2 10 \approx 3.322.

Sometimes entropy analogs of sumset inequalities hold and sometimes they do not (see this paper of Ruzsa or this paper of Tao, or a host of work by Madiman and coauthors).

Sum of a set and its dilate

Let {A, B \subset \mathbb{Z}} be finite and nonempty. One of the first things we learn in additive combinatorics is

\displaystyle |A+B| \geq |A| +|B| - 1. \ \ \ \ \ (1)

There are a host of results improving on (1) when {B} has certain structure. For instance, all of the following have an short and elementary proof. Here {q \cdot A := \{qa : a \in A\}}.

  • (1) {|A+2\cdot A| \geq 2 |A| - 3}
  • (2) {|A+A -A| \gg |A|^2}, for {A} convex,
  • (3) {|A+AA| \geq |A|^2 + |A| - 1},
  • (4) {|A + A^{-1}| \geq |A|^2}.

Let’s see one way to prove (1). After translation and dilation, we may suppose that {A} contains both even and odd numbers. Let {A_0} be the even numbers in {A} and {A_1} be the odd numbers in {A}. By (1),

\displaystyle |A+2 \cdot A| = |A_0 + 2 \cdot A| + |A_1 + 2 \cdot A| \geq |A_0| + |A| - 1 + |A_1| + |A| - 1 = 3 |A| - 2.

We remark that there is another proof of this in which one picks {2 |A| - 3} elements of {A+2 \cdot A} in increasing order.

Antal Balog and I considered the question of improving (1) when {B = q \cdot A }. We proved the following.

Theorem 1: Let {q\geq 1} be an integer and {A \subset \mathbb{Z}} be finite. Then

\displaystyle |A+q \cdot A| \geq (q + 1) |A| - O_q(1). \ \ \ \spadesuit

This is best possible up to the constant as can be seen by taking {A} to be an arithmetic progression. Below in the fold, we will a proof in the case where {q} is a prime. This case is technically easier, and was first handled by Cilleruelo, Hamidoune, and Serra. Already the case {q=3} is non–trivial, though a proof of {|A+3 \cdot A| \geq 4 |A| - 4} had appeared several times in the literature. We adopt the proof from our paper, which can be modified to handle the general case.

We partition {A} into residue classes modulo {q} as follows:

\displaystyle A = \displaystyle \bigcup_{j = 1}^s A_j , \ \ A_j = a_j + q \cdot A_j^{\prime}, \ \ A_j \neq \emptyset ,\ \ 0 \leq a_j < q.\ \ \ \ \ (2)

We say {A} is fully–distributed mod {q} (FD) if {s=q} or equivalently {A} intersects every residue class modulo {q}. The key fact is that not only the sets {A_i} are disjoint, but so are the sets {A_i + q \cdot A} and we have

\displaystyle A + q \cdot A= \coprod_{i =1}^s A_i + q \cdot A.\ \ \ \ \ (3)

As long as {|A| \geq 2}, we may assume that {s \geq 2} since we may replace {A} with {\frac{1}{q} \cdot (A - x)}. This is actually a key point: Theorem 1 is translation and dilation invariant. We first show that Theorem 1 is true for random sets.

Lemma (FD): Suppose {A} is FD. Then

\displaystyle |A+q \cdot A| \geq (q+1)|A| - q. \ \ \ \spadesuit

Proof: By (1) and (3), we have

\displaystyle |A+q \cdot A| = \sum_{j=1}^q |A_j + q \cdot A| \geq \sum_{j=1}^q \left(|A_j |+ |A| - 1 \right) = (q + 1)|A| - q. \ \ \ \spadesuit

Now we are in a position to prove Theorem 1.

Proof of Theorem 1: We prove

\displaystyle |A+q \cdot A| \geq \frac{m}{q} |A| - C_m,\ \ \ \ \ (4)

by induction on {m}. For {m = 2q}, this follows from (1) with {C_m = 1}. We now assume (4) holds for some {m < q(q+1)} and show it for {m+1}.

Suppose that there is some {1 \leq i \leq s} such that {|A_i| \leq q^{-1} |A|}. Then by (1), (4) and {m < q(q+1)}, we have

\displaystyle |A+q \cdot A| \geq |A_i + q \cdot A| + |(A \setminus A_i) + q \cdot (A \setminus A_i)|

\displaystyle \geq |A_i| + |A| - 1 + \frac{m}{q} (|A| - |A_i|) - C_m \geq \frac{m+1}{q} |A| - C_m - 1.

Suppose now that {A_i'}, as defined in (2) is FD for all {1 \leq i \leq s}. By (3) and translation and dilation invariance of {|X+ q \cdot X|}, we have

\displaystyle |A+q \cdot A| \geq \sum_{j=1}^s |A_j + q \cdot A_j| = \sum_{j=1}^s |A'_j + q \cdot A'_j| \geq (q+1) |A| - q^2.

Thus we may suppose that

\displaystyle |A_i| \geq \frac{|A|}{q} , \ \ 1 \leq i \leq s,\ \ \ \ \ (5)

and there is some {k} such that {A'_k} is not FD. We then apply the following lemma.

Lemma (not FD): Suppose {A_k'} is not FD. Then

\displaystyle |A_j+ q \cdot A| \geq |A_j + q \cdot A_j| + \min_{1 \leq m \leq s} |A_m|. \ \ \ \spadesuit

Let’s use Lemma (not FD) to finish the proof. By Lemma (not FD), (4) twice and (5) we have

\displaystyle |A+q \cdot A| \geq |A_k + q \cdot A| + |(A \setminus A_k) + q \cdot (A \setminus A_k)|

\displaystyle \geq |A_k + q \cdot A_k| + \min_{1 \leq m \leq s} |A_m| + \frac{m}{q} (|A| - |A_k|) - C_m

\displaystyle \geq \frac{m}{q} |A_k| - C_m + \frac{|A|}{q} + \frac{m}{q} (|A| - |A_k|) - C_m

\displaystyle = \frac{m+1}{q} |A| - 2 C_m.

This completes the proof modulo Lemma (not FD).

Proof of Lemma (not FD): Suppose

\displaystyle |A_k + q \cdot A| < |A_k + q \cdot A_k| + \displaystyle \min_{1 \leq m \leq s} |Q_m|.

Then for any {1\leq m \leq s}, we have

\displaystyle |A_m^{\prime}| = |A_m| > |(A_k+ q \cdot A_m )\setminus (A_k+ q \cdot A_k) | = |(a_m - a_k + A_k^{\prime} + q \cdot A_m^{\prime}) \setminus (A_k^{\prime} + q \cdot A_k^{\prime})|.

It follows that for every {x \in A_k^{\prime}} there is a {y \in A_m^{\prime}} such that {a_m-a_k+x+qy\in A'_k + q\cdot A'_k}, and so there is an {x'\in A'_k} such that

\displaystyle a_m - a_k + x \equiv x' \pmod q.

We may repeat this argument with {x'} in place of {x} and so on to get that for every {x \in A'_k} and {w \in \mathbb{Z}}, there is {z \in A'_k} such that

\displaystyle w(a_m - a_k) + x = z \pmod q.

Since {q} is prime, we get that {A'_k} is FD, as desired. {\spadesuit}

Since this work, I worked out the general case of sum of many dilates. After that, Antal Balog and I worked on the analogous problem in vector spaces. Both papers contain several open problems and one of them appears on my problems page. Further, there is a paper of Breuillard and Green on contraction maps, for which sum of dilates is a special case. Also, feel free to see my short video I made on the topic, which highlights some of the problems.

One natural generalization that remains open is

\displaystyle |A+q \cdot A| \geq (|q| + 2d -1)|A| - O_{d , q}(1),

where {A \subset \mathbb{Z}^d} and not contained in any {d-1} dimensional subspace. The cases {d=2} and {d=3} are known to hold.

Breaking the 6/5 threshold for sums and products modulo a prime

Ilya Shkredov and I just released our preprint, “Breaking the 6/5 threshold for sums and products modulo a prime.”

Since this paper of Roche–Newton, Rudnev, and Shkredov, the best bound for the sum–product problem in {\mathbb{F}_p} was given by the following.

Theorem (Sum–Product): Let {A \subset \mathbb{F}_p} of size at most {p^{5/8}}. Then

\displaystyle |A+A| + |AA| \gg |A|^{6/5}. \ \ \spadesuit

The major breakthrough was a point–plane incidence theorem of Rudnev. Let us recall their proof, which has been simplified since their original.

We consider the set of points and lines

\displaystyle P = (A+A) \times (AA), \ \ \ \mathcal{L} = \{(u,v) \in \mathbb{F}_p^2 : v = a(u - c)\}_{a , c \in A}.

Note that each line contains the points {(b + c , ab)} and so there are at least {|A|^3} incidences. On the other hand, one may apply a point–line incidence, for instance the cartesian version found in Theorem 4 of this paper of Stevens and Zeeuw to obtain an upper bound for the number of incidences between {P} and {\mathcal{L}}. Combining these two bounds gives Theorem 1.

We are able to make the following improvement.

Theorem 1: Let {A \subset \mathbb{F}_p} of size at most {p^{3/5}}. Then

\displaystyle |A+A| + |AA| \gg |A|^{6/5 + c}, \ \ \ c = 4/305. \ \ \spadesuit

We can improve this application of a Szemerédi–Trotter type bound in the specific case of the sum–product problem. The basic idea is to evaluate the above proof and obtain a bound for the {\tau}–rich lines, instead of incidences. We then obtain the bound for the higher order energy:

\displaystyle d_4^+(A) \lesssim |AA|^2 |A|^{-2} , \ \ d_4^+(A) := E_4^+(A,B) |A|^{-1} |B|^{-3}.\ \ \ \ \ (1)

A simple application of Cauchy–Schwarz gives

\displaystyle |A+A|^{4/3} d_4^+(A)^{-1/3} \leq |A+A|. \ \ \ \ \ (2)

Combining (1) and (2) recovers Theorem 1. We are able to make an improvement to (2) and in turn to the sum–product problem.

Covering sumsets and Plünnecke’s inequality

Recall Plünnecke’s inequality which states the following. We write {mA} for all of the {m}–fold sums of {A}.

Theorem (Plun): Let {A} be a finite subset of an abelian group. Let {K = |A+A| |A|^{-1}} be the doubling constant. Then

\displaystyle |mA | \leq K^{m}|A|. \ \ \ \spadesuit

We describe one natural attempt to prove Theorem (Plun). Suppose that

\displaystyle A+A \subset A+X,

for some small {X}. Then

\displaystyle A+A+A \subset A+X+X.\ \ \ \ \ (1)

and so

\displaystyle |A+A+A| \leq |A| |X|^2.

This motivates the following definition.

Definition (Cov): Let {A} be a finite subset of an abelian group. We let

\displaystyle C(A) = \min \{|X| : A+A \subset A+X , \ X \subset A\}. \ \ \ \spadesuit

Clearly {C(A) \leq |A|}, since we may choose {|X| = |A|}. Note that

\displaystyle C(A) \geq |A+A| |A|^{-1},\ \ \ \ \ (2)

since if {A+A \subset A+X}, then {|A+A| \leq |A| |X|}. One natural question is to what extent can one reverse (2). By the argument in (1),

\displaystyle |mA| \leq C(A)^{m-1} |A|,

and so this would have implications to Plünnecke’s inequality.

The truth is one cannot reverse (2). We give an example due to Ruzsa.

Example: We construct a set {Y} in {\mathbb{Z}^3} via

\displaystyle Y := \left( [n] \cdot e_1 \times [n] \cdot e_2 \times [n] \cdot e_3 \right) \cup [n^2] e_1 \cup [n^2] e_2 \cup [n^2] e_3.


We have the following properties of {Y}:

\displaystyle |Y| \asymp n^3 , \ \ |Y+Y| \asymp n^4 = n |Y| , \ \ C(Y) \asymp n^2 \asymp n |Y+Y| |Y|^{-1}.

Note that {|Y+Y|} comes from the sum of a line of size {n^2} with the box, while {C(Y)} comes from the sum of two perpendicular lines of size {n^2}. The point is that one cannot efficiently cover a two dimensional object with either a one dimensional object or a three dimensional object, even in the presence of additive structure. {\spadesuit}

We do have {C(A) \lesssim |A+A| |A|^{-1}} statistically in the following sense.

Proposition (Stat): Let

\displaystyle S = \{x \in A+A : r_{A+A}(x) \geq |A|^2 |A+A|^{-1}\}.


\displaystyle S \subset A+X, \ \ \ |X| \leq 3|A+A| |A|^{-1} \log |A|. \ \ \spadesuit

This should be compared to this post of Tao, where he talks about an entropy version of Plünnecke’s inequality.

Proof: We choose {X} uniformly at random of size {p |A|}. Let {d = |A|^2 |A+A|^{-1}}. Then for {s \in S},

\displaystyle \mathbb{P}(s \notin A+X) = (1-p)^{r_{A+A}(s)} \leq (1-p)^{d} \leq \exp{(-pd)}.

We choose {p = 3d^{-1} \log |A| =3 |A+A| |A|^{-2} \log |A|}. If {p > 1}, the Proposition follows from taking {X=A} and so we assume the opposite. Let {Z} be the random variable that denotes the number of {s \in S} not covered by {A+X}. By the union bound,

\displaystyle \mathbb{E}[|Z|] \leq |A+A| \exp{(-pd)} \leq |A|^{-1}.

Thus there is a choice of {X} so that {Z = 0} and thus {S \subset A+X} {\spadesuit}

On the other hand, we can bound {C(A)} in terms of the doubling constant if we are allowed to take {X} a subset of the smoother {A+A-A}.

Proposition (Smooth): Let {A} be a finite subset of an abelian group. Then {A+A \subset A+X} where {X \subset A+A-A} and {|X| \ll |A+A-A||A|^{-1} \log |A|}. Thus {A+A} can be covered by {\ll K^3 \log |A|} translates of {A}.

The proof of Proposition (Smooth) is similar to that of Proposition (Stat), so we omit it.

This is related to the following annoying question. We let {m \cdot A} be the dilate of {A} by {m}. We know from Plünnecke’s inequality that

\displaystyle |A+ 2 \cdot A| \leq |A+A+A| \leq K^3 |A| , \ \ K = |A+A| |A|^{-1}.\ \ \ \ \ (3)

This inequality came up in Lemma 5.1 of the almost periodicity paper of Croot and Sisask.

Question (aplus2a): Does there exist a {\delta < 3} such that

\displaystyle |A+2 \cdot A| \leq K^{ \delta} |A|? \ \ \ \spadesuit

This was asked in a paper of Bukh and appears on his website. Note that both of the inequalities in (3) can be basically tight. For the first one, consider and arithmetic progression, and for the second, the example given above.

We now give a construction that shows one cannot take {\delta} too small in Question (aplus2a), even for large {A}, using the tensor power trick.

Example: Consider {A = \{0,1\}}. Then {|A+A| = 3} and {|A+2 \cdot A| = 4}. This example shows that one cannot take {\delta} smaller than

\displaystyle \log (2) / \log (3/2) \approx 1.71.

To make large examples, consider {A = \{0,1\}^d} for some large {d}.

Monochromatic Hilbert cubes and arithmetic progressions

József Balogh, Misha Lavrov, Adam Wagner, and I recently uploaded our paper “Monochromatic Hilbert cubes and arithmetic progressions” to the arXiv.

Van der Waerden’s theorem is a central question in arithmetic Ramsey theory. The goal is to find bounds for {W(k,r)}, which is the least integer such that any {r}–coloring of {1 , \ldots , W(k,r)} contains a {k}–term arithmetic progression. The current records are of the form

\displaystyle c\cdot r^{k-1} \leq W(k,r) \leq 2^{2^{r^{2^{2^{k+9}}}}}.

The upper bound is due to Gowers, while the lower bound is due to Kozik and Shabanov (see also Blankenship, Cummings, and Taranchuk for the prime case).

A Hilbert cube, or affine {k}–cube, is a subset of the integers of the form

\displaystyle \left\{x_0 + \sum_{b \in B} b : B \subseteq A\right\} =: x_0 + \Sigma^* A,

for some {|A|=k} and {x_0 \in \mathbb{Z}}. Hilbert cubes are an important substructure in arithmetic Ramsey theory, coming up in the Gower’s uniformity norm, the density Hales Jewett theorem, and Szemerédi’s original proof of his theorem. We set {h(k,r)} to be the least integer such that any {r}–coloring of {1 , \ldots , W(k,r)} contains an affine {k}–cube. The current records are of the form

\displaystyle r^{ck^2} \leq h(k,r) \leq O(r)^k.

The lower bound is due to Conlon, Fox, and Sudakov, while the upper bound was originally due to Szemerédi and is the so–called Szemerédi cube lemma. Note that the gap in bounds for Hilbert cubes is much less than the gap in Van der Waerden’s problem.

Improving the upper bound would be of interest, and there are a plethora of proofs for it. For instance, one nice geometric proof can be found in this paper of Graham and Solymosi and a standard averaging can be found in these notes of Zeeuw.

We instead consider improving the lower bound. A first try is to randomly {r}–color the integers {1 ,\ldots , n} and show that with high probability there are no monochromatic affine {k}–cubes. An affine {k}–cube of the form {x_0 + \Sigma^* A} is not monochromatic with probability

\displaystyle \frac{2}{2^{\Sigma A}}.

It turns out if every {A} was disassociated, that is having {|\Sigma^* A| = 2^k}, then we could complete this argument and greatly improved upon the Conlon, Fox, Sudakov bound. The problem is with sets of the form {x_0 + \Sigma^* A}, where {\Sigma^* A} is small. The question is how small. It turns out that {|\Sigma^* A| \geq {k +1 \choose 2}}. To make improvements, we must consider the case {|\Sigma^* A| \leq k^{2 + \delta}}. But then, in light of Littlewood–Offord theory, one might expect {A} to look a lot like an arithmetic progression. Thus {\Sigma^* A} should also be like an arithmetic progression. Thus we are lead back to Van der Waerden’s problem. Our main tools for making this precise are this paper of Nguyen and Vu and this paper of Szemerédi and Vu.

Our main theorem is the following.

Theorem 1: Let {k \geq 3} be an integer. Then for every {\epsilon >0}, there is a {c > 0} such that

\displaystyle h(k,4) \ge \min\{W(\lfloor c k^2 \rfloor, 2), 2^{k^{2.5-\epsilon}}\}.

Theorem 1 asserts that either

  • (i) the best lower bound for {W(k,2)} is far from sharp and {h(k,4)} is larger than the Conlon, Fox, Sudakov bound,
  • (ii) the best lower bound for {W(k,2)} is nearly sharp and so the Hilbert cube number is very close to the Van der Waerden number.

Graham conjectured that {W(k,2)} should be closer to the lower bound as opposed to the tower type upper bound. One dream would be to use this to prove a $1000 conjecture of Graham, by showing a bound of the form {h(k,4) \leq 2^{k^{2.5 - \delta}}}. Needless to say, this is likely hard.