Category Archives: Number Theory

Quantitative Hilbert irreducibility and almost prime values of polynomial discriminants

Recently AndersonGafniLemke OliverLowry-DudaZhang and myself uploaded ”Quantitative Hilbert irreducibility and almost prime values of polynomial discriminants” to the arXiv. Let {H} be a large integer,

\displaystyle V_n(H) = \{x^n + a_{n-1} x^{n-1} + \cdots + a_0 : a_{n-1} , \ldots , a_0 \in \mathbb{Z} \cap [-H,H]\},


\displaystyle E_n(H) = \#\{f \in V_n(H) : {\rm Gal}(f) \neq S_n\}.

A classical principle going back to at least Hilbert is that a typical polynomial should have Galois group {S_n}, and this would suggest {\#E_n(H)} is significantly smaller than {\#V_n(H)}. Van der Waerden conjectured that {E_n(H) = o(\#V_n(H) /H)}. Van der Waerden’s conjecture was almost proved quite recently by Chow and Dietmann (for {n \neq 7,8,10}), except they are not able to estimate 

\displaystyle E_n(H , A_n) = \#\{f \in V_n(H) : {\rm Gal}(f) = A_n\}.

Gallagher showed, using the large sieve, that 

\displaystyle E_n(H) = O\left(\frac{\#V_n(H) \log H}{H^{1/2}} \right).\ \ \ \ \ (1)

We show the following.

{\ } Theorem 1: Let {H} be a positive integer. then 

\displaystyle \#E_n(H , A_n) = O\left(\frac{\#V_n(H) H^{O(1/n)}}{H^{2/3}} \right). \ \ \ \spadesuit{\ }

We note that Bhargava recently announced a proof of {E_n(H , A_n) = O(\#V_n(H) / H)}, which is stronger than Theorem 1. We remark that our techniques also apply to counting polynomials with almost prime discriminants (see Theorem 1.2 and the related 1.3). 

We briefly mention some of the ideas used to prove Theorem 1. Classical algebraic number theory reveals that if the Galois group of a polynomial {f} is {A_n}, then {\mu_p(f) = (-1)^{n+1}}, where {\mu_p} is the Mobius function for {\mathbb{F}_p[x]}. Thus we can employ a sieve to estimate {\#E_n(H , A_n) }. However, instead of using the large sieve, we use the small sieve (more precisely, the Selberg sieve). Using this naively gives a bound similar to that of Gallagher in (1), as expected. The key advantage of using the small sieve is we can seek improvements if we have good distribution results. After employing Poisson summation our task in then to understand local factors, which turns out to be basically 

\displaystyle \widehat{\mu_p}(\xi_{n-1} , \ldots , \xi_0) =,\ \ \ \ \

\displaystyle = \frac{1}{p^{n}}\sum_{a_0 , \ldots , a_{n-1} \in \mathbb{F}_p} \mu(x^n + a_{n-1} x^{n-1} + \ldots + a_0) \psi_p(\xi_{n-1} a_{n-1} + \cdots + a_0 \xi_0),\ \ \ \ \ (2)

where {\psi_p : \mathbb{F}_p \rightarrow \mathbb{F}_p} is an additive character. This is morally the same problem as estimating  \displaystyle \sum_{n \leq x} \mu(n) e(nx), where {\mu} is the classical Mobius function. Baker and Harman showed that GRH implies that for any {\epsilon > 0}

\displaystyle |\sum_{n \leq x} \mu(n) e(nx)| \ll x^{3/4 + \epsilon}.

As the Riemann hypothesis is known in the function field setting, such a bound should also hold for (2). Indeed Porritt showed this is the case and this allows us to establish Theorem 1.

Exponential Sums along Oblique Lines

Thanks to Changhao Chen, Burak Erdoğan, and Igor Shparlinski for useful discussions surrounding this post.

Let {k} be a positive integer (which we take later to be {\geq 11}). We consider the exponential sum

\displaystyle S(x,y) : = \sum_{n=N}^{2N} e(xn + yn^k),\ \ \ \ \ (1)

where {e(x) : = e^{2\pi i x}}. We are interested in bounds for

\displaystyle \sup_{(x,y) \in \mathcal{L}_z} |S(x,y)|,

where {\mathcal{L}_z \subset [0,1)^2} are a family of sets indexed by some parameter {z\in \mathbb{R}}. For simplicity, we only consider, for fixed {(a,b) \in \mathbb{R}},

\displaystyle \mathcal{L}_z = \{(x,y) \in [0,1)^2 : ax + by = z\}.

We would like to show that for most {z}, (1) is small. This is supported by the usual heuristic that we expect square root cancellation in (1). On the other hand, {S(x,y)} is large for some special values of {(x,y)} (say {(0,0)}), so it certainly matters how the {\mathcal{L}_z} lie in space. Burak Erdoğan and I studied these types of questions (also the topic of this previous blog post) motivated from understanding the fractal dimension of solutions to certain PDE (see the introduction and references within for a complete history). We let

\displaystyle s(k) = k(k-1) + 2 \lfloor \sqrt{2k-2}\rfloor - 2.

{\ } Theorem 1 (Chen-Shparlinski): Let {\epsilon > 0} and

\displaystyle \alpha > 1 - \frac{1}{1 + s(k)}.

Then for a.e. {z \in \mathbb{R}}, with respect to the Lebesgue measure,

\displaystyle \sup_{(x,y) \in \mathcal{L}_z} S(x,y) \ll_z N^{\alpha + \epsilon}, \ \ \ \spadesuit

{\ }

Let us make some remarks before beginning the proof. It is worth noting that for very small {k} we know the best value of {\alpha}. Indeed, Brandes, Parsell, Poulias, Vaughan, and I showed {\alpha = 3/4} is admissible and cannot be improved. In the aforementioned paper of Erdoğan, we mentioned that one could obtain a variant of Theorem 1 by invoking Vinogradov’s mean value theorem, which is a best possible bound for

\displaystyle \int_{[0,1]^k} |\sum_{n=N}^{2N} e(x_1 n + x_2 n^2 + \cdots + x_k n^k)|^{p}.\ \ \ \ \ (2)

The guiding principle is that if an exponential sum is large at a single point, then one can create many other points where the exponential sum is large. On the other hand, there cannot be too many points where this occur as (2) is small. This is a somewhat unsatisfactory approach, as it is not clear that the {k} variable mean value in (2) is the right tool to analyze the two variable {S(x,y)}. Recently, Chen and Shparlinski instead utilized the following two variable mean value type theorem of Wooley, which turns out to improve the bounds a bit and simplify the proof.

{\ } Theorem 2 (Wooley): Suppose {k \geq 11} is an integer. Then for any {\sigma \geq s(k)}

\displaystyle \int_0^1 \int_0^1 |S(x,y)|^{\sigma} dx dy \leq N^{\sigma - k - 1 + o(1)}. \ \ \spadesuit

{\ }

Note that Theorem 2 is best possible, in a certain sense. By considering a small {N^{-1} \times N^{-k}} rectangle near {(0,0)}, we see

\displaystyle \int_0^1 \int_0^1 |S(x,y)|^{\sigma} dx dy \gg N^{\sigma - k - 1}.

Thus Theorem 2 cannot be improved much, for the values of {\sigma} for which it applies. It is not clear that the range of {\sigma} is best possible. A natural conjecture is that Theorem 2 holds for

\displaystyle \sigma > 2(k+1) .

Such an improvement would improve Theorem 1.

Proof of Theorem 1: We only prove for {N=2^j}, for simplicity. The reader may consult Chen and Shparlinski’s paper for the general case, where the additional idea is to employ the completion technique.

Let {\epsilon > 0} and {0 < \alpha < 1}. We partition {[0,1)^2} into a grid of {O(N^{2 \alpha - k - 3 - 2 \epsilon})} small rectangles of size approximately

\displaystyle N^{\alpha - 2 - \epsilon} \times N^{\alpha - k - 1 - \epsilon}.

We label these rectangles by

\displaystyle \cup_{R \in \mathcal{R}} R.

The point is that (1) is does not change much on such rectangles. Indeed it is easy to check, using {e(x) = 1 + O(x)}, that (for {N} large enough) if

\displaystyle |S(x,y)| \geq N^{\alpha},

for some {(x,y) \in R}, then

\displaystyle |S(x',y')| \geq N^{\alpha}/2,

for any {(x',y') \in R}. We let {\mathcal{R}_{\alpha} \subset \mathcal{R}} consist of the rectangles {R} such that there is a {(x,y) \in \mathcal{R}} with {|S(x,y)| \geq N^{\alpha}}. Combining this with the mean value estimate in Theorem 2, we see that {\#\mathcal{R}_{\alpha}} cannot be too large.

Indeed, Markov’s inequality and Theorem 2, we see that for {\sigma \geq s(k)},

\displaystyle N^{2 \alpha -2 -k - 1 - 2 \epsilon} N^{\alpha \sigma} \#\mathcal{R}_{\alpha} \leq \int_0^1 \int_0^1 |S(x,y)|^{\sigma} dx dy \leq N^{\sigma - k - 1 + o(1)}.


\displaystyle \#\mathcal{R}_{\alpha} \leq N^{(1-\alpha)\sigma -2\alpha+2 + 2 \epsilon + o(1)}.\ \ \ \ \ (3)

We now consider the image of these rectangles under the map

\displaystyle (x,y) \mapsto ax + by.

We have

\displaystyle \{z\in \mathbb{R} : |S(x,y)| \geq N^{\alpha}, \ \ \text{for some} \ (x,y) \in \mathcal{L}_z\} \subset f\left(\bigcup_{R \in \mathcal{R}_{\alpha}}R\right) = \bigcup_{R \in \mathcal{R}_{\alpha}}f(R),


\displaystyle f(x,y) = ax+by.

Note that {f} does not distort rectangles too much, so that

\displaystyle \lambda(f(R)) \ll_{a,b} N^{\alpha - 2 + \epsilon}.

where {\lambda} is the Lebesgue measure. Thus by subadditivity of the Lebesgue measure and (3)

\displaystyle \lambda(\{z\in \mathbb{R} : |S(x,y)| \geq N^{\alpha}, \ \ \text{for some} \ (x,y) \in \mathcal{L}_z\} )\ll_{a,b} N^{\alpha - 2 +\epsilon} N^{(1-\alpha)\sigma -2\alpha+2 + 2 \epsilon + o(1)}.

Note here that {N} is fixed. What we actually care about is what happens for a fixed {z} and {N \geq N(z)} for some large {N(z)}. There is a standard trick from probability (or analysis) to apply the Borel-Cantelli lemma. We first apply the above result with {N = 2^j} to find

\displaystyle \lambda(\{z\in \mathbb{R} : |S(x,y)| \geq 2^{j\alpha}, \ \ \text{for some} \ (x,y) \in \mathcal{L}_z\} ) \leq 2^{j((1-\alpha)\sigma - \alpha + 2 \epsilon )}.\ \ \ \ \ (4)

By the Borel-Cantelli lemma, if

\displaystyle \sum_{j=1}^{\infty} 2^{j((1-\alpha)\sigma - \alpha + 2 \epsilon )}< \infty,

then the set of {z} such that (4) holds for infinitely many {j} has measure zero. This is implied by

\displaystyle (1-\alpha)\sigma - \alpha < 0,

as long as {\epsilon} is sufficiently small. This, in turn, is implied by

\displaystyle \alpha > \frac{\sigma}{\sigma + 1} = 1 - \frac{1}{\sigma + 1} . \ \ \ \spadesuit

On generating functions in additive number theory, II: Lower-order terms and applications to PDEs

Julia BrandesScott ParsellKonstantinos PouliasBob Vaughan, and I recently upload our paper “On generating functions in additive number theory, II: lower-order terms and applications to PDEs,” to the arXiv. This is somewhat of a sequel to earlier work of Vaughan as well as a recent paper of Burak Erdogan and myself (see also this previous blog post). This work began during the Heilbronn Decoupling and Efficient Congruencing workshop hosted by Kevin Hughes and Trevor Wooley.

In what follows, we adopt Vinogradov’s notation and thus write {a \ll b} when {a = O(b)}. Let {t} and {x} be real numbers. We are interested in the exponential sum 

\displaystyle S(t,x) := \sum_{1 \leq n \leq N} e(tn^3 + nx) , \ \ \ e(\theta) : = e^{2 \pi i \theta}.\ \ \ \ \ (1)

We could replace the exponent 3 by any other integer exponent, but we choose 3 for simplicity, as well as being the case of most interest. Weyl obtained, using what is now known as Weyl differencing, estimates for (1), showing that for almost all {t}, and every {\epsilon >0}

\displaystyle \sum_{1 \leq n \leq N} e(tn^2 + nx) \ll N^{1/2+\epsilon} , \ \ \ \sum_{1 \leq n \leq N} e(tn^3 + nx) \ll N^{3/4 + \epsilon}.\ \ \ \ \ (2)

The quadratic estimate is best possible, as is seen from Parseval. We expect a matching estimate for the cubic exponential sum in (2), but (2) is still the best that is known today. The current project originally started in an attempt to understand better three different proofs of (2). We suppose 

\displaystyle t = a/q + \beta_t , \ \ \ |\beta_t| \leq \frac{1}{q^2}, \ \ \ (a,q) = 1,\ \ \ \ \ (3)

and then choose {j} so that 

\displaystyle |x - j/q| \leq 1/(2q).\ \ \ \ \ (4)

By uncertainty principle heuristics, we expect {S}, as defined in (1), to be constant on scales 

\displaystyle |t-a/q| \leq \frac{1}{N^3} , \ \ \ |x-j/q| \leq \frac{1}{N}.

On the other hand {S(a/q , j/q)} is easy to calculate: 

\displaystyle S(a/q , j/q) =\frac{N}{q} \sum_{j=1}^q e(a/q j^3 + nj/q) + O(q).

In the circle method, one needs precise versions of this heuristic. For instance, it is known (see Theorem 7.2 in Vaughan’s book) that 

\displaystyle S(t,x) = S(a/q , j/q)\frac{ I(\beta_t , \beta_x)}{N} + \Delta,\ \ \ \ \ (5)


\displaystyle I(\beta_t , \beta_x) : = \int_{1}^N e(\beta_t u^3 + \beta_x u)du.

and the error term satisfies 

\displaystyle \Delta \ll q(1 + |\beta_x| N + |\beta_t |N^3),\ \ \ \ \ (6)

Note that the estimate, (6), becomes nontrivial only if {\beta_x N} and {\beta_t N^3} are much less than 1 as predicted by the uncertainty principle. A proof of (6) follows relatively quickly from summation by parts. One should think of the main term as being of size roughly {N/\sqrt{q}}, in light of Gauss sum estimates. 

Vaughan, Theorem 4.1, asserts the improvement, in the special case {\beta_x = 0}

\displaystyle \Delta \ll q^{1/2 + \epsilon} (1 + N^3 |\beta_t|)^{1/2}.\ \ \ \ \ (7)

It is worth noting that (7) allows for a proof of Weyl’s inequality, (1). Unfortunately the appearance of the {N^3} (as opposed to a {N^2}) seems to prevent one from going beyond Weyl’s inequality (perhaps because one is using a major arc technique to prove a minor arc result). One consequence of our main result below allows for a proof of Weyl inequality in a similar manner with the presence of a nontrivial linear phase. 

Brüdern and Robert improved (6) to 

\displaystyle \Delta \ll q^{2/3 + \epsilon} (1 + |\beta_x| N + |\beta_t |N^3)^{1/2}.\ \ \ \ \ (8)

In the present paper we show that the estimate (7) still holds in the case when {\beta_x \neq 0}, provided one takes into account addition main terms, thus improving (7).

{\ } Theorem 1: Suppose {t} and {x} are as above, as in (3) and (4). Then 

\displaystyle S(t,x) =\sideset{}{^\dag} \sum_{\substack{d | q \\ ([[j/d]] , q/d) = 1}} S(d[[j /d]] / q , a/q) \frac{I(\beta_t, x - d [[j/d]] )}{N} + \Delta,


\displaystyle \Delta \ll q^{1/2 + \epsilon} (1 +\beta_t N^3)^{1/2}.

Here {[[z]]} is the nearest integer to {z} and {\sideset{}{^\dag}\sum} indicates the sum is over distinct values of {d [[j/d]]}. {\spadesuit} {\ }

The key point is that there are multiple main terms in Theorem 1, given in the sum, which allows an improvement of the error term in (7)

Theorem 1 also indicates that one may be able to perform a major arc only analysis of the system of equations 

\displaystyle \sum_{j=1}^4 x_i^k - y_i^k, \ \ \ k = 1,3,

though we do not pursue this in the current paper. We are also able to use Theorem 1 to estimate the following.

{\ } Theorem 2: Let {k = 2} or {k = 3}. For a.e. {c \in \mathbb{R}} and every {\epsilon >0}

\displaystyle \sup_{N \geq 1} \frac{1}{N^{3/4+\epsilon}} \sup_{t} | \sum_{1 \leq n \leq N} e(t n^k + (t+c)n| \ll 1.\ \ \ \ \ (9)

Furthermore (9) is best possible, in the sense that one cannot replace the exponent {3/4} with anything smaller. {\spadesuit} {\ }

One can interpret the condition in (9) as the supremum of {S(t,x)} such that {(t,x)} lies on the diagonal line (on the torus), {u+v = -c}. Naively one would expect square root cancellation in (9), though it turns out this is not the case. The point is that square root cancellation is a heuristic that only applies to minor arcs, while the supremum in (9) is obtained on a major arc. Thus it is natural to utilize estimates that can handle both minor and major arcs simultaneously, such as the one in Theorem 1. Combining Theorem 2 with previous joint work with Burak Erdo\u gan, we can provide bound for the fractal dimension of the real and imaginary parts of the solution to Airy’s equation, 

\displaystyle u_t + u_{xxx} = 0 , \ \ \ u(0,x) = g(x).\ \ \ \ \ (10)

{\ } Corollary 1 (Fractal Dimension of Airy): Let {u(t,x)} be the solution to (10) with {g} a non-constant characteristic function of an interval. Then the maximum fractal dimension of the real and imaginary parts of {u(t,x)} restricted to 

\displaystyle \{(t,x) \in \mathbb{T}^2 : t+x = -c\},

is at most {17/8} for a.e. {c}. {\spadesuit} {\ }

The introduction of the aforementioned joint work contains a detailed motivation and history to this problem. It was also explored a bit in two undergraduate projects, and one can find some pictures they generated in a 2018 project and another in a 2019 project. We finish by noting the bound of {17/8} in Corollary 1 does not match the known (and expected) lower bound of {3/4}.

Spacing of Quadratic Fractions

The following is a joint blog post with Kyle Pratt, a fellow graduate student at UIUC.

Recall a theorem of Fermat, which asserts that a prime {p} is {\equiv 1 \pmod{4}} if and only if it is the sum of two squares. From Dirichlet’s theorem on primes in arithmetic progressions we conclude there are infinitely primes that can be written as the sum of two squares:

\displaystyle p = m^2+\ell^2 \ \ \ \ \ (1)

Fouvry and Iwaniec showed this can be strengthened in a strong way, by showing there are infinitely many primes of the form in (1) with {\ell} prime.

Kyle and I, mostly to understand Fouvry and Iwaniec’s proof better, considered (a couple of years ago) the analogous question with

\displaystyle \ell^2 - N m^2 , \ \ \ N > 0 , \ \ \ \ \ (2)

replacing {\ell^2 + m^2}. We were lead to a rather interesting spacing result of fractions of the form

\displaystyle \frac{v}{d}, \ \ \ \ \ v^2 - N \equiv 0 \, (\text{mod } d), \ \ \ \ D \leq d \leq 2D,

which we explain below in the fold, using some basic theory of quadratic forms as well as a Brook’s theorem from graph theory.

We (very) briefly recall the structure of the Fouvry-Iwaniec argument. They look to asymptotically estimate

\displaystyle \sum_{\ell^2 + m^2 \leq x} \Lambda(\ell) \Lambda(\ell^2 + m^2) ,\ \ \ \ \ (3)

where {\Lambda} is the Von Mongoldt function. They use Vaughan’s identity on {\Lambda(\ell^2 + m^2)} to decompose (3) into Type I and Type II sums for the sequence

\displaystyle a_n = \sum_{\ell^2 + m^2 =n} \Lambda(\ell).

Great complications from the type II sums ensue, but the fundamental idea is to utilize the quadratic form appearing in (3) to factor

\displaystyle a_{mn} = \sum_{a^2 + b^2 = m} \sum_{c^2 + d^2 = n} b_{m,n}

where {b_{m,n}} encodes a rather strange looking condition that is eventually managable. In essence, this follows from the Brahmagupta-Fibonacci identity of the form

\displaystyle (a^2+b^2)(c^2 + d^2) = (ad-bc)^2 + (ac+bd)^2.

Properties of Gaussian integers underly their analysis.

In this post we are more concerned with the Type I estimates. We look for a bound of the form

\displaystyle \sum_{d \leq x^{\theta}} \Bigg| \sum_{\substack{n \leq x \\ d | n}} a_n - \frac{1}{d} \sum_{n\leq x} a_n \Bigg| \ll x^{1 - \epsilon}. \ \ \ \ \ (4)

The largest {\theta} for which (4) holds is the so-called level of distribution. In the analogous Bombieri-Vinogradov theorem, one can take any {\theta<1/2} (thought only saving an arbitrary power of log), while it turns out here we can take the much stronger {\theta <1} here. This is of the same strength as the unproven Elliot-Halberstam conjecture, as we are allowed to take advantage of the structure of the quadratic form {\ell^2 + m^2}. In the end, the Bombieri-Vinogradov theorem relies on the spacing of distinct Farey fractions

\displaystyle \left|\frac{a}{q} - \frac{a'}{q'}\right| \geq \frac{1}{qq'}, \ \ \ (a,q) = (a',q') = 1.

In a similar manner, a spacing result for the fractions

\displaystyle \{\frac{v}{d} : D \leq d \leq 2D , \ v^2 + 1 \equiv 0 \ {\rm mod} \ d\},

for a fixed large {D}, can be used to provide a bound for (4).

Thus in our search for primes of the form in (2), we are lead to study the spacing of fractions of the form

\displaystyle \Big\{\frac{v}{d} : D \leq d \leq 2D , \ v^2 - N \equiv 0 \, (\text{mod }d)\Big\}, \ \ \ \ \ (5)

where {N} and {D} are fixed ({D} large). For simplicity we take {N} to be a discriminant of a quadratic number field, though this is not a significant assumption. The main spacing result looks as follows.

{ \ }

Proposition 1: The fractions in (5) can be partitioned into sets {F_1 , \ldots , F_t} such that {t = O_N(1)}, and for any distinct {x,y \in F_j} we have

\displaystyle \| x - y \| \gg \frac{1}{D}. \ \ \ \spadesuit

{ \ }

We remark this estimate is best possible up to logarithms. Proposition 1 combined with some reductions and the large sieve can be used to establish (4).

{ \ }

Proof of Proposition 1: Suppose {v^2 - N \equiv 0 \ (\text{mod }d)}. Then we have

\displaystyle d = ar^2 + 2brs + cs^2,\ \ \ \ \ (6)

for some {a,b,c,r,s\in \mathbb{Z}}. Note we can choose the tuples {(a,b,c)} to come from the {h(N)} inequivalent quadratic forms of discriminant {N}. One can check that after multiplication by units in the ring of integers of {\mathbb{Q}(\sqrt{N})}, we may assume {r \asymp_N s}. It follows that {r , s \asymp_N D^{1/2}}

We have, as in this paper of Hooley (page 106), that

\displaystyle \frac{v}{d} = \frac{\overline{r}}{s} - \frac{ar+bs}{sd}, \ \ \ r\overline{r} \equiv 1 \, (\text{mod }s).\ \ \ \ \ (7)


\displaystyle \Big|\frac{\overline{r}}{s} - \frac{\overline{r'}}{s'}\Big| =\Big| \frac{r s' - r' s}{ss'}\Big| \gg \frac{1}{D}, \ \ \ r s' - r' s \neq 0. \ \ \ \ \ (8)

Thus we want to show {\frac{ar+bs}{sd}} is much smaller than {1/D}. We have the trivial bound

\displaystyle \frac{ar+bs}{sd} \ll_N 1/D.\ \ \ \ \ (9)

This is not good enough to ensure a spacing of {1/D} as the error term can be bigger than the main term by a multiplicative constant. We cannot even ensure such a spacing for all the fractions in (5), but what we are able to do is partition the fractions into sets

\displaystyle F_1, \ldots , F_t , \ \ \ t = O_N(1),

such that we can improve (8) for the fractions that lie inside each set. Indeed for each {a,b,c, r,s} there is a unique {v} and {d} that satisfy (6) and (7). Thus there are at most {O_N(1)} choices of {v,d} for a fixed {r,s}.

{ \ }

Lemma 1: Suppose {(x,y) = 1} and {x,y \asymp_N D^{1/2}}. Then there are at most {O(k)} choices of coprime {x',y' \asymp D^{1/2}} so that {xy' - x'y = k}. {\spadesuit}

{ \ }

Proof of Lemma 1: Note that, by coprimality, {k=0} is impossible. We have {x' = x_0 + xt}, {y' = y_0 + yt}. By size considerations, there are {O(1)} choices for {t} and thus {x'} and {y'}. {\spadesuit}

{ \ }

We choose {K = K(N)} to be a sufficiently large constant. By Lemma 1, for a fixed fraction {v/d} from (5), there are at most {O_N(1)} choices {v',d'} so that

\displaystyle \Big|\frac{v}{d} - \frac{v'}{d'}\Big| \leq \frac{1}{D}.

We create a graph {G} with vertex set from (5) and {v/d \sim v'/d'} if {|r s' - r' s| \leq K}. Thus {G} has maximum degree {O_N(1)}. By Brooks’ theorem, we may find a coloring of {G} with {O_N(1)} color classes {F_1, \ldots , F_t}.

Inside a color class, we improve the lower bound in (8) to {K/D} while (9) remains the same and so {\|v/d - v'/d'\| \gg_N 1/D}, as desired. {\spadesuit}

Breaking the Square root Barrier via the Burgess Bound

Thanks to Kyle Pratt for enlightening discussions leading to this post.

Recall the Burgess bound which states for any interval {I \subset \mathbb{F}_p} of size at least {p^{1/4 + \epsilon}}, one has

\displaystyle \left| \sum_{x \in I} \chi(x) \right| \ll \frac{|I|}{p^{\delta}}, \ \ \ \ |I| \geq p^{1/4+ \epsilon}\ \ \ \ \ (1)

where {\chi} is the Legendre symbol modulo {p}.

Conjecture 1: For any {\alpha > 0}, (1) holds for intervals, {I}, of size {p^{\alpha}}. { \ \ \ \spadesuit}

Conjecture 1 is one way quantify that {\chi} behaves like a random sign pattern. Indeed almost surely, (1) holds for a random {\pm 1}–valued function from {\mathbb{F}_p}.

Character sums are central objects of study in analytic number theory and such a result would have wide applications. For instance, it would solve the Vinogradov least square problem. It would also have implications to sub–convexity bounds of {L}–fucntions, using

\displaystyle L(1/2 , \chi) = \sum_{n \leq \sqrt{p}} \chi( n) / \sqrt{n} + O(\log p).

For instance, summation by parts and (1) gives the sub–convexity bound

\displaystyle L(1/2 , \chi) \ll p^{1/4 - \delta} .

The key obstacle is the famous square root barrier. In {N} coin flips, one can change {\sqrt{N}} of the outcomes without affecting the global statistics, for instance in the central limit theorem. Thus when applying global statistics to problems in short intervals, one naturally arise at a square root barrier for the size of the interval.

To this end, it is no surprise that the Polya–Vinogradov inequality, that is conjecture 1 with {\alpha = 1/2 + \epsilon}, was proved about forty years before the Burgess bound. The Polya–Vinogradov theorem can be proved using only the global statistics of {\chi}, for instance for all {\xi \in \mathbb{F}_p},

\displaystyle \left| \sum_{x \in \mathbb{F}_p} \chi(x) e_p(\xi x) \right| \leq \sqrt{p} , \ \ \ e_p(\theta) : = e^{2 \pi i \theta / p }.

On the other hand, the Burgess bound breaks the square root barrier, so one needs additional ideas, which we now discuss. We first use {I+x \approx I} for small {x}. Then we use the multiplicativity of {\chi} to amplify an interval of size {p^{1/4 + \epsilon}} to a set of size {p^{1/2 + \epsilon}}. Finally, we use completion and apply a global statistic of {\chi} (Hasse-Weil bound). Without the amplification step, this method would not work. Let us sketch the proof.

By the near translation invariance of {I}, one has

\displaystyle \sum_{x \in I} \chi(x) =\sum_{x \in I} \chi(x + yt) + O(p^{1/4 + \epsilon / 2}),

for all {y \in J = \{1 , \ldots , \lfloor p^{1/4} \rfloor \}} and {t \in T = \{1 , \ldots ,\lfloor p^{\epsilon / 2} \rfloor \}}. But by the multiplicativity of {\chi},

\displaystyle \left| \sum_{y \in J ,t \in T} \sum_{x \in I} \chi(x + yt) \right| \leq \sum_{x \in I , y \in J} \left| \sum_{t \in T} \chi(y^{-1} x + t) \right| .\ \ \ \ \ (2)

Now the number of ways to write an integer as {xy^{-1}} is roughly one on average on a set of size {p^{1/2 + \epsilon}}, so (2) is approximately

\displaystyle \sum_{u \in S} \left| \sum_{t \in T} \chi(u+ t) \right| ,

for some {S} of size {p^{1/2 + \epsilon}}. Since {S} is so large, we can now efficiently complete the sum after an application of Hölder’s inequality to arrive at

\displaystyle \sum_{u \in \mathbb{F}_p} \left| \sum_{t \in T} \chi(u+ t) \right|^{2r} .

This may be bounded by the following global statistic. We remark in our case, there is a short proof using the polynomial method (the proof in the note is only for elliptic curves, but is easily generalized to our situation).

Lemma 1: Let {f} be a polynomial which is not a multiple of a perfect square. Then

\displaystyle \sum_{x \in \mathbb{F}_p} \chi (f(x)) = O_{{\rm deg}(f)} (\sqrt{p}).

In particular,

\displaystyle \sum_{u \in \mathbb{F}_p} \left| \sum_{t \in T} \chi(u+ t) \right|^{2r} \ll_r p^{1/2} |T|^{2r} + p |T|^r. \ \ \spadesuit

Putting this altogether, as done beautifully in Chang’s survey one arrives at the Burgess bound of the form (1), for {\delta = \epsilon^2 / 4}.

Another successful instance of breaking the square root barrier is in this paper of Hanson. One result he proved was for any {\delta > 0},

\displaystyle \sum_{a \in A , b \in B , c \in C} \chi(a + b + c) = o_{\delta} (|A||B||C|),

for arbitrary {A, B , C} of size at least {\delta \sqrt{p}}. He takes advantage of the additive structure of the problem as was pioneered in this paper of Chang using Freiman’s theorem from additive combinatorics. This has the same flavor as the Burgess bound in (1) as one replaces the very additively structured interval {I} with a sumset. The main problem along these lines is the following

Conjecture 2: For any {A, B} of size at least {p^{\alpha}}, there is a {\delta >0} such that

\displaystyle \sum_{a \in A , b \in B} \chi (a + b) \ll |A| |B| p^{-\delta}. \ \ \spadesuit.

This would have an application to sparse signal recovery. One can see Bourgain’s work and the references within for the dual problem, where multiplicative character is replaced by additive character and sumset is replaced by product set.

Granville and Soundararajan (Theorem 2) building upon work of Montgomery and Vaughan showed that by assuming GRH, one has

\displaystyle \sum_{n \leq x} \chi (n) \sim \sum_{n \leq x} \chi(n) \Psi_y(n) , \ \ \ y \geq \log^5 q ,

where {\Psi_y} is the characteristic function of {y}-smooth numbers. Thus the sum in (1) (with {I = [1 , \ldots , x]}) is determined solely by the values of {\chi} at rather small primes. Using smooth number estimates, this implies Conjecture 1 in the case {I = [1, \ldots , x]}.


Fractal solutions of dispersive partial differential equations on the torus

Burak Erdoğan and I just put “Fractal solutions of dispersive partial differential equations on the torus” on the arXiv. We study cancellation in exponential sums and how this leads to bounds for the fractal dimension of solutions to certain PDE, the ultimate “square root cancellation” implying exact knowledge of the dimension.

In this post, we consider the case of Schrödinger’s equation with initial data {\chi := \chi_{[0 , \pi]}} for simplicity. Here,

\displaystyle if_t + f_{xx} = 0 , \ \ t \in \mathbb{R}, \ \ x \in \mathbb{R} / 2 \pi \mathbb{Z} \ \ , \ \ f(0,x) = \chi(x) ,

and the solution is given by

\displaystyle f(t,x) = \sum_{n \in \mathbb{Z}} \widehat{\chi}(n) e^{i t n^2 + i n x}.

Note the solution is periodic in both {x} and {t}.

For a line {\mathcal{L} \subset \left( \mathbb{R} / 2 \pi \mathbb{Z} \right)^2}, we are interested in the fractal dimension of the real and imaginary parts of the graph of {f |_{\mathcal{L}} (t,x)}. This dimension must lie in the interval {[1,2]}.

An important case to consider is {t = a/q}, which leads to so–called quantization and is related to the Talbot effect. In this case, it was shown by Berry and Klein that

\displaystyle f(t,x) = \sum_{j=0}^{q-1} c_j \chi_{[0 , 1/q] + j/q} (x), \ \ \ \ \ (1)

for some {c_j \in \mathbb{C}} that are Gauss sums. Thus {f} is a linear combination of at most {q} intervals and has fractal dimension 1. See page 4 of this paper of Chen and Olver for some pictures of this phenomenon.

The story is entirely different for irrational {t}. To see why, observe for a sequence {\frac{a_n}{q_n} \rightarrow t}, the functions {f(\frac{a_n}{q_n} , x)} as given in (1) increase in complexity as {q_n} increases.

To make this precise, we use the theory of Besov spaces and show that bounds for the {L^p} norm of

\displaystyle H_N(t,x) : = \sum_{N \leq n < 2N} e^{itn^2 + i nx}, \ \ (t,x) \in \mathcal{L},

imply fractal dimension bounds for the real and imaginary parts of {f_{\mathcal{L}}(t,x)}.

Remark: The starting point of the theory of of Besov spaces that we use can be illustrated by the following basic facts from analysis:

  • The graph of a real valued {C^{\gamma}} function has fractal dimension {\leq 2 - \gamma},
  • For {g \in C^{\gamma}}, one has {\widehat{g}(n) \ll n^{-\gamma}}.

Thus we see that both fractal dimension and Fourier decay are related to the smoothness of {g}. In our situation, we have more information than that of the conclusion of the second bullet, that is we know that

\displaystyle \sum_{N \leq n < 2N} \widehat{g}(n),

exhibits cancellation and so we are able to use more involved theory to roughly reverse the implication in the second bullet as well as provide lower bounds. ♠

Let us first consider the case of horizontal lines in space–time, that is {t} fixed. Then by orthogonality, Weyl’s inequality, and the divisor bound, one has for almost every {t},

\displaystyle ||H_N||_{L_x^2} = N^{1/2} , \ ||H_N||_{L_x^{\infty}} \ll N^{1/2 + \epsilon}, \ ||H_N||_{L_x^4} \ll N^{1/2 + \epsilon}.

By a previous blog post, this implies

\displaystyle ||H_N||_{L_x^1} \gg N^{1/2 - \epsilon}.

One can use this to show that the fractal dimension of {f_{\mathcal{L}}(t,x)} is equal to {3/2} for almost every horizontal line, recovering a theorem of Rodnianski. In one part of our paper, we adapt the above strategy to show the following.

Theorem (oblique): The fractal dimension of the real and imaginary parts of the solution to (1) restricted to almost every

\displaystyle \mathcal{L} = \{(t,x) : a t + bx = c\},

is in the interval

\displaystyle [7/4 , 19/10]. ♠

Note that the lower bound 7/4 is bigger than the 3/2 for horizontal lines. To show the upper bound in Theorem (oblique), we study exponential sums such as

\displaystyle \sup_x |\sum_{n \leq N} e^{i(c+x)n^2 + i nx}| , \ \ c \in \mathbb{R} / 2 \pi \mathbb{Z}.

In this case, square root cancellation would imply the fractal dimension is exactly {7/4}.

In our paper, we study a variety of dispersive PDE, for instance the Airy, KDV, non–linear Schrödinger, free water boundary, Boussinesq, and gravity–capillary wave equation. To handle non–linear equations, we use smoothing estimates, which roughly state that the solution to a nonlinear PDE is the linear part plus a smoother perturbation.

Gaps in prime-like sets

This post was inspired by a question of Trevor Wooley after a talk of James Maynard at MSRI. He asked what was known for lower bounds of large gaps between integers that have a representation as the sum of two squares. We assume some familiarity with the large gaps between primes problem.

A theorem of Fermat asserts that an integer, n is represented by the sum of two squares if and only if all the prime factors of n equivalent to 3 modulo 4 appear with an even power.

For technical simplicity, we consider the set of integers that simply have no prime factors equivalent to 3 modulo 4, but one may extend the considerations below to the original question.

The size of such a set is proportional to x / \log^{1/2} x. Thus the average gap is \asymp \log^{1/2} x and so the average gap is at least this big.

Surprisingly, there is an easy argument that yields a gap of size \gg \log x. We sketch it here. For each prime p \leq \log x equivalent to 3 modulo 4, choose a residue class a_p \pmod p. Using a greedy algorithm to choose the a_p, we may sieve out the integers in \{1 , \ldots , y\} for y \ll \log x. By the Chinese remainder theorem, there is a t \leq x such that t = - a_p \pmod p for all the relevant p. Thus t+1 , \ldots , t+y all contain a prime factor equivalent to 3 modulo 4, as desired.

What should one expect? Adopting the Cramer model, we expect that there should be gaps of size at least \asymp \log^{3/2 - \epsilon} x. We give a rough sketch as to why one should expect this. Let S = \{ n \leq x : p|n \implies p \neq 3 \pmod 4\}. We suppose that an integer is chosen random with probability q = 1 / \log^{1/2} x. Then the probability that n+1 , \ldots , n_h \notin S is (1- q)^h which is well approximated by e^{-qh}. If h \asymp (\log x) / q, then this is around the number of choices for n. One can see the Cramer model worked out more carefully here.

Fix q and a coprime to q. Now we consider the set S = \{n \leq x : p|n \implies p \equiv a \mod q\}. The Cramer model, as above, suggests that there should be gaps between elements of S that are \leq x of size \log^{1 + 1/\phi (q) - \epsilon } x. Nevertheless we may utilize the trivial bound above to obtain gaps of size \log x. Thus, for these S, we can come arbitrarily close (i.e. an arbitrarily small power of log) to the lower bounds associated to Cramer’s conjecture.

Note for primes, we are nowhere near Cramer’s conjecture. Ford, Green, Konyagin, Maynard, and Tao have the best lower bound. Things look worse for upper bounds, as no one knows how to improve upon a power of x for the primes or any of the sets mentioned in this post (even if one assumes the Riemann hypothesis).

An introduction to Vinogradov’s mean value theorem

VMT-Intro: pdf file containing the post

A link to a talk I gave on the topic. The talk is much more elementary than the blog post.


Vinogradov’s mean value theorem (recently proved by Bourgain, Demeter and Guth) is a beautiful problem in the intersection of number theory and harmonic analysis.

As a problem in number theory, the mean value theorem requests to show most of the solutions to a certain system of Diophantine equations of the form f_j(\vec{x}) = f_j(\vec{y}) for some integral polynomials f_1 , \ldots , f_k are of the form \vec{x} = \vec{y}. As a problem in harmonic analysis, the mean value theorem requests for an upper bound for the L^p norm of a certain function. These two problems turn out to be equivalent, as explained in the pdf linked at the top of the post, thanks to Fourier analytic identities.

One goal in the above pdf is to understand the nature of the so called “critical exponent.” Interpolation reveals that Vinogradov’s mean value theorem follows from an L^{k(k+1)} bound of a certain function. While a first step to understanding this critical exponent is interpolation, consideration of the major arcs gives proper insight into why k(k+1) appears.

In the final section, I attempt to explain how the mean value theorem can be interpreted as a stronger form of orthogonality of certain complex exponentials. For a vector n \in \mathbb{Z}^k, we define e_n : \mathbb{R}^k / \mathbb{Z}^k \to \mathbb{C} via e_n(\alpha) : = e^{2 \pi i n \cdot \alpha}. Then Vinogradov’s mean value theorem can be interpreted as showing \{e_{(j , \ldots , j^k)}\}_{1 \leq j \leq N} are stronger than orthogonal (k \geq 2, not true for k =1). We make this somewhat precise in the above pdf, adopting a probabilistic perspective.

I’d like to thank Chris Gartland, a fellow graduate student, for helping me formulate the ideas in the last section. For instance, it was his idea to utilize equation 5.

I’d also like to thank Junxian Li, a fellow graduate student in number theory, for very useful discussions regarding the major arcs.

Lastly, anyone at UIUC interested in hearing more about Vinogradov’s mean value theorem (for instance Bourgain, Demeter and Guth’s recent result or classical number theoretic methods), please get in touch with me. My email can be found here, or you can visit my office in Altgeld 169.