Category Archives: Number Theory

Spacing of Quadratic Fractions

The following is a joint blog post with Kyle Pratt, a fellow graduate student at UIUC.

Recall a theorem of Fermat, which asserts that a prime {p} is {\equiv 1 \pmod{4}} if and only if it is the sum of two squares. From Dirichlet’s theorem on primes in arithmetic progressions we conclude there are infinitely primes that can be written as the sum of two squares:

\displaystyle p = m^2+\ell^2 \ \ \ \ \ (1)

Fouvry and Iwaniec showed this can be strengthened in a strong way, by showing there are infinitely many primes of the form in (1) with {\ell} prime.

Kyle and I, mostly to understand Fouvry and Iwaniec’s proof better, considered (a couple of years ago) the analogous question with

\displaystyle \ell^2 - N m^2 , \ \ \ N > 0 , \ \ \ \ \ (2)

replacing {\ell^2 + m^2}. We were lead to a rather interesting spacing result of fractions of the form

\displaystyle \frac{v}{d}, \ \ \ \ \ v^2 - N \equiv 0 \, (\text{mod } d), \ \ \ \ D \leq d \leq 2D,

which we explain below in the fold, using some basic theory of quadratic forms as well as a Brook’s theorem from graph theory.

We (very) briefly recall the structure of the Fouvry-Iwaniec argument. They look to asymptotically estimate

\displaystyle \sum_{\ell^2 + m^2 \leq x} \Lambda(\ell) \Lambda(\ell^2 + m^2) ,\ \ \ \ \ (3)

where {\Lambda} is the Von Mongoldt function. They use Vaughan’s identity on {\Lambda(\ell^2 + m^2)} to decompose (3) into Type I and Type II sums for the sequence

\displaystyle a_n = \sum_{\ell^2 + m^2 =n} \Lambda(\ell).

Great complications from the type II sums ensue, but the fundamental idea is to utilize the quadratic form appearing in (3) to factor

\displaystyle a_{mn} = \sum_{a^2 + b^2 = m} \sum_{c^2 + d^2 = n} b_{m,n}

where {b_{m,n}} encodes a rather strange looking condition that is eventually managable. In essence, this follows from the Brahmagupta-Fibonacci identity of the form

\displaystyle (a^2+b^2)(c^2 + d^2) = (ad-bc)^2 + (ac+bd)^2.

Properties of Gaussian integers underly their analysis.

In this post we are more concerned with the Type I estimates. We look for a bound of the form

\displaystyle \sum_{d \leq x^{\theta}} \Bigg| \sum_{\substack{n \leq x \\ d | n}} a_n - \frac{1}{d} \sum_{n\leq x} a_n \Bigg| \ll x^{1 - \epsilon}. \ \ \ \ \ (4)

The largest {\theta} for which (4) holds is the so-called level of distribution. In the analogous Bombieri-Vinogradov theorem, one can take any {\theta<1/2} (thought only saving an arbitrary power of log), while it turns out here we can take the much stronger {\theta <1} here. This is of the same strength as the unproven Elliot-Halberstam conjecture, as we are allowed to take advantage of the structure of the quadratic form {\ell^2 + m^2}. In the end, the Bombieri-Vinogradov theorem relies on the spacing of distinct Farey fractions

\displaystyle \left|\frac{a}{q} - \frac{a'}{q'}\right| \geq \frac{1}{qq'}, \ \ \ (a,q) = (a',q') = 1.

In a similar manner, a spacing result for the fractions

\displaystyle \{\frac{v}{d} : D \leq d \leq 2D , \ v^2 + 1 \equiv 0 \ {\rm mod} \ d\},

for a fixed large {D}, can be used to provide a bound for (4).

Thus in our search for primes of the form in (2), we are lead to study the spacing of fractions of the form

\displaystyle \Big\{\frac{v}{d} : D \leq d \leq 2D , \ v^2 - N \equiv 0 \, (\text{mod }d)\Big\}, \ \ \ \ \ (5)

where {N} and {D} are fixed ({D} large). For simplicity we take {N} to be a discriminant of a quadratic number field, though this is not a significant assumption. The main spacing result looks as follows.

{ \ }

Proposition 1: The fractions in (5) can be partitioned into sets {F_1 , \ldots , F_t} such that {t = O_N(1)}, and for any distinct {x,y \in F_j} we have

\displaystyle \| x - y \| \gg \frac{1}{D}. \ \ \ \spadesuit

{ \ }

We remark this estimate is best possible up to logarithms. Proposition 1 combined with some reductions and the large sieve can be used to establish (4).

{ \ }

Proof of Proposition 1: Suppose {v^2 - N \equiv 0 \ (\text{mod }d)}. Then we have

\displaystyle d = ar^2 + 2brs + cs^2,\ \ \ \ \ (6)

for some {a,b,c,r,s\in \mathbb{Z}}. Note we can choose the tuples {(a,b,c)} to come from the {h(N)} inequivalent quadratic forms of discriminant {N}. One can check that after multiplication by units in the ring of integers of {\mathbb{Q}(\sqrt{N})}, we may assume {r \asymp_N s}. It follows that {r , s \asymp_N D^{1/2}}

We have, as in this paper of Hooley (page 106), that

\displaystyle \frac{v}{d} = \frac{\overline{r}}{s} - \frac{ar+bs}{sd}, \ \ \ r\overline{r} \equiv 1 \, (\text{mod }s).\ \ \ \ \ (7)


\displaystyle \Big|\frac{\overline{r}}{s} - \frac{\overline{r'}}{s'}\Big| =\Big| \frac{r s' - r' s}{ss'}\Big| \gg \frac{1}{D}, \ \ \ r s' - r' s \neq 0. \ \ \ \ \ (8)

Thus we want to show {\frac{ar+bs}{sd}} is much smaller than {1/D}. We have the trivial bound

\displaystyle \frac{ar+bs}{sd} \ll_N 1/D.\ \ \ \ \ (9)

This is not good enough to ensure a spacing of {1/D} as the error term can be bigger than the main term by a multiplicative constant. We cannot even ensure such a spacing for all the fractions in (5), but what we are able to do is partition the fractions into sets

\displaystyle F_1, \ldots , F_t , \ \ \ t = O_N(1),

such that we can improve (8) for the fractions that lie inside each set. Indeed for each {a,b,c, r,s} there is a unique {v} and {d} that satisfy (6) and (7). Thus there are at most {O_N(1)} choices of {v,d} for a fixed {r,s}.

{ \ }

Lemma 1: Suppose {(x,y) = 1} and {x,y \asymp_N D^{1/2}}. Then there are at most {O(k)} choices of coprime {x',y' \asymp D^{1/2}} so that {xy' - x'y = k}. {\spadesuit}

{ \ }

Proof of Lemma 1: Note that, by coprimality, {k=0} is impossible. We have {x' = x_0 + xt}, {y' = y_0 + yt}. By size considerations, there are {O(1)} choices for {t} and thus {x'} and {y'}. {\spadesuit}

{ \ }

We choose {K = K(N)} to be a sufficiently large constant. By Lemma 1, for a fixed fraction {v/d} from (5), there are at most {O_N(1)} choices {v',d'} so that

\displaystyle \Big|\frac{v}{d} - \frac{v'}{d'}\Big| \leq \frac{1}{D}.

We create a graph {G} with vertex set from (5) and {v/d \sim v'/d'} if {|r s' - r' s| \leq K}. Thus {G} has maximum degree {O_N(1)}. By Brooks’ theorem, we may find a coloring of {G} with {O_N(1)} color classes {F_1, \ldots , F_t}.

Inside a color class, we improve the lower bound in (8) to {K/D} while (9) remains the same and so {\|v/d - v'/d'\| \gg_N 1/D}, as desired. {\spadesuit}

Breaking the Square root Barrier via the Burgess Bound

Thanks to Kyle Pratt for enlightening discussions leading to this post.

Recall the Burgess bound which states for any interval {I \subset \mathbb{F}_p} of size at least {p^{1/4 + \epsilon}}, one has

\displaystyle \left| \sum_{x \in I} \chi(x) \right| \ll \frac{|I|}{p^{\delta}}, \ \ \ \ |I| \geq p^{1/4+ \epsilon}\ \ \ \ \ (1)

where {\chi} is the Legendre symbol modulo {p}.

Conjecture 1: For any {\alpha > 0}, (1) holds for intervals, {I}, of size {p^{\alpha}}. { \ \ \ \spadesuit}

Conjecture 1 is one way quantify that {\chi} behaves like a random sign pattern. Indeed almost surely, (1) holds for a random {\pm 1}–valued function from {\mathbb{F}_p}.

Character sums are central objects of study in analytic number theory and such a result would have wide applications. For instance, it would solve the Vinogradov least square problem. It would also have implications to sub–convexity bounds of {L}–fucntions, using

\displaystyle L(1/2 , \chi) = \sum_{n \leq \sqrt{p}} \chi( n) / \sqrt{n} + O(\log p).

For instance, summation by parts and (1) gives the sub–convexity bound

\displaystyle L(1/2 , \chi) \ll p^{1/4 - \delta} .

The key obstacle is the famous square root barrier. In {N} coin flips, one can change {\sqrt{N}} of the outcomes without affecting the global statistics, for instance in the central limit theorem. Thus when applying global statistics to problems in short intervals, one naturally arise at a square root barrier for the size of the interval.

To this end, it is no surprise that the Polya–Vinogradov inequality, that is conjecture 1 with {\alpha = 1/2 + \epsilon}, was proved about forty years before the Burgess bound. The Polya–Vinogradov theorem can be proved using only the global statistics of {\chi}, for instance for all {\xi \in \mathbb{F}_p},

\displaystyle \left| \sum_{x \in \mathbb{F}_p} \chi(x) e_p(\xi x) \right| \leq \sqrt{p} , \ \ \ e_p(\theta) : = e^{2 \pi i \theta / p }.

On the other hand, the Burgess bound breaks the square root barrier, so one needs additional ideas, which we now discuss. We first use {I+x \approx I} for small {x}. Then we use the multiplicativity of {\chi} to amplify an interval of size {p^{1/4 + \epsilon}} to a set of size {p^{1/2 + \epsilon}}. Finally, we use completion and apply a global statistic of {\chi} (Hasse-Weil bound). Without the amplification step, this method would not work. Let us sketch the proof.

By the near translation invariance of {I}, one has

\displaystyle \sum_{x \in I} \chi(x) =\sum_{x \in I} \chi(x + yt) + O(p^{1/4 + \epsilon / 2}),

for all {y \in J = \{1 , \ldots , \lfloor p^{1/4} \rfloor \}} and {t \in T = \{1 , \ldots ,\lfloor p^{\epsilon / 2} \rfloor \}}. But by the multiplicativity of {\chi},

\displaystyle \left| \sum_{y \in J ,t \in T} \sum_{x \in I} \chi(x + yt) \right| \leq \sum_{x \in I , y \in J} \left| \sum_{t \in T} \chi(y^{-1} x + t) \right| .\ \ \ \ \ (2)

Now the number of ways to write an integer as {xy^{-1}} is roughly one on average on a set of size {p^{1/2 + \epsilon}}, so (2) is approximately

\displaystyle \sum_{u \in S} \left| \sum_{t \in T} \chi(u+ t) \right| ,

for some {S} of size {p^{1/2 + \epsilon}}. Since {S} is so large, we can now efficiently complete the sum after an application of H\”{o}lder’s inequality to arrive at

\displaystyle \sum_{u \in \mathbb{F}_p} \left| \sum_{t \in T} \chi(u+ t) \right|^{2r} .

This may be bounded by the following global statistic. We remark in our case, there is a short proof using the polynomial method (the proof in the note is only for elliptic curves, but is easily generalized to our situation).

Lemma 1: Let {f} be a polynomial which is not a multiple of a perfect square. Then

\displaystyle \sum_{x \in \mathbb{F}_p} \chi (f(x)) = O_{{\rm deg}(f)} (\sqrt{p}).

In particular,

\displaystyle \sum_{u \in \mathbb{F}_p} \left| \sum_{t \in T} \chi(u+ t) \right|^{2r} \ll_r p^{1/2} |T|^{2r} + p |T|^r. \ \ \spadesuit

Putting this altogether, as done beautifully in Chang’s survey one arrives at the Burgess bound of the form (1), for {\delta = \epsilon^2 / 4}.

Another successful instance of breaking the square root barrier is in this paper of Hanson. One result he proved was for any {\delta > 0},

\displaystyle \sum_{a \in A , b \in B , c \in C} \chi(a + b + c) = o_{\delta} (|A||B||C|),

for arbitrary {A, B , C} of size at least {\delta \sqrt{p}}. He takes advantage of the additive structure of the problem as was pioneered in this paper of Chang using Freiman’s theorem from additive combinatorics. This has the same flavor as the Burgess bound in (1) as one replaces the very additively structured interval {I} with a sumset. The main problem along these lines is the following

Conjecture 2: For any {A, B} of size at least {p^{\alpha}}, there is a {\delta >0} such that

\displaystyle \sum_{a \in A , b \in B} \chi (a + b) \ll |A| |B| p^{-\delta}. \ \ \spadesuit.

This would have an application to sparse signal recovery. One can see Bourgain’s work and the references within for the dual problem, where multiplicative character is replaced by additive character and sumset is replaced by product set.

Granville and Soundararajan (Theorem 2) building upon work of Montgomery and Vaughan showed that by assuming GRH, one has

\displaystyle \sum_{n \leq x} \chi (n) \sim \sum_{n \leq x} \chi(n) \Psi_y(n) , \ \ \ y \geq \log^5 q ,

where {\Psi_y} is the characteristic function of {y}-smooth numbers. Thus the sum in (1) (with {I = [1 , \ldots , x]}) is determined solely by the values of {\chi} at rather small primes. Using smooth number estimates, this implies Conjecture 1 in the case {I = [1, \ldots , x]}.


Fractal solutions of dispersive partial differential equations on the torus

Burak Erdoğan and I just put “Fractal solutions of dispersive partial differential equations on the torus” on the arXiv. We study cancellation in exponential sums and how this leads to bounds for the fractal dimension of solutions to certain PDE, the ultimate “square root cancellation” implying exact knowledge of the dimension.

In this post, we consider the case of Schrödinger’s equation with initial data {\chi := \chi_{[0 , \pi]}} for simplicity. Here,

\displaystyle if_t + f_{xx} = 0 , \ \ t \in \mathbb{R}, \ \ x \in \mathbb{R} / 2 \pi \mathbb{Z} \ \ , \ \ f(0,x) = \chi(x) ,

and the solution is given by

\displaystyle f(t,x) = \sum_{n \in \mathbb{Z}} \widehat{\chi}(n) e^{i t n^2 + i n x}.

Note the solution is periodic in both {x} and {t}.

For a line {\mathcal{L} \subset \left( \mathbb{R} / 2 \pi \mathbb{Z} \right)^2}, we are interested in the fractal dimension of the real and imaginary parts of the graph of {f |_{\mathcal{L}} (t,x)}. This dimension must lie in the interval {[1,2]}.

An important case to consider is {t = a/q}, which leads to so–called quantization and is related to the Talbot effect. In this case, it was shown by Berry and Klein that

\displaystyle f(t,x) = \sum_{j=0}^{q-1} c_j \chi_{[0 , 1/q] + j/q} (x), \ \ \ \ \ (1)

for some {c_j \in \mathbb{C}} that are Gauss sums. Thus {f} is a linear combination of at most {q} intervals and has fractal dimension 1. See page 4 of this paper of Chen and Olver for some pictures of this phenomenon.

The story is entirely different for irrational {t}. To see why, observe for a sequence {\frac{a_n}{q_n} \rightarrow t}, the functions {f(\frac{a_n}{q_n} , x)} as given in (1) increase in complexity as {q_n} increases.

To make this precise, we use the theory of Besov spaces and show that bounds for the {L^p} norm of

\displaystyle H_N(t,x) : = \sum_{N \leq n < 2N} e^{itn^2 + i nx}, \ \ (t,x) \in \mathcal{L},

imply fractal dimension bounds for the real and imaginary parts of {f_{\mathcal{L}}(t,x)}.

Remark: The starting point of the theory of of Besov spaces that we use can be illustrated by the following basic facts from analysis:

  • The graph of a real valued {C^{\gamma}} function has fractal dimension {\leq 2 - \gamma},
  • For {g \in C^{\gamma}}, one has {\widehat{g}(n) \ll n^{-\gamma}}.

Thus we see that both fractal dimension and Fourier decay are related to the smoothness of {g}. In our situation, we have more information than that of the conclusion of the second bullet, that is we know that

\displaystyle \sum_{N \leq n < 2N} \widehat{g}(n),

exhibits cancellation and so we are able to use more involved theory to roughly reverse the implication in the second bullet as well as provide lower bounds. ♠

Let us first consider the case of horizontal lines in space–time, that is {t} fixed. Then by orthogonality, Weyl’s inequality, and the divisor bound, one has for almost every {t},

\displaystyle ||H_N||_{L_x^2} = N^{1/2} , \ ||H_N||_{L_x^{\infty}} \ll N^{1/2 + \epsilon}, \ ||H_N||_{L_x^4} \ll N^{1/2 + \epsilon}.

By a previous blog post, this implies

\displaystyle ||H_N||_{L_x^1} \gg N^{1/2 - \epsilon}.

One can use this to show that the fractal dimension of {f_{\mathcal{L}}(t,x)} is equal to {3/2} for almost every horizontal line, recovering a theorem of Rodnianski. In one part of our paper, we adapt the above strategy to show the following.

Theorem (oblique): The fractal dimension of the real and imaginary parts of the solution to (1) restricted to almost every

\displaystyle \mathcal{L} = \{(t,x) : a t + bx = c\},

is in the interval

\displaystyle [7/4 , 19/10]. ♠

Note that the lower bound 7/4 is bigger than the 3/2 for horizontal lines. To show the upper bound in Theorem (oblique), we study exponential sums such as

\displaystyle \sup_x |\sum_{n \leq N} e^{i(c+x)n^2 + i nx}| , \ \ c \in \mathbb{R} / 2 \pi \mathbb{Z}.

In this case, square root cancellation would imply the fractal dimension is exactly {7/4}.

In our paper, we study a variety of dispersive PDE, for instance the Airy, KDV, non–linear Schrödinger, free water boundary, Boussinesq, and gravity–capillary wave equation. To handle non–linear equations, we use smoothing estimates, which roughly state that the solution to a nonlinear PDE is the linear part plus a smoother perturbation.

Gaps in prime-like sets

This post was inspired by a question of Trevor Wooley after a talk of James Maynard at MSRI. He asked what was known for lower bounds of large gaps between integers that have a representation as the sum of two squares. We assume some familiarity with the large gaps between primes problem.

A theorem of Fermat asserts that an integer, n is represented by the sum of two squares if and only if all the prime factors of n equivalent to 3 modulo 4 appear with an even power.

For technical simplicity, we consider the set of integers that simply have no prime factors equivalent to 3 modulo 4, but one may extend the considerations below to the original question.

The size of such a set is proportional to x / \log^{1/2} x. Thus the average gap is \asymp \log^{1/2} x and so the average gap is at least this big.

Surprisingly, there is an easy argument that yields a gap of size \gg \log x. We sketch it here. For each prime p \leq \log x equivalent to 3 modulo 4, choose a residue class a_p \pmod p. Using a greedy algorithm to choose the a_p, we may sieve out the integers in \{1 , \ldots , y\} for y \ll \log x. By the Chinese remainder theorem, there is a t \leq x such that t = - a_p \pmod p for all the relevant p. Thus t+1 , \ldots , t+y all contain a prime factor equivalent to 3 modulo 4, as desired.

What should one expect? Adopting the Cramer model, we expect that there should be gaps of size at least \asymp \log^{3/2 - \epsilon} x. We give a rough sketch as to why one should expect this. Let S = \{ n \leq x : p|n \implies p \neq 3 \pmod 4\}. We suppose that an integer is chosen random with probability q = 1 / \log^{1/2} x. Then the probability that n+1 , \ldots , n_h \notin S is (1- q)^h which is well approximated by e^{-qh}. If h \asymp (\log x) / q, then this is around the number of choices for n. One can see the Cramer model worked out more carefully here.

Fix q and a coprime to q. Now we consider the set S = \{n \leq x : p|n \implies p \equiv a \mod q\}. The Cramer model, as above, suggests that there should be gaps between elements of S that are \leq x of size \log^{1 + 1/\phi (q) - \epsilon } x. Nevertheless we may utilize the trivial bound above to obtain gaps of size \log x. Thus, for these S, we can come arbitrarily close (i.e. an arbitrarily small power of log) to the lower bounds associated to Cramer’s conjecture.

Note for primes, we are nowhere near Cramer’s conjecture. Ford, Green, Konyagin, Maynard, and Tao have the best lower bound. Things look worse for upper bounds, as no one knows how to improve upon a power of x for the primes or any of the sets mentioned in this post (even if one assumes the Riemann hypothesis).

An introduction to Vinogradov’s mean value theorem

VMT-Intro: pdf file containing the post

A link to a talk I gave on the topic. The talk is much more elementary than the blog post.


Vinogradov’s mean value theorem (recently proved by Bourgain, Demeter and Guth) is a beautiful problem in the intersection of number theory and harmonic analysis.

As a problem in number theory, the mean value theorem requests to show most of the solutions to a certain system of Diophantine equations of the form f_j(\vec{x}) = f_j(\vec{y}) for some integral polynomials f_1 , \ldots , f_k are of the form \vec{x} = \vec{y}. As a problem in harmonic analysis, the mean value theorem requests for an upper bound for the L^p norm of a certain function. These two problems turn out to be equivalent, as explained in the pdf linked at the top of the post, thanks to Fourier analytic identities.

One goal in the above pdf is to understand the nature of the so called “critical exponent.” Interpolation reveals that Vinogradov’s mean value theorem follows from an L^{k(k+1)} bound of a certain function. While a first step to understanding this critical exponent is interpolation, consideration of the major arcs gives proper insight into why k(k+1) appears.

In the final section, I attempt to explain how the mean value theorem can be interpreted as a stronger form of orthogonality of certain complex exponentials. For a vector n \in \mathbb{Z}^k, we define e_n : \mathbb{R}^k / \mathbb{Z}^k \to \mathbb{C} via e_n(\alpha) : = e^{2 \pi i n \cdot \alpha}. Then Vinogradov’s mean value theorem can be interpreted as showing \{e_{(j , \ldots , j^k)}\}_{1 \leq j \leq N} are stronger than orthogonal (k \geq 2, not true for k =1). We make this somewhat precise in the above pdf, adopting a probabilistic perspective.

I’d like to thank Chris Gartland, a fellow graduate student, for helping me formulate the ideas in the last section. For instance, it was his idea to utilize equation 5.

I’d also like to thank Junxian Li, a fellow graduate student in number theory, for very useful discussions regarding the major arcs.

Lastly, anyone at UIUC interested in hearing more about Vinogradov’s mean value theorem (for instance Bourgain, Demeter and Guth’s recent result or classical number theoretic methods), please get in touch with me. My email can be found here, or you can visit my office in Altgeld 169.