I started a YouTube Channel

My First Video

I am happy to announce that I just posted my first video to a new YouTube Channel. The result is an interview with Xavier Garcia about how Chat GPT works.

What’s Next?

I’ll continue to post videos surrounding Machine Learning, Data Science, and perhaps elsewhere. I plan to do more interviews, individual videos about relevant topics, as well as some educational material.

The Process

I was already meeting with Xavier regularly to discuss Machine Learning topics, and so I thought to myself that other people might benefit from our discussions.

Our discussion for the video was perhaps a bit altered than the usual, as we were a bit less likely to say things “off the cuff.” I’ll be looking to make this easier going forward, as I think a lot of valuable insight lie here.

There was/is a learning curve in making videos, and I’ll definitely be looking to improve the quality as time goes forward. From zero to published video took a couple months (of course not doing this full time). Also, I got lucky and found a very helpful video editor which made things a lot easier. Things like sound quality, video quality, and lighting are not things I’ve had to worry about too much in the past. I didn’t invest too much in equipment, just trying to make use of what I had.

Can Chat-GPT Do Math?

Chat-GPT is a new impressive AI chatbot released by Open AI. Impressive applications of it can be found all over the internet. But can it do math?

By math, we do not mean simply perform computations. Its own design ensures that there will be computational problems it will be unable to solve. What I am more interested in is if it can solve problems that require some mathematical reasoning.

To choose our problems, we use the MMLU dataset. Galactica, a large language model recently released by Meta AI, has achieved some good results on this dataset. Their findings are in section 5.3 of their paper.

In what follows, we’ll pick a few problems and see how Chat-GPT handles them. If it fails, we will try to reword the question in hopes of achieving some partial results.

We note that the MMLU data set has multiple choice questions, but we just present the questions to chat-GPT. We just choose question from the abstract algebra section of the MMLU. Afterwards, we’ll prod chat-GPT for intuition and a proof. We’ll end by suggesting a few applications of chat-GPT for mathematicians.

Question 1

Me: The inverse of -i in the multiplicative group, {1, -1, i , -i } is

Chat-GPT: The inverse of -i in the multiplicative group {1, -1, i, -i} is i. The inverse of an element x in a group is an element y such that $latex xy=yx=1$, where 1 is the identity element of the group. In this group, 1 is the identity element, so the inverse of an element x is an element y such that x*y=1.

For example, the inverse of i is -i, because i*(-i)=1. Similarly, the inverse of -i is i, because (-i)*i=1.

Remark: Chat-GPT answers this question perfectly and gives an excellent explanation. To be fair, this is an easier question.

Question 2

Me: How many homomorphisms are there of Z into Z_2? Please write in latex format.

Chat-GPT: There are two homomorphisms of \mathbb{Z} into \mathbb{Z}_2. They are defined as follows:

f_1:\mathbb{Z}\rightarrow\mathbb{Z}_2 such that f_1(n)=n\pmod{2} for all n\in\mathbb{Z}

f_2:\mathbb{Z}\rightarrow\mathbb{Z}_2 such that f_2(n)=0 for all n\in\mathbb{Z}

Remark: Again, Chat-GPT is spot on. However, I asked this question a few times and it once said 1, and another time said two but gave a false justification:

There are only two homomorphisms of Z (the integers) into Z2 (the integers modulo 2).

One homomorphism sends every integer to 0, and the other homomorphism sends every integer to 1.

Question 3

Me: Let p = (1, 2, 5, 4)(2, 3) in S_5 . Find the index of <p> in S_5. Please write your answer in latex.

Chat-GPT:

The index of <p> in S_5 is the number of elements in S_5 that come before <p> in lexicographic order.

To find the index of <p>, we must first determine the cycle type of p. The cycle type of a permutation is a list of the lengths of its disjoint cycles, listed in decreasing order.

The cycle type of p is [2,2], since p consists of a 2-cycle and a 2-cycle.

Now we can use the cycle index of S_5 to determine the number of permutations in S_5 with cycle type [2,2].

The cycle index of S_5 is given by:

Z(S_5) = \frac{1}{5!} \sum_{i=1}^{5} a_i p_i^{5}

where a_i is the number of permutations in S_5 with i cycles, and p_i is a variable representing a cycle of length i.

Substituting the cycle type [2,2] into the cycle index gives:

\frac{1}{5!} (p_2^2 + p_2^2)

This simplifies to:

\frac{1}{120} (p_2^2 + p_2^2) = \frac{1}{60} (2p_2^2) = \frac{1}{30} p_2^2

The coefficient of p_2^2 in this expression is \frac{1}{30}, so the number of permutations in S_5 with cycle type [2,2] is \frac{1}{30}.

Since <p> is the only permutation in S_5 with cycle type [2,2], it is the \frac{1}{30}-th permutation in S_5.

Therefore, the index of <p> in S_5 is \boxed{\frac{1}{30}}.

Remark: Woah! This is very wrong! The very first sentence is completely wrong. Also, its final conclusion, 1/30 is not even an integer, which is certainly not right. Indeed, the index of a subgroup counts something and most be a positive integer.

We are able to salvage something though. Suppose we computed that the order of <p> is 5, using our algebra skills. Then we can prompt chat-GPT with the following.

Me: Now suppose H \leq S_5 is a subgroup of order 5. What is the index of H in S_5?

Chat-GPT: The index of a subgroup H in a group G is the number of left cosets of H in G. This is also equal to the number of right cosets of H in G.

Since the order of a subgroup divides the order of the group, the order of H must divide the order of S_5, which is 120. The only divisors of 120 that are less than 120 are 1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 20, 24, 30, 40, and 120. Therefore, the only possible orders for H are 1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 20, 24, 30, 40, and 120.

Since the order of H is 5, the index of H in S_5 is \frac{120}{5} = \boxed{24}.

Remark: Here chat-GPT answers perfectly with a great explanation. Perhaps it would be nice to mention that the main fact being used is Lagrange’s theorem, though this can easily be obtained by asking it “What theorem did you use above?”

But what’s the intuition?

What makes a research mathematician great is a certain “intuition” for problems. Let’s see if chat-GPT can explain any of that.

Me: Can you explain the intuition behind the Cauchy-Schwarz inequality? Please write any math formula in latex.

Chat-GPT: Sure! The Cauchy-Schwarz inequality is a useful inequality that arises in various areas of mathematics, such as vector algebra, inner product spaces, and probability theory. It states that for any two vectors x and y, the following inequality holds:

\left(\sum_{i=1}^n x_i y_i\right)^2 \le \left(\sum_{i=1}^n x_i^2\right)\left(\sum_{i=1}^n y_i^2\right)

Intuitively, the Cauchy-Schwarz inequality states that the inner product of two vectors x and y is “controlled” by the lengths of the vectors x and y. To see this, consider the special case where x and y are unit vectors (i.e., vectors of length 1). In this case, the inequality becomes:

\left(\sum_{i=1}^n x_i y_i\right)^2 \le \left(\sum_{i=1}^n x_i^2\right)\left(\sum_{i=1}^n y_i^2\right) = 1 \cdot 1 = 1

which means that the inner product of x and y is always less than or equal to 1. This is consistent with our intuition that the inner product of two unit vectors should be “small” if they are “orthogonal” (i.e., perpendicular) and “large” if they are “parallel” (i.e., pointing in the same direction).

I hope this helps! Let me know if you have any questions.

Remark: This is not too bad, but could certainly be improved. For instance, explaining when equality holds in Cauchy-Schwarz (when one vector is a multiple of the other) helps give a more precise meaning the the last paragraph. One can see this first few paragraphs of this blog post of Terence Tao to see an explanation from one of the world’s leading mathematicians. Also, here is a math stack exchange on the topic.

Prove it.

Me: Can you prove it?

Chat-GPT: Sure! Here is a proof of the Cauchy-Schwarz inequality:

First, we can rewrite the inequality as:

\left(\sum_{i=1}^n x_i y_i\right)^2 - \left(\sum_{i=1}^n x_i^2\right)\left(\sum_{i=1}^n y_i^2\right) \le 0

Next, we can expand the left-hand side as follows:

\sum_{i=1}^n x_i^2 y_i^2 + \sum_{i \ne j} x_i x_j y_i y_j - \sum_{i=1}^n x_i^2 y_i^2 - \sum_{i \ne j} x_i^2 y_j^2 \le 0

This simplifies to:

\sum_{i \ne j} (x_i y_i)(x_j y_j) - \sum_{i \ne j} (x_i y_j)^2 \le 0

Now, we can apply the arithmetic-geometric mean inequality to each term on the right-hand side:

\sum_{i \ne j} (x_i y_i)(x_j y_j) - \sum_{i \ne j} [(x_i y_j)^2]^{\frac{1}{2}}[(x_i y_j)^2]^{\frac{1}{2}} \le 0

This simplifies to:

\sum_{i \ne j} (x_i y_i)(x_j y_j) - \left(\sum_{i \ne j} [(x_i y_j)^2]^{\frac{1}{2}}\right)^2 \le 0

Finally, we can apply the arithmetic-geometric mean inequality again to the term on the right-hand side:

\sum_{i \ne j} (x_i y_i)(x_j y_j) - \left[\prod_{i \ne j} [(x_i y_j)^2]^{\frac{1}{2}}\right]^{\frac{2}{n-1}} \le 0

This simplifies to:

\sum_{i \ne j} (x_i y_i)(x_j y_j) - \prod_{i \ne j} (x_i y_j) \le 0

And we are done! This completes the proof of the Cauchy-Schwarz inequality.

I hope this helps! Let me know if you have any questions.

Remark: Unfortunately, this proof is not quite correct. Both the proof method, as well as the details (i.e. its application of AM-GM) are severely flawed.

Conclusion

To be honest, Chat-GPT is quite impressive for a model that is not tailored for mathematics in any way. I think it could already, in its current format, help a mathematician to prepare lecture notes or even easier parts of a paper. Perhaps its greatest utility would come in automating certain parts of grants and job applications

My experience with Computer Proofs

Image generated by Dalle

To acclimate myself with Computer Proofs, I aimed to write down some basic theorems from my area of research in Lean. I chose Lean primarily because of the very active and helpful community. In fact, everything I did in lean was done alongside Yael Dillies whom I met through Zulip. In what follows below, I will discuss how I went about this and what I took away from the experience.

First of all, what is Lean? It is an interactive theorem prover (such as Isabelle, Coq, and others) . For our purposes, it is a way to input mathematical statements and their corresponding proofs. The power of lean is that it verifies that the proof you enter is indeed correct. The user can write a mathematical statement she wishes to prove. The interactive window then notifies us what goals remain to prove and we can begin working to establish them. Correct proofs are rewarded by a “Proof Complete” (and a festive 🎉). Incomplete proofs give unaccomplished goals or errors. To get a feel of how this works, one can load up the natural number game.

An interactive theorem prover offers interesting possibilities. The most obvious is that we can be sure that a proof is correct. Of course this is not completely true, as one has to make sure that previous statements and definitions are indeed correctly stated! One can consult this Quanta article about the impressive computer formalization of some of Peter Scholze’s work.

An even more interesting goal is the hope to automate parts of the mathematical research. One idea, independent of theorem provers, is an advanced autocomplete to be used when writing mathematical papers in latex. The technology to build this already exists and has been applied successfully to programming and is still evolving. For instance, Google’s Minerva has already had some success in using language models to perform mathematical reasoning.

Another goal would be a to automate the process of checking correctness of proofs. An effective type of system would save mathematicians countless hours just in refereeing. This can be viewed loosely as a neural machine translation problem. There has been lots of success in this domain with natural language (say something like English to German) both in research and in practice.

Finally a third, and even more ambitious, goal is for an AI to be able to come up with its own proofs. Needless to say, we are currently at the infancy of this program. The interested reader can consult open AI’s somewhat recent paper on the subject.

For further reading on these sorts of things, I recommend a promising path towards autoformalization and general artificial intelligence by Christian Szegedy. Here Szegedy outlines his thoughts on so-called autoformalization as well as how such a program would fit into the general development of AI.

My first step in getting familiar with lean was to work through the natural number game. This is basically a guided tour for proving some basic statements in lean, avoiding some technical difficulties in the process (for instance, it is browser based). Here one has a goal to prove, and must use the given hypotheses to establish the goal. After each new line of code is written, one can analyze how this changes the hypotheses/goals.

After this, I got lean on my local machine and worked through a tutorial, which is again a guided tour, but a bit more in depth than the natural number game (here I was joined for some of this by Tsutomu Okano). I worked through various problems one might see in undergraduate analysis course relating to sequences, limits, and other topics.

The next leap is to begin to contribute to mathlib. Mathlib is a library of mathematics in lean that hundreds (in my estimation) of people have worked on. This is not an easy step, and to proceed here it helps that lean has a very active community through Zulip. The active community was my main reason for choosing Lean over other interactive theorem provers. For instance, knowledge of the incredibly useful Github is helpful for figuring out what is going on.

After asking around, I was lucky that Bhavik Mehta and Yael Dillies were both knowledgeable with Lean and interested in my main area of research. Moreover, there was already some work in mathlib on additive combinatorics and finite sets, which was very useful to build off of (see this work of Bloom and Mehta). I started off by trying to prove Ruzsa’s triangle inequality.

Theorem 1: Let A , B , C \subset \mathbb{Z} be finite sets. Then

|A-C| |B| \leq |A-B||B-C|. \ \ \ \ \ \spadesuit

I thought this was a good starting point as it is fundamental to the area, whilst having a short proof (i.e. Lemma 2.6 in Tao and Vu). This turned out to be a bit harder than I expected. The argument can be formulated as a double counting argument or showing that a certain function is an injection. It turns out that the former formulation is a bit easier to input into lean due to what is already in mathlib, borrowing some of Bhavik’s work on the sum-product problem (however, I was shown that with enough perseverance one can use the latter proof). Moreover, some very simple things like point-wise set subtraction, had to be added to mathlib. This turns out to be quite a non-trivial task and was completed by Yael. This is perhaps a result of additive combinatorics being a relatively new addition to mathlib.

Next, Yael and I input Plunnecke’s inequality into mathlib. Plunnecke’s inquality is another fundamental result that has a relatively short proof (thanks to Petridis). Here, Yael took over the lean part. Their workflow was way more efficient than mine, as they is a veteran of lean. They have developed a plethora of tricks for finding results in mathlib and even figuring out what to use in mathlib. While they cannot write in lean as quickly as I can Latex, they is not ten times slower.

We eventually made a pull request to mathlib containing Ruzsa’s triangle inquality and Plunnecke’s inequality. Mathematically, I did not gain too much working on this. I believe this is in part because I already had a thorough understanding of the proofs. Nevertheless, thinking of how to actually convert theorems to lean did force me to revisit these classical theorems from multiple perspectives and I could see this being productive as one scales.

Overall, I enjoyed the experience of working in Lean, largely due to the vibrant community. Transferring mathematics to theorem provers requires a unique combination of mathematics and software engineering (both to contribute as well as under the hood). We already have some talented people (both young and old) working on this and I certainly look forward to seeing what will be accomplished along these lines.

On the largest sum-free subset problem in the integers

I recently uploaded “On the largest sum-free subset problem in the integers,” to the arXiv.

Let A \subset \mathbb{Z} be a finite subset of the integers. We say A is sum-free if there are no solutions to

a + b = c,

with a,b,c \in A. We define S(A) to be the size of the largest sum-free subset of A. We seek lower bounds for S(A). It is conjectured that

S(A) \geq (n+C)/3. \ \ \ \ \ (1)

for any C > 0. Erdős established C=1 is admissible and Bourgain later improved this to C=2. By a construction, Eberhard, Green, and Manners showed that C = o(|A|).

I was originally drawn to this problem for two reasons. The first is that the aforementioned result of Erdős is the first additive combinatorics result in Tao and Vu’s additive combinatorics book. The second is that Bourgain’s original proof seemed to have a stigma that it was quite difficult.

We now sketch that C=1 is admissible, as shown by Erdős. The first idea is that the set [1/3,2/3) \subset \mathbb{R}/\mathbb{Z} is sum-free. Thus any subset of this set is also sum-free. Note this set has measure 1/3, which is the same as the multiplicative constant in (1).

The second idea is to randomly map A into [1/3,2/3). Indeed choosing

\theta \in \mathbb{R} / \mathbb{Z}

at random, we consider

\theta \cdot A \cap [1/3,2/3) \subset \mathbb{R}/\mathbb{Z}.

One can check that this set on average has size |A|/3 and as mentioned before, is sum-free.

Bourgain’s work and also our work involves more careful choices of \theta. Underpinning the work is to think of f = [1/3,2/3) - 1/3 as a function, f: \mathbb{R}/\mathbb{Z} \to \mathbb{C}, on the torus and to apply a combination of Fourier techniques and combinatorial techniques.

For a set S, we let

f_S(x) = \sum_{s \in S} f(sx).

Then the Erdős argument above may be restated as \int f_A = 0. Furthermore, (1) would follow from establishing there is an x\in \mathbb{R}/\mathbb{Z} satisfying

f_A(x) \geq C/3.

One new idea in our work is to partition A into A_0 and A_1, where A_1 is the set of elements in A that are divisible by 3. It turns out that this decomposition is useful as

f_{A_1}(x) = f_{A_1}(x+1/3) = f_{A_1}(x+2/3),

while

f_{A_0}(x) + f_{A_0}(x+1/3) + f_{A_0}(x+2/3) = 0.

Thus, for instance, a short argument reveals that if one can establish f_{A_1}(x) \geq C/3, then it follows that (1) for A.

Quantitative Hilbert irreducibility and almost prime values of polynomial discriminants

Recently AndersonGafniLemke OliverLowry-DudaZhang and myself uploaded ”Quantitative Hilbert irreducibility and almost prime values of polynomial discriminants” to the arXiv. Let {H} be a large integer,

\displaystyle V_n(H) = \{x^n + a_{n-1} x^{n-1} + \cdots + a_0 : a_{n-1} , \ldots , a_0 \in \mathbb{Z} \cap [-H,H]\},

and

\displaystyle E_n(H) = \#\{f \in V_n(H) : {\rm Gal}(f) \neq S_n\}.

A classical principle going back to at least Hilbert is that a typical polynomial should have Galois group {S_n}, and this would suggest {\#E_n(H)} is significantly smaller than {\#V_n(H)}. Van der Waerden conjectured that {E_n(H) = o(\#V_n(H) /H)}. Van der Waerden’s conjecture was almost proved quite recently by Chow and Dietmann (for {n \neq 7,8,10}), except they are not able to estimate 

\displaystyle E_n(H , A_n) = \#\{f \in V_n(H) : {\rm Gal}(f) = A_n\}.

Gallagher showed, using the large sieve, that 

\displaystyle E_n(H) = O\left(\frac{\#V_n(H) \log H}{H^{1/2}} \right).\ \ \ \ \ (1)

We show the following.

{\ } Theorem 1: Let {H} be a positive integer. then 

\displaystyle \#E_n(H , A_n) = O\left(\frac{\#V_n(H) H^{O(1/n)}}{H^{2/3}} \right). \ \ \ \spadesuit{\ }

We note that Bhargava recently announced a proof of {E_n(H , A_n) = O(\#V_n(H) / H)}, which is stronger than Theorem 1. We remark that our techniques also apply to counting polynomials with almost prime discriminants (see Theorem 1.2 and the related 1.3). 

We briefly mention some of the ideas used to prove Theorem 1. Classical algebraic number theory reveals that if the Galois group of a polynomial {f} is {A_n}, then {\mu_p(f) = (-1)^{n+1}}, where {\mu_p} is the Mobius function for {\mathbb{F}_p[x]}. Thus we can employ a sieve to estimate {\#E_n(H , A_n) }. However, instead of using the large sieve, we use the small sieve (more precisely, the Selberg sieve). Using this naively gives a bound similar to that of Gallagher in (1), as expected. The key advantage of using the small sieve is we can seek improvements if we have good distribution results. After employing Poisson summation our task in then to understand local factors, which turns out to be basically 

\displaystyle \widehat{\mu_p}(\xi_{n-1} , \ldots , \xi_0) =,\ \ \ \ \

\displaystyle = \frac{1}{p^{n}}\sum_{a_0 , \ldots , a_{n-1} \in \mathbb{F}_p} \mu(x^n + a_{n-1} x^{n-1} + \ldots + a_0) \psi_p(\xi_{n-1} a_{n-1} + \cdots + a_0 \xi_0),\ \ \ \ \ (2)

where {\psi_p : \mathbb{F}_p \rightarrow \mathbb{F}_p} is an additive character. This is morally the same problem as estimating  \displaystyle \sum_{n \leq x} \mu(n) e(nx), where {\mu} is the classical Mobius function. Baker and Harman showed that GRH implies that for any {\epsilon > 0}

\displaystyle |\sum_{n \leq x} \mu(n) e(nx)| \ll x^{3/4 + \epsilon}.

As the Riemann hypothesis is known in the function field setting, such a bound should also hold for (2). Indeed Porritt showed this is the case and this allows us to establish Theorem 1.

Effective results on the size and structure of sumsets

For a set {A \subset \mathbb{Z}^d} we define the {N}-fold sumset of {A} via

\displaystyle NA : = \{a_1 + \ldots + a_N : a_i \in A\}.

We have the following result of Khovanskii from 1992.

Theorem 1 (Polynomial size of iterated sumsets): Let {A \subset \mathbb{Z}^d} of size {n}. Then there exists a {N_A \geq 1} and {c_0 , \ldots , c_d \in \mathbb{Q}} such that for {N \geq N_A}

\displaystyle |NA| = \sum_{j=0}^d c_j N^j. \ \ \ \spadesuit

One part of a recent work, Andrew Granville, Aled Walker, and I were able to provide effective bounds for {N_A}. We let

\displaystyle w(A) = \max_{a , b \in A} ||a-b||_{\infty}.

Theorem 2 (Effective Khovanskii): Let {A \subset \mathbb{Z}^d} of size {n}. Then there exists {c_0 , \ldots , c_d \in \mathbb{Q}} such that for any

\displaystyle N \geq 2^{2n} n^{n^3 + 1} w(A)^{n^3}.\ \ \ \ \ (1)

one has

\displaystyle |NA| = \sum_{j=0}^d c_j N^j. \ \ \ \spadesuit

It is not clear that (1) is best possible. Certainly there has to be some dependence on how {A} lies in space, as can be seen by taking the elements of {A} to be increasingly spread out. Curran and Goldmakher showed that when {|A| = d+2}, one needs {N_A} at least the order of { \omega(A)^d }

We first recall an elegant proof of Theorem 1 due to Nathanson and Ruzsa, which is also the starting point for Theorem 2.

Proof of Theorem 1: We consider {\mathbb{Z}_{\geq 0}^n}, equipped with {\leq}, the lexicographical ordering. Let {A = \{a_1 ,\ldots , a_n\}}. We then have a map

\displaystyle \phi : \mathbb{Z}_{\geq 0}^n \rightarrow \bigcup_{j=0}^{\infty} jA,

via

\displaystyle \phi(x_1 , \ldots , x_n) =x_1 a_1 + \ldots + x_n a_n.

It is worth noting that if {\phi} is injective, we immediately deduce Theorem 1 by the stars and bars. Typically {\phi} is not injective, but it turns out to not be a significant problem. We let {U} be the set of elements {x} such that there exists a {y} with {||y||_1 = ||x||_1}, {y < x}, and {\phi(y) = \phi(x)}. We call any element of {U} useless. They are deemed useless as

\displaystyle |NA| = \{x \in \mathbb{Z}_{\geq 0}^n : ||x||_1 = N\} \setminus U.

There is nothing intrinsic that makes elements of {U} useless, rather it is a consequence of our choice of lexicographical ordering. One can check that {U} is closed under translations from elements of {\mathbb{Z}_{\geq 0}^n}.

We need a way to count the elements of {U} and thus propose another definition. We let {\leq_{{\rm unif}}} be the partial ordering where {x \leq_{{\rm unif}} y} if {x} is smaller than {y} coordinate-wise. Let {U_{\min}} be the elements {x \in U} such that there is no {y\in U} with {y <_{{\rm unif}} x}. Dickson’s lemma implies that {U_{\min}} is finite. For a set {U'}, we let

\displaystyle B(N, U') = \{x \in \mathbb{Z}_{\geq 0 }^n: ||x||_1 = N , x >_{{\rm unif}} u, \ \text{for all} \ u \in U'\}.

Thus we have, by the inclusion-exclusion principle,

\displaystyle | \{x \in \mathbb{Z}_{\geq 0}^n : ||x||_1 = N\} \setminus U| = \sum_{U' \subset U_{\min}} (-1)^{|U'|}| B(N , U')|.

Thus it is enough to show for any finite {U'}, {\#B(N,U')} is polynomial in {N}, for {N} large enough. This follows from the same stars and bars argument mentioned above, as long as

\displaystyle N \geq \max_{u \in U'} ||u||_{\infty}, \ \ \ \ \ (2)

and Theorem 1 follows. {\spadesuit}

Note that this proof does not give any effective bound on {N_A}, as we do not have any control over the set {U_{\min}}. In particular, one would like to have a bound on the {\ell^{\infty}} norm of the elements of {U_{\min}}, in light of (2). In general, one cannot make Dickson’s lemma quantitative, but in our case we can use the structure of {U_{\min}} to do so. The point is that {U} is defined by linear relations, so one can appeal to Siegel’s lemma.

Proof (sketch) of Theorem 2: We translate {A} so that {0 \in A}, which does not effect the problem (in particular, {w(A)} remains unchanged). We build upon the proof of Theorem 1. Suppose {x \in U_{\min}}. As {x \in U}, there is a {y \in \mathbb{Z}_{\geq 0}^n} such that {||x||_1 = ||y||_1}, {y < x}, and {\phi(x) = \phi(y)}. Thus

\displaystyle \sum_{i \in I} x_i a_i = \sum_{j \in J} y_i a_j.

As {x \in U_{\min}}, one can check that {I\cap J = \emptyset}. We now construct a matrix {M} as follows. The first row has {\#I} 1’s followed by {\#J} {-1}‘s. The remaining {d} rows are give by {(a_i)_{i \in I}} and {(-a_j)_{j \in J}}. Thus

\displaystyle M (x,y)^T = 0 \ \ \ \ \ (3)

One would hope to apply Siegel’s lemma, which asserts that (3) has a solution such that {||(x,y)||_{\infty}} is small. The primary issue is that this small solution, {x^*} may have nothing to do with {(x,y)^T}. However one can translate {(x,y)^T} by a multiple of {x^*} to create a solution that is small in a single coordinate. Then one “forgets” this coordinate and tries proceeds by induction. A secondary issue is that the {x^* \in \mathbb{Z}^n}, may have negative coordinates, but this turns out to not be a critical issue. All of this carried out in section 6 of the aforementioned paper with Granville and Walker. {\spadesuit}

Kuratowski Closure and Topological Spaces

This post was the result of conversations with Chris Gartland (from several years ago). In what follows we outline an alternate definition of a topological space due to Kuratowski.

We recall Kuratowski’s closure axioms for a set {X}. These are a set of axioms which can be used to define a topology on a set. Here, one replaces the notion of open (or closed) sets with a closure operation. The closure operation has the advantage of being defined on all subsets of {X}. For{A \subset X}, one may think of the closure of {A}, denoted {\overline{A}}, to be points that are “close to” {A}.

First we recall what is meant by a topological space. We let {\mathcal{P}(X)} be the power set of {X}, that is the set of all subsets of {X}.

{\ } Definition 1: Let {X} be a set and {T \subset \mathcal{P}(X)}. Then {(X,\mathcal{T})} is a topology with open sets {\mathcal{T}} if

  • (a) {\emptyset , X \in \mathcal{T}},
  • (b) if {A, B \in \mathcal{T}}, then {A \cap B \in \mathcal{T}},
  • (c) if {A_i \in \mathcal{T}} for all {i \in \mathcal{I}}, then {\cup_{i\in \mathcal{I}} A_i \in \mathcal{T}}.

Furthermore, we say

\displaystyle \mathcal{C} := \{X \setminus A : A \in \mathcal{T}\},

are the closed sets of {X}. {\spadesuit} {\ }

Note we could have defined (a), (b), (c) in terms of closed sets, utilizing DeMorgan’s law. We make use of this implicitly below. We now present the definition of Kuratowski closure.

{\ } Definition 2: Let {X} be a set and {\overline{\cdot} : \mathcal{P}(X) \rightarrow \mathcal{P}(X)}. We say {\overline{\cdot}} is a closure operation if

  • (i) {\overline{\emptyset} = \emptyset},
  • (ii) if {A \subset X}, then {A \subset \overline{A}},
  • (iii) if {D = \overline{A}}, then {\overline{D} = D},
  • (iv) if {A,B \subset X}, then {\overline{A\cup B} = \overline{A} \cup \overline{B}}. {\spadesuit}

{\ }

As mentioned above, one can informally think of {\overline{A}} as the set of points in {X} which are “close” to {A}. We now explain that these two definitions are equivalent. To define a closure operation from Definition 1, we set

\displaystyle \overline{A} := \bigcap_{A \subset C \ \text{closed}} C.\ \ \ \ \ (1)

Note that {\overline{A}} is closed by (c). Given a closure operation, we may define the closed sets to be

\displaystyle \mathcal{C} : = \{\overline{A} : A \subset X\}.\ \ \ \ \ (2)

{\ } Proposition 1: Let {(X, \mathcal{T})} be a topological space as in Definition 1. Then the operation defined in (1) is a closure operation in the sense of Definition 2. {\spadesuit} {\ }

Proof: Let

\displaystyle \mathcal{C} = \{X \setminus A : A \in \mathcal{T}\},

be the set of closed sets in {X}. Let {A , B \subset X}. By (1), we have {A \subset \overline{A}} and thus (ii) holds. As {X} is open, {\emptyset \in \mathcal{C}} and so (i) is satisfied. Note for any closed set, {C \in \mathcal{C}}, we have {\overline{C} = C} by (ii) and that {C} itself appears on the right hand side of (1). By (c), we have that

\displaystyle D = \bigcap_{A \subset C \ \text{closed}} C \in \mathcal{C},

and so {\overline{D} = D}. This proves (iii) and so it remains to show (iv). Note that {\overline{A} \cup \overline{B}} is a closed set containing {A \cup B}, by (b). Thus

\displaystyle \overline{A \cup B} \subset \overline{A} \cup \overline{B}.\ \ \ \ \ (3)

Now let {C \in \mathcal{C}} that contains {A \cup B}. Then {A, B \subset C} and so

\displaystyle A , B \subset \ \bigcap_{(A\cup B) \subset C \ \text{closed}} C.

This proves the reverse inclusion of (3) and thus of Proposition 1. {\spadesuit}

We now show that a closure axiom can be used to define a topology.

{\ } Proposition 2: Let {\overline{\cdot} : \mathcal{P}(X) \rightarrow \mathcal{P}(X)} be a closure axiom as in Definition 2. Then the sets defined in (2) form the closed sets of a topology in the sense of Definition 1. {\spadesuit} {\ }

Proof: We implicitly make use of DeMorgan’s law to transition bewtween closed and open sets. By {(i)}, we have {\emptyset \in \mathcal{C}} and by (ii), we have {X \in \mathcal{C}}. Thus {(a)} is established. Let {A, B \in \mathcal{C}}. By (iii), (iv) and (2), we have

\displaystyle A \cup B = \overline{A \cup B} \in \mathcal{C},

which establishes (b). Let {C_i \in \mathcal{C}} for all {i \in \mathcal{I}} some index set. Then we have

\displaystyle \overline{\cap_{i \in \mathcal{I}} C_i}= \cap_{i \in \mathcal{I}} C_i.

Indeed, the backwards inclusion follows from (ii). To see the forward direction, it is enough to show for any {i \in \mathcal{I}},

\displaystyle \overline{\cap_{i \in \mathcal{I}} C_i} \subset \overline{C_i} = C_i.

The second equality is (iii). The first subset inequality follows from (iv) as if {A \subset B} then

\displaystyle \overline{A} \subset \overline{A} \cup \overline{B} = \overline{A \cup B} = \overline{B}.

We apply this with {A = \cup_{i\in \mathcal{I}} C_i} and {B = C_i}. {\spadesuit}

Recall if {(X , \mathcal{T}_X)} and {(Y , \mathcal{T}_Y)} are topological spaces, then

\displaystyle f : X \rightarrow Y,

is said to be continuous if the pre-image of every open set in {Y} is open in {X}, that is

\displaystyle f^{-1}(B) \in \mathcal{T}_X,\ \ \ \ \ (4)

for all open {Y}. As {f^{-1}(Y \setminus B) = X \setminus f^{-1}(B)}, (5) is equivalent to the analogous definition for closed sets. It turns out that we may equivalently define a map to be continuous if

\displaystyle f(\overline{A}) \subset \overline{f(A)}.\ \ \ \ \ (5)

This can be informally interpreted as points that are close to {A} are mapped to points that are close to {f(A)}. We now prove this.

{\ } Proposition 3: Let {f : X \rightarrow Y} be a function. Then {f} is continuous in the sense of (4) if and only if it is continuous in the sense of (5). {\spadesuit} {\ }

Proof: We start with the forward direction. Let {A \subset X}. Then by (4), {f^{-1}(\overline{f(A)})} is closed. As {f(A) \subset \overline{f(A)}}, we have

\displaystyle A \subset f^{-1}(\overline{f(A)}).

Since the right hand side is closed, by (ii) and (iii), we have

\displaystyle \overline{A} \subset f^{-1}(\overline{f(A)}).

Applying {f} to both sides establishes (5).

Now we show the reverse implication. Let {B} be a closed set. Then by (5),

\displaystyle f(\overline{f^{-1}(B)}) \subset \overline{f(f^{-1}(B))} =\overline{B} = B .

As any element which is mapped to {B} by {f} must lie in {f^{-1}(B)}, we have

\displaystyle \overline{f^{-1}(B)} \subset f^{-1}(B).

By (ii), equality holds and thus {f^{-1}(B)} is closed. {\spadesuit}

Exponential Sums along Oblique Lines

Thanks to Changhao Chen, Burak Erdoğan, and Igor Shparlinski for useful discussions surrounding this post.

Let {k} be a positive integer (which we take later to be {\geq 11}). We consider the exponential sum

\displaystyle S(x,y) : = \sum_{n=N}^{2N} e(xn + yn^k),\ \ \ \ \ (1)

where {e(x) : = e^{2\pi i x}}. We are interested in bounds for

\displaystyle \sup_{(x,y) \in \mathcal{L}_z} |S(x,y)|,

where {\mathcal{L}_z \subset [0,1)^2} are a family of sets indexed by some parameter {z\in \mathbb{R}}. For simplicity, we only consider, for fixed {(a,b) \in \mathbb{R}},

\displaystyle \mathcal{L}_z = \{(x,y) \in [0,1)^2 : ax + by = z\}.

We would like to show that for most {z}, (1) is small. This is supported by the usual heuristic that we expect square root cancellation in (1). On the other hand, {S(x,y)} is large for some special values of {(x,y)} (say {(0,0)}), so it certainly matters how the {\mathcal{L}_z} lie in space. Burak Erdoğan and I studied these types of questions (also the topic of this previous blog post) motivated from understanding the fractal dimension of solutions to certain PDE (see the introduction and references within for a complete history). We let

\displaystyle s(k) = k(k-1) + 2 \lfloor \sqrt{2k-2}\rfloor - 2.

{\ } Theorem 1 (Chen-Shparlinski): Let {\epsilon > 0} and

\displaystyle \alpha > 1 - \frac{1}{1 + s(k)}.

Then for a.e. {z \in \mathbb{R}}, with respect to the Lebesgue measure,

\displaystyle \sup_{(x,y) \in \mathcal{L}_z} S(x,y) \ll_z N^{\alpha + \epsilon}, \ \ \ \spadesuit

{\ }

Let us make some remarks before beginning the proof. It is worth noting that for very small {k} we know the best value of {\alpha}. Indeed, Brandes, Parsell, Poulias, Vaughan, and I showed {\alpha = 3/4} is admissible and cannot be improved. In the aforementioned paper of Erdoğan, we mentioned that one could obtain a variant of Theorem 1 by invoking Vinogradov’s mean value theorem, which is a best possible bound for

\displaystyle \int_{[0,1]^k} |\sum_{n=N}^{2N} e(x_1 n + x_2 n^2 + \cdots + x_k n^k)|^{p}.\ \ \ \ \ (2)

The guiding principle is that if an exponential sum is large at a single point, then one can create many other points where the exponential sum is large. On the other hand, there cannot be too many points where this occur as (2) is small. This is a somewhat unsatisfactory approach, as it is not clear that the {k} variable mean value in (2) is the right tool to analyze the two variable {S(x,y)}. Recently, Chen and Shparlinski instead utilized the following two variable mean value type theorem of Wooley, which turns out to improve the bounds a bit and simplify the proof.

{\ } Theorem 2 (Wooley): Suppose {k \geq 11} is an integer. Then for any {\sigma \geq s(k)}

\displaystyle \int_0^1 \int_0^1 |S(x,y)|^{\sigma} dx dy \leq N^{\sigma - k - 1 + o(1)}. \ \ \spadesuit

{\ }

Note that Theorem 2 is best possible, in a certain sense. By considering a small {N^{-1} \times N^{-k}} rectangle near {(0,0)}, we see

\displaystyle \int_0^1 \int_0^1 |S(x,y)|^{\sigma} dx dy \gg N^{\sigma - k - 1}.

Thus Theorem 2 cannot be improved much, for the values of {\sigma} for which it applies. It is not clear that the range of {\sigma} is best possible. A natural conjecture is that Theorem 2 holds for

\displaystyle \sigma > 2(k+1) .

Such an improvement would improve Theorem 1.

Proof of Theorem 1: We only prove for {N=2^j}, for simplicity. The reader may consult Chen and Shparlinski’s paper for the general case, where the additional idea is to employ the completion technique.

Let {\epsilon > 0} and {0 < \alpha < 1}. We partition {[0,1)^2} into a grid of {O(N^{2 \alpha - k - 3 - 2 \epsilon})} small rectangles of size approximately

\displaystyle N^{\alpha - 2 - \epsilon} \times N^{\alpha - k - 1 - \epsilon}.

We label these rectangles by

\displaystyle \cup_{R \in \mathcal{R}} R.

The point is that (1) is does not change much on such rectangles. Indeed it is easy to check, using {e(x) = 1 + O(x)}, that (for {N} large enough) if

\displaystyle |S(x,y)| \geq N^{\alpha},

for some {(x,y) \in R}, then

\displaystyle |S(x',y')| \geq N^{\alpha}/2,

for any {(x',y') \in R}. We let {\mathcal{R}_{\alpha} \subset \mathcal{R}} consist of the rectangles {R} such that there is a {(x,y) \in \mathcal{R}} with {|S(x,y)| \geq N^{\alpha}}. Combining this with the mean value estimate in Theorem 2, we see that {\#\mathcal{R}_{\alpha}} cannot be too large.

Indeed, Markov’s inequality and Theorem 2, we see that for {\sigma \geq s(k)},

\displaystyle N^{2 \alpha -2 -k - 1 - 2 \epsilon} N^{\alpha \sigma} \#\mathcal{R}_{\alpha} \leq \int_0^1 \int_0^1 |S(x,y)|^{\sigma} dx dy \leq N^{\sigma - k - 1 + o(1)}.

Thus

\displaystyle \#\mathcal{R}_{\alpha} \leq N^{(1-\alpha)\sigma -2\alpha+2 + 2 \epsilon + o(1)}.\ \ \ \ \ (3)

We now consider the image of these rectangles under the map

\displaystyle (x,y) \mapsto ax + by.

We have

\displaystyle \{z\in \mathbb{R} : |S(x,y)| \geq N^{\alpha}, \ \ \text{for some} \ (x,y) \in \mathcal{L}_z\} \subset f\left(\bigcup_{R \in \mathcal{R}_{\alpha}}R\right) = \bigcup_{R \in \mathcal{R}_{\alpha}}f(R),

where

\displaystyle f(x,y) = ax+by.

Note that {f} does not distort rectangles too much, so that

\displaystyle \lambda(f(R)) \ll_{a,b} N^{\alpha - 2 + \epsilon}.

where {\lambda} is the Lebesgue measure. Thus by subadditivity of the Lebesgue measure and (3)

\displaystyle \lambda(\{z\in \mathbb{R} : |S(x,y)| \geq N^{\alpha}, \ \ \text{for some} \ (x,y) \in \mathcal{L}_z\} )\ll_{a,b} N^{\alpha - 2 +\epsilon} N^{(1-\alpha)\sigma -2\alpha+2 + 2 \epsilon + o(1)}.

Note here that {N} is fixed. What we actually care about is what happens for a fixed {z} and {N \geq N(z)} for some large {N(z)}. There is a standard trick from probability (or analysis) to apply the Borel-Cantelli lemma. We first apply the above result with {N = 2^j} to find

\displaystyle \lambda(\{z\in \mathbb{R} : |S(x,y)| \geq 2^{j\alpha}, \ \ \text{for some} \ (x,y) \in \mathcal{L}_z\} ) \leq 2^{j((1-\alpha)\sigma - \alpha + 2 \epsilon )}.\ \ \ \ \ (4)

By the Borel-Cantelli lemma, if

\displaystyle \sum_{j=1}^{\infty} 2^{j((1-\alpha)\sigma - \alpha + 2 \epsilon )}< \infty,

then the set of {z} such that (4) holds for infinitely many {j} has measure zero. This is implied by

\displaystyle (1-\alpha)\sigma - \alpha < 0,

as long as {\epsilon} is sufficiently small. This, in turn, is implied by

\displaystyle \alpha > \frac{\sigma}{\sigma + 1} = 1 - \frac{1}{\sigma + 1} . \ \ \ \spadesuit

A large gap in a dilate of a set

I recently uploaded “a large gap in a dilate of a set,” to the arXiv. We prove the following.

{\ } Theorem 1: Let {A \subset \mathbb{F}_p} with at least two elements. Suppose {N \leq 2p/|A| -2}. Then there is a {x \in \mathbb{F}_p} and {d \in \mathbb{F}_p^{\times}} such that

\displaystyle (d\cdot A + x ) \cap \{1 , \ldots , N\} = \emptyset.  \ \ \ \spadesuit

{\ }

As the note is only 3 pages, we do not remark on the proof (which uses the polynomial method) but elaborate on some related topics. Note by the pigeon-hole principle, Theorem 1 is true for every {d} if we only insist {N \geq p/|A| - 1}. Thus we have effectively doubled the bound coming from the pigeon hole principle. Note without dilation, Theorem 1 is false, as one can take {|A|} equally spaced elements.

One can ask what happens for Theorem 1 if one does not allow translation by {x}. It turns out that one cannot hope to go beyond {N \geq 2p/|A|}, as it was shown in this paper of Chen, Shparlinski and Winterhof, using the example

\displaystyle A = \{1 , \ldots , p/N\} \cup -\{1 , \ldots , p/N\} .

It is an interesting question to decide to what extent Theorem 1 is true with translation by {x}. We remark this is in a similar spirit to the lonely runner conjecture.

It would be nice if Theorem 1 were true for {N \geq C p/|A|} for all {C}, even in the special case when {|A| \sim \sqrt{p}}. Certainly this is true for a random set, without the need for dilation.

In particular, this would give us hope in answering an old question of Erdös, which we recall. A Sidon set in a abelian group is a set such that {a+b = c+d} with {a,b,c,d \in A} implies {\{a,b\} = \{c,d\}}. Let {r_2(N)} be the largest Sidon set contained in {\{1 , \ldots , N\}}. Erdös asked if

\displaystyle r_2(N) = N^{1/2} + O(1).

There are constructions of Sidon sets of size {N^{1/2}} (for some {N}) coming from { \mathbb{Z} /N \mathbb{Z}} for well-chosen {N}. The hope would be to dilate the set in {\mathbb{Z}/N \mathbb{Z}} so there is a large gap of size {g}, thus finding a Sidon set of inside of {\{1 , \ldots , N-g\}} in place of {\{1 , \ldots , N\}}. It is actually not known if we can take {N } to be a prime in the modular construction, which may be found in this nice survey of O’ Bryant. This is certainly a question of interest.

On the other hand, one can hope to improve Theorem 1 for some of these constructions. It turns out one can easily check that Ruzsa’s construction which is the set {A \subset \mathbb{Z}/ (p(p-1)) \mathbb{Z}} does not admit large gaps. Indeed the set has size {\sim p} but any dilate of {A} does not contain a gap significantly longer than {2p}. This also shows Theorem 1 is false for general cyclic groups. The point is that in his construction the natural projection to {\mathbb{Z}/(p-1)\mathbb{Z}} maps {A} surjectively.

This seems to be a bit of a red herring in the application to Sidon sets. On the other hand, for the well-known construction of Bose-Chowla (contained in the aforementioned survey), the analog of Theorem 1 is true and there is no reason to suspect that it cannot be improved. In fact, in this case a proof also proceeds by the polynomial method.