Back in 2005, I rewrote Szemerédi’s original proof in order to understand it better, however my rewrite ended up being about the same length as the original argument and was probably only usable to myself. In 2012, after Szemerédi was awarded the Abel prize, I revisited this argument with the intention to try to write up a more readable version of the proof, but ended up just presenting some ingredients of the argument in a blog post, rather than try to rewrite the whole thing. In that post, I suspected that the cleanest way to write up the argument would be through the language of nonstandard analysis (perhaps in an iterated hyperextension that could handle various hierarchies of infinitesimals), but was unable to actually achieve any substantial simplifications by passing to the nonstandard world.

A few weeks ago, I participated in a week-long workshop at the American Institute of Mathematics on “Nonstandard methods in combinatorial number theory”, and spent some time in a working group with Shabnam Akhtari, Irfam Alam, Renling Jin, Steven Leth, Karl Mahlburg, Paul Potgieter, and Henry Towsner to try to obtain a manageable nonstandard version of Szemerédi’s original proof. We didn’t end up being able to do so – in fact there are now signs that perhaps nonstandard analysis is not the optimal framework in which to place this argument – but we did at least clarify the existing standard argument, to the point that I was able to go back to my original rewrite of the proof and present it in a more civilised form, which I am now uploading here as an unpublished preprint. There are now a number of simplifications to the proof. Firstly, one no longer needs the full strength of the regularity lemma; only the simpler “weak” regularity lemma of Frieze and Kannan is required. Secondly, the proof has been “factored” into a number of stand-alone propositions of independent interest, in particular involving just (families of) one-dimensional arithmetic progressions rather than the complicated-looking multidimensional arithmetic progressions that occur so frequently in the original argument of Szemerédi. Finally, the delicate manipulations of densities and epsilons via double counting arguments in Szemerédi’s original paper have been abstracted into a certain key property of families of arithmetic progressions that I call the “double counting property”.

The factoring mentioned above is particularly simple in the case of proving Roth’s theorem, which is now presented separately in the above writeup. Roth’s theorem seeks to locate a length three progression in which all three elements lie in a single set. This will be deduced from an easier variant of the theorem in which one locates (a family of) length three progressions in which just the first two elements of the progression lie in a good set (and some other properties of the family are also required). This is in turn derived from an even easier variant in which now just the first element of the progression is required to be in the good set.

More specifically, Roth’s theorem is now deduced from

Theorem 1.5. Let be a natural number, and let be a set of integers of upper density at least . Then, whenever is partitioned into finitely many colour classes, there exists a colour class and a family of 3-term arithmetic progressions with the following properties:

- For each , and lie in .
- For each , lie in .
- The for are in arithmetic progression.

The situation in this theorem is depicted by the following diagram, in which elements of are in blue and elements of are in grey:

Theorem 1.5 is deduced in turn from the following easier variant:

Theorem 1.6. Let be a natural number, and let be a set of integers of upper density at least . Then, whenever is partitioned into finitely many colour classes, there exists a colour class and a family of 3-term arithmetic progressions with the following properties:

- For each , lie in .
- For each , and lie in .
- The for are in arithmetic progression.

The situation here is described by the figure below.

Theorem 1.6 is easy to prove. To derive Theorem 1.5 from Theorem 1.6, or to derive Roth’s theorem from Theorem 1.5, one uses double counting arguments, van der Waerden’s theorem, and the weak regularity lemma, largely as described in this previous blog post; see the writeup for the full details. (I would be interested in seeing a shorter proof of Theorem 1.5 though that did not go through these arguments, and did not use the more powerful theorems of Roth or Szemerédi.)

]]>

Suppose we take a subset of the natural numbers and look at the sum of the reciprocals of its elements:

One question we can ask here is: for which subsets does this sum converge? Though there are many routes one can take to try to answer this question, we look towards density.

The *natural density* of is defined to be

We think of this essentially as the probability of choosing an element of from the natural numbers. Let’s do some examples!

- We certainly have
- If then
- Let There are elements of less than so

- Let Then we need to know how many squares there are less than . If , then , so there are squares less than . Therefore

Notice that (being the harmonic series) diverges and Also, (being the Riemann zeta function evaluated at ) converges to and we just showed Perhaps this is our connection!

**(Wishful-Thinking “Theorem”) **For any ,

As the name of the above “theorem” hints at, this statement is not true. Our counterexample is a prevalent character: the primes. Recall the prime number theorem, which states that the number of primes less than some number is asymptotically Then the natural density of the primes must be

But in 1737, Euler proved that diverges! So our wishful-thinking theorem is not true. We call a set *bad* if it doesn’t satisfying our wishful-thinking theorem.

To me, this failure means that the natural numbers are too “large” to detect all sets whose sum of reciprocals diverge. So what if we shrank it? For let’s define the *relative density* of in to be

Like the natural density, we think of the relative density of in to be the probability that we choose an element of out of In fact, notice that

As such, we get similar properties as the natural density, with some interesting twists!

- If then
- If then
- If , then
- If and , then
- Most generally, given any we have

As all other properties follow from 5, we will only prove that fact.

**(Proof of 5) ** This is essentially just a reordering of multiplication:

We shift all denominators to the right (cycling at the end) and find

Our main purpose in studying relative density is the hope that it will lead to an indicator set. We call a set an *indicator set* if for all we have

So is an indicator set if our wishful-thinking “theorem” is true when is replaced by We call it this because while is “too large”, is “small enough” to indicate the convergence of the sum of reciprocals of elements of There are a lot of interesting properties that indicator sets must have if they exist, but it turns out we can prove their nonexistence with two simple propositions.

**(Proposition 1)** If is an indicator set, then

**(Proof) **If we must have implying But so by the definition of , we must have This is a contradiction, so

The second proposition is not even a statement on indicator sets, but on relative density in general.

**(Proposition 2) **Given any , we have

**(Proof) ** We have

We swap the denominators to find

We can now prove our main theorem!

**(Main Theorem)** There does not exist an indicator set. That is, there does not exist a set such that for all , we have if and only if

**(Proof)** Suppose there exists an indicator set Let be any bad set (like the primes, for example). Then proposition 1 and 2 together tell us that Since is bad, we have and But since we also have Together, these give

which is clearly a contradiction. Therefore, cannot exist.

Note that a “left-indicator set” (as opposed to the “right-indicator set” discussed above) can quickly be shown to not exist as well. Suppose was a set such that for all

Then property 1 tells us that for any subset we have which would imply that for all subsets which certainly cannot be true.

While the proof of the nonexistence of these indicators sets ended up being fairly short, there are still interesting things going on here. There has been a lot of work relating natural density, sum of reciprocals, and the containment of arbitrarily long arithmetic progressions (ALAPs). These relate as follows.

If the text inside a box is green, it indicates that the row implies the column (with the text referencing the proof). If it is red, then it implies the row does not imply the column (with the text giving a counterexample). Blue text indicates that the implication is currently an open question.

- X – I have had trouble finding this in the literature, here is a stackexchange post with a proof.
- ST – Szemerédi’s theorem,
- P – Primes,
- E – Erdős’ conjecture,
- GT – Green-Tao theorem,
- Y – Take the set . This set contains arithmetic progressions of arbitrary length but its sum of reciprocals converges, since

Clearly there’s a lot to look at here, so hopefully this is not the last time you will read about relative densities on this blog!

Thanks for reading,

Jonathan Gerhard

]]>For any natural number , define to be the largest cardinality of a subset of which does not contain any non-trivial arithmetic progressions of length four (where “non-trivial” means that is non-zero). Trivially we have . In 1969, Szemerédi showed that . However, the decay rate that could be theoretically extracted from this argument (and from several subsequent proofs of this bound, including one by Roth) were quite poor. The first significant quantitative bound on this quantity was by Gowers, who showed that for some absolute constant . In the second paper in the above-mentioned series, we managed to improve this bound to . In this paper, we improve the bound further to , which seems to be the limit of the methods. (We remark that if we could take to be larger than one, this would imply the length four case of a well known conjecture of Erdös that any set of natural numbers whose sum of reciprocals diverges would contain arbitrarily long arithmetic progressions. Thanks to the work of Sanders and of Bloom, the corresponding case of the conjecture for length three conjectures is nearly settled, as it is known that for the analogous bound on one can take any less than one.)

Most of the previous work on bounding relied in some form or another on the *density increment argument* introduced by Roth back in 1953; roughly speaking, the idea is to show that if a dense subset of fails to contain arithmetic progressions of length four, one seeks to then locate a long subprogression of in which has increased density. This was the basic method for instance underlying our previous bound , as well as a finite field analogue of the bound ; however we encountered significant technical difficulties for several years in extending this argument to obtain the result of the current paper. Our method is instead based on “energy increment arguments”, and more specifically on establishing quantitative version of a Khintchine-type recurrence theorem, similar to the qualitative recurrence theorems established (in the ergodic theory context) by Bergelson-Host-Kra, and (in the current combinatorial context) by Ben Green and myself.

One way to phrase the latter recurrence theorem is as follows. Suppose that has density . Then one would expect a “randomly” selected arithmetic progression in (using the convention that random variables will be in boldface) to be contained in with probability about . This is not true in general, however it was shown by Ben and myself that for any , there was a set of shifts of cardinality , such that for any such one had

if was chosen uniformly at random from . This easily implies that , but does not give a particularly good bound on the decay rate, because the implied constant in the cardinality lower bound is quite poor (in fact of tower-exponential type, due to the use of regularity lemmas!), and so one has to take to be extremely large compared to to avoid the possibility that the set of shifts in the above theorem consists only of the trivial shift .

We do not know how to improve the lower bound on the set of shifts to the point where it can give bounds that are competitive with those in this paper. However, we can obtain better quantitative results if we permit ourselves to *couple* together the two parameters and of the length four progression. Namely, with , , as above, we are able to show that there exist random variables , not necessarily independent, such that

and such that we have the non-degeneracy bound

This then easily implies the main theorem.

The energy increment method is then deployed to locate a good pair of random variables that will obey the above bounds. One can get some intuition on how to proceed here by considering some model cases. Firstly one can consider a “globally quadratically structured” case in which the indicator function “behaves like” a globally quadratic function such as , for some irrational and some smooth periodic function of mean . If one then takes to be uniformly distributed in and respectively for some small , with no coupling between the two variables, then the left-hand side of (1) is approximately of the form

where the integral is with respect to the probability Haar measure, and the constraint ultimately arises from the algebraic constraint

However, an application of the Cauchy-Schwarz inequality and Fubini’s theorem shows that the integral in (2) is at least , which (morally at least) gives (1) in this case.

Due to the nature of the energy increment argument, it also becomes necessary to consider “locally quadratically structured” cases, in which is partitioned into some number of structured pieces (think of these as arithmetic progressions, or as “Bohr sets), and on each piece , behaves like a locally quadratic function such as , where now varies with , and the mean of will be approximately on the average after averaging in (weighted by the size of the pieces ). Now one should select and in the following coupled manner: first one chooses uniformly from , then one defines to be the label such that , and then selects uniformly from a set which is related to in much the same way that is related to . If one does this correctly, the analogue of (2) becomes

and one can again use Cauchy-Schwarz and Fubini’s theorem to conclude.

The general case proceeds, very roughly, by an iterative argument. At each stage of the iteration, one has some sort of quadratic model of which involves a decomposition of into structured pieces , and a quadratic approximation to on each piece. If this approximation is accurate enough (or more precisely, if a certain (averaged) local Gowers uniformity norm of the error is small enough) to model the count in (1) (for random variables determined by the above partition of into pieces ), and if the frequencies (such as ) involved in the quadratic approximation are “high rank” or “linearly independent over the rationals” in a suitably quantitative sense, then some version of the above arguments can be made to work. If there are some unwanted linear dependencies in the frequencies, we can do some linear algebra to eliminate one of the frequencies (using some geometry of numbers to keep the quantitative bounds under control) and continue the iteration. If instead the approximation is too inaccurate, then the error will be large in a certain averaged local Gowers uniformity norm . A significant fraction of the paper is then devoted to establishing a quantitative *inverse theorem* for that norm that concludes (with good bounds) that the error must then locally correlate with locally quadratic phases, which can be used to refine the quadratic approximation to in a manner that significantly increases its “energy” (basically an norm). Such energy increments cannot continue indefinitely, and when they terminate we obtain the desired claim.

There are existing inverse theorems for type norms in the literature, going back to the pioneering work of Gowers mentioned previously, and relying on arithmetic combinatorics tools such as Freiman’s theorem and the Balog-Szemerédi-Gowers lemma, which are good for analysing the “-structured homomorphisms” that arise in Gowers’ argument. However, when we applied these methods to the local Gowers norms we obtained inferior quantitative results that were not strong enough for our application. Instead, we use arguments from a different paper of Gowers in which he tackled Szemerédi’s theorem for arbitrary length progressions. This method produces “-structured homomorphisms” associated to any function with large Gowers uniformity norm; however the catch is that such homomorphisms are initially supported only on a sparse unstructured set, rather than a structured set such as a Bohr set. To proceed further, one first has to locate inside the sparse unstructured set a sparse *pseudorandom* subset of a Bohr set, and then use “error-correction” type methods (such as “majority-vote” based algorithms) to locally upgrade this -structured homomorphism on pseudorandom subsets of Bohr sets to a -structured homomorphism on the entirety of a Bohr set. It is then possible to use some “approximate cohomology” tools to “integrate” these homomorphisms (and discern a key “local symmetry” property of these homomorphisms) to locate the desired local quadratic structure (in much the same fashion that a -form on that varies linearly with the coordinates can be integrated to be the derivative of a quadratic function if we know that the -form is closed). These portions of the paper are unfortunately rather technical, but broadly follow the methods already used in previous literature.

The conference opened with a talk by Yoav Segev on his construction, with Eliahu Rips and Katrin Tent, of infinite non-split sharply 2-transitive groups.

A permutation group is sharply 2-transitive if any pair of distinct elements of the domain can be mapped to any other such pair by a unique element of the group. All such finite groups are known, and indeed it is easy to prove that such a finite group is “split”, that is, has a transitive abelian normal subgroup (and hence is the one-dimensional affine group over a finite nearfield – all finite nearfields were determined by Zassenhaus in the 1930s). It is a long-standing open problem whether there exist non-split infinite sharply 2-transitive groups.

Sharply 2-transitive groups give rise to a special low-rank class of independence algebras, a topic that Csaba Szabó and I worked on during my first visit to Budapest more than 22 years ago (at a time when the third Macdonalds had only just opened in Budapest). So it was very interesting to me to hear about this new construction.

The construction is remarkably easy. Yoav took us through the entire thing, apart from some calculations with normal forms in HNN extensions and free products. But the examples seem to be very diverse. In fact they show that any group whatever can be embedded as a subgroup in a sharply 2-transitive group.

In a sharply 2-transitive group, any two points are interchanged by a unique element, which is an involution; all involutions are conjugate, and so all fix the same number of points, which is either 0 or 1. Their construction deals with the former case (which they call “characteristic 2”). It can be formulated in group-theoretic terms. The final result *G* is a group with a subgroup *A* having the properties that any two conjugates of *A* intersect in the identity, that there are only two *A*–*A* double cosets in *G*, and that *A* contains no involutions. (*A* is the point stabiliser.) So start with any pair *G*_{0}, *A*_{0} having the first and third properties; if there are more than two double cosets, add a new generator to unify two of them. This may create new double cosets, but “in the limit” (repeating the construction enough times) the tortoise catches up with the hare and all the double cosets outside *A* are pulled into one. Now to get the announced result, we can take *G*_{0} to be any group and *A*_{0} to be the trivial group.

Balázs Szegedy gave us a very interesting talk on how nilpotent groups force themselves into additive combinatorics (e.g. Szemerédi’s theorem on arithmetic progressions in dense sets of integers) whether we like it or not. Roth proved the theorem for 3-term arithmetic progressions using Fourier analysis; a similar proof of Szemerédi’s theorem involves “higher order Fourier analysis”. Balázs has axiomatised the appropriate objects (which he calls *nilspaces*) in terms of “cubes”. The axioms are simple enough but the objects captured include nilmanifolds. I cannot really do justice to the talk in a single paragraph; but the claim is that shadows of these objects already appear in Szemerédi’s proof (although it appears to be just complicated combinatorics).

The day ended with Evgeny Vdovin talking about the conjecture that, if a transitive finite permutation group with no soluble normal subgroup has the property that the point stabiliser is soluble, then the group has a base of bounded size (that is, there is a set of points of bounded size whose pointwise stabiliser is the identity). The bound has been variously conjectured to be 7 or 5. This is almost certain to be true. Evgeny’s method shows the importance of choosing the right induction hypothesis. He works down a specially chosen composition series for the group, but the induction hypothesis “there is a base of size *k* cannot be made to work (there are counterexmples). Instead, he has to take the hypothesis “there are at least 5 regular orbits on *k*-tuples”. Why 5? I don’t know, but it seems to work!

Theorem 1 (Szemerédi’s theorem)Let be a positive integer, and let be a function with for some , where we use the averaging notation , , etc.. Then for we havefor some depending only on .

The equivalence is basically thanks to an averaging argument of Varnavides; see for instance Chapter 11 of my book with Van Vu or this previous blog post for a discussion. We have removed the cases as they are trivial and somewhat degenerate.

There are now many proofs of this theorem. Some time ago, I took an ergodic-theoretic proof of Furstenberg and converted it to a purely finitary proof of the theorem. The argument used some simplifying innovations that had been developed since the original work of Furstenberg (in particular, deployment of the Gowers uniformity norms, as well as a “dual” norm that I called the uniformly almost periodic norm, and an emphasis on van der Waerden’s theorem for handling the “compact extension” component of the argument). But the proof was still quite messy. However, as discussed in this previous blog post, messy finitary proofs can often be cleaned up using nonstandard analysis. Thus, there should be a nonstandard version of the Furstenberg ergodic theory argument that is relatively clean. I decided (after some encouragement from Ben Green and Isaac Goldbring) to write down most of the details of this argument in this blog post, though for sake of brevity I will skim rather quickly over arguments that were already discussed at length in other blog posts. In particular, I will presume familiarity with nonstandard analysis (in particular, the notion of a standard part of a bounded real number, and the Loeb measure construction), see for instance this previous blog post for a discussion.

By routine “compactness and contradiction” arguments (as discussed in this previous post), Theorem 1 can be deduced from the following nonstandard variant:

Theorem 2Let be a nonstandard positive integer, let be the nonstandard cyclic group , and let be an internal function with . Then for any standard ,Here of course the averaging notation is interpreted internally.

Indeed, if Theorem 1 failed, one could create a sequence of functions of density at least for some fixed , and a fixed such that

taking ultralimits one can then soon obtain a counterexample to Theorem 2.

It remains to prove Theorem 2. Henceforth is a fixed nonstandard positive integer, and . By the Loeb measure construction (discussed in this previous blog post), one can give the structure of a probability space (the *Loeb space* of ), such that every internal subset of is (Loeb) measurable with

which implies that any bounded internal function has standard part which is (Loeb) measurable with

Conversely, a countable saturation argument shows that any function in is equal almost everywhere to the standard part of a bounded internal function.

From Hölder’s inequality we see that the -linear form

vanishes if one of the has standard part vanishing almost everywhere. As such, we can (by abuse of notation) extend this -linear form to functions that are elements of , rather than bounded internal functions. With this convention, we see that Theorem 2 is equivalent to the following assertion.

Theorem 3For any non-negative with , one has for any standard ,

The next step is to introduce the *Gowers-Host-Kra uniformity seminorms* , defined for by the formula

where is any bounded internal function whose standard part equals almost everywhere. From Hölder’s inequality one can see that the exact choice of does not matter, so that this seminorm is well-defined. (It is indeed a seminorm, but we will not need this fact here.)

We have the following application of the van der Corput inequality:

Theorem 4 (Generalised von Neumann theorem)Let be standard. For any with for some , one has

This estimate is proven in numerous places in the literature (e.g. Lemma 11.4 of my book with Van Vu, or Exercise 23 of this blog post) and will not be repeated here. In particular, from multilinearity we see that

Dual to the Gowers norms are the uniformly almost periodic norms . Let us first define the internal version of these norms. We define to be the space of constant internal functions , with internal norm . Once is defined for some , we define to be the internal normed vector space of internal functions for which there exists a nonstandard real number , an internally finite non-empty set , an internal family of internal functions bounded in magnitude by one for each , and an internal family of internal functions in the unit ball of such that one had the representation

for all , where is the shift of by . The internal infimum of all such is then the norm of . This gives each of the the structure of an internal shift-invariant Banach algebra; see Section 5 of . The norms also controlled the supremum norm:

In particular, if we write for the space of standard parts of internal functions of bounded norm in , then is an (external) Banach algebra contained (as a real vector space) in . For , we can then define a factor of to be the probability space , where is the subalgebra of consisting of those sets such that lies in the closure of . This is easily seen to be a shift-invariant -algebra, and so is a factor.

We have the following key *characteristic factor* relationship:

Theorem 5Let with . Then .

In fact the converse implication is true also (making the *universal characteristic factor* for the seminorm), but we will not need this direction of the implication.

*Proof:* Suppose for contradiction that ; we can normalise . Writing for some bounded internal , we then see that has a non-zero inner product with , where the *dual function* for is the bounded internal function

From the easily verified identity

and a routine induction, we see that lies in the unit ball of , and so is measurable with respect to . By hypothesis this implies that is orthogonal to , a contradiction.

In view of the above theorem and (1), we may replace by without affecting the average in Theorem 3. Thus that theorem is equivalent to the following.

Theorem 6Let and be standard. Then for any non-negative with , one has

We only apply this theorem in the case and , but for inductive purposes it is convenient to decouple the two parameters.

We prove Theorem 6 by induction on (allowing to be arbitrary). When , the claim is obvious for any because all functions in are essentially constant. Now suppose that and that the claim has already been proven for .

Let be a nonnegative function whose mean is positive; we may normalise to take values in . Let be standard, and let be a sufficiently small standard quantity depending on to be chosen later (one could for instance take , but we will not attempt to optimise in ). As is -measurable, one can find an internal function with and bounded norm such that . (Note though that while the norm of is bounded, this bound could be extremely large compared to , , .)

Set . We define the relative inner product for by the formula

and the relative norm

This gives the structure of a (pre-)Hilbert module over , as discussed in this previous blog post.

A crucial point is that the function is *relatively almost periodic* over the previous characteristic factor , in the following sense.

Proposition 7 (Relative almost periodicity)There exists a standard natural number and functions in the unit ball of with the following “relative total boundedness” property: for any , there exists a -measurable function such that almost everywhere (where is short-hand for ).

*Proof:* This will be a relative version of the standard analysis fact that integral operators on finite measure spaces with bounded kernel are in the Hilbert-Schmidt class, and thus compact.

By construction, there exists an internally finite non-empty set , an internal collection of internal functions that are uniformly bounded in , and an internal collection of internal functions that are uniformly bounded in , such that

for all . Note in particular that the all lie in a bounded subset of , and the all lie in a bounded subset of .

We give the -algebra generated from the standard parts of bounded internal functions such that the standard parts of all lie in a bounded subset of ; this gives a probability space that extends the product measure of and . We define an operator as follows. If , then is the standard part of some bounded internal function . We then define by the formula

This can easily be seen to not depend on the choice of , and defines a -linear operator (embedding into both and in the obvious fashion). Note that lies in the range of applied to a function in the unit ball of .

Now we claim that this operator is *relatively Hilbert-Schmidt* over , in the sense that there exists a finite bound such that

for all finite collections of functions that are relatively orthonormal over in the sense that

and

for all and . Indeed, the left-hand side of (4) may be expanded first as

for some sequence in with , and then as

where we use Loeb measure on and is the function , and are lifted up to in the obvious fashion. By Cauchy-Schwarz and the boundedness of , we can bound this by

But the are relatively orthonormal over (this reflects the relative orthogonality of and over ), so that

and the claim follows from the hypotheses on .

Using the relative spectral theorem for relative Hilbert-Schmidt operators (see Corollary 17 of this blog post), we may thus find relatively orthonormal systems in and respectively over and a non-increasing sequence of non-negative coefficients (the relative singular values) with almost everywhere, such that we have the spectral decomposition

wiht the sum converging in . (If were standard Borel spaces, one could deduce this theorem from the usual spectral theorem for Hilbert-Schmidt operators using disintegration. Loeb spaces are certainly not standard Borel, but as discussed in the linked blog post above, one can adapt the *proof* of the spectral theorem to the relative setting without using the device of disintegration.

Since and the are decreasing, one can find an such that almost everywhere for all . For in the unit ball of , this lets one approximate by the finite rank operator to within almost everywhere in norm. If one rounds to the nearest multiple of for each , and lets be the collection of linear combinations of the form with a multiple of , we obtain the claim.

We return to the proof of (2). Since and , we have

if is small enough. In particular there is a -measurable set of measure at least such that on . Since

we see from Markov’s inequality (for small enough ) that there is a -measurable subset of of measure at least such that

for the relative norm. In particular we have

Let be a sufficiently large standard natural number (depending on and the quantity from Proposition 7), in fact it will essentially be a van der Waerden number of these inputs) to be chosen later. Applying the induction hypothesis, we have

In particular, there is a standard , such that for in a subset of of measure at least , we have

or equivalently that the set

has measure at least .

Let be as above, and let be the functions from Proposition 7. Then for , we can find a measurable function such that

almost everywhere on , hence by (5) we have

almost everywhere on . From this and the relative Hölder inequality, we see that

a.e. on whenever .

Now, for large enough, we see from van der Warden’s theorem that there exist measurable such that

almost everywhere in , and hence in (this can be seen by partitioning into finitely many pieces, with each of the constant on each of these pieces). For that choice of we have

and

and thus

almost everywhere on . But from (6) one has

a.e. on , so from Hölder’s inequality we have (for sufficiently small) that

From non-negativity of , this implies that

which on integrating in gives

Averaging in , we conclude that

Shifting by , we conclude that

Dilating by (and noting that the map is at most -to-one on ), we conclude that

and (2) follows.

]]>

Theorem 1 (Szemerédi’s theorem in the primes)Let be a subset of the primes of positive relative density, thus . Then contains arbitrarily long arithmetic progressions.

This result was based in part on an earlier paper of Green that handled the case of progressions of length three. With the primes replaced by the integers, this is of course the famous theorem of Szemerédi.

Szemerédi’s theorem has now been generalised in many different directions. One of these is the multidimensional Szemerédi theorem of Furstenberg and Katznelson, who used ergodic-theoretic techniques to show that any dense subset of necessarily contained infinitely many constellations of any prescribed shape. Our main result is to relativise that theorem to the primes as well:

Theorem 2 (Multidimensional Szemerédi theorem in the primes)Let , and let be a subset of the Cartesian power of the primes of positive relative density, thusThen for any , contains infinitely many “constellations” of the form with and a positive integer.

In the case when is itself a Cartesian product of one-dimensional sets (in particular, if is all of ), this result already follows from Theorem 1, but there does not seem to be a similarly easy argument to deduce the general case of Theorem 2 from previous results. Simultaneously with this paper, an independent proof of Theorem 2 using a somewhat different method has been established by Cook, Maygar, and Titichetrakun.

The result is reminiscent of an earlier result of mine on finding constellations in the Gaussian primes (or dense subsets thereof). That paper followed closely the arguments of my original paper with Ben Green, namely it first enclosed (a W-tricked version of) the primes or Gaussian primes (in a sieve theoretic-sense) by a slightly larger set (or more precisely, a weight function ) of *almost primes* or *almost Gaussian primes*, which one could then verify (using methods closely related to the sieve-theoretic methods in the ongoing Polymath8 project) to obey certain pseudorandomness conditions, known as the *linear forms condition* and the *correlation condition*. Very roughly speaking, these conditions assert statements of the following form: if is a randomly selected integer, then the events of simultaneously being an almost prime (or almost Gaussian prime) are approximately independent for most choices of . Once these conditions are satisfied, one can then run a *transference argument* (initially based on ergodic-theory methods, but nowadays there are simpler transference results based on the Hahn-Banach theorem, due to Gowers and Reingold-Trevisan-Tulsiani-Vadhan) to obtain relative Szemerédi-type theorems from their absolute counterparts.

However, when one tries to adapt these arguments to sets such as , a new difficulty occurs: the natural analogue of the almost primes would be the Cartesian square of the almost primes – pairs whose entries are both almost primes. (Actually, for technical reasons, one does not work directly with a set of almost primes, but would instead work with a weight function such as that is concentrated on a set such as , but let me ignore this distinction for now.) However, this set does *not* enjoy as many pseudorandomness conditions as one would need for a direct application of the transference strategy to work. More specifically, given any fixed , and random , the four events

do *not* behave independently (as they would if were replaced for instance by the Gaussian almost primes), because any three of these events imply the fourth. This blocks the transference strategy for constellations which contain some right-angles to them (e.g. constellations of the form ) as such constellations soon turn into rectangles such as the one above after applying Cauchy-Schwarz a few times. (But a few years ago, Cook and Magyar showed that if one restricted attention to constellations which were in general position in the sense that any coordinate hyperplane contained at most one element in the constellation, then this obstruction does not occur and one can establish Theorem 2 in this case through the transference argument.) It’s worth noting that very recently, Conlon, Fox, and Zhao have succeeded in removing of the pseudorandomness conditions (namely the correlation condition) from the transference principle, leaving only the linear forms condition as the remaining pseudorandomness condition to be verified, but unfortunately this does not completely solve the above problem because the linear forms condition also fails for (or for weights concentrated on ) when applied to rectangular patterns.

There are now two ways known to get around this problem and establish Theorem 2 in full generality. The approach of Cook, Magyar, and Titichetrakun proceeds by starting with one of the known proofs of the multidimensional Szemerédi theorem – namely, the proof that proceeds through hypergraph regularity and hypergraph removal – and attach pseudorandom weights directly within the proof itself, rather than trying to add the weights to the *result* of that proof through a transference argument. (A key technical issue is that weights have to be added to all the levels of the hypergraph – not just the vertices and top-order edges – in order to circumvent the failure of naive pseudorandomness.) As one has to modify the entire proof of the multidimensional Szemerédi theorem, rather than use that theorem as a black box, the Cook-Magyar-Titichetrakun argument is lengthier than ours; on the other hand, it is more general and does not rely on some difficult theorems about primes that are used in our paper.

In our approach, we continue to use the multidimensional Szemerédi theorem (or more precisely, the equivalent theorem of Furstenberg and Katznelson concerning multiple recurrence for commuting shifts) as a black box. The difference is that instead of using a transference principle to connect the relative multidimensional Szemerédi theorem we need to the multiple recurrence theorem, we instead proceed by a version of the Furstenberg correspondence principle, similar to the one that connects the absolute multidimensional Szemerédi theorem to the multiple recurrence theorem. I had discovered this approach many years ago in an unpublished note, but had abandoned it because it required an *infinite* number of linear forms conditions (in contrast to the transference technique, which only needed a finite number of linear forms conditions and (until the recent work of Conlon-Fox-Zhao) a correlation condition). The reason for this infinite number of conditions is that the correspondence principle has to build a probability measure on an entire -algebra; for this, it is not enough to specify the measure of a single set such as , but one also has to specify the measure of “cylinder sets” such as where could be arbitrarily large. The larger gets, the more linear forms conditions one needs to keep the correspondence under control.

With the sieve weights we were using at the time, standard sieve theory methods could indeed provide a finite number of linear forms conditions, but not an infinite number, so my idea was abandoned. However, with my later work with Green and Ziegler on linear equations in primes (and related work on the Mobius-nilsequences conjecture and the inverse conjecture on the Gowers norm), Tamar and I realised that the primes themselves obey an infinite number of linear forms conditions, so one can basically use the primes (or a proxy for the primes, such as the von Mangoldt function ) as the enveloping sieve weight, rather than a classical sieve. Thus my old idea of using the Furstenberg correspondence principle to transfer Szemerédi-type theorems to the primes could actually be realised. In the one-dimensional case, this simply produces a much more complicated proof of Theorem 1 than the existing one; but it turns out that the argument works as well in higher dimensions and yields Theorem 2 relatively painlessly, except for the fact that it needs the results on linear equations in primes, the known proofs of which are extremely lengthy (and also require some of the transference machinery mentioned earlier). The problem of correlations in rectangles is avoided in the correspondence principle approach because one can compensate for such correlations by performing a suitable weighted limit to compute the measure of cylinder sets, with each requiring a different weighted correction. (This may be related to the Cook-Magyar-Titichetrakun strategy of weighting all of the facets of the hypergraph in order to recover pseudorandomness, although our contexts are rather different.)

]]>and the more complicated “expensive” argument gave the improvement

for some constant depending only on .

Unfortunately, while the cheap argument is correct, we discovered a subtle but serious gap in our expensive argument in the original paper. Roughly speaking, the strategy in that argument is to employ the *density increment method*: one begins with a large subset of that has no arithmetic progressions of length , and seeks to locate a subspace on which has a significantly increased density. Then, by using a “Koopman-von Neumann theorem”, ultimately based on an iteration of the inverse theorem of Ben and myself (and also independently by Samorodnitsky), one approximates by a “quadratically structured” function , which is (locally) a combination of a bounded number of quadratic phase functions, which one can prepare to be in a certain “locally equidistributed” or “locally high rank” form. (It is this reduction to the high rank case that distinguishes the “expensive” argument from the “cheap” one.) Because has no progressions of length , the count of progressions of length weighted by will also be small; by combining this with the theory of equidistribution of quadratic phase functions, one can then conclude that there will be a subspace on which has increased density.

The error in the paper was to conclude from this that the original function also had increased density on the same subspace; it turns out that the manner in which approximates is not strong enough to deduce this latter conclusion from the former. (One can strengthen the nature of approximation until one restores such a conclusion, but only at the price of deteriorating the quantitative bounds on one gets at the end of the day to be worse than the cheap argument.)

After trying unsuccessfully to repair this error, we eventually found an alternate argument, based on earlier papers of ourselves and of Bergelson-Host-Kra, that avoided the density increment method entirely and ended up giving a simpler proof of a stronger result than (1), and also gives the explicit value of for the exponent in (1). In fact, it gives the following stronger result:

Theorem 1Let be a subset of of density at least , and let . Then there is a subspace of of codimension such that the number of (possibly degenerate) progressions in is at least .

The bound (1) is an easy consequence of this theorem after choosing and removing the degenerate progressions from the conclusion of the theorem.

The main new idea is to work with a *local* Koopman-von Neumann theorem rather than a global one, trading a relatively weak global approximation to with a significantly stronger local approximation to on a subspace . This is somewhat analogous to how sometimes in graph theory it is more efficient (from the point of view of quantative estimates) to work with a local version of the Szemerédi regularity lemma which gives just a single regular pair of cells, rather than attempting to regularise almost all of the cells. This local approach is well adapted to the inverse theorem we use (which also has this local aspect), and also makes the reduction to the high rank case much cleaner. At the end of the day, one ends up with a fairly large subspace on which is quite dense (of density ) and which can be well approximated by a “pure quadratic” object, namely a function of a small number of quadratic phases obeying a high rank condition. One can then exploit a special positivity property of the count of length four progressions weighted by pure quadratic objects, essentially due to Bergelson-Host-Kra, which then gives the required lower bound.

As I was on the Abel prize committee this year, I won’t comment further on the prize, but will instead focus on what is arguably Endre’s most well known result, namely Szemerédi’s theorem on arithmetic progressions:

Theorem 1 (Szemerédi’s theorem)Let be a set of integers of positive upper density, thus , where . Then contains an arithmetic progression of length for any .

Szemerédi’s original proof of this theorem is a remarkably intricate piece of combinatorial reasoning. Most proofs of theorems in mathematics – even long and difficult ones – generally come with a reasonably compact “high-level” overview, in which the proof is (conceptually, at least) broken down into simpler pieces. There may well be technical difficulties in formulating and then proving each of the component pieces, and then in fitting the pieces together, but usually the “big picture” is reasonably clear. To give just one example, the overall strategy of Perelman’s proof of the Poincaré conjecture can be briefly summarised as follows: to show that a simply connected three-dimensional manifold is homeomorphic to a sphere, place a Riemannian metric on it and perform Ricci flow, excising any singularities that arise by surgery, until the entire manifold becomes extinct. By reversing the flow and analysing the surgeries performed, obtain enough control on the topology of the original manifold to establish that it is a topological sphere.

In contrast, the pieces of Szemerédi’s proof are highly interlocking, particularly with regard to all the epsilon-type parameters involved; it takes quite a bit of notational setup and foundational lemmas before the key steps of the proof can even be stated, let alone proved. Szemerédi’s original paper contains a logical diagram of the proof (reproduced in Gowers’ recent talk) which already gives a fair indication of this interlocking structure. (Many years ago I tried to present the proof, but I was unable to find much of a simplification, and my exposition is probably not that much clearer than the original text.) Even the use of nonstandard analysis, which is often helpful in cleaning up armies of epsilons, turns out to be a bit tricky to apply here. (In typical applications of nonstandard analysis, one can get by with a single nonstandard universe, constructed as an ultrapower of the standard universe; but to correctly model all the epsilons occuring in Szemerédi’s argument, one needs to repeatedly perform the ultrapower construction to obtain a (finite) sequence of increasingly nonstandard (and increasingly saturated) universes, each one containing unbounded quantities that are far larger than any quantity that appears in the preceding universe, as discussed at the end of this previous blog post. This sequence of universes does end up concealing all the epsilons, but it is not so clear that this is a net gain in clarity for the proof; I may return to the nonstandard presentation of Szemeredi’s argument at some future juncture.)

Instead of trying to describe the entire argument here, I thought I would instead show some key components of it, with only the slightest hint as to how to assemble the components together to form the whole proof. In particular, I would like to show how two particular ingredients in the proof – namely van der Waerden’s theorem and the Szemerédi regularity lemma – become useful. For reasons that will hopefully become clearer later, it is convenient not only to work with ordinary progressions , but also progressions of progressions , progressions of progressions of progressions, and so forth. (In additive combinatorics, these objects are known as *generalised arithmetic progressions* of rank one, two, three, etc., and play a central role in the subject, although the way they are used in Szemerédi’s proof is somewhat different from the way that they are normally used in additive combinatorics.) Very roughly speaking, Szemerédi’s proof begins by building an enormous generalised arithmetic progression of high rank containing many elements of the set (arranged in a “near-maximal-density” configuration), and then steadily prunes this progression to improve the combinatorial properties of the configuration, until one ends up with a single rank one progression of length that consists entirely of elements of .

To illustrate some of the basic ideas, let us first consider a situation in which we have located a progression of progressions of length , with each progression , being quite long, and containing a near-maximal amount of elements of , thus

where is the “maximal density” of along arithmetic progressions. (There are a lot of subtleties in the argument about exactly how good the error terms are in various approximations, but we will ignore these issues for the sake of this discussion and just use the imprecise symbols such as instead.) By hypothesis, is positive. The objective is then to locate a progression in , with each in for . It may help to view the progression of progressions as a tall thin rectangle .

If we write for , then the problem is equivalent to finding a (possibly degenerate) arithmetic progression , with each in .

By hypothesis, we know already that each set has density about in :

Let us now make a “weakly mixing” assumption on the , which roughly speaking asserts that

for “most” subsets of of density of a certain form to be specified shortly. This is a plausible type of assumption if one believes to behave like a random set, and if the sets are constructed “independently” of the in some sense. Of course, we do not expect such an assumption to be valid all of the time, but we will postpone consideration of this point until later. Let us now see how this sort of weakly mixing hypothesis could help one count progressions of the desired form.

We will inductively consider the following (nonrigorously defined) sequence of claims for each :

- : For most choices of , there are arithmetic progressions in with the specified choice of , such that for all .

(Actually, to avoid boundary issues one should restrict to lie in the middle third of , rather than near the edges, but let us ignore this minor technical detail.) The quantity is natural here, given that there are arithmetic progressions in that pass through in the position, and that each one ought to have a probability of or so that the events simultaneously hold.) If one has the claim , then by selecting a typical in , we obtain a progression with for all , as required. (In fact, we obtain about such progressions by this method.)

We can heuristically justify the claims by induction on . For , the claims are clear just from direct counting of progressions (as long as we keep away from the edges of ). Now suppose that , and the claims have already been proven. For any and for most , we have from hypothesis that there are progressions in through with . Let be the set of all the values of attained by these progressions, then . Invoking the weak mixing hypothesis, we (heuristically, at least) conclude that for most choices of , we have

which then gives the desired claim .

The observant reader will note that we only needed the claim in the case for the above argument, but for technical reasons, the full proof requires one to work with more general values of (also the claim needs to be replaced by a more complicated version of itself, but let’s ignore this for sake of discussion).

We now return to the question of how to justify the weak mixing hypothesis (2). For a single block of , one can easily concoct a scenario in which this hypothesis fails, by choosing to overlap with too strongly, or to be too disjoint from . However, one can do better if one can select from a long progression of blocks. The starting point is the following simple double counting observation that gives the right upper bound:

Proposition 2 (Single upper bound)Let be a progression of progressions for some large . Suppose that for each , the set has density in (i.e. (1) holds). Let be a subset of of density . Then (if is large enough) one can find an such that

*Proof:* The key is the double counting identity

Because has maximal density and is large, we have

for each , and thus

The claim then follows from the pigeonhole principle.

Now suppose we want to obtain weak mixing not just for a single set , but for a small number of such sets, i.e. we wish to find an for which

for all , where is the density of in . The above proposition gives, for each , a choice of for which (3) holds, but it could be a different for each , and so it is not immediately obvious how to use Proposition 2 to find an for which (3) holds *simultaneously* for all . However, it turns out that the van der Waerden theorem is the perfect tool for this amplification:

Proposition 3 (Multiple upper bound)Let be a progression of progressions for some large . Suppose that for each , the set has density in (i.e. (1) holds). For each , let be a subset of of density . Then (if is large enough depending on ) one can find an such thatsimultaneously for all .

*Proof:* Suppose that the claim failed (for some suitably large ). Then, for each , there exists such that

This can be viewed as a colouring of the interval by colours. If we take large compared to , van der Waerden’s theorem allows us to then find a long subprogression of which is monochromatic, so that is constant on this progression. But then this will furnish a counterexample to Proposition 2.

One nice thing about this proposition is that the upper bounds can be automatically upgraded to an asymptotic:

Proposition 4 (Multiple mixing)Let be a progression of progressions for some large . Suppose that for each , the set has density in (i.e. (1) holds). For each , let be a subset of of density . Then (if is large enough depending on ) one can find an such thatsimultaneously for all .

*Proof:* By applying the previous proposition to the collection of sets and their complements (thus replacing with , one can find an for which

and

which gives the claim.

However, this improvement of Proposition 2 turns out to not be strong enough for applications. The reason is that the number of sets for which mixing is established is too small compared with the length of the progression one has to use in order to obtain that mixing. However, thanks to the magic of the Szemerédi regularity lemma, one can amplify the above proposition even further, to allow for a huge number of to be mixed (at the cost of excluding a small fraction of exceptions):

Proposition 5 (Really multiple mixing)Let be a progression of progressions for some large . Suppose that for each , the set has density in (i.e. (1) holds). For each in some (large) finite set , let be a subset of of density . Then (if is large enough, butnotdependent on the size of ) one can find an such thatsimultaneously for almost all .

*Proof:* We build a bipartite graph connecting the progression to the finite set by placing an edge between an element and an element whenever . The number can then be interpreted as the degree of in this graph, while the number is the number of neighbours of that land in .

We now apply the regularity lemma to this graph . Roughly speaking, what this lemma does is to partition and into almost equally sized cells and such that for most pairs of cells, the graph resembles a random bipartite graph of some density between these two cells. The key point is that the number of cells here is bounded uniformly in the size of and . As a consequence of this lemma, one can show that for most vertices in a typical cell , the number is approximately equal to

and the number is approximately equal to

The point here is that the different statistics are now controlled by a mere statistics (this is not unlike the use of principal component analysis in statistics, incidentally, but that is another story). Now, we invoke Proposition 4 to find an for which

simultaneously for all , and the claim follows.

This proposition now suggests a way forward to establish the type of mixing properties (2) needed for the preceding attempt at proving Szemerédi’s theorem to actually work. Whereas in that attempt, we were working with a single progression of progressions of progressions containing a near-maximal density of elements of , we will now have to work with a *family* of such progression of progressions, where ranges over some suitably large parameter set. Furthermore, in order to invoke Proposition 5, this family must be “well-arranged” in some arithmetic sense; in particular, for a given , it should be possible to find many reasonably large subfamilies of this family for which the terms of the progression of progressions in this subfamily are themselves in arithmetic progression. (Also, for technical reasons having to do with the fact that the sets in Proposition 5 are not allowed to depend on , one also needs the progressions for any given to be “similar” in the sense that they intersect in the same fashion (thus the sets as varies need to be translates of each other).) If one has this sort of family, then Proposition 5 allows us to “spend” some of the degrees of freedom of the parameter set in order to gain good mixing properties for at least one of the sets in the progression of progressions.

Of course, we still have to figure out how to get such large families of well-arranged progressions of progressions. Szemerédi’s solution was to begin by working with generalised progressions of a much larger rank than the rank progressions considered here; roughly speaking, to prove Szemerédi’s theorem for length progressions, one has to consider generalised progressions of rank as high as . It is possible by a reasonably straightforward (though somewhat delicate) “density increment argument” to locate a huge generalised progression of this rank which is “saturated” by in a certain rather technical sense (related to the concept of “near maximal density” used previously). Then, by another reasonably elementary argument, it is possible to locate inside a suitable large generalised progression of some rank , a family of large generalised progressions of rank which inherit many of the good properties of the original generalised progression, and which have the arithmetic structure needed for Proposition 5 to be applicable, at least for one value of . (But getting this sort of property for *all* values of simultaneously is tricky, and requires many careful iterations of the above scheme; there is also the problem that by obtaining good behaviour for one index , one may lose good behaviour at previous indices, leading to a sort of “Tower of Hanoi” situation which may help explain the exponential factor in the rank that is ultimately needed. It is an extremely delicate argument; all the parameters and definitions have to be set very precisely in order for the argument to work at all, and it is really quite remarkable that Endre was able to see it through to the end.)

By **Jacob Aron**, New Scientist

Imagine I present you with a line of cards labelled *1* through to *n*, where *n*is some incredibly large number. I ask you to remove a certain number of cards – which ones you choose is up to you, inevitably leaving ugly random gaps in my carefully ordered sequence. It might seem as if all order must now be lost, but in fact no matter which cards you pick, I can always identify a surprisingly ordered pattern in the numbers that remain.

As a magic trick it might not equal sawing a woman in half, but mathematically proving that it is always possible to find a pattern in such a scenario is one of the feats that today garnered Endre Szemerédi mathematics’ prestigious Abel prize.

The Norwegian Academy of Science and Letters in Oslo awarded Szemerédi the one million dollar prize today for “fundamental contributions to discrete mathematics and theoretical computer science”. His specialty was combinatorics, a field that deals with the different ways of counting and rearranging discrete objects, whether they be numbers or playing cards.

The trick described above is a direct result of what is known as Szemerédi’s theorem, a piece of mathematics that answered a question first posed by the mathematicians Paul Erdos and Pál Turán in 1936 and that had remained unsolved for nearly 40 years.

**Irregular mind**

The theorem reveals how patterns can be found in large sets of consecutive numbers with many of their members missing. The patterns in question are arithmetic sequences – strings of numbers with a common difference such as 3, 7, 11, 15, 19.

Such problems are often fairly easy for mathematicians to pose, but fiendishly difficulty to solve. The book An Irregular Mind, published in honour of Szemerédi’s 70th birthday in 2010, stated that “his brain is wired differently than for most mathematicians”.

“He’s more likely than most to come up with an idea from left field,” agrees mathematician Timothy Gowers of the University of Cambridge, who gave a presentation in Oslo on Szemerédi’s work following the prize announcement.

Szemerédi actually came late to mathematics, initially studying at medical school for a year and then working in a factory before switching to become a mathematician. His talent was discovered by Erdos, who was famous for working with hundreds of mathematicians in his lifetime.

**Modest winner**

When Szemerédi proved his theorem in 1975 he also provided mathematicians with a tool known as the Szemerédi regularity lemma, which gives a deeper understanding of large graphs – mathematical objects often used to model networked structures such as the internet.

The lemma has also helped computer scientists better understand a technique in artificial intelligence known as “probably approximately correct learning”. Szemerédi also worked on another important computing problem related to sorting lists, demonstrating a theoretical limit for sorting using parallel processors, which are found in modern computers.

Speaking on the phone to Gowers after receiving his award, Szemerédi said he was “very happy” but suggested that there were other mathematicians more deserving than himself. Gowers told our sister site*New Scientist* that Szemerédi was “very modest”, adding that “he is a worthy winner and a lot of people think this sort of recognition is long overdue in his case”.

@ Electronicsweekly

It was announced yesterday that Endre Szemerédi is the winner of the 2012 Abel Prize. As we mentioned a few years ago, the Abel Prize is a a fairly new award in math. Unlike the Fields Medal (which famously is for people under 40), the Abel Prize is meant to recognize long, illustrious careers in mathematics. It has quickly become one of the most prestigious awards in math.

It was awarded for:

for his fundamental contributions to discrete mathematics and theoretical computer science, and in recognition of the profound and lasting impact of these contributions on additive number theory and ergodic theory.

— Abel Prize Citation

Fellow math blogger, Tim Gowers, was in charge of giving a talk for non-mathematicians (i.e. journalists and such) about Dr. Szemerédi’s research. A tough challenge which Dr. Gowers adroitly pulls off. You can read the text on his blog here.

Dr. Szemerédi’s area of research is combinatorics. This is an area (like number theory) which is famous for having many easy to state but extremely difficult to answer questions. We wanted to mention two topics in one of Dr. Szemerédi’s areas of research: extremal combinatorics.

Very roughly, extremal combinatorics is the study of how when structures get very large, order becomes unavoidable. What do we mean? Well, our first example is Ramsey Theory.

First, recall that a graph in math is a collection of vertices (or nodes) which are connected by edges*. For example, the graph with 5 vertices and with edges between every pair of edges is called the complete graph on 5 vertices. It looks like this:

Now imagine you color the edges of the graph with two colors (lets say crimson and cream :-) ). The question is: Is it possible to color the edges with two colors in a way to **avoid** ending up with a triangle which is all one color?**

It’s not too hard to sit down with the complete graph on 5 vertices and create a coloring with crimson and cream which has no triangles with all three edges of the same color. Surprisingly, if you color the complete graph on 6 vertices:

then a monochromatic triangle is **unavoidable**!

You should try coloring the graph yourself and see if you can avoid a monochromatic triangle. But here’s the proof: Let’s look at the vertex on the far left of the picture of the complete graph on 6 vertices drawn above. There are 5 edges which leave that vertex. Since there are only two colors, one of the colors must be used 3 or more times. Let’s say crimson was used 3+ times. Now let’s look at the edges between those 3+ vertices. If any one of them is crimson, then that makes a crimson triangle with our original vertex. If they are all cream, then those 3 vertices form the corners of a cream triangle! A monochromatic triangle is unavoidable!

Ramsey’s theorem is the following amazing generalization of this result: If you use k colors and are interested in looking for a monochromatic complete graph on m vertices, then if you pick n large enough, then the complete graph on n vertices will **always** have the monochromatic graph you’re looking for.

In our example above, we used k=2 colors and are looking for a complete graph on m=3 vertices (aka a triangle). What we proved is that if n=6, then you always have the monochromatic triangle we’re looking for (notice that any n>6 will also work!). Ramsey’s theorem greatly generalizes this result to more colors and larger monochromatic subgraphs.

We also need to mention Szemerédi’s theorem. It is in the same spirit as Ramsey’s Theorem. We are now looking for arithmetic progressions in the integers. Remember, an arithmetic progression is a sequence of numbers where you go from one to the next by adding some fixed constant. So, for example, 2,7,12,17 is an arithmetic progression of 4 numbers with a step size of 5.

More generally, say you want to find an arithmetic progression of 4 numbers but you don’t care about the step size. Let’s pick a very small percentage, say 0.000000001%. Then Szemerédi’s theorem says there is a N so that whenever you pick 0.000000001% of the numbers 1,2,3,…,N, then you will always be able to find an arithmetic sequence of 4 numbers!

Once again, in a large enough mathematical object, patterns are unavoidable!

Here’s what Szemerédi’s theorem says: Say you are looking for an arithmetic progression of k numbers (with any step size). Pick a percentage, P%. Now, no matter how small your percentage is, there is a number N so that any subset of 1,2,3, …, N which has more than P% of the N possible elements, **must** contain an arithmetic progression of k numbers.

Of course Szemerédi’s theorem doesn’t promise that N will be small. In fact, you can imagine that it actually needs to be very, very, very, very big. If we write for the N for k and , then there is a (very big) constant C such that:

See wikipedia’s article for further details.

Besides being amazing in its own right, Szemerédi’s theorem launched a huge amount of new mathematics. Perhaps most famously, the Green-Tao Theorem builds on Szemerédi’s theorem. It proves that the set of prime numbers contains arbitrarily long arithmetic progressions.

* You can imagine that graph theory is super useful for studying networks. For example, the vertices could be the computers and routers at OU and the edges could be the cables connecting them.

** This is sometimes called the Party Problem. If the vertices are individuals and you color an edge between them crimson if they are friends and cream if they are not friends, then we’ve proven that if you invite 6 people to a party, there will be three people who are all friends or all strangers.

]]>In this first of two posts, we prove Szemerédi’s regularity lemma. The second post will give some applications of this lemma: the triangle removal lemma and Roth’s theorem. Some of the content has intersection with the Ergodic Ramsey Theory posts, whose interested reader may check here: ERT0, ERT1, ERT2, ERT3, ERT4, ERT5, ERT6, ERT7, ERT8, ERT9, ERT10, ERT11, ERT12, ERT13, ERT14, ERT15, ERT16.

**1. Additive combinatorics **

“*Additive combinatorics is the theory of*

* counting additive structures in sets.*”

T. Tao and V. Vu.

This theory has seen exciting developments and dramatic changes in direction in recent years, thanks to its connections with areas such as number theory, ergodic theory and graph theory. This section gives a brief historic introduction on the main results.

Van der Waerden’s theorem (see ERT6 for a topological dynamical proof), one of Kintchine’s *Three Pearls of Number Theory*, states that whenever the natural numbers are finitely partitioned (or, as it is customary to say, finitely colored), one of the cells of the partition contains arbitrarily long arithmetic progressions. In other words, the structure of the natural numbers can not be destroyed by partitions: arbitrarily large parts of persist inside some component of the partition. This result was first proved in and represents the first great result on additive combinatorics. Afterwards, in the mid-thirties, Erdös and Turán conjectured a density version of van der Waerden’s theorem. To present it, let us define what is the notion of density in the natural numbers.

Definition 1Given a set , theupper densityof is

If the limit exists, we say that has density, and denote it by . As pointed out by Erdös and Turán, having positive upper density is a notion of largeness and it is natural to ask if sets with this property have arbitrarily long arithmetic progressions. This quite recalcitrant question was only settled in by Szemerédi in the paper *On sets of integers containing no elements in arithmetic progressions* . Meanwhile, the first partial result was obtained by Roth in .

Theorem 2 (Roth)If has positive upper density, then it contains an arithmetic progression of length .

His proof relied on a Fourier-analytic argument of energy increment for functions: one decomposes a function as , where is good and is bad in a specific sense (this follows the same philosophy of Calderón-Zygmund’s theory on harmonic analysis). If the effect of is large, it is possible to break it into good and bad parts again and so on. In each step, the “energy” of increases a fixed amount. Being bounded, it must stop after a finite number of steps. At the end, controls the behavior of and for it the result is straightforward. See *The remarkable effectiveness of ergodic theory in number theory* for further details.

Sixteen years later, in the paper *On sets of integers containing no four elements in arithmetic progression*, Szemerédi extended Roth’s theorem to

Theorem 3 (Szemerédi)If has positive upper density, then it contains an arithmetic progression of length .

Finally, in , Szemerédi settled the conjecture in its full generality.

Theorem 4 (Szemerédi)If has positive upper density, then it contains arbitrarily long arithmetic progression.

His proof required a complicated combinatorial argument and relied on a graph-theoretical result, known as **Szemerédi’s regularity lemma**, which turned out to be an important result in graph theory. It asserts, roughly speaking, that any graph can be decomposed into a relatively small number of disjoint subgraphs, most of which behave pseudo-randomly. This is the main topic of this post.

It is worth to mention Erdös also conjectured that if satisfies

then it contains arbitrarily long arithmetic progressions. This question is wide open: nobody knows even if contains arithmetic progressions of length . On the other hand, a remarkable result of Green and Tao states the conjecture for the particular case of the prime numbers.

Theorem 5 (Green and Tao)The prime numbers contain arbitrarily long arithmetic progressions.

**2. Setting notation **

is a graph, where is a finite set of *vertices* and is the set of *edges*, each of them joining two distinct elements of . For disjoint , is the number of edges between and and

is the *density *of the pair .

Definition 6For and disjoint subsets , the pair is-regularif, for every and satisfying

we have

A partition of into pairwise disjoint sets in which is called the *exceptional set* is an *equipartition* if . We view the exceptional set as distinct parts, each consisting of a single vertex, and its role is purely technical: to make all other classes have exactly the same cardinality.

Definition 7An equipartition is-regularif

- ,
- all but at most of the pairs are -regular.

The classes are called *clusters* or *groups*. Given two partitions of , we say *refines* if every cluster of is equal to the union of some clusters of .

**3. Szemerédi’s regularity lemma **

Szemerédi’s regularity lemma says that every graph with many vertices can be partitioned into a small number of clusters with the same cardinality, most of the pairs being -regular, and a few leftover edges. In my point of view, this result allows the decomposition of every graph with a sufficiently large number of vertices into many components uniformly (every component has the same number of vertices) in such a way the relation of the clusters is at the same time

**uniform:** the densities do not vary too much, and

**randomic:** even controlling the density, nothing can be said about the distribution of the edges.

As a toy model, let and consider the complete random graph with vertices in which every edge belongs to with probability . If are disjoint subsets of , the expected value of is , and the same happens for subsets , . Szemerédi’s regularity lemma says that, approximately, this is indeed the universal behavior.

Theorem 8 (Szemerédi’s regularity lemma)For every and every integer , there exist integers and for which every graph with at least vertices has an -regular equipartition , where .

Note the importance of having an upper bound for the number of clusters. Otherwise, we could just take each of them to be a singleton.

The idea in the proof is similar to Roth’s approach. Start with an arbitrary partition of into disjoint classes of equal sizes. Proceed by showing that, as long as the partition is not -regular, it can be refined in a way to distribute the density deviation. This is done by introducing a bounded *energy function* that increases a fixed amount every time the refinement is made. After a finite number of steps, the resulting partition is -regular.

We now discuss what should be the energy function. The natural way of looking for it is to identify the obstruction for a pair to be -regular. This means there are subsets and such that , and

Consider the partitions and . The above inequality has the following probabilistic interpretation. Consider the random variable defined on the product by: let be a uniformly random element of and a uniformly random element of , let and be those members of the respective partitions for which and , and take

The expectation of is equal to

By assumption, deviates from at least whenever , and this event has probability

Then . Noting that the expectation of is

we conclude that

The fractions containing above represent the energy function we are looking for: given two disjoint subsets , define

For partitions , let

Definition 9Given a partition of with exceptional set , theindexof is

where the sum ranges over all unordered pairs of distinct parts of , with each vertex of forming a singleton part in its own.

Note that is a sum of terms of the form . The first good property it must have is boundedness.

**Property 1.** .

In fact, as ,

It is also monotone increasing with respect to refinements. This is the content of the next two properties.

**Property 2.** If are subsets of and are partitions of , respectively, then

This property follows easily from Cauchy-Schwarz inequality (the interested reader may check it in the survey *Szemerédi’s regularity lemma and its applications in graph theory*), but this analytical argument is not so clear. A soft way of proving it is to consider the probabilistic point of view, with the aid of the random variable . According to the above calculations,

and so, by Jensen’s inequality (which in this case is just Cauchy-Schwarz inequality),

**Property 3.** If refines , then

This is a direct consequence of Property 2 by breaking according to :

The next property grows the index of non -regular partitions and reflects the right choice of the energy function. In a few words, it says that

**“The lack of uniformity implies energy increment”**

and this idea permeates many results in recent developments in combinatorics, harmonic analysis, ergodic theory and others areas. Actually, all known proofs of Szemerédi’s theorem use this principle at some stage. To mention some of them:

- the original proof of Roth considers good and bad parts of functions.
- Furstenberg’s approach: every non-compact system has a weak mixing factor.
- the Fourier-analytic proof of Gowers identifies arithmetic progressions via the nowadays called
*Gowers norms*. - the construction of characteristic factors for multiple ergodic averages uses the
*Gowers-Host-Kra seminorms*.

These two last results are still being developed to generate what is being called *higher-order Fourier analysis*. See this post of Terence Tao for a discussion about this topic. Going back to what matters, let’s prove the

Proposition 10 (Lack of uniformity implies energy increment 1)Suppose and are disjoint nonempty subsets of and the pair is not -regular. Then there are partitions of and of such that

*Proof:* The reader must convince himself that this is exactly relation (3). For those still not convinced, let’s do it again. Assume and are such that , and

Consider and . The evaluation of the variation will prove the proposition. On one hand, by the calculations in Property 2,

On the other, deviates from at least whenever , and this event has probability

Then which, together with (1), gives that

Proposition 11 (Lack of uniformity implies energy increment 2)Suppose and let be a non -regular equipartition of , where is the exceptional set. Then there exists a refinement of with the following properties:

- is an equipartition of ,
- ,
- and
- .

*Proof:* The idea is to apply the previous proposition to every non-regular pair. As there are at least of them, the index will increase the fixed amount. Let be the cardinality of every , . Saying that is not -regular means that, for at least pairs , , is not -regular. For each of these, let , be the partitions of , respectively, given by Proposition 10 and consider the smallest partition that refines and all , . By Proposition 10,

as . This proves that (and any of its refinements) satisfies (iv). The problem is that is not necessarily an equipartition. We adjust this by defining , splitting every part of arbitrarily into disjoint sets of size and throwing the remaining vertices of each part, if any, to the exceptional set. This new partition satisfies (i), (ii) and (iii), as we’ll verify below.

(i) is an equipartition by definition.

(ii) To get , every cluster of is divided in at most parts. After, every element of is divided in at most non-exceptional parts. This implies that

(iii) Each cluster of contributes with at most vertices to and so

Finally, we are able to prove the regularity lemma.

*Proof:* First, note that if the result is true for and , , then the result is also true for the pair . This allows us to assume that and is arbitrarily large.

Begin with an arbitrary partition of such that and . Apply Proposition 11 at most times to obtain an equipartition . Let be the largest number obtained by iterating the map at most times, starting from . Then has at most clusters. In addition, the cardinality of its exceptional set is bounded by

which is smaller than if is large. This concludes the proof.

There is a large literature about Szemerédi’s regularity lemma. We refer the reader to four references: my lecture notes available at my homepage, the book *The probabilistic method* of Alon and Spencer, the survey of Komlós and M. Simonovits and Tao’s perspective via random partitions. Merry Christmas!!

Theorem 1 (Furstenberg multiple recurrence)Let be a measure-preserving system, thus is a probability space and is a measure-preserving bijection such that and are both measurable. Let be a measurable subset of of positive measure . Then for any , there exists such thatEquivalently, there exists and such that

As is well known, the Furstenberg multiple recurrence theorem is equivalent to Szemerédi’s theorem, thanks to the Furstenberg correspondence principle; see for instance these lecture notes of mine.

The multiple recurrence theorem is proven, roughly speaking, by an induction on the “complexity” of the system . Indeed, for very simple systems, such as periodic systems (in which is the identity for some , which is for instance the case for the circle shift , with a rational shift ), the theorem is trivial; at a slightly more advanced level, *almost periodic* (or *compact*) systems (in which is a precompact subset of for every , which is for instance the case for irrational circle shifts), is also quite easy. One then shows that the multiple recurrence property is preserved under various *extension* operations (specifically, compact extensions, weakly mixing extensions, and limits of chains of extensions), which then gives the multiple recurrence theorem as a consequence of the *Furstenberg-Zimmer structure theorem* for measure-preserving systems. See these lecture notes for further discussion.

From a high-level perspective, this is still one of the most conceptual proofs known of Szemerédi’s theorem. However, the individual components of the proof are still somewhat intricate. Perhaps the most difficult step is the demonstration that the multiple recurrence property is preserved under *compact extensions*; see for instance these lecture notes, which is devoted entirely to this step. This step requires quite a bit of measure-theoretic and/or functional analytic machinery, such as the theory of disintegrations, relatively almost periodic functions, or Hilbert modules.

However, I recently realised that there is a special case of the compact extension step – namely that of *finite* extensions – which avoids almost all of these technical issues while still capturing the essence of the argument (and in particular, the key idea of using van der Waerden’s theorem). As such, this may serve as a pedagogical device for motivating this step of the proof of the multiple recurrence theorem.

Let us first explain what a finite extension is. Given a measure-preserving system , a finite set , and a measurable map from to the permutation group of , one can form the *finite extension*

which as a probability space is the product of with the finite probability space (with the discrete -algebra and uniform probability measure), and with shift map

One easily verifies that this is indeed a measure-preserving system. We refer to as the *cocycle* of the system.

An example of finite extensions comes from group theory. Suppose we have a short exact sequence

of finite groups. Let be a group element of , and let be its projection in . Then the shift map on (with the discrete -algebra and uniform probability measure) can be viewed as a finite extension of the shift map on (again with the discrete -algebra and uniform probability measure), by arbitrarily selecting a section that inverts the projection map, identifying with by identifying with for , and using the cocycle

Thus, for instance, the unit shift on can be thought of as a finite extension of the unit shift on whenever is a multiple of .

Another example comes from Riemannian geometry. If is a Riemannian manifold that is a finite cover of another Riemannian manifold (with the metric on being the pullback of that on ), then (unit time) geodesic flow on the cosphere bundle of is a finite extension of the corresponding flow on .

Here, then, is the finite extension special case of the compact extension step in the proof of the multiple recurrence theorem:

Proposition 2 (Finite extensions)Let be a finite extension of a measure-preserving system . If obeys the conclusion of the Furstenberg multiple recurrence theorem, then so does .

Before we prove this proposition, let us first give the combinatorial analogue.

Lemma 3Let be a subset of the integers that contains arbitrarily long arithmetic progressions, and let be a colouring of by colours (or equivalently, a partition of into colour classes ). Then at least one of the contains arbitrarily long arithmetic progressions.

*Proof:* By the infinite pigeonhole principle, it suffices to show that for each , one of the colour classes contains an arithmetic progression of length .

Let be a large integer (depending on and ) to be chosen later. Then contains an arithmetic progression of length , which may be identified with . The colouring of then induces a colouring on into colour classes. Applying (the finitary form of) van der Waerden’s theorem, we conclude that if is sufficiently large depending on and , then one of these colouring classes contains an arithmetic progression of length ; undoing the identification, we conclude that one of the contains an arithmetic progression of length , as desired.

Of course, by specialising to the case , we see that the above Lemma is in fact equivalent to van der Waerden’s theorem.

Now we prove Proposition 2.

*Proof:* Fix . Let be a positive measure subset of . By Fubini’s theorem, we have

where and is the fibre of at . Since is positive, we conclude that the set

is a positive measure subset of . Note for each , we can find an element such that . While not strictly necessary for this argument, one can ensure if one wishes that the function is measurable by totally ordering , and then letting the minimal element of for which .

Let be a large integer (which will depend on and the cardinality of ) to be chosen later. Because obeys the multiple recurrence theorem, we can find a positive integer and such that

Now consider the sequence of points

for . From (1), we see that

for some sequence . This can be viewed as a colouring of by colours, where is the cardinality of . Applying van der Waerden’s theorem, we conclude (if is sufficiently large depending on and ) that there is an arithmetic progression in with such that

for some . If we then let , we see from (2) that

for all , and the claim follows.

Remark 1The precise connection between Lemma 3 and Proposition 2 arises from the following observation: with as in the proof of Proposition 2, and , the setcan be partitioned into the classes

where is the graph of . The multiple recurrence property for ensures that contains arbitrarily long arithmetic progressions, and so therefore one of the must also, which gives the multiple recurrence property for .

]]>

Remark 2Compact extensions can be viewed as a generalisation of finite extensions, in which the fibres are no longer finite sets, but are themselves measure spaces obeying an additional property, which roughly speaking asserts that for many functions on the extension, the shifts of behave in an almost periodic fashion on most fibres, so that the orbits become totally bounded on each fibre. This total boundedness allows one to obtain an analogue of the above colouring map to which van der Waerden’s theorem can be applied.

** — 1. Introduction: notions of hypercyclicity — **

First of all, I will review some basic notions from linear dynamics that will be quite central throughout the exposition. I refer the reader to the excellent book of Bayart and Matheron (Bayart and Matheron, 2009) where most of this material is drawn from anyways. We will state several classical results here omitting the proof. If no other reference is given, this means the proof can be found in (Bayart and Matheron, 2009).

** — 1.1. Hypercyclic operators — **

We will work on *a separable Banach space* over or . We will always use the symbol to denote a *bounded linear operator* acting on . In what follows I will just write , , without any further comment, assuming always that these symbols have the meaning described above.

The most central notion in linear dynamics is that of hypercyclicity.

Definition 1Theorbitof a vector under (or the -orbit) is the setThe operator T is said to be

hypercyclicif there is some vector such that the set is dense in . Such a vector will be called ahypercyclic vector for(or a -hypercyclic vector).

Some remarks are in order. First of all let us point out that these definitions only make sense if the space is *separable*. On the other hand, hypercyclicity is an infinite dimensional phenomenon; there are no hypercyclic operators on a finite-dimensional space To see this quickly think of a square matrix in its Jordan normal form.

An easy consequence of these definitions is that whenever an operator is hypercyclic, we must have . Moreover, whenever is an invertible operator, is hypercyclic if and only if is hypercyclic. These facts will be used in the discussion below .

The definition of hypercyclicity does not require any linear structure. It makes sense for an arbitrary *continuous* map acting on a topological space .

The most general setup *linear dynamics* is that of an arbitrary separable topological vector space . We will stick however to the case of a Banach space to simplify the exposition, the generalizations being mostly of a technical nature.

The notion of hypercyclicity is strictly stronger (though relevant) than that of *cyclicity*. Recall from classical operator theory that an operator is called *cyclic* if there exists a vector (a *cyclic vector for *) such that the linear span of

is dense in . This notion is related to the *invariant subspace problem*; the operator lacks (non-trivial) invariant closed subspaces if and only if every non-zero vector is cyclic for .

Likewise, the notion of hypercyclicity is closely related to the *invariant subset problem*. It is an easy observation that an operator lacks non-trivial invariant subsets if and only if every non-zero vector is hypercyclic for . P. Enflo first answered the question in the negative for a constructing a rather peculiar Banach space. After that, C.J. Read has proved that there is an operator on for which every non-zero vector is hypercyclic. So the invariant subspace problem has a negative solution on . However the problem remains open in the case of Hilbert spaces.

** — 1.2. Universal sequences of operators — **

We will be interested in the following generalization of hypercyclicity to *families* of continuous linear operators , where each and are two topological spaces.

Definition 2The family is calleduniversalif there exists a such that the set is dense in .

Of course hypercyclicity is a special case of universality, where the family of operators is defined as the *iterates* of a fixed operator and is a topological vector space.

** — 1.3. Cesàro Hypercyclicity — **

In (León-Saavedra, 2002), F. León-Saavedra introduced the notion of *Cesàro hypercyclicity*.

Definition 3An operator is calledCesàro hypercyclicif itsCesàro orbit, that is the setis dense in . Such a vector will be called

Cesàro hypercyclicfor .

Saavedra showed in (León-Saavedra, 2002) that is Cesàro hypercyclic if and only if there is a vector such that the set

is dense in . Observe that this means that the family of operators is universal. We stress here that, in general, the notions of hypercyclicity and Cesàro hypercyclicity are not `ordered’; hypercyclicity does not imply Cesàro hypercyclicity and vice versa.

** — 1.4. How to prove that an operator is hypercyclic — **

This first characterization of hypercyclicity comes from topological dynamics and is often referred to as `Birkhoff’s transitivity theorem’.

Theorem 4 (Brkhoff’s transitivity theorem)Let be a continuous linear operator on a separable Banach space . Then is hypercyclic if and only if it istopologically transitive; that is, for every pair of open sets , there exists such that .

A byproduct of the proof of Theorem 4 is that the set of -hypercyclic vectors, , is a dense subset of .

Actually Birkhoff’s theorem is true in a much more general context but I won’t pursue that here. It is important however that no linearity is necessary in Theorem 4. As a result, when one adds linearity, the following handy criterion becomes available.

Definition 5 (Hypercyclicity criterion)Let be a separable Banach space and a bounded linear operator. We say that satisfies thehypercyclicity criterionif there exists an increasing sequence of positive integers , two dense sets and a sequence of maps such that:(i) for any ,

(ii) for any ,

(iii) for any .

Using Theorem 4 one can prove the following:

Theorem 6Let be a continuous linear operator on a separable Banach space . Suppose that satisfies the hypercyclicity criterion 5. Then is hypercyclic.

Definition 5 and Theorem 6 are originally due to Kitai (Kitai, 1982), in the case that and . The criterion was then evolved by R.Gethner and J. H. Shapiro in (Gethner and Shapiro, 1987) and J. Bès (Bès, 1998).

It was a long-standing question whether *every* hypercyclic operator satisfies the hypercyclicity criterion. This problem was recently resolved in the negative by M. De La Rosa and C.J. Read. It is not hard to show (and it was known) that the hypercyclicity criterion is equivalent to the operator being hypercyclic. In topological dynamics this property is referred to as being *weakly mixing*. This problem was recently resolved in the negative in (de la Rosa and Read, 2009) and later in (Bayart and Matheron, 2007) for all classical Banach spaces.

A consequence of the hypercyclicity criterion 5 and Theorem 6 is the following result, which highlights the connection between linear dynamics and spectral theory. Roughly speaking, the following *Godefroy-Shapiro criterion* states that an operator which has a `large supply’ of eigenvectors is hypercyclic. See (Godefroy and Shapiro, 1991).

Theorem 7 (Godefroy-Shapiro criterion)Let be a continuous linear operator on a separable Banach space . Suppose that and both span a dense subspace of . Then is hypercyclic.

** — 1.5. Examples of hypercyclic operators — **

We will now use the previous hypercyclicity criteria to show that some very natural operators are hypercyclic. We will also take the chance to define some classes of operators which I want to discuss later on, in relevance to our main theorem.

Example 1Let denote the space of all entire functions on endowed with the topology of uniform convergence on compact sets. Now is not a Banach space but it is a separable Frèchet space so all the notions and theorems discussed above go through. We consider thederivative operator. To see this, apply the hypercyclicity criterion with andNow the operator in the hypercyclicity criterion needs to be defined as a sort of (asymptotic) right inverse of the derivative operator so it is natural to define and . Then we have that as for every monomial so that takes care of

(i)in the hypercyclicity criterion. Condition(iii)is trivial to verify since on . Finally, in order to check the validity of condition(ii)in the hypercyclicity criterion we need to see that as for every positive integer . However, we readily see thatfrom which we easily conclude that uniformly on compact subsets of .

Example 2Let us now consider the Hilbert space . Thebackward shift operatoris defined by . Observe that this operator can never be hypercyclic since so the orbit of any vector under stays inside the unit ball. However, the operator is hypercyclic for every with . Again it is an easy exercise to check the validity of the hypercyclicity criterion with and , where is the space of all finitely supported sequences. Again where is the natural candidate, the right inverse of which in this case is theforward shiftoperator defined as .

Our last example one the one hand illustrates the Godefroy-Shapiro criterion and on the other hand gives an introduction to a class of operators I would like to consider later on in the discussion.

Example 3Here we consider a Hilbert space of analytic functions , where is the open unit disk of the complex plane. The space is pretty general but we require the following two conditions:

- , and
- for every , the point evaluation functionals are bounded.
The second condition assures that convergence in implies pointwise convergence on . By the boundedness of holomorphic functions on compact sets and the uniform boundedness principle the second condition amounts to requiring that convergence in implies uniform convergence on compact subsets of . The reader is thus encouraged to think of the Hardy space or the Bergman space in the place of , keeping in mind however that interesting phenomena occur outside these two particular cases.

A feature of that we will use is the existence of a

reproducing kernel. In particular, For each , the boundedness of the point evaluation functionals and the Riesz representation theorem provide a unique function , thereproducing kernelof at , such thatRecall that a function is called a

multiplierof if for every . Such a defines amultiplication operatorin terms of the formulaBy the boundedness of point evaluation functionals and the closed graph theorem it follows that is a bounded linear operator on . Moreover, every multiplier is a bounded holomorphic function, this is,

Observe that for every and every we have that

Remembering that there is at least one which is not identically we conclude that . Thus every multiplier is a bounded holomorphic function with . The opposite is not always true under our assumptions as can be seen by considering for example the Dirichlet space of holomorphic functions on , that is the space of all functions such that

Here denotes area measure. In the Dirichlet space not every bounded holomorphic function is a multiplier.

In general it is not difficult to see that a multiplication operator is

neverhypercyclic. The situation is quite different for theadjoints of multiplication operators. In order to make the statement of the following theorem more clear we require the extra assumption thateveryholomorphic function is a multiplier of such that . This extra assumption is automatically satisfied in the case of the Hardy space or the Bergman space but not in the Dirichlet space. The following theorem is from (Godefroy and Shapiro, 1991).

Theorem 8 (Godefroy, Shapiro)Assume that is a Hilbert space of holomorphic functions as above. Furthermore assume that every bounded holomorphic function is a multiplier of such that . Then the adjoint multiplication operator is hypercyclic if and only if is non-constant and .

Proof:We first prove that if then is hypercyclic. For we consider the reproducing kernel . Sincefor every , we conclude that for every . That is, for every , is an eigenvector of with corresponding eigenvalue . Now let and . Since is non-constant and we have that both are non-empty open sets (by the open-mapping theorem for analytic functions is an open set). By the Godefroy Shapiro criterion, in order to show that is hypercyclic it suffices to show that and both span a dense subset of . Indeed, assume that there exists a function which is orthogonal to all either for all or for all . In either case vanishes on a non-empty open set and thus is identically zero.

In order to prove the other direction first observe that whenever is hypercyclic, is non-constant. Moreover we have that is connected so it either lies entirely inside, or entirely outside the unit disk. In the first case we have that , thus cannot be hypercyclic. In the complementary case, the function is a bounded holomorphic function and . By the first case, is not hypercyclic, and since , neither is .

Example 4We finish this short list of examples by giving another typical class of hypercyclic operators, namely unilateral and bilateral weighted shifts. Let be the Hilbert space of square summable sequences . Consider the canonical basis of and let be a (bounded) sequence of positive numbers. The operator is aunilateral (backward) weighted shiftwith weight sequence if for every and .Let be the Hilbert space of square summable sequences endowed with the usual norm. That is, if . Let be a (bounded) sequence of positive numbers. The operator is a

bilateral (backward) weighted shiftwith weight sequence if for every . Here is the canonical basis of .

Theorem 9Let be defined as above, with weight sequences respectively.(i) is hypercyclic if and only if

(ii) is hypercyclic if and only if, for any

and

** — 2. Recurrence, multiple recurrence and hypercyclicity — **

Let us consider a bounded linear operator on a separable Banach space . We have already seen that saying that an operator is *hypercyclic* is equivalent to saying that an operator is topologically transitive, that is that for every pair of open sets , there is some positive integer such that . In what follows I will introduce some notions that come from topological dynamical systems.

** — 2.1. Recurrence and Multiple recurrence — **

A somewhat weaker notion in topological dynamics is that of *recurrence*.

Definition 10The operator is calledrecurrentif for every open set there is a such that .

Clearly every hypercyclic operator is recurrent. Unlike hypercyclicity which is a purely infinite dimensional phenomenon, there are recurrent operators in finite dimensions (consider for example a rotation on the plane).

A recurrent operator has many points whose orbit under asymptotically `returns’ to the point. To make this more precise, let us call a vector *recurrent vector for * if there exists an increasing sequence of positive integers such that as . It turns out that a recurrent operator has a dense set of recurrent vectors.

Proposition 11An operator is recurrent if and only if the set of recurrent vectors for is dense in . In this case the set of recurrent vectors for is a subset of .

*Proof:* Let us first prove the easy implication. That is we assume that has a dense set of recurrent points and let be an open set in . Since the recurrent points of are dense, there is a which is recurrent for . Take such that . Since is recurrent, there is a such that . Thus . That is we have that . Let us now assume that is recurrent. We fix an open ball for some and . We need to show that there is a recurrent vector in . Since is recurrent there exists a positive integer such that , for some .That is we have that and . Since is continuous, there exists such that and . Now since is recurrent, there is a such that for some . By continuity again there is an such that and . Continuing inductively we construct a sequence , a strictly increasing sequence of positive integers and a sequence of positive real numbers , such that

Since is complete we conclude by Cantor’s theorem that

for some . We also have that , for all . Thus we have that for every , which means that in . That is, is a recurrent point in the original ball .

Finally, let us write for the set of -recurrent vectors. Observe that

which shows that the set of -recurrent vectors is a -set.

After (simple) recurrence, let’s now consider multiple recurrence. An operator is called *topologically multiply recurrent* if for every non-empty open set and every there is a such that

Of course a hypercyclic operator is always recurrent. However, there is no reason why a hypercyclic operator should be topologically multiply recurrent in general. This is illustrated in the following proposition.

Proposition 12 (Costakis and Parissis, 2010)There exists a hypercyclic bilateral weighted shift on which is not topologically multiply recurrent.

** — 2.2. Frequent hypercyclicity and Szemerédi’s theorem — **

Recently, Bayart and Grivaux introduced in (Bayart and Grivaux, 2005) and (Bayart and Grivaux, 2006) a notion that examines how frequently the orbit of a hypercyclic operator visits a non-empty open set.

Definition 13An operator is calledfrequently hypercyclicif there exists a vector such that, for every non-empty open set , the sethas positive lower density.

This is the strongest form of this definition, using the `weakest’ density. There are variations where the lower density is replaced for example by the upper density. Recall that the lower density of a set is defined as

while the upper density of is

In (Bayart and Grivaux, 2006) a `frequent hypercyclicity criterion’ was established. We won’t describe this here but point out one of its applications. Going back to adjoints of multiplication operators, an application of the Bayart-Grivaux frequent hypercyclicity criterion yields the following result:

Example 5Recall that is a non-trivial Hilbert space of holomorphic functions with bounded point evaluation functionals. We consider multiplier operators with symbol . We have the following result which is a corollary of the Bayart-Grivaux criterion

Proposition 14 (Bayart, Grivaux)Assume that is a Hilbert space of holomorphic functions as above. Furthermore assume that every bounded holomorphic function is a multiplier of such that . The following are equivalent:(i) The adjoint multiplication operator is hypercyclic.

(ii) The adjoint multiplication operator is frequently hypercyclic.

(iii) The function is non-constant and .

The notion of frequent hypercyclicity seems to be the right one in relevance to topological multiple recurrence. In order to illustrate this connection we need Szemerédi’s theorem on arithmetic progressions.

Theorem 15 (Szemerédi)Let be a subset of with positive upper density. Then contains arbitrarily long arithmetic progressions.

The following proposition is just an easy application of Szemerédi’s theorem:

Proposition 16Let be a frequently hypercyclic operator. Then is topologically multiple recurrent.

*Proof:* Let be an open set and let . Since is frequently hypercyclic, there exists a such that the set

has positive lower density. By Szemerédi’s theorem, contains an arithmetic progression of length , that is we have that

This means that

that is, is topologically multiply recurrent.

** — 2.3. Frequently Cesàro hypercyclic operators — **

As we have seen earlier, an operator is Cesàro hypercyclic if and only if there exists a such that the set

is dense in . In accordance to frequently hypercyclicity, Costakis and Ruzsa introduced in (Costakis and Ruzsa, 2010) the notion of a *frequently Cesàro hypercyclic* operator in the obvious way.

Definition 17An operator is calledfrequently Cesàro hypercyclicif there is a vector such that, for every open set , the sethas positive lower density.

In contrast with Cesàro hypercyclic operators, frequently Cesàro hypercyclic operators are always hypercyclic:

Theorem 18 (Costakis and Ruzsa, 2010)Let be a frequently Cesàro hypercyclic operator. Then is hypercyclic.

As in the case of frequently hypercyclic operators, frequently Cesàro hypercyclic operators are always topologically multiply recurrent. However, this is not so obvious any more.

Theorem 19 (Costakis and Parissis, 2010)Let be a frequently Cesàro hypercyclic operator. Then is topologically multiply recurrent.

The hypothesis of the previous theorem is optimal in the sense that a Cesàro hypercyclic is not in general topologically multiply recurrent.

Proposition 20 (Costakis and Parissis, 2010)There exists a Cesàro hypercyclic bilateral weighted shift on which is not recurrent, and hence not topologically multiply recurrent.

Before giving the actual proof of Theorem 19, let us try to repeat the simple argument used in the proof of Proposition 16. We begin by fixing a positive integer and an open set . We will assume that is a ball, say . We need to show that there exists some vector with

or, in other words, that there is a such that

By the hypothesis and Szemerédi’s theorem there is a vector and an arithmetic progression of length

such that

In this case it is not obvious which is the natural candidate for the vector but let’s take . We then have for

where we know that all the ‘s are in . We can then naively estimate

There are two problems here. The first is that we cannot control the factor . The second is that even if we could, say we had , this estimate would give us that which is one too large. The second problem is easy to deal with. We just start with a smaller ball inside our original set and carry out this reasoning for the smaller ball. In the proof given below we will consider two cases. In the first we will just assume that is small. In the complementary case, we will appropriately use the information that is large!

*Proof of Theorem 19:* Let be any non-empty open set in . We fix a non-zero vector and take a positive number such that . Without loss of generality we may assume that . Consider the ball with

Observe that . Since is a frequently Ces\`{a}ro hypercyclic operator there exists such that the set

has positive lower density. By Szemerédi’s theorem the set contains an arithmetic progression of length , i.e. there exist positive integers such that

Therefore the vectors

belong to .

As promised, we will consider two cases depending on the values of the ratio of the step over the first term of the arithmetic progression provided by Szemerédi’s theorem:

**Case 1. .**

We define the vector as

Then we have

for every . Since

we conclude that

and therefore

as we wanted to show.

**Case 2. .**

Here we first need to specify a number such that

for every . Indeed, solving the above equation for we get

We now define the vector as

Then we have

that is . On the other hand,

for every . The last equality and the above estimates imply

for every . Let . Since

we conclude that

Therefore

This completes the proof of the theorem.

** — 3. Back to adjoints of multiplication operators. — **

We can now give a full characterization of frequent hypercyclicity and multiple recurrence in the case of adjoints of multiplication operators on a non-trivial Hilbert space of holomorphic functions. It turns out that the weaker property of being recurrent is equivalent to frequent hypercyclicity and thus to every other property we have discussed here.

Proposition 21 (Costakis and Parissis, 2010)Assume that is a Hilbert space of holomorphic functions as above. Furthermore assume that every bounded holomorphic function is a multiplier of such that . The following are equivalent:(i) is recurrent.

(ii) The adjoint multiplication operator is hypercyclic.

(iii) The adjoint multiplication operator is frequently hypercyclic.

(iv) The adjoint multiplication operator is topologically multiply recurrent.

(v) The function is non-constant and .

*Proof:* We have already seen in Theorem 8 and Proposition 14 that conditions *(ii), (iii)* and *(v)* are equivalent. Also, by Proposition 16, *(iii)* implies *(iv)* and obviously *(iv)* implies *(i)*. So the proof will be complete if we show for example that *(i)* implies *(v)*.

Indeed, assume that is recurrent. Suppose, for the sake of contradiction, that . Since is connected, so is ; thus, we either have that or .

**Case 1. .**

Then we have . We will consider two complementary cases. Assume that there exist and a recurrent vector for such that

The above inequality and the fact that imply that for every positive integer

On the other hand for some strictly increasing sequence of positive integers we have . Using the last inequality we arrive at , a contradiction. In the complementary case we must have for every vector which is recurrent for . Since the set of recurrent vectors for is dense in we get that for every . Hence for every . Take now and consider the reproducing kernel of . We have already seen in the proof of Theorem 8 that where is the reproducing kernel at . We conclude that

However, this is clearly impossible since is an isometry.

**Case 2. .**

Here is a bounded holomorphic function satisfying ; therefore, is invertible. It is easy to see that if an operator is invertible, then is recurrent if and only if is recurrent. Thus the operator is recurrent and the proof follows by Case 1.

Remark 22It is easy to see that under the hypotheses of Proposition 21, is never recurrent. On the other hand, suppose that is a constant function with for some and every . Then we have that (or equivalently ) is recurrent if and only if is topologically multiply recurrent if and only if . In order to prove this it is enough to notice that for every non-zero complex number , with , and every positive integer , there exists an increasing sequence of positive integers such that

** — 4. Some open questions — **

I will close this post by suggesting a couple of open problems. For more information you can check the actual paper.

** — 4.1. Multipliers on the Dirichlet space. — **

First of all, let me come back to the adjoints of multiplication operators. Recall that the Dirichlet space is defined as the space of holomorphic functions such that

The reader might have noticed that throughout the discussion here, I have assumed that the multipliers of the Hilbert space are exactly the bounded holomorphic functions and that . Although this is actually the case on the Hardy space or the Bergman space , things are quite different on the Dirichlet space defined before. On the Dirichlet space, not all bounded holomorphic functions are multipliers. In fact the characterization of multipliers on the Dirichlet space is a bit more technical and is due to Stegenga (Stegenga 1980):

Theorem 23 (Stegenga)The function is a multiplier for the Dirichlet space if and only if and the measure is a Carleson measure for the Dirichlet space .

Of course this theorem doesn’t tell us much if we can’t understand which are the Carleson measures for the Dirichlet space. Here I will just give the definition as the characterization of these measures is completely beyond the scope of this post.

Definition 24A positive Borel measure on is a Carleson measure for the Dirichlet space if for some positive constantfor every .

Due to the more involved characterization of the multipliers on the Dirichlet space, characterizing when adjoints of multiplication operators on are hypercyclic is an open question. It is however known that the condition is no longer necessary, though it is sufficient. An example is provided by the function on . On the other hand it is known that is necessary. For this, see for example the PhD thesis of Irina Seceleanu.

** — 4.2. Frequently universal sequences of operators. — **

Remember that a family of operators on is called *universal* if there exists a such that the set

is dense in . The following definition is the natural extension of frequent hypercyclicity to universal families

Definition 25The family of operators is calledfrequently universalif there exists a such that for every open set the sethas positive lower density.

Thus saying that an operator is frequently Cesàro hypercyclic amounts to saying that the family is frequently universal. Theorem 19 says that if the family is frequently universal then is topologically multiply recurrent. However, there is nothing too special about the sequence . One can consider the family of operators where is an appropriate sequence of complex numbers.

Under what condition on the sequence of complex numbers one may conclude that is topologically multiply recurrent from the hypothesis that the family is frequently universal?

** — 5. Bibliography — **

Bayart, Frédéric and Sophie Grivaux. 2005. *Hypercyclicity and unimodular point spectrum*, J. Funct. Anal. 226, no. 2, 281–300. MR2159459 (2006i:47014).

Bayart, Frédéric and Sophie Grivaux. 2006. *Frequently hypercyclic operators*, Trans. Amer. Math. Soc. 358, no. 11, 5083–5117 (electronic). MR2231886 (2007e:47013) .

Bayart, Frédéric and Étienne Matheron. 2009. *Dynamics of linear operators, Cambridge Tracts in Mathematics*, vol. 179, Cambridge University Press, Cambridge. MR2533318.

Bayart, Frédéric and Étienne Matheron. 2007. Hypercyclic operators failing the hypercyclicity criterion on classical Banach spaces, J. Funct. Anal. 250, no. 2, 426–441. MR2352487 (2008k:47016).

Bès, Juan, P. 1998. *Three problems on hypercyclic operators.*, PhD. Thesis.

Costakis, George and Ioannis Parissis. 2010. Szemeredi’s theorem, frequent hypercyclicity and multiple recurrence, available at http://arxiv.org/abs/1008.4017.

Costakis, George and Imre Z. Ruzsa. 2010. *Frequently Cesàro hypercylic operators are hypercyclic*, preprint.

De la Rosa, Manuel and Charles Read. 2009. A hypercyclic operator whose direct sum TT is not hypercyclic, J. Operator Theory 61, no. 2, 369–380. MR2501011 (2010e:47023).

Gethner, Robert M. and Joel H. Shapiro. 1987. *Universal vectors for operators on spaces of holo- morphic functions*, Proc. Amer. Math. Soc. 100, no. 2, 281–288. MR884467 (88g:47060).

Godefroy, Gilles and Joel H. Shapiro. 1991. *Operators with dense, invariant, cyclic vector manifolds*, J. Funct. Anal. 98, no. 2, 229–269. MR1111569 (92d:47029).

Kitai, Carol. 1982. *Invariant closed sets for linear operators*, ProQuest LLC, Ann Arbor, MI. Thesis (Ph.D.)–University of Toronto (Canada). MR2632793.

León-Saavedra, Fernando. 2002. *Operators with hypercyclic Cesàro means*, Studia Math. 152, no. 3, 201–215. MR1916224 (2003f:47012).

Stegenga, David A. 1980. *Multipliers of the Dirichlet space*, Illinois J. Math. 24, no. 1, 113–139. MR550655 (81a:30027).

Such norms can be defined on any finite additive group (and also on some other types of domains, though we will not discuss this point here). In particular, they can be defined on the finite-dimensional vector spaces over a finite field .

In this case, the Gowers norms are closely tied to the space of polynomials of degree at most . Indeed, as noted in Exercise 20 of Notes 4, a function of norm has norm equal to if and only if for some ; thus polynomials solve the “ inverse problem” for the trivial inequality . They are also a crucial component of the solution to the “ inverse problem” and “ inverse problem”. For the former, we will soon show:

Proposition 1 ( inverse theorem for )Let be such that and for some . Then there exists such that , where is a constant depending only on .

Thus, for the Gowers norm to be almost completely saturated, one must be very close to a polynomial. The converse assertion is easily established:

Exercise 1 (Converse to inverse theorem for )If and for some , then , where is a constant depending only on .

In the world, one no longer expects to be close to a polynomial. Instead, one expects to *correlate* with a polynomial. Indeed, one has

Lemma 2 (Converse to the inverse theorem for )If and are such that , where , then .

*Proof:* From the definition of the norm (equation (18) from Notes 3), the monotonicity of the Gowers norms (Exercise 19 of Notes 3), and the polynomial phase modulation invariance of the Gowers norms (Exercise 21 of Notes 3), one has

and the claim follows.

In the high characteristic case at least, this can be reversed:

Theorem 3 ( inverse theorem for )Suppose that . If is such that and , then there exists such that .

This result is sometimes referred to as the *inverse conjecture for the Gowers norm* (in high, but bounded, characteristic). For small , the claim is easy:

Exercise 2Verify the cases of this theorem. (Hint:to verify the case, use the Fourier-analytic identities and , where is the space of all homomorphisms from to , and are the Fourier coefficients of .)

This conjecture for larger values of are more difficult to establish. The case of the theorem was established by Ben Green and myself in the high characteristic case ; the low characteristic case was independently and simultaneously established by Samorodnitsky. The cases in the high characteristic case was established in two stages, firstly using a modification of the Furstenberg correspondence principle, due to Ziegler and myself. to convert the problem to an ergodic theory counterpart, and then using a modification of the methods of Host-Kra and Ziegler to solve that counterpart, as done in this paper of Bergelson, Ziegler, and myself.

The situation with the low characteristic case in general is still unclear. In the high characteristic case, we saw from Notes 4 that one could replace the space of non-classical polynomials in the above conjecture with the essentially equivalent space of classical polynomials . However, as we shall see below, this turns out not to be the case in certain low characteristic cases (a fact first observed by Lovett, Meshulam, and Samorodnitsky, and independently by Ben Green and myself), for instance if and ; this is ultimately due to the existence in those cases of non-classical polynomials which exhibit no significant correlation with classical polynomials of equal or lesser degree. This distinction between classical and non-classical polynomials appears to be a rather non-trivial obstruction to understanding the low characteristic setting; it may be necessary to obtain a more complete theory of non-classical polynomials in order to fully settle this issue.

The inverse conjecture has a number of consequences. For instance, it can be used to establish the analogue of Szemerédi’s theorem in this setting:

Theorem 4 (Szemerédi’s theorem for finite fields)Let be a finite field, let , and let be such that . If is sufficiently large depending on , then contains an (affine) line for some with .

Exercise 3Use Theorem 4 to establish the following generalisation: with the notation as above, if and is sufficiently large depending on , then contains an affine -dimensional subspace.

We will prove this theorem in two different ways, one using a density increment method, and the other using an energy increment method. We discuss some other applications below the fold.

** — 1. The inverse theorem — **

We now prove Proposition 1. Results of this type for general appear in this paper of Alon, Kaufman, Krivelevich, Litsyn, and Ron (see also this paper of Sudan, Trevisan, and Vadhan for a precursor result), the case was treated previously by by Blum, Luby, and Rubinfeld. The argument here is due to Tamar Ziegler and myself. The argument has a certain “cohomological” flavour (comparing cocycles with coboundaries, determining when a closed form is exact, etc.). Indeed, the inverse theory can be viewed as a sort of “additive combinatorics cohomology”.

Let be as in the theorem. We let all implied constants depend on . We use the symbol to denote various positive constants depending only on . We may assume is sufficiently small depending on , as the claim is trivial otherwise.

The case is easy, so we assume inductively that and that the claim has been already proven for .

The first thing to do is to make unit magnitude. One easily verifies the crude bound

and thus

Since pointwise, we conclude that

As such, differs from a function of unit magnitude by in norm. By replacing with and using the triangle inequality for the Gowers norm (changing and worsening the constant in Proposition 1 if necessary), we may assume without loss of generality that throughout, thus for some .

Since

we see from Markov’s inequality that

for all in a subset of of density . Applying the inductive hypothesis, we see that for each such , we can find a polynomial such that

Now let . Using the cocycle identity

where is the shift operator , we see using Hölder’s inequality that

On the other hand, is a polynomial of order . Also, since is so dense, every element of has at least one representation of the form for some (indeed, out of all possible representations , or can fall outside of for at most of these representations). We conclude that for every there exists a polynomial such that

The new polynomial supercedes the old one ; to reflect this, we abuse notation and write for . Applying the cocycle equation again, we see that

for all . Applying the rigidity of polynomials (Exercise 14 from Notes 4), we conclude that

for some constant . From (2) we in fact have for all .

The expression is known as a *-coboundary* (see this blog post for more discussion). To eliminate it, we use the finite characteristic to discretise the problem as follows. First, we use the cocycle identity

where is the characteristic of the field. Using (1), we conclude that

On the other hand, takes values in some coset of a finite subgroup of (depending only on ), by Lemma 1 of Notes 4. We conclude that this coset must be a shift of by . Since itself takes values in some coset of a finite subgroup, we conclude that there is a finite subgroup (depending only on ) such that each takes values in a shift of by .

Next, we note that we have the freedom to shift each by (adjusting accordingly) without significantly affecting any of the properties already established. Doing so, we can thus ensure that all the take values in itself, which forces to do so also. But since , we conclude that for all , thus is a perfect cocycle:

We may thus integrate and write , where . Thus is a polynomial of degree for each , thus itself is a polynomial of degree . From (1) one has

for all ; averaging in we conclude that

and thus

and Proposition 1 follows.

One consequence of Proposition 1 is that the property of being a classical polynomial of a fixed degree is *locally testable*, which is a notion of interest in theoretical computer science. More precisely, suppose one is given a large finite vector space and two functions . One is told that one of the functions is a classical polynomial of degree at most , while the other is quite far from being such a classical polynomial, in the sense that every polynomial of degree at most will differ with that polynomial on at least of the values in . The task is then to decide with a high degree of confidence which of the functions is a polynomial and which one is not, without inspecting too many of the values of or .

This can be done as follows. Pick at random, and test whether the identities

and

hold; note that one only has to inspect at values in for this. If one of these identities fails, then that function must not be polynomial, and so one has successfully decided which of the functions is polynomials. We claim that the probability that the identity fails for the non-polynomial function is at least for some , and so if one iterates this test times, one will be able to successfully solve the problem with probability arbitrarily close to . To verify the claim, suppose for contradiction that the identity only failed at most of the time for the non-polynomial (say it is ); then , and thus by Proposition 1, is very close in norm to a polynomial; rounding that polynomial to a root of unity we thus see that agrees with high accuracy to a classical polynomial, which leads to a contradiction if is chosen suitably.

** — 2. A partial counterexample in low characteristic — **

We now show a distinction between classical polynomials and non-classical polynomials that causes the inverse conjecture to fail in low characteristic if one insists on using classical polynomials. For simplicity we restrict attention to the characteristic two case . We will use an argument of Alon and Beigel, reproduced in this paper of Green and myself. A different argument (with stronger bounds) appears in this paper of Lovett, Meshulam, and Samorodnitsky.

We work in a standard vector space , with standard basis and coordinates . Among all the classical polynomials on this space are the *symmetric polynomials*

which play a special role.

Exercise 4Let be the digit summation function . Show thatEstablish Lucas’ theorem

where , is the binary expansion of . Show that is the binary coefficient of , and conclude that is a function of . (

Note:These results are closely related to the well-known fact that Pascal’s triangle modulo takes the form of an infinite Sierpinski gasket.)

We define an *an affine coordinate subspace* to be a translate of a subspace of generated by some subset of the standard basis vectors . To put it another way, an affine coordinate subspace is created by freezing some of the coordinates, but letting some other coordinates be arbitrary.

Of course, not all classical polynomials come from symmetric polynomials. However, thanks to an application of Ramsey’s theorem observed by Alon and Biegel, this is true on coordinate subspaces:

Lemma 5 (Ramsey’s theorem for polynomials)Let be a polynomial of degree at most . Then one can partition into affine coordinate subspaces of dimension at least , where as for fixed , such that on each such subspace , is equal to a linear combination of the symmetric polynomials .

*Proof:* We induct on . The claim is trivial for , so suppose that and the claim has already been proven for smaller . The degree term of can be written as

where is a -uniform hypergraph on , i.e. a collection of -element subsets of . Applying Ramsey’s theorem (for hypergraphs), one can find a subcollection of indices with such that either has no edges in , or else contains all the edges in . We then foliate into the affine subspaces formed by translating the coordinate subspace generated by . By construction, we see that on each such subspace, is equal to either or plus a polynomial of degree . The claim then follows by applying the induction hypothesis (and noting that the linear span of on an affine coordinate subspace is equivariant with respect to translation of that subspace).

Because of this, if one wants to concoct a function which is almost orthogonal to all polynomials of degree at most , it will suffice to build a function which is almost orthogonal to the symmetric polynomials on all affine coordinate subspaces of moderately large size. Pursuing this idea, we are led to

Proposition 6 (Counterexample to classical inverse conjecture)Let , and let be the function , where is as in Exercise 4. Then is a non-classical polynomial of degree at most , and so ; but one hasuniformly for all classical polynomials of degree less than , where is bounded in magnitude by a quantity that goes to zero as for each fixed .

*Proof:* We first prove the polynomiality of . Let be the obvious map from to , thus

By linearity, it will suffice to show that each function is a polynomial of degree at most . But one easily verifies that for any , is equal to zero when and equal to when . Iterating this observation times, we obtain the claim.

Now let be a classical polynomial of degree less than . By Lemma 5, we can partition into affine coordinate subspaces of dimension at least such that is a linear combination of on each such subspace. By the pigeonhole principle, we thus can find such a such that

On the other hand, from Exercise 4, the function on depends only on . Now, as , the function (which is essentially the distribution function of a simple random walk of length on ) becomes equidistributed; in particular, for any , the function will take the values and with asymptotically equal frequency on , whilst remains unchanged. As such we see that as , and thus as , and the claim follows.

Exercise 5With the same setup as the previous proposition, show that , but that for all classical polynomials of degree less than .

** — 3. The inverse theorem: sketches of a proof — **

The proof of Theorem 3 is rather difficult once ; even the case is not particularly easy. However, the arguments still have the same cohomological flavour encountered in the theory. We will not give full proofs of this theorem here, but indicate some of the main ideas.

We begin by discussing (quite non-rigorously) the significantly simpler (but still non-trivial) case, established by Ben Green and myself. Unsurprisingly, we will take advantage of the case of the theorem as an induction hypothesis.

Let for some field of characteristic greater than , and be a function with and . We would like to show that correlates with a quadratic phase function (due to the characteristic hypothesis, we may take to be classical), in the sense that .

We expand as . By the pigeonhole principle, we conclude that

for “many” , where by “many” we mean “a proportion of “. Applying the inverse theorem, we conclude that for many , that there exists a linear polynomial (which we may as well take to be classical) such that

This should be compared with the theory. There, we were able to force close to for most ; here, we only have the weaker statement that *correlates* with for *many* (not *most*) . Still, we will keep going. In the theory, we were able to assume had magnitude , which made the cocycle equation available; this then forced an approximate cocycle equation for most (indeed, we were able to use this trick to upgrade “most” to “all”).

This doesn’t quite work in the case. Firstly, need not have magnitude exactly equal to . This is not a terribly serious problem, but the more important difficulty is that correlation, unlike the property of being close, is not transitive or multiplicative: just because correlates with , and correlates with , one cannot then conclude that correlates with ; and even if one had this, and if correlated with , one could not conclude that correlated with .

Despite all these obstacles, it is still possible to extract something resembling a cocycle equation for the , by means of the Cauchy-Schwarz inequality. Indeed, we have the following remarkable observation of Gowers:

Lemma 7 (Gowers’ Cauchy-Schwarz argument)Let be a finite additive group, and let be a function, bounded by . Let be a subset with , and suppose that for each , suppose that we have a function bounded by , such thatuniformly in . Then there exist quadruples with such that

uniformly among the quadruples.

We shall refer to quadruples obeying the relation as *additive quadruples*.

*Proof:* We extend to be zero when lies outside of . Then we have

We expand the left-hand side as

setting , this becomes

From the pigeonhole principle, we conclude that for many values of , one has

Performing Cauchy-Schwarz once in and once in to eliminate the factors, and then re-averaging in , we conclude that

Setting to be the additive quadruple we obtain

Performing the averages we obtain

and the claim follows (note that for the quadruples obeying the stated lower bound, must lie in ).

Applying this lemma to our current situation, we find many additive quadruples for which

In particular, by the equidistribution theory in Notes 4, the polynomial is low rank.

The above discussion is valid in any value of , but is particularly simple when , as the are now linear, and so is now *constant*. Writing for some using the standard dot product on , and some (irrelevant) constant term , we conclude that

for many additive quadruples .

We now have to solve an additive combinatorics problem, namely to classify the functions from to which are “ affine linear” in the sense that the property (3) holds for many additive quadruples; equivalently, the graph in has high “additive energy”, defined as the number of additive quadruples that it contains. An obvious example of a function with this property is an affine-linear function , where is a linear transformation and . As it turns out, this is essentially the only example:

Proposition 8 (Balog-Szemerédi-Gowers-Freiman theorem for vector spaces)Let , and let be a map from to such that (3) holds for additive quadruples in . Then there exists an affine function such that for values of in .

This proposition is a consequence of standard results in additive combinatorics, in particular the Balog-Szemerédi-Gowers lemma and Freiman’s theorem for vector spaces; see Section 11.3 of my book with Van for further discussion. The proof is elementary but a little lengthy and would take us too far afield, so we simply assume this proposition for now and keep going. We conclude that

The most difficult term to deal with here is the quadratic term . To deal with this term, suppose temporarily that is symmetric, thus . Then (since we are in odd characteristic) we can *integrate* as

and thus

for many . Taking norms in , we conclude that the inner product between two copies of and two copies of . Applying the Cauchy-Schwarz-Gowers inequality, followed by the inverse theorem, we conclude that correlates with for some linear phase, and thus itself correlates with for some quadratic phase.

This argument also works (with minor modification) when is *virtually symmetric*, in the sense that there exist a bounded index subspace of such that the restriction of the form to is symmetric, by foliating into cosets of that subspace; we omit the details. On the other hand, if is not virtually symmetric, there is no obvious way to “integrate” the phase to eliminate it as above. (Indeed, in order for to be “exact” in the sense that it is the “derivative” of something (modulo lower order terms), e.g. for some , it must first be “closed” in the sense that in some sense, since we have ; thus we again see the emergence of cohomological concepts in the background.)

To establish the required symmetry on , we return to Gowers’ Cauchy-Schwarz argument from Lemma 7, and tweak it slightly. We start with (4) and rewrite it as

where . We square-average this in to obtain

Now we make the somewhat unusual substitution to obtain

Thus there exists such that

We collect all terms that depend only on (and ) or only on (and ) to obtain

for some bounded functions . Eliminating these functions by two applications of Cauchy-Schwarz, we obtain

or, on making the change of variables ,

Using equidistribution theory, this means that the quadratic form is low rank, which easily implies that is virtually symmetric.

Now we turn to the general case. In principle, the above argument should still work, say for . The main sticking point is that instead of dealing with a vector-valued function that is approximately linear in the sense that (3) holds for many additive quadruples, in the case one is now faced with a \xi_{h_1}

for many additive quadruples , where the matrix has bounded rank. With our current level of additive combinatorics technology, we are not able to deal properly with this bounded rank error (the main difficulty being that the set of low rank matrices has no good “doubling” properties). Because of this obstruction, no generalisation of the above arguments to higher has been found.

There is however another approach, based ultimately on the ergodic theory work of Host-Kra and of Ziegler, that can handle the general case, which was worked out in two papers, one by myself and Ziegler, and one by Bergelson, Ziegler, and myself. It turns out that it is convenient to phrase these arguments in the language of ergodic theory. However, in order not to have to introduce too much additional material, I will try to describe the arguments here in the case without explicitly using ergodic theory notation. To do this, though, I will have to sacrifice a lot of rigour and only work with some illustrative special cases rather than the general case, and also use somewhat vague terminology (e.g. “general position” or “low rank”).

To simplify things further, we will establish the inverse theorem only for a special type of function, namely a quartic phase , where is a classical polynomial of degree . (A good example to keep in mind is the symmetric polynomial phase , though one has to take some care with this example due to the low characteristic.) The claim to show then is that if , then correlates with a cubic phase. In the high characteristic case , this result can be handled by equidistribution theory. Indeed, since

that theory tells us that the quartic polynomial is low rank. On the other hand, in high characteristic one has the Taylor expansion

for some cubic function (as can be seen for instance by decomposing into monomials). From this we easily conclude that itself has low rank (i.e. it is a function of boundedly many cubic (or lower degree) polynomials), at which point it is easy to see from Fourier analysis that will correlate with the exponential of a polynomial of degree at most .

Now we present a different argument that relies slightly less on the quartic nature of ; it is a substantially more difficult argument, and we will skip some steps here to simplify the exposition, but the argument happens to extend to more general situations. As , we have for many , thus by the inverse theorem, correlates with a quadratic phase. Using equidistribution theory, we conclude that the cubic polynomial is low rank.

At present, the low rank property for is only true for many . But from the cocycle identity

we see that if and are both low rank, then so is ; thus the property of being low rank is in some sense preserved by addition. Using this and a bit of additive combinatorics, one can conclude that is low rank for all in a bounded index subspace of ; restricting to that subspace, we will now assume that is low rank for *all* . Thus we have

where is some bounded collection of quadratic polynomials for each , and is some function. To simplify the discussion, let us pretend that in fact consists of just a single quadratic , plus some linear polynomials , thus

There are two extreme cases to consider, depending on how depends on . Consider first a “core” case when is independent of . Thus

If is low rank, then we can absorb it into the factors, so suppose instead thaat is high rank, and thus equidistributed even after fixing the values of .

The function is cubic, and is a high rank quadratic. Because of this, the function must be at most linear in the variable; this can be established by another application of equidistribution theory, see Section 8 of this paper of Ben and myself; thus one can factorise

for some functions . In fact, as is cubic, must be linear, while is cubic.

By comparing the coefficients in the cocycle equation (5), we see that the function is itself a cocycle:

As a consequence, we have for some function . Since is linear, is quadratic; thus we have

With a high characteristic assumption , one can ensure is classical. We will assume that is high rank, as this is the most difficult case.

Suppose first that . In high characteristic, one can then integrate by expressing this as plus lower order terms, thus is an order function in the sense that it is a function of a bounded number of linear functions. In particular, has a large norm for all , which implies that has a large norm, and thus correlates with a quadratic phase. Since can be decomposed by Fourier analysis into a linear combination of quadratic phases, we conclude that correlates with a quadratic phase and one is thus done in this case.

Now consider the other extreme, in which and lie in general position. Then, if we differentiate (8) in , we obtain one has

and then anti-symmetrising in one has

If and are unrelated, then the linear forms will typically be in general position with respect to each other and with , and similarly will be in general position with respect to each other and with . From this, one can show that the above equation is not satisfiable generically, because the mixed terms cannot be cancelled by the simpler terms in the above expression.

An interpolation of the above two arguments can handle the case in which does not depend on . Now we consider the other extreme, in which varies in , so that and are in general position for generic , and similarly for and , or for and . (Note though that we cannot simultaneously assume that are in general position; indeed, might vary linearly in , and indeed we expect this to be the basic behaviour of here, as was observed in the preceding argument.)

To analyse this situation, we return to the cocycle equation (5), which currently reads

Because any two of can be assumed to be in general position, one can show using equidistribution theory that the above equation can only be satisfied when the are linear in the variable, thus

much as before. Furthermore, the coefficients must now be (essentially) constant in in order to obtain (9). Absorbing this constant into the definition of , we now have

We will once again pretend that is just a single linear form . Again we consider two extremes. If is independent of , then by passing to a bounded index subspace (the level set of ) we now see that is quadratic, hence is cubic, and we are done. Now suppose instead that varies in , so that are in general position for generic . We look at the cocycle equation again, which now tells us that obeys the *quasicocycle* condition

where is a quadratic polynomial. With any two of in general position, one can then conclude (using equidistribution theory) that are quadratic polynomials. Thus is quadratic, and is cubic as before. This completes the heuristic discussion of various extreme model cases; the general case is handled by a rather complicated combination of all of these special case methods, and is best performed in the framework of ergodic theory (in particular, the idea of extracting out the coefficient of a key polynomial, such as the coerfficient of , is best captured by the ergodic theory concept of *vertical differentiation*); see this paper of Bergelson, Ziegler, and myself.

** — 4. Consequences of the Gowers inverse conjecture — **

We now discuss briefly some of the consequences of the Gowers inverse conjecture, beginning with Szemerédi’s theorem in vector fields (Theorem 4). We will use the density increment method (an energy increment argument is also possible, but is more complicated; see this paper). Let be a set of density at least containing no lines. This implies that the -linear form

has size . On the other hand, as this pattern has complexity , one has from Notes 3 the bound

whenever are bounded in magnitude by . Splitting , we conclude that

and thus (for large enough)

Applying Theorem 3, we find that there exists a polynomial of degree at most such that

To proceed we need the following analogue of Proposition 5 of Notes 2:

Exercise 6 (Fragmenting a polynomial into subspaces)Let be a classical polynomial of degree . Show that one can partition into affine subspaces of dimension at least , where as for fixed , such that is constant on each . (Hint:Induct on , and use Exercise 6 of Notes 4 repeatedly to find a good initial partition into subspaces on which has degree at most .)

Exercise 7Use the previous exercise to complete the proof of Theorem 4. (Hint:mimic the density increment argument from Notes 2.)

By using the inverse theorem in place of the Fourier-analytic analogue in Lemma 7 of Notes 2, one obtains the following regularity lemma, analogous to Theorem 10 of Notes 2:

Theorem 9 (Strong arithmetic regularity lemma)Suppose that . Let , let , and let be an arbitrary function. Then we can decompose and find such that

- (Nonnegativity) take values in , and have mean zero;
- (Structure) is a function of classical polynomials of degree at most ;
- (Smallness) has an norm of at most ; and
- (Pseudorandomness) One has for all .

For a proof, see this paper of mine. The argument is similar to that appearing in Theorem 10 of Notes 2, but the discrete nature of polynomials in bounded characteristic allows one to avoid a number of technical issues regarding measurability.

This theorem can then be used for a variety of applications in additive combinatorics. For instance, it gives the following variant of a result of Bergelson, Host, and Kra:

Proposition 10 (Bergelson-Host-Kra type result)Let , let , and let with , and let . Then for values of , one has

Roughly speaking, the idea is to apply the regularity lemma to , discard the contribution of the and errors, and then control the structured component using the equidistribution theory from Notes 4. A proof of this result can be found in this paper of Ben Green; see also this paper of Ben and myself for an analogous result in . Curiously, the claim fails when is replaced by any larger number; this is essentially an observation of Ruzsa that appears in the appendix of the paper of Bergelson, Host, and Kra.

The above regularity lemma (or more precisely, a close relative of this lemma) was also used by Gowers and Wolf to determine the true complexity of a linear system:

]]>

Theorem 11 (Gowers-Wolf theorem)Let be a collection of linear forms with integer coefficients, with no two forms being linearly dependent. Let have sufficiently large characteristic, and suppose that are functions bounded in magnitude by such thatwhere was the form defined in Notes 3. Then for each there exists a classical polynomial of degree at most such that

where is the true complexity of the system defined in Notes 3. This is best possible.