Theorem 1 (Szemerédi’s theorem in the primes)Let be a subset of the primes of positive relative density, thus . Then contains arbitrarily long arithmetic progressions.

This result was based in part on an earlier paper of Green that handled the case of progressions of length three. With the primes replaced by the integers, this is of course the famous theorem of Szemerédi.

Szemerédi’s theorem has now been generalised in many different directions. One of these is the multidimensional Szemerédi theorem of Furstenberg and Katznelson, who used ergodic-theoretic techniques to show that any dense subset of necessarily contained infinitely many constellations of any prescribed shape. Our main result is to relativise that theorem to the primes as well:

Theorem 2 (Multidimensional Szemerédi theorem in the primes)Let , and let be a subset of the Cartesian power of the primes of positive relative density, thusThen for any , contains infinitely many “constellations” of the form with and a positive integer.

In the case when is itself a Cartesian product of one-dimensional sets (in particular, if is all of ), this result already follows from Theorem 1, but there does not seem to be a similarly easy argument to deduce the general case of Theorem 2 from previous results. Simultaneously with this paper, an independent proof of Theorem 2 using a somewhat different method has been established by Cook, Maygar, and Titichetrakun.

The result is reminiscent of an earlier result of mine on finding constellations in the Gaussian primes (or dense subsets thereof). That paper followed closely the arguments of my original paper with Ben Green, namely it first enclosed (a W-tricked version of) the primes or Gaussian primes (in a sieve theoretic-sense) by a slightly larger set (or more precisely, a weight function ) of *almost primes* or *almost Gaussian primes*, which one could then verify (using methods closely related to the sieve-theoretic methods in the ongoing Polymath8 project) to obey certain pseudorandomness conditions, known as the *linear forms condition* and the *correlation condition*. Very roughly speaking, these conditions assert statements of the following form: if is a randomly selected integer, then the events of simultaneously being an almost prime (or almost Gaussian prime) are approximately independent for most choices of . Once these conditions are satisfied, one can then run a *transference argument* (initially based on ergodic-theory methods, but nowadays there are simpler transference results based on the Hahn-Banach theorem, due to Gowers and Reingold-Trevisan-Tulsiani-Vadhan) to obtain relative Szemerédi-type theorems from their absolute counterparts.

However, when one tries to adapt these arguments to sets such as , a new difficulty occurs: the natural analogue of the almost primes would be the Cartesian square of the almost primes – pairs whose entries are both almost primes. (Actually, for technical reasons, one does not work directly with a set of almost primes, but would instead work with a weight function such as that is concentrated on a set such as , but let me ignore this distinction for now.) However, this set does *not* enjoy as many pseudorandomness conditions as one would need for a direct application of the transference strategy to work. More specifically, given any fixed , and random , the four events

do *not* behave independently (as they would if were replaced for instance by the Gaussian almost primes), because any three of these events imply the fourth. This blocks the transference strategy for constellations which contain some right-angles to them (e.g. constellations of the form ) as such constellations soon turn into rectangles such as the one above after applying Cauchy-Schwarz a few times. (But a few years ago, Cook and Magyar showed that if one restricted attention to constellations which were in general position in the sense that any coordinate hyperplane contained at most one element in the constellation, then this obstruction does not occur and one can establish Theorem 2 in this case through the transference argument.) It’s worth noting that very recently, Conlon, Fox, and Zhao have succeeded in removing of the pseudorandomness conditions (namely the correlation condition) from the transference principle, leaving only the linear forms condition as the remaining pseudorandomness condition to be verified, but unfortunately this does not completely solve the above problem because the linear forms condition also fails for (or for weights concentrated on ) when applied to rectangular patterns.

There are now two ways known to get around this problem and establish Theorem 2 in full generality. The approach of Cook, Magyar, and Titichetrakun proceeds by starting with one of the known proofs of the multidimensional Szemerédi theorem – namely, the proof that proceeds through hypergraph regularity and hypergraph removal – and attach pseudorandom weights directly within the proof itself, rather than trying to add the weights to the *result* of that proof through a transference argument. (A key technical issue is that weights have to be added to all the levels of the hypergraph – not just the vertices and top-order edges – in order to circumvent the failure of naive pseudorandomness.) As one has to modify the entire proof of the multidimensional Szemerédi theorem, rather than use that theorem as a black box, the Cook-Magyar-Titichetrakun argument is lengthier than ours; on the other hand, it is more general and does not rely on some difficult theorems about primes that are used in our paper.

In our approach, we continue to use the multidimensional Szemerédi theorem (or more precisely, the equivalent theorem of Furstenberg and Katznelson concerning multiple recurrence for commuting shifts) as a black box. The difference is that instead of using a transference principle to connect the relative multidimensional Szemerédi theorem we need to the multiple recurrence theorem, we instead proceed by a version of the Furstenberg correspondence principle, similar to the one that connects the absolute multidimensional Szemerédi theorem to the multiple recurrence theorem. I had discovered this approach many years ago in an unpublished note, but had abandoned it because it required an *infinite* number of linear forms conditions (in contrast to the transference technique, which only needed a finite number of linear forms conditions and (until the recent work of Conlon-Fox-Zhao) a correlation condition). The reason for this infinite number of conditions is that the correspondence principle has to build a probability measure on an entire -algebra; for this, it is not enough to specify the measure of a single set such as , but one also has to specify the measure of “cylinder sets” such as where could be arbitrarily large. The larger gets, the more linear forms conditions one needs to keep the correspondence under control.

With the sieve weights we were using at the time, standard sieve theory methods could indeed provide a finite number of linear forms conditions, but not an infinite number, so my idea was abandoned. However, with my later work with Green and Ziegler on linear equations in primes (and related work on the Mobius-nilsequences conjecture and the inverse conjecture on the Gowers norm), Tamar and I realised that the primes themselves obey an infinite number of linear forms conditions, so one can basically use the primes (or a proxy for the primes, such as the von Mangoldt function ) as the enveloping sieve weight, rather than a classical sieve. Thus my old idea of using the Furstenberg correspondence principle to transfer Szemerédi-type theorems to the primes could actually be realised. In the one-dimensional case, this simply produces a much more complicated proof of Theorem 1 than the existing one; but it turns out that the argument works as well in higher dimensions and yields Theorem 2 relatively painlessly, except for the fact that it needs the results on linear equations in primes, the known proofs of which are extremely lengthy (and also require some of the transference machinery mentioned earlier). The problem of correlations in rectangles is avoided in the correspondence principle approach because one can compensate for such correlations by performing a suitable weighted limit to compute the measure of cylinder sets, with each requiring a different weighted correction. (This may be related to the Cook-Magyar-Titichetrakun strategy of weighting all of the facets of the hypergraph in order to recover pseudorandomness, although our contexts are rather different.)

]]>and the more complicated “expensive” argument gave the improvement

for some constant depending only on .

Unfortunately, while the cheap argument is correct, we discovered a subtle but serious gap in our expensive argument in the original paper. Roughly speaking, the strategy in that argument is to employ the *density increment method*: one begins with a large subset of that has no arithmetic progressions of length , and seeks to locate a subspace on which has a significantly increased density. Then, by using a “Koopman-von Neumann theorem”, ultimately based on an iteration of the inverse theorem of Ben and myself (and also independently by Samorodnitsky), one approximates by a “quadratically structured” function , which is (locally) a combination of a bounded number of quadratic phase functions, which one can prepare to be in a certain “locally equidistributed” or “locally high rank” form. (It is this reduction to the high rank case that distinguishes the “expensive” argument from the “cheap” one.) Because has no progressions of length , the count of progressions of length weighted by will also be small; by combining this with the theory of equidistribution of quadratic phase functions, one can then conclude that there will be a subspace on which has increased density.

The error in the paper was to conclude from this that the original function also had increased density on the same subspace; it turns out that the manner in which approximates is not strong enough to deduce this latter conclusion from the former. (One can strengthen the nature of approximation until one restores such a conclusion, but only at the price of deteriorating the quantitative bounds on one gets at the end of the day to be worse than the cheap argument.)

After trying unsuccessfully to repair this error, we eventually found an alternate argument, based on earlier papers of ourselves and of Bergelson-Host-Kra, that avoided the density increment method entirely and ended up giving a simpler proof of a stronger result than (1), and also gives the explicit value of for the exponent in (1). In fact, it gives the following stronger result:

Theorem 1Let be a subset of of density at least , and let . Then there is a subspace of of codimension such that the number of (possibly degenerate) progressions in is at least .

The bound (1) is an easy consequence of this theorem after choosing and removing the degenerate progressions from the conclusion of the theorem.

The main new idea is to work with a *local* Koopman-von Neumann theorem rather than a global one, trading a relatively weak global approximation to with a significantly stronger local approximation to on a subspace . This is somewhat analogous to how sometimes in graph theory it is more efficient (from the point of view of quantative estimates) to work with a local version of the Szemerédi regularity lemma which gives just a single regular pair of cells, rather than attempting to regularise almost all of the cells. This local approach is well adapted to the inverse theorem we use (which also has this local aspect), and also makes the reduction to the high rank case much cleaner. At the end of the day, one ends up with a fairly large subspace on which is quite dense (of density ) and which can be well approximated by a “pure quadratic” object, namely a function of a small number of quadratic phases obeying a high rank condition. One can then exploit a special positivity property of the count of length four progressions weighted by pure quadratic objects, essentially due to Bergelson-Host-Kra, which then gives the required lower bound.

As I was on the Abel prize committee this year, I won’t comment further on the prize, but will instead focus on what is arguably Endre’s most well known result, namely Szemerédi’s theorem on arithmetic progressions:

Theorem 1 (Szemerédi’s theorem)Let be a set of integers of positive upper density, thus , where . Then contains an arithmetic progression of length for any .

Szemerédi’s original proof of this theorem is a remarkably intricate piece of combinatorial reasoning. Most proofs of theorems in mathematics – even long and difficult ones – generally come with a reasonably compact “high-level” overview, in which the proof is (conceptually, at least) broken down into simpler pieces. There may well be technical difficulties in formulating and then proving each of the component pieces, and then in fitting the pieces together, but usually the “big picture” is reasonably clear. To give just one example, the overall strategy of Perelman’s proof of the Poincaré conjecture can be briefly summarised as follows: to show that a simply connected three-dimensional manifold is homeomorphic to a sphere, place a Riemannian metric on it and perform Ricci flow, excising any singularities that arise by surgery, until the entire manifold becomes extinct. By reversing the flow and analysing the surgeries performed, obtain enough control on the topology of the original manifold to establish that it is a topological sphere.

In contrast, the pieces of Szemerédi’s proof are highly interlocking, particularly with regard to all the epsilon-type parameters involved; it takes quite a bit of notational setup and foundational lemmas before the key steps of the proof can even be stated, let alone proved. Szemerédi’s original paper contains a logical diagram of the proof (reproduced in Gowers’ recent talk) which already gives a fair indication of this interlocking structure. (Many years ago I tried to present the proof, but I was unable to find much of a simplification, and my exposition is probably not that much clearer than the original text.) Even the use of nonstandard analysis, which is often helpful in cleaning up armies of epsilons, turns out to be a bit tricky to apply here. (In typical applications of nonstandard analysis, one can get by with a single nonstandard universe, constructed as an ultrapower of the standard universe; but to correctly model all the epsilons occuring in Szemerédi’s argument, one needs to repeatedly perform the ultrapower construction to obtain a (finite) sequence of increasingly nonstandard (and increasingly saturated) universes, each one containing unbounded quantities that are far larger than any quantity that appears in the preceding universe, as discussed at the end of this previous blog post. This sequence of universes does end up concealing all the epsilons, but it is not so clear that this is a net gain in clarity for the proof; I may return to the nonstandard presentation of Szemeredi’s argument at some future juncture.)

Instead of trying to describe the entire argument here, I thought I would instead show some key components of it, with only the slightest hint as to how to assemble the components together to form the whole proof. In particular, I would like to show how two particular ingredients in the proof – namely van der Waerden’s theorem and the Szemerédi regularity lemma – become useful. For reasons that will hopefully become clearer later, it is convenient not only to work with ordinary progressions , but also progressions of progressions , progressions of progressions of progressions, and so forth. (In additive combinatorics, these objects are known as *generalised arithmetic progressions* of rank one, two, three, etc., and play a central role in the subject, although the way they are used in Szemerédi’s proof is somewhat different from the way that they are normally used in additive combinatorics.) Very roughly speaking, Szemerédi’s proof begins by building an enormous generalised arithmetic progression of high rank containing many elements of the set (arranged in a “near-maximal-density” configuration), and then steadily prunes this progression to improve the combinatorial properties of the configuration, until one ends up with a single rank one progression of length that consists entirely of elements of .

To illustrate some of the basic ideas, let us first consider a situation in which we have located a progression of progressions of length , with each progression , being quite long, and containing a near-maximal amount of elements of , thus

where is the “maximal density” of along arithmetic progressions. (There are a lot of subtleties in the argument about exactly how good the error terms are in various approximations, but we will ignore these issues for the sake of this discussion and just use the imprecise symbols such as instead.) By hypothesis, is positive. The objective is then to locate a progression in , with each in for . It may help to view the progression of progressions as a tall thin rectangle .

If we write for , then the problem is equivalent to finding a (possibly degenerate) arithmetic progression , with each in .

By hypothesis, we know already that each set has density about in :

Let us now make a “weakly mixing” assumption on the , which roughly speaking asserts that

for “most” subsets of of density of a certain form to be specified shortly. This is a plausible type of assumption if one believes to behave like a random set, and if the sets are constructed “independently” of the in some sense. Of course, we do not expect such an assumption to be valid all of the time, but we will postpone consideration of this point until later. Let us now see how this sort of weakly mixing hypothesis could help one count progressions of the desired form.

We will inductively consider the following (nonrigorously defined) sequence of claims for each :

- : For most choices of , there are arithmetic progressions in with the specified choice of , such that for all .

(Actually, to avoid boundary issues one should restrict to lie in the middle third of , rather than near the edges, but let us ignore this minor technical detail.) The quantity is natural here, given that there are arithmetic progressions in that pass through in the position, and that each one ought to have a probability of or so that the events simultaneously hold.) If one has the claim , then by selecting a typical in , we obtain a progression with for all , as required. (In fact, we obtain about such progressions by this method.)

We can heuristically justify the claims by induction on . For , the claims are clear just from direct counting of progressions (as long as we keep away from the edges of ). Now suppose that , and the claims have already been proven. For any and for most , we have from hypothesis that there are progressions in through with . Let be the set of all the values of attained by these progressions, then . Invoking the weak mixing hypothesis, we (heuristically, at least) conclude that for most choices of , we have

which then gives the desired claim .

The observant reader will note that we only needed the claim in the case for the above argument, but for technical reasons, the full proof requires one to work with more general values of (also the claim needs to be replaced by a more complicated version of itself, but let’s ignore this for sake of discussion).

We now return to the question of how to justify the weak mixing hypothesis (2). For a single block of , one can easily concoct a scenario in which this hypothesis fails, by choosing to overlap with too strongly, or to be too disjoint from . However, one can do better if one can select from a long progression of blocks. The starting point is the following simple double counting observation that gives the right upper bound:

Proposition 2 (Single upper bound)Let be a progression of progressions for some large . Suppose that for each , the set has density in (i.e. (1) holds). Let be a subset of of density . Then (if is large enough) one can find an such that

*Proof:* The key is the double counting identity

Because has maximal density and is large, we have

for each , and thus

The claim then follows from the pigeonhole principle.

Now suppose we want to obtain weak mixing not just for a single set , but for a small number of such sets, i.e. we wish to find an for which

for all , where is the density of in . The above proposition gives, for each , a choice of for which (3) holds, but it could be a different for each , and so it is not immediately obvious how to use Proposition 2 to find an for which (3) holds *simultaneously* for all . However, it turns out that the van der Waerden theorem is the perfect tool for this amplification:

Proposition 3 (Multiple upper bound)Let be a progression of progressions for some large . Suppose that for each , the set has density in (i.e. (1) holds). For each , let be a subset of of density . Then (if is large enough depending on ) one can find an such thatsimultaneously for all .

*Proof:* Suppose that the claim failed (for some suitably large ). Then, for each , there exists such that

This can be viewed as a colouring of the interval by colours. If we take large compared to , van der Waerden’s theorem allows us to then find a long subprogression of which is monochromatic, so that is constant on this progression. But then this will furnish a counterexample to Proposition 2.

One nice thing about this proposition is that the upper bounds can be automatically upgraded to an asymptotic:

Proposition 4 (Multiple mixing)Let be a progression of progressions for some large . Suppose that for each , the set has density in (i.e. (1) holds). For each , let be a subset of of density . Then (if is large enough depending on ) one can find an such thatsimultaneously for all .

*Proof:* By applying the previous proposition to the collection of sets and their complements (thus replacing with , one can find an for which

and

which gives the claim.

However, this improvement of Proposition 2 turns out to not be strong enough for applications. The reason is that the number of sets for which mixing is established is too small compared with the length of the progression one has to use in order to obtain that mixing. However, thanks to the magic of the Szemerédi regularity lemma, one can amplify the above proposition even further, to allow for a huge number of to be mixed (at the cost of excluding a small fraction of exceptions):

Proposition 5 (Really multiple mixing)Let be a progression of progressions for some large . Suppose that for each , the set has density in (i.e. (1) holds). For each in some (large) finite set , let be a subset of of density . Then (if is large enough, butnotdependent on the size of ) one can find an such thatsimultaneously for almost all .

*Proof:* We build a bipartite graph connecting the progression to the finite set by placing an edge between an element and an element whenever . The number can then be interpreted as the degree of in this graph, while the number is the number of neighbours of that land in .

We now apply the regularity lemma to this graph . Roughly speaking, what this lemma does is to partition and into almost equally sized cells and such that for most pairs of cells, the graph resembles a random bipartite graph of some density between these two cells. The key point is that the number of cells here is bounded uniformly in the size of and . As a consequence of this lemma, one can show that for most vertices in a typical cell , the number is approximately equal to

and the number is approximately equal to

The point here is that the different statistics are now controlled by a mere statistics (this is not unlike the use of principal component analysis in statistics, incidentally, but that is another story). Now, we invoke Proposition 4 to find an for which

simultaneously for all , and the claim follows.

This proposition now suggests a way forward to establish the type of mixing properties (2) needed for the preceding attempt at proving Szemerédi’s theorem to actually work. Whereas in that attempt, we were working with a single progression of progressions of progressions containing a near-maximal density of elements of , we will now have to work with a *family* of such progression of progressions, where ranges over some suitably large parameter set. Furthermore, in order to invoke Proposition 5, this family must be “well-arranged” in some arithmetic sense; in particular, for a given , it should be possible to find many reasonably large subfamilies of this family for which the terms of the progression of progressions in this subfamily are themselves in arithmetic progression. (Also, for technical reasons having to do with the fact that the sets in Proposition 5 are not allowed to depend on , one also needs the progressions for any given to be “similar” in the sense that they intersect in the same fashion (thus the sets as varies need to be translates of each other).) If one has this sort of family, then Proposition 5 allows us to “spend” some of the degrees of freedom of the parameter set in order to gain good mixing properties for at least one of the sets in the progression of progressions.

Of course, we still have to figure out how to get such large families of well-arranged progressions of progressions. Szemerédi’s solution was to begin by working with generalised progressions of a much larger rank than the rank progressions considered here; roughly speaking, to prove Szemerédi’s theorem for length progressions, one has to consider generalised progressions of rank as high as . It is possible by a reasonably straightforward (though somewhat delicate) “density increment argument” to locate a huge generalised progression of this rank which is “saturated” by in a certain rather technical sense (related to the concept of “near maximal density” used previously). Then, by another reasonably elementary argument, it is possible to locate inside a suitable large generalised progression of some rank , a family of large generalised progressions of rank which inherit many of the good properties of the original generalised progression, and which have the arithmetic structure needed for Proposition 5 to be applicable, at least for one value of . (But getting this sort of property for *all* values of simultaneously is tricky, and requires many careful iterations of the above scheme; there is also the problem that by obtaining good behaviour for one index , one may lose good behaviour at previous indices, leading to a sort of “Tower of Hanoi” situation which may help explain the exponential factor in the rank that is ultimately needed. It is an extremely delicate argument; all the parameters and definitions have to be set very precisely in order for the argument to work at all, and it is really quite remarkable that Endre was able to see it through to the end.)

By **Jacob Aron**, New Scientist

Imagine I present you with a line of cards labelled *1* through to *n*, where *n*is some incredibly large number. I ask you to remove a certain number of cards – which ones you choose is up to you, inevitably leaving ugly random gaps in my carefully ordered sequence. It might seem as if all order must now be lost, but in fact no matter which cards you pick, I can always identify a surprisingly ordered pattern in the numbers that remain.

As a magic trick it might not equal sawing a woman in half, but mathematically proving that it is always possible to find a pattern in such a scenario is one of the feats that today garnered Endre Szemerédi mathematics’ prestigious Abel prize.

The Norwegian Academy of Science and Letters in Oslo awarded Szemerédi the one million dollar prize today for “fundamental contributions to discrete mathematics and theoretical computer science”. His specialty was combinatorics, a field that deals with the different ways of counting and rearranging discrete objects, whether they be numbers or playing cards.

The trick described above is a direct result of what is known as Szemerédi’s theorem, a piece of mathematics that answered a question first posed by the mathematicians Paul Erdos and Pál Turán in 1936 and that had remained unsolved for nearly 40 years.

**Irregular mind**

The theorem reveals how patterns can be found in large sets of consecutive numbers with many of their members missing. The patterns in question are arithmetic sequences – strings of numbers with a common difference such as 3, 7, 11, 15, 19.

Such problems are often fairly easy for mathematicians to pose, but fiendishly difficulty to solve. The book An Irregular Mind, published in honour of Szemerédi’s 70th birthday in 2010, stated that “his brain is wired differently than for most mathematicians”.

“He’s more likely than most to come up with an idea from left field,” agrees mathematician Timothy Gowers of the University of Cambridge, who gave a presentation in Oslo on Szemerédi’s work following the prize announcement.

Szemerédi actually came late to mathematics, initially studying at medical school for a year and then working in a factory before switching to become a mathematician. His talent was discovered by Erdos, who was famous for working with hundreds of mathematicians in his lifetime.

**Modest winner**

When Szemerédi proved his theorem in 1975 he also provided mathematicians with a tool known as the Szemerédi regularity lemma, which gives a deeper understanding of large graphs – mathematical objects often used to model networked structures such as the internet.

The lemma has also helped computer scientists better understand a technique in artificial intelligence known as “probably approximately correct learning”. Szemerédi also worked on another important computing problem related to sorting lists, demonstrating a theoretical limit for sorting using parallel processors, which are found in modern computers.

Speaking on the phone to Gowers after receiving his award, Szemerédi said he was “very happy” but suggested that there were other mathematicians more deserving than himself. Gowers told our sister site*New Scientist* that Szemerédi was “very modest”, adding that “he is a worthy winner and a lot of people think this sort of recognition is long overdue in his case”.

@ Electronicsweekly

It was announced yesterday that Endre Szemerédi is the winner of the 2012 Abel Prize. As we mentioned a few years ago, the Abel Prize is a a fairly new award in math. Unlike the Fields Medal (which famously is for people under 40), the Abel Prize is meant to recognize long, illustrious careers in mathematics. It has quickly become one of the most prestigious awards in math.

It was awarded for:

for his fundamental contributions to discrete mathematics and theoretical computer science, and in recognition of the profound and lasting impact of these contributions on additive number theory and ergodic theory.

– Abel Prize Citation

Fellow math blogger, Tim Gowers, was in charge of giving a talk for non-mathematicians (i.e. journalists and such) about Dr. Szemerédi’s research. A tough challenge which Dr. Gowers adroitly pulls off. You can read the text on his blog here.

Dr. Szemerédi’s area of research is combinatorics. This is an area (like number theory) which is famous for having many easy to state but extremely difficult to answer questions. We wanted to mention two topics in one of Dr. Szemerédi’s areas of research: extremal combinatorics.

Very roughly, extremal combinatorics is the study of how when structures get very large, order becomes unavoidable. What do we mean? Well, our first example is Ramsey Theory.

First, recall that a graph in math is a collection of vertices (or nodes) which are connected by edges*. For example, the graph with 5 vertices and with edges between every pair of edges is called the complete graph on 5 vertices. It looks like this:

Now imagine you color the edges of the graph with two colors (lets say crimson and cream :-) ). The question is: Is it possible to color the edges with two colors in a way to **avoid** ending up with a triangle which is all one color?**

It’s not too hard to sit down with the complete graph on 5 vertices and create a coloring with crimson and cream which has no triangles with all three edges of the same color. Surprisingly, if you color the complete graph on 6 vertices:

then a monochromatic triangle is **unavoidable**!

You should try coloring the graph yourself and see if you can avoid a monochromatic triangle. But here’s the proof: Let’s look at the vertex on the far left of the picture of the complete graph on 6 vertices drawn above. There are 5 edges which leave that vertex. Since there are only two colors, one of the colors must be used 3 or more times. Let’s say crimson was used 3+ times. Now let’s look at the edges between those 3+ vertices. If any one of them is crimson, then that makes a crimson triangle with our original vertex. If they are all cream, then those 3 vertices form the corners of a cream triangle! A monochromatic triangle is unavoidable!

Ramsey’s theorem is the following amazing generalization of this result: If you use k colors and are interested in looking for a monochromatic complete graph on m vertices, then if you pick n large enough, then the complete graph on n vertices will **always** have the monochromatic graph you’re looking for.

In our example above, we used k=2 colors and are looking for a complete graph on m=3 vertices (aka a triangle). What we proved is that if n=6, then you always have the monochromatic triangle we’re looking for (notice that any n>6 will also work!). Ramsey’s theorem greatly generalizes this result to more colors and larger monochromatic subgraphs.

We also need to mention Szemerédi’s theorem. It is in the same spirit as Ramsey’s Theorem. We are now looking for arithmetic progressions in the integers. Remember, an arithmetic progression is a sequence of numbers where you go from one to the next by adding some fixed constant. So, for example, 2,7,12,17 is an arithmetic progression of 4 numbers with a step size of 5.

More generally, say you want to find an arithmetic progression of 4 numbers but you don’t care about the step size. Let’s pick a very small percentage, say 0.000000001%. Then Szemerédi’s theorem says there is a N so that whenever you pick 0.000000001% of the numbers 1,2,3,…,N, then you will always be able to find an arithmetic sequence of 4 numbers!

Once again, in a large enough mathematical object, patterns are unavoidable!

Here’s what Szemerédi’s theorem says: Say you are looking for an arithmetic progression of k numbers (with any step size). Pick a percentage, P%. Now, no matter how small your percentage is, there is a number N so that any subset of 1,2,3, …, N which has more than P% of the N possible elements, **must** contain an arithmetic progression of k numbers.

Of course Szemerédi’s theorem doesn’t promise that N will be small. In fact, you can imagine that it actually needs to be very, very, very, very big. If we write for the N for k and , then there is a (very big) constant C such that:

See wikipedia’s article for further details.

Besides being amazing in its own right, Szemerédi’s theorem launched a huge amount of new mathematics. Perhaps most famously, the Green-Tao Theorem builds on Szemerédi’s theorem. It proves that the set of prime numbers contains arbitrarily long arithmetic progressions.

* You can imagine that graph theory is super useful for studying networks. For example, the vertices could be the computers and routers at OU and the edges could be the cables connecting them.

** This is sometimes called the Party Problem. If the vertices are individuals and you color an edge between them crimson if they are friends and cream if they are not friends, then we’ve proven that if you invite 6 people to a party, there will be three people who are all friends or all strangers.

]]>In this first of two posts, we prove Szemerédi’s regularity lemma. The second post will give some applications of this lemma: the triangle removal lemma and Roth’s theorem. Some of the content has intersection with the Ergodic Ramsey Theory posts, whose interested reader may check here: ERT0, ERT1, ERT2, ERT3, ERT4, ERT5, ERT6, ERT7, ERT8, ERT9, ERT10, ERT11, ERT12, ERT13, ERT14, ERT15, ERT16.

**1. Additive combinatorics **

“*Additive combinatorics is the theory of*

* counting additive structures in sets.*“

T. Tao and V. Vu.

This theory has seen exciting developments and dramatic changes in direction in recent years, thanks to its connections with areas such as number theory, ergodic theory and graph theory. This section gives a brief historic introduction on the main results.

Van der Waerden’s theorem (see ERT6 for a topological dynamical proof), one of Kintchine’s *Three Pearls of Number Theory*, states that whenever the natural numbers are finitely partitioned (or, as it is customary to say, finitely colored), one of the cells of the partition contains arbitrarily long arithmetic progressions. In other words, the structure of the natural numbers can not be destroyed by partitions: arbitrarily large parts of persist inside some component of the partition. This result was first proved in and represents the first great result on additive combinatorics. Afterwards, in the mid-thirties, Erdös and Turán conjectured a density version of van der Waerden’s theorem. To present it, let us define what is the notion of density in the natural numbers.

Definition 1Given a set , theupper densityof is

If the limit exists, we say that has density, and denote it by . As pointed out by Erdös and Turán, having positive upper density is a notion of largeness and it is natural to ask if sets with this property have arbitrarily long arithmetic progressions. This quite recalcitrant question was only settled in by Szemerédi in the paper *On sets of integers containing no elements in arithmetic progressions* . Meanwhile, the first partial result was obtained by Roth in .

Theorem 2 (Roth)If has positive upper density, then it contains an arithmetic progression of length .

His proof relied on a Fourier-analytic argument of energy increment for functions: one decomposes a function as , where is good and is bad in a specific sense (this follows the same philosophy of Calderón-Zygmund’s theory on harmonic analysis). If the effect of is large, it is possible to break it into good and bad parts again and so on. In each step, the “energy” of increases a fixed amount. Being bounded, it must stop after a finite number of steps. At the end, controls the behavior of and for it the result is straightforward. See *The remarkable effectiveness of ergodic theory in number theory* for further details.

Sixteen years later, in the paper *On sets of integers containing no four elements in arithmetic progression*, Szemerédi extended Roth’s theorem to

Theorem 3 (Szemerédi)If has positive upper density, then it contains an arithmetic progression of length .

Finally, in , Szemerédi settled the conjecture in its full generality.

Theorem 4 (Szemerédi)If has positive upper density, then it contains arbitrarily long arithmetic progression.

His proof required a complicated combinatorial argument and relied on a graph-theoretical result, known as **Szemerédi’s regularity lemma**, which turned out to be an important result in graph theory. It asserts, roughly speaking, that any graph can be decomposed into a relatively small number of disjoint subgraphs, most of which behave pseudo-randomly. This is the main topic of this post.

It is worth to mention Erdös also conjectured that if satisfies

then it contains arbitrarily long arithmetic progressions. This question is wide open: nobody knows even if contains arithmetic progressions of length . On the other hand, a remarkable result of Green and Tao states the conjecture for the particular case of the prime numbers.

Theorem 5 (Green and Tao)The prime numbers contain arbitrarily long arithmetic progressions.

**2. Setting notation **

is a graph, where is a finite set of *vertices* and is the set of *edges*, each of them joining two distinct elements of . For disjoint , is the number of edges between and and

is the *density *of the pair .

Definition 6For and disjoint subsets , the pair is-regularif, for every and satisfying

we have

A partition of into pairwise disjoint sets in which is called the *exceptional set* is an *equipartition* if . We view the exceptional set as distinct parts, each consisting of a single vertex, and its role is purely technical: to make all other classes have exactly the same cardinality.

Definition 7An equipartition is-regularif

- ,
- all but at most of the pairs are -regular.

The classes are called *clusters* or *groups*. Given two partitions of , we say *refines* if every cluster of is equal to the union of some clusters of .

**3. Szemerédi’s regularity lemma **

Szemerédi’s regularity lemma says that every graph with many vertices can be partitioned into a small number of clusters with the same cardinality, most of the pairs being -regular, and a few leftover edges. In my point of view, this result allows the decomposition of every graph with a sufficiently large number of vertices into many components uniformly (every component has the same number of vertices) in such a way the relation of the clusters is at the same time

**uniform:** the densities do not vary too much, and

**randomic:** even controlling the density, nothing can be said about the distribution of the edges.

As a toy model, let and consider the complete random graph with vertices in which every edge belongs to with probability . If are disjoint subsets of , the expected value of is , and the same happens for subsets , . Szemerédi’s regularity lemma says that, approximately, this is indeed the universal behavior.

Theorem 8 (Szemerédi’s regularity lemma)For every and every integer , there exist integers and for which every graph with at least vertices has an -regular equipartition , where .

Note the importance of having an upper bound for the number of clusters. Otherwise, we could just take each of them to be a singleton.

The idea in the proof is similar to Roth’s approach. Start with an arbitrary partition of into disjoint classes of equal sizes. Proceed by showing that, as long as the partition is not -regular, it can be refined in a way to distribute the density deviation. This is done by introducing a bounded *energy function* that increases a fixed amount every time the refinement is made. After a finite number of steps, the resulting partition is -regular.

We now discuss what should be the energy function. The natural way of looking for it is to identify the obstruction for a pair to be -regular. This means there are subsets and such that , and

Consider the partitions and . The above inequality has the following probabilistic interpretation. Consider the random variable defined on the product by: let be a uniformly random element of and a uniformly random element of , let and be those members of the respective partitions for which and , and take

The expectation of is equal to

By assumption, deviates from at least whenever , and this event has probability

Then . Noting that the expectation of is

we conclude that

The fractions containing above represent the energy function we are looking for: given two disjoint subsets , define

For partitions , let

Definition 9Given a partition of with exceptional set , theindexof is

where the sum ranges over all unordered pairs of distinct parts of , with each vertex of forming a singleton part in its own.

Note that is a sum of terms of the form . The first good property it must have is boundedness.

**Property 1.** .

In fact, as ,

It is also monotone increasing with respect to refinements. This is the content of the next two properties.

**Property 2.** If are subsets of and are partitions of , respectively, then

This property follows easily from Cauchy-Schwarz inequality (the interested reader may check it in the survey *Szemerédi’s regularity lemma and its applications in graph theory*), but this analytical argument is not so clear. A soft way of proving it is to consider the probabilistic point of view, with the aid of the random variable . According to the above calculations,

and so, by Jensen’s inequality (which in this case is just Cauchy-Schwarz inequality),

**Property 3.** If refines , then

This is a direct consequence of Property 2 by breaking according to :

The next property grows the index of non -regular partitions and reflects the right choice of the energy function. In a few words, it says that

**“The lack of uniformity implies energy increment”**

and this idea permeates many results in recent developments in combinatorics, harmonic analysis, ergodic theory and others areas. Actually, all known proofs of Szemerédi’s theorem use this principle at some stage. To mention some of them:

- the original proof of Roth considers good and bad parts of functions.
- Furstenberg’s approach: every non-compact system has a weak mixing factor.
- the Fourier-analytic proof of Gowers identifies arithmetic progressions via the nowadays called
*Gowers norms*. - the construction of characteristic factors for multiple ergodic averages uses the
*Gowers-Host-Kra seminorms*.

These two last results are still being developed to generate what is being called *higher-order Fourier analysis*. See this post of Terence Tao for a discussion about this topic. Going back to what matters, let’s prove the

Proposition 10 (Lack of uniformity implies energy increment 1)Suppose and are disjoint nonempty subsets of and the pair is not -regular. Then there are partitions of and of such that

*Proof:* The reader must convince himself that this is exactly relation (3). For those still not convinced, let’s do it again. Assume and are such that , and

Consider and . The evaluation of the variation will prove the proposition. On one hand, by the calculations in Property 2,

On the other, deviates from at least whenever , and this event has probability

Then which, together with (1), gives that

Proposition 11 (Lack of uniformity implies energy increment 2)Suppose and let be a non -regular equipartition of , where is the exceptional set. Then there exists a refinement of with the following properties:

- is an equipartition of ,
- ,
- and
- .

*Proof:* The idea is to apply the previous proposition to every non-regular pair. As there are at least of them, the index will increase the fixed amount. Let be the cardinality of every , . Saying that is not -regular means that, for at least pairs , , is not -regular. For each of these, let , be the partitions of , respectively, given by Proposition 10 and consider the smallest partition that refines and all , . By Proposition 10,

as . This proves that (and any of its refinements) satisfies (iv). The problem is that is not necessarily an equipartition. We adjust this by defining , splitting every part of arbitrarily into disjoint sets of size and throwing the remaining vertices of each part, if any, to the exceptional set. This new partition satisfies (i), (ii) and (iii), as we’ll verify below.

(i) is an equipartition by definition.

(ii) To get , every cluster of is divided in at most parts. After, every element of is divided in at most non-exceptional parts. This implies that

(iii) Each cluster of contributes with at most vertices to and so

Finally, we are able to prove the regularity lemma.

*Proof:* First, note that if the result is true for and , , then the result is also true for the pair . This allows us to assume that and is arbitrarily large.

Begin with an arbitrary partition of such that and . Apply Proposition 11 at most times to obtain an equipartition . Let be the largest number obtained by iterating the map at most times, starting from . Then has at most clusters. In addition, the cardinality of its exceptional set is bounded by

which is smaller than if is large. This concludes the proof.

There is a large literature about Szemerédi’s regularity lemma. We refer the reader to four references: my lecture notes available at my homepage, the book *The probabilistic method* of Alon and Spencer, the survey of Komlós and M. Simonovits and Tao’s perspective via random partitions. Merry Christmas!!

Theorem 1 (Furstenberg multiple recurrence)Let be a measure-preserving system, thus is a probability space and is a measure-preserving bijection such that and are both measurable. Let be a measurable subset of of positive measure . Then for any , there exists such thatEquivalently, there exists and such that

As is well known, the Furstenberg multiple recurrence theorem is equivalent to Szemerédi’s theorem, thanks to the Furstenberg correspondence principle; see for instance these lecture notes of mine.

The multiple recurrence theorem is proven, roughly speaking, by an induction on the “complexity” of the system . Indeed, for very simple systems, such as periodic systems (in which is the identity for some , which is for instance the case for the circle shift , with a rational shift ), the theorem is trivial; at a slightly more advanced level, *almost periodic* (or *compact*) systems (in which is a precompact subset of for every , which is for instance the case for irrational circle shifts), is also quite easy. One then shows that the multiple recurrence property is preserved under various *extension* operations (specifically, compact extensions, weakly mixing extensions, and limits of chains of extensions), which then gives the multiple recurrence theorem as a consequence of the *Furstenberg-Zimmer structure theorem* for measure-preserving systems. See these lecture notes for further discussion.

From a high-level perspective, this is still one of the most conceptual proofs known of Szemerédi’s theorem. However, the individual components of the proof are still somewhat intricate. Perhaps the most difficult step is the demonstration that the multiple recurrence property is preserved under *compact extensions*; see for instance these lecture notes, which is devoted entirely to this step. This step requires quite a bit of measure-theoretic and/or functional analytic machinery, such as the theory of disintegrations, relatively almost periodic functions, or Hilbert modules.

However, I recently realised that there is a special case of the compact extension step – namely that of *finite* extensions – which avoids almost all of these technical issues while still capturing the essence of the argument (and in particular, the key idea of using van der Waerden’s theorem). As such, this may serve as a pedagogical device for motivating this step of the proof of the multiple recurrence theorem.

Let us first explain what a finite extension is. Given a measure-preserving system , a finite set , and a measurable map from to the permutation group of , one can form the *finite extension*

which as a probability space is the product of with the finite probability space (with the discrete -algebra and uniform probability measure), and with shift map

One easily verifies that this is indeed a measure-preserving system. We refer to as the *cocycle* of the system.

An example of finite extensions comes from group theory. Suppose we have a short exact sequence

of finite groups. Let be a group element of , and let be its projection in . Then the shift map on (with the discrete -algebra and uniform probability measure) can be viewed as a finite extension of the shift map on (again with the discrete -algebra and uniform probability measure), by arbitrarily selecting a section that inverts the projection map, identifying with by identifying with for , and using the cocycle

Thus, for instance, the unit shift on can be thought of as a finite extension of the unit shift on whenever is a multiple of .

Another example comes from Riemannian geometry. If is a Riemannian manifold that is a finite cover of another Riemannian manifold (with the metric on being the pullback of that on ), then (unit time) geodesic flow on the cosphere bundle of is a finite extension of the corresponding flow on .

Here, then, is the finite extension special case of the compact extension step in the proof of the multiple recurrence theorem:

Proposition 2 (Finite extensions)Let be a finite extension of a measure-preserving system . If obeys the conclusion of the Furstenberg multiple recurrence theorem, then so does .

Before we prove this proposition, let us first give the combinatorial analogue.

Lemma 3Let be a subset of the integers that contains arbitrarily long arithmetic progressions, and let be a colouring of by colours (or equivalently, a partition of into colour classes ). Then at least one of the contains arbitrarily long arithmetic progressions.

*Proof:* By the infinite pigeonhole principle, it suffices to show that for each , one of the colour classes contains an arithmetic progression of length .

Let be a large integer (depending on and ) to be chosen later. Then contains an arithmetic progression of length , which may be identified with . The colouring of then induces a colouring on into colour classes. Applying (the finitary form of) van der Waerden’s theorem, we conclude that if is sufficiently large depending on and , then one of these colouring classes contains an arithmetic progression of length ; undoing the identification, we conclude that one of the contains an arithmetic progression of length , as desired.

Of course, by specialising to the case , we see that the above Lemma is in fact equivalent to van der Waerden’s theorem.

Now we prove Proposition 2.

*Proof:* Fix . Let be a positive measure subset of . By Fubini’s theorem, we have

where and is the fibre of at . Since is positive, we conclude that the set

is a positive measure subset of . Note for each , we can find an element such that . While not strictly necessary for this argument, one can ensure if one wishes that the function is measurable by totally ordering , and then letting the minimal element of for which .

Let be a large integer (which will depend on and the cardinality of ) to be chosen later. Because obeys the multiple recurrence theorem, we can find a positive integer and such that

Now consider the sequence of points

for . From (1), we see that

for some sequence . This can be viewed as a colouring of by colours, where is the cardinality of . Applying van der Waerden’s theorem, we conclude (if is sufficiently large depending on and ) that there is an arithmetic progression in with such that

for some . If we then let , we see from (2) that

for all , and the claim follows.

Remark 1The precise connection between Lemma 3 and Proposition 2 arises from the following observation: with as in the proof of Proposition 2, and , the setcan be partitioned into the classes

where is the graph of . The multiple recurrence property for ensures that contains arbitrarily long arithmetic progressions, and so therefore one of the must also, which gives the multiple recurrence property for .

]]>

Remark 2Compact extensions can be viewed as a generalisation of finite extensions, in which the fibres are no longer finite sets, but are themselves measure spaces obeying an additional property, which roughly speaking asserts that for many functions on the extension, the shifts of behave in an almost periodic fashion on most fibres, so that the orbits become totally bounded on each fibre. This total boundedness allows one to obtain an analogue of the above colouring map to which van der Waerden’s theorem can be applied.

** — 1. Introduction: notions of hypercyclicity — **

First of all, I will review some basic notions from linear dynamics that will be quite central throughout the exposition. I refer the reader to the excellent book of Bayart and Matheron (Bayart and Matheron, 2009) where most of this material is drawn from anyways. We will state several classical results here omitting the proof. If no other reference is given, this means the proof can be found in (Bayart and Matheron, 2009).

** — 1.1. Hypercyclic operators — **

We will work on *a separable Banach space* over or . We will always use the symbol to denote a *bounded linear operator* acting on . In what follows I will just write , , without any further comment, assuming always that these symbols have the meaning described above.

The most central notion in linear dynamics is that of hypercyclicity.

Definition 1Theorbitof a vector under (or the -orbit) is the setThe operator T is said to be

hypercyclicif there is some vector such that the set is dense in . Such a vector will be called ahypercyclic vector for(or a -hypercyclic vector).

Some remarks are in order. First of all let us point out that these definitions only make sense if the space is *separable*. On the other hand, hypercyclicity is an infinite dimensional phenomenon; there are no hypercyclic operators on a finite-dimensional space To see this quickly think of a square matrix in its Jordan normal form.

An easy consequence of these definitions is that whenever an operator is hypercyclic, we must have . Moreover, whenever is an invertible operator, is hypercyclic if and only if is hypercyclic. These facts will be used in the discussion below .

The definition of hypercyclicity does not require any linear structure. It makes sense for an arbitrary *continuous* map acting on a topological space .

The most general setup *linear dynamics* is that of an arbitrary separable topological vector space . We will stick however to the case of a Banach space to simplify the exposition, the generalizations being mostly of a technical nature.

The notion of hypercyclicity is strictly stronger (though relevant) than that of *cyclicity*. Recall from classical operator theory that an operator is called *cyclic* if there exists a vector (a *cyclic vector for *) such that the linear span of

is dense in . This notion is related to the *invariant subspace problem*; the operator lacks (non-trivial) invariant closed subspaces if and only if every non-zero vector is cyclic for .

Likewise, the notion of hypercyclicity is closely related to the *invariant subset problem*. It is an easy observation that an operator lacks non-trivial invariant subsets if and only if every non-zero vector is hypercyclic for . P. Enflo first answered the question in the negative for a constructing a rather peculiar Banach space. After that, C.J. Read has proved that there is an operator on for which every non-zero vector is hypercyclic. So the invariant subspace problem has a negative solution on . However the problem remains open in the case of Hilbert spaces.

** — 1.2. Universal sequences of operators — **

We will be interested in the following generalization of hypercyclicity to *families* of continuous linear operators , where each and are two topological spaces.

Definition 2The family is calleduniversalif there exists a such that the set is dense in .

Of course hypercyclicity is a special case of universality, where the family of operators is defined as the *iterates* of a fixed operator and is a topological vector space.

** — 1.3. Cesàro Hypercyclicity — **

In (León-Saavedra, 2002), F. León-Saavedra introduced the notion of *Cesàro hypercyclicity*.

Definition 3An operator is calledCesàro hypercyclicif itsCesàro orbit, that is the setis dense in . Such a vector will be called

Cesàro hypercyclicfor .

Saavedra showed in (León-Saavedra, 2002) that is Cesàro hypercyclic if and only if there is a vector such that the set

is dense in . Observe that this means that the family of operators is universal. We stress here that, in general, the notions of hypercyclicity and Cesàro hypercyclicity are not `ordered'; hypercyclicity does not imply Cesàro hypercyclicity and vice versa.

** — 1.4. How to prove that an operator is hypercyclic — **

This first characterization of hypercyclicity comes from topological dynamics and is often referred to as `Birkhoff’s transitivity theorem’.

Theorem 4 (Brkhoff’s transitivity theorem)Let be a continuous linear operator on a separable Banach space . Then is hypercyclic if and only if it istopologically transitive; that is, for every pair of open sets , there exists such that .

A byproduct of the proof of Theorem 4 is that the set of -hypercyclic vectors, , is a dense subset of .

Actually Birkhoff’s theorem is true in a much more general context but I won’t pursue that here. It is important however that no linearity is necessary in Theorem 4. As a result, when one adds linearity, the following handy criterion becomes available.

Definition 5 (Hypercyclicity criterion)Let be a separable Banach space and a bounded linear operator. We say that satisfies thehypercyclicity criterionif there exists an increasing sequence of positive integers , two dense sets and a sequence of maps such that:(i) for any ,

(ii) for any ,

(iii) for any .

Using Theorem 4 one can prove the following:

Theorem 6Let be a continuous linear operator on a separable Banach space . Suppose that satisfies the hypercyclicity criterion 5. Then is hypercyclic.

Definition 5 and Theorem 6 are originally due to Kitai (Kitai, 1982), in the case that and . The criterion was then evolved by R.Gethner and J. H. Shapiro in (Gethner and Shapiro, 1987) and J. Bès (Bès, 1998).

It was a long-standing question whether *every* hypercyclic operator satisfies the hypercyclicity criterion. This problem was recently resolved in the negative by M. De La Rosa and C.J. Read. It is not hard to show (and it was known) that the hypercyclicity criterion is equivalent to the operator being hypercyclic. In topological dynamics this property is referred to as being *weakly mixing*. This problem was recently resolved in the negative in (de la Rosa and Read, 2009) and later in (Bayart and Matheron, 2007) for all classical Banach spaces.

A consequence of the hypercyclicity criterion 5 and Theorem 6 is the following result, which highlights the connection between linear dynamics and spectral theory. Roughly speaking, the following *Godefroy-Shapiro criterion* states that an operator which has a `large supply’ of eigenvectors is hypercyclic. See (Godefroy and Shapiro, 1991).

Theorem 7 (Godefroy-Shapiro criterion)Let be a continuous linear operator on a separable Banach space . Suppose that and both span a dense subspace of . Then is hypercyclic.

** — 1.5. Examples of hypercyclic operators — **

We will now use the previous hypercyclicity criteria to show that some very natural operators are hypercyclic. We will also take the chance to define some classes of operators which I want to discuss later on, in relevance to our main theorem.

Example 1Let denote the space of all entire functions on endowed with the topology of uniform convergence on compact sets. Now is not a Banach space but it is a separable Frèchet space so all the notions and theorems discussed above go through. We consider thederivative operator. To see this, apply the hypercyclicity criterion with andNow the operator in the hypercyclicity criterion needs to be defined as a sort of (asymptotic) right inverse of the derivative operator so it is natural to define and . Then we have that as for every monomial so that takes care of

(i)in the hypercyclicity criterion. Condition(iii)is trivial to verify since on . Finally, in order to check the validity of condition(ii)in the hypercyclicity criterion we need to see that as for every positive integer . However, we readily see thatfrom which we easily conclude that uniformly on compact subsets of .

Example 2Let us now consider the Hilbert space . Thebackward shift operatoris defined by . Observe that this operator can never be hypercyclic since so the orbit of any vector under stays inside the unit ball. However, the operator is hypercyclic for every with . Again it is an easy exercise to check the validity of the hypercyclicity criterion with and , where is the space of all finitely supported sequences. Again where is the natural candidate, the right inverse of which in this case is theforward shiftoperator defined as .

Our last example one the one hand illustrates the Godefroy-Shapiro criterion and on the other hand gives an introduction to a class of operators I would like to consider later on in the discussion.

Example 3Here we consider a Hilbert space of analytic functions , where is the open unit disk of the complex plane. The space is pretty general but we require the following two conditions:

- , and
- for every , the point evaluation functionals are bounded.
The second condition assures that convergence in implies pointwise convergence on . By the boundedness of holomorphic functions on compact sets and the uniform boundedness principle the second condition amounts to requiring that convergence in implies uniform convergence on compact subsets of . The reader is thus encouraged to think of the Hardy space or the Bergman space in the place of , keeping in mind however that interesting phenomena occur outside these two particular cases.

A feature of that we will use is the existence of a

reproducing kernel. In particular, For each , the boundedness of the point evaluation functionals and the Riesz representation theorem provide a unique function , thereproducing kernelof at , such thatRecall that a function is called a

multiplierof if for every . Such a defines amultiplication operatorin terms of the formulaBy the boundedness of point evaluation functionals and the closed graph theorem it follows that is a bounded linear operator on . Moreover, every multiplier is a bounded holomorphic function, this is,

Observe that for every and every we have that

Remembering that there is at least one which is not identically we conclude that . Thus every multiplier is a bounded holomorphic function with . The opposite is not always true under our assumptions as can be seen by considering for example the Dirichlet space of holomorphic functions on , that is the space of all functions such that

Here denotes area measure. In the Dirichlet space not every bounded holomorphic function is a multiplier.

In general it is not difficult to see that a multiplication operator is

neverhypercyclic. The situation is quite different for theadjoints of multiplication operators. In order to make the statement of the following theorem more clear we require the extra assumption thateveryholomorphic function is a multiplier of such that . This extra assumption is automatically satisfied in the case of the Hardy space or the Bergman space but not in the Dirichlet space. The following theorem is from (Godefroy and Shapiro, 1991).

Theorem 8 (Godefroy, Shapiro)Assume that is a Hilbert space of holomorphic functions as above. Furthermore assume that every bounded holomorphic function is a multiplier of such that . Then the adjoint multiplication operator is hypercyclic if and only if is non-constant and .

Proof:We first prove that if then is hypercyclic. For we consider the reproducing kernel . Sincefor every , we conclude that for every . That is, for every , is an eigenvector of with corresponding eigenvalue . Now let and . Since is non-constant and we have that both are non-empty open sets (by the open-mapping theorem for analytic functions is an open set). By the Godefroy Shapiro criterion, in order to show that is hypercyclic it suffices to show that and both span a dense subset of . Indeed, assume that there exists a function which is orthogonal to all either for all or for all . In either case vanishes on a non-empty open set and thus is identically zero.

In order to prove the other direction first observe that whenever is hypercyclic, is non-constant. Moreover we have that is connected so it either lies entirely inside, or entirely outside the unit disk. In the first case we have that , thus cannot be hypercyclic. In the complementary case, the function is a bounded holomorphic function and . By the first case, is not hypercyclic, and since , neither is .

Example 4We finish this short list of examples by giving another typical class of hypercyclic operators, namely unilateral and bilateral weighted shifts. Let be the Hilbert space of square summable sequences . Consider the canonical basis of and let be a (bounded) sequence of positive numbers. The operator is aunilateral (backward) weighted shiftwith weight sequence if for every and .Let be the Hilbert space of square summable sequences endowed with the usual norm. That is, if . Let be a (bounded) sequence of positive numbers. The operator is a

bilateral (backward) weighted shiftwith weight sequence if for every . Here is the canonical basis of .

Theorem 9Let be defined as above, with weight sequences respectively.(i) is hypercyclic if and only if

(ii) is hypercyclic if and only if, for any

and

** — 2. Recurrence, multiple recurrence and hypercyclicity — **

Let us consider a bounded linear operator on a separable Banach space . We have already seen that saying that an operator is *hypercyclic* is equivalent to saying that an operator is topologically transitive, that is that for every pair of open sets , there is some positive integer such that . In what follows I will introduce some notions that come from topological dynamical systems.

** — 2.1. Recurrence and Multiple recurrence — **

A somewhat weaker notion in topological dynamics is that of *recurrence*.

Definition 10The operator is calledrecurrentif for every open set there is a such that .

Clearly every hypercyclic operator is recurrent. Unlike hypercyclicity which is a purely infinite dimensional phenomenon, there are recurrent operators in finite dimensions (consider for example a rotation on the plane).

A recurrent operator has many points whose orbit under asymptotically `returns’ to the point. To make this more precise, let us call a vector *recurrent vector for * if there exists an increasing sequence of positive integers such that as . It turns out that a recurrent operator has a dense set of recurrent vectors.

Proposition 11An operator is recurrent if and only if the set of recurrent vectors for is dense in . In this case the set of recurrent vectors for is a subset of .

*Proof:* Let us first prove the easy implication. That is we assume that has a dense set of recurrent points and let be an open set in . Since the recurrent points of are dense, there is a which is recurrent for . Take such that . Since is recurrent, there is a such that . Thus . That is we have that . Let us now assume that is recurrent. We fix an open ball for some and . We need to show that there is a recurrent vector in . Since is recurrent there exists a positive integer such that , for some .That is we have that and . Since is continuous, there exists such that and . Now since is recurrent, there is a such that for some . By continuity again there is an such that and . Continuing inductively we construct a sequence , a strictly increasing sequence of positive integers and a sequence of positive real numbers , such that

Since is complete we conclude by Cantor’s theorem that

for some . We also have that , for all . Thus we have that for every , which means that in . That is, is a recurrent point in the original ball .

Finally, let us write for the set of -recurrent vectors. Observe that

which shows that the set of -recurrent vectors is a -set.

After (simple) recurrence, let’s now consider multiple recurrence. An operator is called *topologically multiply recurrent* if for every non-empty open set and every there is a such that

Of course a hypercyclic operator is always recurrent. However, there is no reason why a hypercyclic operator should be topologically multiply recurrent in general. This is illustrated in the following proposition.

Proposition 12 (Costakis and Parissis, 2010)There exists a hypercyclic bilateral weighted shift on which is not topologically multiply recurrent.

** — 2.2. Frequent hypercyclicity and Szemerédi’s theorem — **

Recently, Bayart and Grivaux introduced in (Bayart and Grivaux, 2005) and (Bayart and Grivaux, 2006) a notion that examines how frequently the orbit of a hypercyclic operator visits a non-empty open set.

Definition 13An operator is calledfrequently hypercyclicif there exists a vector such that, for every non-empty open set , the sethas positive lower density.

This is the strongest form of this definition, using the `weakest’ density. There are variations where the lower density is replaced for example by the upper density. Recall that the lower density of a set is defined as

while the upper density of is

In (Bayart and Grivaux, 2006) a `frequent hypercyclicity criterion’ was established. We won’t describe this here but point out one of its applications. Going back to adjoints of multiplication operators, an application of the Bayart-Grivaux frequent hypercyclicity criterion yields the following result:

Example 5Recall that is a non-trivial Hilbert space of holomorphic functions with bounded point evaluation functionals. We consider multiplier operators with symbol . We have the following result which is a corollary of the Bayart-Grivaux criterion

Proposition 14 (Bayart, Grivaux)Assume that is a Hilbert space of holomorphic functions as above. Furthermore assume that every bounded holomorphic function is a multiplier of such that . The following are equivalent:(i) The adjoint multiplication operator is hypercyclic.

(ii) The adjoint multiplication operator is frequently hypercyclic.

(iii) The function is non-constant and .

The notion of frequent hypercyclicity seems to be the right one in relevance to topological multiple recurrence. In order to illustrate this connection we need Szemerédi’s theorem on arithmetic progressions.

Theorem 15 (Szemerédi)Let be a subset of with positive upper density. Then contains arbitrarily long arithmetic progressions.

The following proposition is just an easy application of Szemerédi’s theorem:

Proposition 16Let be a frequently hypercyclic operator. Then is topologically multiple recurrent.

*Proof:* Let be an open set and let . Since is frequently hypercyclic, there exists a such that the set

has positive lower density. By Szemerédi’s theorem, contains an arithmetic progression of length , that is we have that

This means that

that is, is topologically multiply recurrent.

** — 2.3. Frequently Cesàro hypercyclic operators — **

As we have seen earlier, an operator is Cesàro hypercyclic if and only if there exists a such that the set

is dense in . In accordance to frequently hypercyclicity, Costakis and Ruzsa introduced in (Costakis and Ruzsa, 2010) the notion of a *frequently Cesàro hypercyclic* operator in the obvious way.

Definition 17An operator is calledfrequently Cesàro hypercyclicif there is a vector such that, for every open set , the sethas positive lower density.

In contrast with Cesàro hypercyclic operators, frequently Cesàro hypercyclic operators are always hypercyclic:

Theorem 18 (Costakis and Ruzsa, 2010)Let be a frequently Cesàro hypercyclic operator. Then is hypercyclic.

As in the case of frequently hypercyclic operators, frequently Cesàro hypercyclic operators are always topologically multiply recurrent. However, this is not so obvious any more.

Theorem 19 (Costakis and Parissis, 2010)Let be a frequently Cesàro hypercyclic operator. Then is topologically multiply recurrent.

The hypothesis of the previous theorem is optimal in the sense that a Cesàro hypercyclic is not in general topologically multiply recurrent.

Proposition 20 (Costakis and Parissis, 2010)There exists a Cesàro hypercyclic bilateral weighted shift on which is not recurrent, and hence not topologically multiply recurrent.

Before giving the actual proof of Theorem 19, let us try to repeat the simple argument used in the proof of Proposition 16. We begin by fixing a positive integer and an open set . We will assume that is a ball, say . We need to show that there exists some vector with

or, in other words, that there is a such that

By the hypothesis and Szemerédi’s theorem there is a vector and an arithmetic progression of length

such that

In this case it is not obvious which is the natural candidate for the vector but let’s take . We then have for

where we know that all the ‘s are in . We can then naively estimate

There are two problems here. The first is that we cannot control the factor . The second is that even if we could, say we had , this estimate would give us that which is one too large. The second problem is easy to deal with. We just start with a smaller ball inside our original set and carry out this reasoning for the smaller ball. In the proof given below we will consider two cases. In the first we will just assume that is small. In the complementary case, we will appropriately use the information that is large!

*Proof of Theorem 19:* Let be any non-empty open set in . We fix a non-zero vector and take a positive number such that . Without loss of generality we may assume that . Consider the ball with

Observe that . Since is a frequently Ces\`{a}ro hypercyclic operator there exists such that the set

has positive lower density. By Szemerédi’s theorem the set contains an arithmetic progression of length , i.e. there exist positive integers such that

Therefore the vectors

belong to .

As promised, we will consider two cases depending on the values of the ratio of the step over the first term of the arithmetic progression provided by Szemerédi’s theorem:

**Case 1. .**

We define the vector as

Then we have

for every . Since

we conclude that

and therefore

as we wanted to show.

**Case 2. .**

Here we first need to specify a number such that

for every . Indeed, solving the above equation for we get

We now define the vector as

Then we have

that is . On the other hand,

for every . The last equality and the above estimates imply

for every . Let . Since

we conclude that

Therefore

This completes the proof of the theorem.

** — 3. Back to adjoints of multiplication operators. — **

We can now give a full characterization of frequent hypercyclicity and multiple recurrence in the case of adjoints of multiplication operators on a non-trivial Hilbert space of holomorphic functions. It turns out that the weaker property of being recurrent is equivalent to frequent hypercyclicity and thus to every other property we have discussed here.

Proposition 21 (Costakis and Parissis, 2010)Assume that is a Hilbert space of holomorphic functions as above. Furthermore assume that every bounded holomorphic function is a multiplier of such that . The following are equivalent:(i) is recurrent.

(ii) The adjoint multiplication operator is hypercyclic.

(iii) The adjoint multiplication operator is frequently hypercyclic.

(iv) The adjoint multiplication operator is topologically multiply recurrent.

(v) The function is non-constant and .

*Proof:* We have already seen in Theorem 8 and Proposition 14 that conditions *(ii), (iii)* and *(v)* are equivalent. Also, by Proposition 16, *(iii)* implies *(iv)* and obviously *(iv)* implies *(i)*. So the proof will be complete if we show for example that *(i)* implies *(v)*.

Indeed, assume that is recurrent. Suppose, for the sake of contradiction, that . Since is connected, so is ; thus, we either have that or .

**Case 1. .**

Then we have . We will consider two complementary cases. Assume that there exist and a recurrent vector for such that

The above inequality and the fact that imply that for every positive integer

On the other hand for some strictly increasing sequence of positive integers we have . Using the last inequality we arrive at , a contradiction. In the complementary case we must have for every vector which is recurrent for . Since the set of recurrent vectors for is dense in we get that for every . Hence for every . Take now and consider the reproducing kernel of . We have already seen in the proof of Theorem 8 that where is the reproducing kernel at . We conclude that

However, this is clearly impossible since is an isometry.

**Case 2. .**

Here is a bounded holomorphic function satisfying ; therefore, is invertible. It is easy to see that if an operator is invertible, then is recurrent if and only if is recurrent. Thus the operator is recurrent and the proof follows by Case 1.

Remark 22It is easy to see that under the hypotheses of Proposition 21, is never recurrent. On the other hand, suppose that is a constant function with for some and every . Then we have that (or equivalently ) is recurrent if and only if is topologically multiply recurrent if and only if . In order to prove this it is enough to notice that for every non-zero complex number , with , and every positive integer , there exists an increasing sequence of positive integers such that

** — 4. Some open questions — **

I will close this post by suggesting a couple of open problems. For more information you can check the actual paper.

** — 4.1. Multipliers on the Dirichlet space. — **

First of all, let me come back to the adjoints of multiplication operators. Recall that the Dirichlet space is defined as the space of holomorphic functions such that

The reader might have noticed that throughout the discussion here, I have assumed that the multipliers of the Hilbert space are exactly the bounded holomorphic functions and that . Although this is actually the case on the Hardy space or the Bergman space , things are quite different on the Dirichlet space defined before. On the Dirichlet space, not all bounded holomorphic functions are multipliers. In fact the characterization of multipliers on the Dirichlet space is a bit more technical and is due to Stegenga (Stegenga 1980):

Theorem 23 (Stegenga)The function is a multiplier for the Dirichlet space if and only if and the measure is a Carleson measure for the Dirichlet space .

Of course this theorem doesn’t tell us much if we can’t understand which are the Carleson measures for the Dirichlet space. Here I will just give the definition as the characterization of these measures is completely beyond the scope of this post.

Definition 24A positive Borel measure on is a Carleson measure for the Dirichlet space if for some positive constantfor every .

Due to the more involved characterization of the multipliers on the Dirichlet space, characterizing when adjoints of multiplication operators on are hypercyclic is an open question. It is however known that the condition is no longer necessary, though it is sufficient. An example is provided by the function on . On the other hand it is known that is necessary. For this, see for example the PhD thesis of Irina Seceleanu.

** — 4.2. Frequently universal sequences of operators. — **

Remember that a family of operators on is called *universal* if there exists a such that the set

is dense in . The following definition is the natural extension of frequent hypercyclicity to universal families

Definition 25The family of operators is calledfrequently universalif there exists a such that for every open set the sethas positive lower density.

Thus saying that an operator is frequently Cesàro hypercyclic amounts to saying that the family is frequently universal. Theorem 19 says that if the family is frequently universal then is topologically multiply recurrent. However, there is nothing too special about the sequence . One can consider the family of operators where is an appropriate sequence of complex numbers.

Under what condition on the sequence of complex numbers one may conclude that is topologically multiply recurrent from the hypothesis that the family is frequently universal?

** — 5. Bibliography — **

Bayart, Frédéric and Sophie Grivaux. 2005. *Hypercyclicity and unimodular point spectrum*, J. Funct. Anal. 226, no. 2, 281–300. MR2159459 (2006i:47014).

Bayart, Frédéric and Sophie Grivaux. 2006. *Frequently hypercyclic operators*, Trans. Amer. Math. Soc. 358, no. 11, 5083–5117 (electronic). MR2231886 (2007e:47013) .

Bayart, Frédéric and Étienne Matheron. 2009. *Dynamics of linear operators, Cambridge Tracts in Mathematics*, vol. 179, Cambridge University Press, Cambridge. MR2533318.

Bayart, Frédéric and Étienne Matheron. 2007. Hypercyclic operators failing the hypercyclicity criterion on classical Banach spaces, J. Funct. Anal. 250, no. 2, 426–441. MR2352487 (2008k:47016).

Bès, Juan, P. 1998. *Three problems on hypercyclic operators.*, PhD. Thesis.

Costakis, George and Ioannis Parissis. 2010. Szemeredi’s theorem, frequent hypercyclicity and multiple recurrence, available at http://arxiv.org/abs/1008.4017.

Costakis, George and Imre Z. Ruzsa. 2010. *Frequently Cesàro hypercylic operators are hypercyclic*, preprint.

De la Rosa, Manuel and Charles Read. 2009. A hypercyclic operator whose direct sum TT is not hypercyclic, J. Operator Theory 61, no. 2, 369–380. MR2501011 (2010e:47023).

Gethner, Robert M. and Joel H. Shapiro. 1987. *Universal vectors for operators on spaces of holo- morphic functions*, Proc. Amer. Math. Soc. 100, no. 2, 281–288. MR884467 (88g:47060).

Godefroy, Gilles and Joel H. Shapiro. 1991. *Operators with dense, invariant, cyclic vector manifolds*, J. Funct. Anal. 98, no. 2, 229–269. MR1111569 (92d:47029).

Kitai, Carol. 1982. *Invariant closed sets for linear operators*, ProQuest LLC, Ann Arbor, MI. Thesis (Ph.D.)–University of Toronto (Canada). MR2632793.

León-Saavedra, Fernando. 2002. *Operators with hypercyclic Cesàro means*, Studia Math. 152, no. 3, 201–215. MR1916224 (2003f:47012).

Stegenga, David A. 1980. *Multipliers of the Dirichlet space*, Illinois J. Math. 24, no. 1, 113–139. MR550655 (81a:30027).

Such norms can be defined on any finite additive group (and also on some other types of domains, though we will not discuss this point here). In particular, they can be defined on the finite-dimensional vector spaces over a finite field .

In this case, the Gowers norms are closely tied to the space of polynomials of degree at most . Indeed, as noted in Exercise 20 of Notes 4, a function of norm has norm equal to if and only if for some ; thus polynomials solve the “ inverse problem” for the trivial inequality . They are also a crucial component of the solution to the “ inverse problem” and “ inverse problem”. For the former, we will soon show:

Proposition 1 ( inverse theorem for )Let be such that and for some . Then there exists such that , where is a constant depending only on .

Thus, for the Gowers norm to be almost completely saturated, one must be very close to a polynomial. The converse assertion is easily established:

Exercise 1 (Converse to inverse theorem for )If and for some , then , where is a constant depending only on .

In the world, one no longer expects to be close to a polynomial. Instead, one expects to *correlate* with a polynomial. Indeed, one has

Lemma 2 (Converse to the inverse theorem for )If and are such that , where , then .

*Proof:* From the definition of the norm (equation (18) from Notes 3), the monotonicity of the Gowers norms (Exercise 19 of Notes 3), and the polynomial phase modulation invariance of the Gowers norms (Exercise 21 of Notes 3), one has

and the claim follows.

In the high characteristic case at least, this can be reversed:

Theorem 3 ( inverse theorem for )Suppose that . If is such that and , then there exists such that .

This result is sometimes referred to as the *inverse conjecture for the Gowers norm* (in high, but bounded, characteristic). For small , the claim is easy:

Exercise 2Verify the cases of this theorem. (Hint:to verify the case, use the Fourier-analytic identities and , where is the space of all homomorphisms from to , and are the Fourier coefficients of .)

This conjecture for larger values of are more difficult to establish. The case of the theorem was established by Ben Green and myself in the high characteristic case ; the low characteristic case was independently and simultaneously established by Samorodnitsky. The cases in the high characteristic case was established in two stages, firstly using a modification of the Furstenberg correspondence principle, due to Ziegler and myself. to convert the problem to an ergodic theory counterpart, and then using a modification of the methods of Host-Kra and Ziegler to solve that counterpart, as done in this paper of Bergelson, Ziegler, and myself.

The situation with the low characteristic case in general is still unclear. In the high characteristic case, we saw from Notes 4 that one could replace the space of non-classical polynomials in the above conjecture with the essentially equivalent space of classical polynomials . However, as we shall see below, this turns out not to be the case in certain low characteristic cases (a fact first observed by Lovett, Meshulam, and Samorodnitsky, and independently by Ben Green and myself), for instance if and ; this is ultimately due to the existence in those cases of non-classical polynomials which exhibit no significant correlation with classical polynomials of equal or lesser degree. This distinction between classical and non-classical polynomials appears to be a rather non-trivial obstruction to understanding the low characteristic setting; it may be necessary to obtain a more complete theory of non-classical polynomials in order to fully settle this issue.

The inverse conjecture has a number of consequences. For instance, it can be used to establish the analogue of Szemerédi’s theorem in this setting:

Theorem 4 (Szemerédi’s theorem for finite fields)Let be a finite field, let , and let be such that . If is sufficiently large depending on , then contains an (affine) line for some with .

Exercise 3Use Theorem 4 to establish the following generalisation: with the notation as above, if and is sufficiently large depending on , then contains an affine -dimensional subspace.

We will prove this theorem in two different ways, one using a density increment method, and the other using an energy increment method. We discuss some other applications below the fold.

** — 1. The inverse theorem — **

We now prove Proposition 1. Results of this type for general appear in this paper of Alon, Kaufman, Krivelevich, Litsyn, and Ron (see also this paper of Sudan, Trevisan, and Vadhan for a precursor result), the case was treated previously by by Blum, Luby, and Rubinfeld. The argument here is due to Tamar Ziegler and myself. The argument has a certain “cohomological” flavour (comparing cocycles with coboundaries, determining when a closed form is exact, etc.). Indeed, the inverse theory can be viewed as a sort of “additive combinatorics cohomology”.

Let be as in the theorem. We let all implied constants depend on . We use the symbol to denote various positive constants depending only on . We may assume is sufficiently small depending on , as the claim is trivial otherwise.

The case is easy, so we assume inductively that and that the claim has been already proven for .

The first thing to do is to make unit magnitude. One easily verifies the crude bound

and thus

Since pointwise, we conclude that

As such, differs from a function of unit magnitude by in norm. By replacing with and using the triangle inequality for the Gowers norm (changing and worsening the constant in Proposition 1 if necessary), we may assume without loss of generality that throughout, thus for some .

Since

we see from Markov’s inequality that

for all in a subset of of density . Applying the inductive hypothesis, we see that for each such , we can find a polynomial such that

Now let . Using the cocycle identity

where is the shift operator , we see using Hölder’s inequality that

On the other hand, is a polynomial of order . Also, since is so dense, every element of has at least one representation of the form for some (indeed, out of all possible representations , or can fall outside of for at most of these representations). We conclude that for every there exists a polynomial such that

The new polynomial supercedes the old one ; to reflect this, we abuse notation and write for . Applying the cocycle equation again, we see that

for all . Applying the rigidity of polynomials (Exercise 14 from Notes 4), we conclude that

for some constant . From (2) we in fact have for all .

The expression is known as a *-coboundary* (see this blog post for more discussion). To eliminate it, we use the finite characteristic to discretise the problem as follows. First, we use the cocycle identity

where is the characteristic of the field. Using (1), we conclude that

On the other hand, takes values in some coset of a finite subgroup of (depending only on ), by Lemma 1 of Notes 4. We conclude that this coset must be a shift of by . Since itself takes values in some coset of a finite subgroup, we conclude that there is a finite subgroup (depending only on ) such that each takes values in a shift of by .

Next, we note that we have the freedom to shift each by (adjusting accordingly) without significantly affecting any of the properties already established. Doing so, we can thus ensure that all the take values in itself, which forces to do so also. But since , we conclude that for all , thus is a perfect cocycle:

We may thus integrate and write , where . Thus is a polynomial of degree for each , thus itself is a polynomial of degree . From (1) one has

for all ; averaging in we conclude that

and thus

and Proposition 1 follows.

One consequence of Proposition 1 is that the property of being a classical polynomial of a fixed degree is *locally testable*, which is a notion of interest in theoretical computer science. More precisely, suppose one is given a large finite vector space and two functions . One is told that one of the functions is a classical polynomial of degree at most , while the other is quite far from being such a classical polynomial, in the sense that every polynomial of degree at most will differ with that polynomial on at least of the values in . The task is then to decide with a high degree of confidence which of the functions is a polynomial and which one is not, without inspecting too many of the values of or .

This can be done as follows. Pick at random, and test whether the identities

and

hold; note that one only has to inspect at values in for this. If one of these identities fails, then that function must not be polynomial, and so one has successfully decided which of the functions is polynomials. We claim that the probability that the identity fails for the non-polynomial function is at least for some , and so if one iterates this test times, one will be able to successfully solve the problem with probability arbitrarily close to . To verify the claim, suppose for contradiction that the identity only failed at most of the time for the non-polynomial (say it is ); then , and thus by Proposition 1, is very close in norm to a polynomial; rounding that polynomial to a root of unity we thus see that agrees with high accuracy to a classical polynomial, which leads to a contradiction if is chosen suitably.

** — 2. A partial counterexample in low characteristic — **

We now show a distinction between classical polynomials and non-classical polynomials that causes the inverse conjecture to fail in low characteristic if one insists on using classical polynomials. For simplicity we restrict attention to the characteristic two case . We will use an argument of Alon and Beigel, reproduced in this paper of Green and myself. A different argument (with stronger bounds) appears in this paper of Lovett, Meshulam, and Samorodnitsky.

We work in a standard vector space , with standard basis and coordinates . Among all the classical polynomials on this space are the *symmetric polynomials*

which play a special role.

Exercise 4Let be the digit summation function . Show thatEstablish Lucas’ theorem

where , is the binary expansion of . Show that is the binary coefficient of , and conclude that is a function of . (

Note:These results are closely related to the well-known fact that Pascal’s triangle modulo takes the form of an infinite Sierpinski gasket.)

We define an *an affine coordinate subspace* to be a translate of a subspace of generated by some subset of the standard basis vectors . To put it another way, an affine coordinate subspace is created by freezing some of the coordinates, but letting some other coordinates be arbitrary.

Of course, not all classical polynomials come from symmetric polynomials. However, thanks to an application of Ramsey’s theorem observed by Alon and Biegel, this is true on coordinate subspaces:

Lemma 5 (Ramsey’s theorem for polynomials)Let be a polynomial of degree at most . Then one can partition into affine coordinate subspaces of dimension at least , where as for fixed , such that on each such subspace , is equal to a linear combination of the symmetric polynomials .

*Proof:* We induct on . The claim is trivial for , so suppose that and the claim has already been proven for smaller . The degree term of can be written as

where is a -uniform hypergraph on , i.e. a collection of -element subsets of . Applying Ramsey’s theorem (for hypergraphs), one can find a subcollection of indices with such that either has no edges in , or else contains all the edges in . We then foliate into the affine subspaces formed by translating the coordinate subspace generated by . By construction, we see that on each such subspace, is equal to either or plus a polynomial of degree . The claim then follows by applying the induction hypothesis (and noting that the linear span of on an affine coordinate subspace is equivariant with respect to translation of that subspace).

Because of this, if one wants to concoct a function which is almost orthogonal to all polynomials of degree at most , it will suffice to build a function which is almost orthogonal to the symmetric polynomials on all affine coordinate subspaces of moderately large size. Pursuing this idea, we are led to

Proposition 6 (Counterexample to classical inverse conjecture)Let , and let be the function , where is as in Exercise 4. Then is a non-classical polynomial of degree at most , and so ; but one hasuniformly for all classical polynomials of degree less than , where is bounded in magnitude by a quantity that goes to zero as for each fixed .

*Proof:* We first prove the polynomiality of . Let be the obvious map from to , thus

By linearity, it will suffice to show that each function is a polynomial of degree at most . But one easily verifies that for any , is equal to zero when and equal to when . Iterating this observation times, we obtain the claim.

Now let be a classical polynomial of degree less than . By Lemma 5, we can partition into affine coordinate subspaces of dimension at least such that is a linear combination of on each such subspace. By the pigeonhole principle, we thus can find such a such that

On the other hand, from Exercise 4, the function on depends only on . Now, as , the function (which is essentially the distribution function of a simple random walk of length on ) becomes equidistributed; in particular, for any , the function will take the values and with asymptotically equal frequency on , whilst remains unchanged. As such we see that as , and thus as , and the claim follows.

Exercise 5With the same setup as the previous proposition, show that , but that for all classical polynomials of degree less than .

** — 3. The inverse theorem: sketches of a proof — **

The proof of Theorem 3 is rather difficult once ; even the case is not particularly easy. However, the arguments still have the same cohomological flavour encountered in the theory. We will not give full proofs of this theorem here, but indicate some of the main ideas.

We begin by discussing (quite non-rigorously) the significantly simpler (but still non-trivial) case, established by Ben Green and myself. Unsurprisingly, we will take advantage of the case of the theorem as an induction hypothesis.

Let for some field of characteristic greater than , and be a function with and . We would like to show that correlates with a quadratic phase function (due to the characteristic hypothesis, we may take to be classical), in the sense that .

We expand as . By the pigeonhole principle, we conclude that

for “many” , where by “many” we mean “a proportion of “. Applying the inverse theorem, we conclude that for many , that there exists a linear polynomial (which we may as well take to be classical) such that

This should be compared with the theory. There, we were able to force close to for most ; here, we only have the weaker statement that *correlates* with for *many* (not *most*) . Still, we will keep going. In the theory, we were able to assume had magnitude , which made the cocycle equation available; this then forced an approximate cocycle equation for most (indeed, we were able to use this trick to upgrade “most” to “all”).

This doesn’t quite work in the case. Firstly, need not have magnitude exactly equal to . This is not a terribly serious problem, but the more important difficulty is that correlation, unlike the property of being close, is not transitive or multiplicative: just because correlates with , and correlates with , one cannot then conclude that correlates with ; and even if one had this, and if correlated with , one could not conclude that correlated with .

Despite all these obstacles, it is still possible to extract something resembling a cocycle equation for the , by means of the Cauchy-Schwarz inequality. Indeed, we have the following remarkable observation of Gowers:

Lemma 7 (Gowers’ Cauchy-Schwarz argument)Let be a finite additive group, and let be a function, bounded by . Let be a subset with , and suppose that for each , suppose that we have a function bounded by , such thatuniformly in . Then there exist quadruples with such that

uniformly among the quadruples.

We shall refer to quadruples obeying the relation as *additive quadruples*.

*Proof:* We extend to be zero when lies outside of . Then we have

We expand the left-hand side as

setting , this becomes

From the pigeonhole principle, we conclude that for many values of , one has

Performing Cauchy-Schwarz once in and once in to eliminate the factors, and then re-averaging in , we conclude that

Setting to be the additive quadruple we obtain

Performing the averages we obtain

and the claim follows (note that for the quadruples obeying the stated lower bound, must lie in ).

Applying this lemma to our current situation, we find many additive quadruples for which

In particular, by the equidistribution theory in Notes 4, the polynomial is low rank.

The above discussion is valid in any value of , but is particularly simple when , as the are now linear, and so is now *constant*. Writing for some using the standard dot product on , and some (irrelevant) constant term , we conclude that

for many additive quadruples .

We now have to solve an additive combinatorics problem, namely to classify the functions from to which are “ affine linear” in the sense that the property (3) holds for many additive quadruples; equivalently, the graph in has high “additive energy”, defined as the number of additive quadruples that it contains. An obvious example of a function with this property is an affine-linear function , where is a linear transformation and . As it turns out, this is essentially the only example:

Proposition 8 (Balog-Szemerédi-Gowers-Freiman theorem for vector spaces)Let , and let be a map from to such that (3) holds for additive quadruples in . Then there exists an affine function such that for values of in .

This proposition is a consequence of standard results in additive combinatorics, in particular the Balog-Szemerédi-Gowers lemma and Freiman’s theorem for vector spaces; see Section 11.3 of my book with Van for further discussion. The proof is elementary but a little lengthy and would take us too far afield, so we simply assume this proposition for now and keep going. We conclude that

The most difficult term to deal with here is the quadratic term . To deal with this term, suppose temporarily that is symmetric, thus . Then (since we are in odd characteristic) we can *integrate* as

and thus

for many . Taking norms in , we conclude that the inner product between two copies of and two copies of . Applying the Cauchy-Schwarz-Gowers inequality, followed by the inverse theorem, we conclude that correlates with for some linear phase, and thus itself correlates with for some quadratic phase.

This argument also works (with minor modification) when is *virtually symmetric*, in the sense that there exist a bounded index subspace of such that the restriction of the form to is symmetric, by foliating into cosets of that subspace; we omit the details. On the other hand, if is not virtually symmetric, there is no obvious way to “integrate” the phase to eliminate it as above. (Indeed, in order for to be “exact” in the sense that it is the “derivative” of something (modulo lower order terms), e.g. for some , it must first be “closed” in the sense that in some sense, since we have ; thus we again see the emergence of cohomological concepts in the background.)

To establish the required symmetry on , we return to Gowers’ Cauchy-Schwarz argument from Lemma 7, and tweak it slightly. We start with (4) and rewrite it as

where . We square-average this in to obtain

Now we make the somewhat unusual substitution to obtain

Thus there exists such that

We collect all terms that depend only on (and ) or only on (and ) to obtain

for some bounded functions . Eliminating these functions by two applications of Cauchy-Schwarz, we obtain

or, on making the change of variables ,

Using equidistribution theory, this means that the quadratic form is low rank, which easily implies that is virtually symmetric.

Now we turn to the general case. In principle, the above argument should still work, say for . The main sticking point is that instead of dealing with a vector-valued function that is approximately linear in the sense that (3) holds for many additive quadruples, in the case one is now faced with a \xi_{h_1}

for many additive quadruples , where the matrix has bounded rank. With our current level of additive combinatorics technology, we are not able to deal properly with this bounded rank error (the main difficulty being that the set of low rank matrices has no good “doubling” properties). Because of this obstruction, no generalisation of the above arguments to higher has been found.

There is however another approach, based ultimately on the ergodic theory work of Host-Kra and of Ziegler, that can handle the general case, which was worked out in two papers, one by myself and Ziegler, and one by Bergelson, Ziegler, and myself. It turns out that it is convenient to phrase these arguments in the language of ergodic theory. However, in order not to have to introduce too much additional material, I will try to describe the arguments here in the case without explicitly using ergodic theory notation. To do this, though, I will have to sacrifice a lot of rigour and only work with some illustrative special cases rather than the general case, and also use somewhat vague terminology (e.g. “general position” or “low rank”).

To simplify things further, we will establish the inverse theorem only for a special type of function, namely a quartic phase , where is a classical polynomial of degree . (A good example to keep in mind is the symmetric polynomial phase , though one has to take some care with this example due to the low characteristic.) The claim to show then is that if , then correlates with a cubic phase. In the high characteristic case , this result can be handled by equidistribution theory. Indeed, since

that theory tells us that the quartic polynomial is low rank. On the other hand, in high characteristic one has the Taylor expansion

for some cubic function (as can be seen for instance by decomposing into monomials). From this we easily conclude that itself has low rank (i.e. it is a function of boundedly many cubic (or lower degree) polynomials), at which point it is easy to see from Fourier analysis that will correlate with the exponential of a polynomial of degree at most .

Now we present a different argument that relies slightly less on the quartic nature of ; it is a substantially more difficult argument, and we will skip some steps here to simplify the exposition, but the argument happens to extend to more general situations. As , we have for many , thus by the inverse theorem, correlates with a quadratic phase. Using equidistribution theory, we conclude that the cubic polynomial is low rank.

At present, the low rank property for is only true for many . But from the cocycle identity

we see that if and are both low rank, then so is ; thus the property of being low rank is in some sense preserved by addition. Using this and a bit of additive combinatorics, one can conclude that is low rank for all in a bounded index subspace of ; restricting to that subspace, we will now assume that is low rank for *all* . Thus we have

where is some bounded collection of quadratic polynomials for each , and is some function. To simplify the discussion, let us pretend that in fact consists of just a single quadratic , plus some linear polynomials , thus

There are two extreme cases to consider, depending on how depends on . Consider first a “core” case when is independent of . Thus

If is low rank, then we can absorb it into the factors, so suppose instead thaat is high rank, and thus equidistributed even after fixing the values of .

The function is cubic, and is a high rank quadratic. Because of this, the function must be at most linear in the variable; this can be established by another application of equidistribution theory, see Section 8 of this paper of Ben and myself; thus one can factorise

for some functions . In fact, as is cubic, must be linear, while is cubic.

By comparing the coefficients in the cocycle equation (5), we see that the function is itself a cocycle:

As a consequence, we have for some function . Since is linear, is quadratic; thus we have

With a high characteristic assumption , one can ensure is classical. We will assume that is high rank, as this is the most difficult case.

Suppose first that . In high characteristic, one can then integrate by expressing this as plus lower order terms, thus is an order function in the sense that it is a function of a bounded number of linear functions. In particular, has a large norm for all , which implies that has a large norm, and thus correlates with a quadratic phase. Since can be decomposed by Fourier analysis into a linear combination of quadratic phases, we conclude that correlates with a quadratic phase and one is thus done in this case.

Now consider the other extreme, in which and lie in general position. Then, if we differentiate (8) in , we obtain one has

and then anti-symmetrising in one has

If and are unrelated, then the linear forms will typically be in general position with respect to each other and with , and similarly will be in general position with respect to each other and with . From this, one can show that the above equation is not satisfiable generically, because the mixed terms cannot be cancelled by the simpler terms in the above expression.

An interpolation of the above two arguments can handle the case in which does not depend on . Now we consider the other extreme, in which varies in , so that and are in general position for generic , and similarly for and , or for and . (Note though that we cannot simultaneously assume that are in general position; indeed, might vary linearly in , and indeed we expect this to be the basic behaviour of here, as was observed in the preceding argument.)

To analyse this situation, we return to the cocycle equation (5), which currently reads

Because any two of can be assumed to be in general position, one can show using equidistribution theory that the above equation can only be satisfied when the are linear in the variable, thus

much as before. Furthermore, the coefficients must now be (essentially) constant in in order to obtain (9). Absorbing this constant into the definition of , we now have

We will once again pretend that is just a single linear form . Again we consider two extremes. If is independent of , then by passing to a bounded index subspace (the level set of ) we now see that is quadratic, hence is cubic, and we are done. Now suppose instead that varies in , so that are in general position for generic . We look at the cocycle equation again, which now tells us that obeys the *quasicocycle* condition

where is a quadratic polynomial. With any two of in general position, one can then conclude (using equidistribution theory) that are quadratic polynomials. Thus is quadratic, and is cubic as before. This completes the heuristic discussion of various extreme model cases; the general case is handled by a rather complicated combination of all of these special case methods, and is best performed in the framework of ergodic theory (in particular, the idea of extracting out the coefficient of a key polynomial, such as the coerfficient of , is best captured by the ergodic theory concept of *vertical differentiation*); see this paper of Bergelson, Ziegler, and myself.

** — 4. Consequences of the Gowers inverse conjecture — **

We now discuss briefly some of the consequences of the Gowers inverse conjecture, beginning with Szemerédi’s theorem in vector fields (Theorem 4). We will use the density increment method (an energy increment argument is also possible, but is more complicated; see this paper). Let be a set of density at least containing no lines. This implies that the -linear form

has size . On the other hand, as this pattern has complexity , one has from Notes 3 the bound

whenever are bounded in magnitude by . Splitting , we conclude that

and thus (for large enough)

Applying Theorem 3, we find that there exists a polynomial of degree at most such that

To proceed we need the following analogue of Proposition 5 of Notes 2:

Exercise 6 (Fragmenting a polynomial into subspaces)Let be a classical polynomial of degree . Show that one can partition into affine subspaces of dimension at least , where as for fixed , such that is constant on each . (Hint:Induct on , and use Exercise 6 of Notes 4 repeatedly to find a good initial partition into subspaces on which has degree at most .)

Exercise 7Use the previous exercise to complete the proof of Theorem 4. (Hint:mimic the density increment argument from Notes 2.)

By using the inverse theorem in place of the Fourier-analytic analogue in Lemma 7 of Notes 2, one obtains the following regularity lemma, analogous to Theorem 10 of Notes 2:

Theorem 9 (Strong arithmetic regularity lemma)Suppose that . Let , let , and let be an arbitrary function. Then we can decompose and find such that

- (Nonnegativity) take values in , and have mean zero;
- (Structure) is a function of classical polynomials of degree at most ;
- (Smallness) has an norm of at most ; and
- (Pseudorandomness) One has for all .

For a proof, see this paper of mine. The argument is similar to that appearing in Theorem 10 of Notes 2, but the discrete nature of polynomials in bounded characteristic allows one to avoid a number of technical issues regarding measurability.

This theorem can then be used for a variety of applications in additive combinatorics. For instance, it gives the following variant of a result of Bergelson, Host, and Kra:

Proposition 10 (Bergelson-Host-Kra type result)Let , let , and let with , and let . Then for values of , one has

Roughly speaking, the idea is to apply the regularity lemma to , discard the contribution of the and errors, and then control the structured component using the equidistribution theory from Notes 4. A proof of this result can be found in this paper of Ben Green; see also this paper of Ben and myself for an analogous result in . Curiously, the claim fails when is replaced by any larger number; this is essentially an observation of Ruzsa that appears in the appendix of the paper of Bergelson, Host, and Kra.

The above regularity lemma (or more precisely, a close relative of this lemma) was also used by Gowers and Wolf to determine the true complexity of a linear system:

]]>

Theorem 11 (Gowers-Wolf theorem)Let be a collection of linear forms with integer coefficients, with no two forms being linearly dependent. Let have sufficiently large characteristic, and suppose that are functions bounded in magnitude by such thatwhere was the form defined in Notes 3. Then for each there exists a classical polynomial of degree at most such that

where is the true complexity of the system defined in Notes 3. This is best possible.

The key point is that one can show (by an elementary argument relying primarily an induction on dimension argument and the Weyl recurrence theorem, i.e. that given any real and any integer , that the expression gets arbitrarily close to an integer) that given a (polynomial) nilsequence , one can subdivide any long arithmetic progression (such as ) into a number of medium-sized progressions, where the nilsequence is nearly constant on each progression. As a consequence of this and the inverse conjecture for the Gowers norm, if a set has no arithmetic progressions, then it must have an elevated density on a subprogression; iterating this observation as per the usual density-increment argument as introduced long ago by Roth, one obtains the claim. (This is very close to the scheme of Gowers’ proof.)

Technically, one might call this the shortest proof of Szemerédi’s theorem in the literature (and would be something like the sixteenth such genuinely distinct proof, by our count), but that would be cheating quite a bit, primarily due to the fact that it assumes the inverse conjecture for the Gowers norm, our current proof of which is checking in at about 100 pages…

]]>
The regularity lemma is a manifestation of the “dichotomy between structure and randomness”, as discussed for instance in my ICM article or FOCS article. In the degree case , this result is essentially due to Green. It is powered by the *inverse conjecture for the Gowers norms*, which we and Tamar Ziegler have recently established (paper to be forthcoming shortly; the case of our argument is discussed here). The counting lemma is established through the quantitative equidistribution theory of nilmanifolds, which Ben and I set out in this paper.

The regularity and counting lemmas are designed to be used together, and in the paper we give three applications of this combination. Firstly, we give a new proof of Szemerédi’s theorem, which proceeds via an energy increment argument rather than a density increment one. Secondly, we establish a conjecture of Bergelson, Host, and Kra, namely that if has density , and , then there exist shifts for which contains at least arithmetic progressions of length of spacing . (The case of this conjecture was established earlier by Green; the case is false, as was shown by Ruzsa in an appendix to the Bergelson-Host-Kra paper.) Thirdly, we establish a variant of a recent result of Gowers-Wolf, showing that the true complexity of a system of linear forms over indeed matches the conjectured value predicted in their first paper.

In all three applications, the scheme of proof can be described as follows:

- Apply the arithmetic regularity lemma, and decompose a relevant function into three pieces, .
- The uniform part is so tiny in the Gowers uniformity norm that its contribution can be easily dealt with by an appropriate “generalised von Neumann theorem”.
- The contribution of the (virtual, irrational) nilsequence can be controlled using the arithmetic counting lemma.
- Finally, one needs to check that the contribution of the small error does not overwhelm the main term . This is the trickiest bit; one often needs to use the counting lemma again to show that one can find a set of arithmetic patterns for that is so sufficiently “equidistributed” that it is not impacted by the small error.

To illustrate the last point, let us give the following example. Suppose we have a set of some positive density (say ) and we have managed to prove that contains a reasonable number of arithmetic progressions of length (say), e.g. it contains at least such progressions. Now we perturb by deleting a small number, say , elements from to create a new set . Can we still conclude that the new set contains any arithmetic progressions of length ?

Unfortunately, the answer could be no; conceivably, all of the arithmetic progressions in could be wiped out by the elements removed from , since each such element of could be associated with up to (or even ) arithmetic progressions in .

But suppose we knew that the arithmetic progressions in were *equidistributed*, in the sense that each element in belonged to the same number of such arithmetic progressions, namely . Then each element deleted from only removes at most progressions, and so one can safely remove elements from and still retain some arithmetic progressions. The same argument works if the arithmetic progressions are only *approximately* equidistributed, in the sense that the number of progressions that a given element belongs to concentrates sharply around its mean (for instance, by having a small variance), provided that the equidistribution is sufficiently strong. Fortunately, the arithmetic regularity and counting lemmas are designed to give precisely such a strong equidistribution result.

A succinct (but slightly inaccurate) summation of the regularity+counting lemma strategy would be that in order to solve a problem in additive combinatorics, it “suffices to check it for nilsequences”. But this should come with a caveat, due to the issue of the small error above; in addition to checking it for nilsequences, the answer in the nilsequence case must be sufficiently “dispersed” in a suitable sense, so that it can survive the addition of a small (but not completely negligible) perturbation.

One last “production note”. Like our previous paper with Emmanuel Breuillard, we used Subversion to write this paper, which turned out to be a significant efficiency boost as we could work on different parts of the paper simultaneously (this was particularly important this time round as the paper was somewhat lengthy and complicated, and there was a submission deadline). When doing so, we found it convenient to split the paper into a dozen or so pieces (one for each section of the paper, basically) in order to avoid conflicts, and to help coordinate the writing process. I’m also looking into git (a more advanced version control system), and am planning to use it for another of my joint projects; I hope to be able to comment on the relative strengths of these systems (and with plain old email) in the future.

]]>Formulated as it is, this question follows simply by multiple applications of Poincaré’s Recurrence Theorem (see ERT1): there exists such that . Letting , there exists such that , which is the same as

Repeating the argument times, we obtain positive integers such that

where . This is much more than we wanted. In fact, applying the argument infinitely many times, we construct a sequence of positive integers such that

for every finite family of subsets of . Unfortunately, we have no control in the ‘s, so that combinatorial applications are harder. It would be interesting if we had some regularity in them. For example, can they form an arithmetic progression? The answer is YES and this constitutes one of the pilars of Ergodic Ramsey Theory.

Theorem 1(Furstenberg) If is a mps, such that and is a positive integer, then there exists a positive integer such that

Obviously, the existence of such is equivalent to the existence of such that

Taking , the characteristic function of ,

where , so that (2) is equivalent to

This inquires the analysis of the averages

Due to its nonsymmetry, instead of (3) we consider commuting transformations , all of them preserving , and the averages

from now on called **multiple ergodic averages**. Clearly, (3) is a special case of (4) considering , . Although the purpose is the full generality of (4), it is natural first to investigate (3). Four situations deserve attention:

- If and , does (4) have positive ?
- -norm convergence.
- Pointwise convergence.
- What about convergence of multiple polynomial ergodic averages?

The first one was solved affirmatively by H. Furstenberg in the 1977 seminal paper **Ergodic behavior of diagonal measures and a theorem of Szemerédi on arithmetic progressions**.

Theorem 2(Furstenberg) Let be a mps and be non-negative and satisfy . Then, for any ,

As we’ll see in forecoming posts, this actually proves Szemerédi’s Theorem, via a **correspondence principle** between sets of integers of positive density and measure-preserving systems. One year after, motivated by a topological analogue due to B. Weiss, Furstenberg and Y. Katznelson established in **An ergodic Szemerédi theorem for commuting transformations** an extension to the commutative case.

Theorem 3(Furstenberg and Katznelson) Let be a probability measure space and commuting transformations, all of them preserving . Then, for any non-negative such that ,

This result, in addition to extending Furstenberg’s Theorem, implies a purely combinatorial multidimensional version of Szemerédi’s Theorem.

Theorem 4(Multidimensional Szemerédi’s Theorem) Let be a subset with positive upper-Banach density and be any finite configuration. Then there are an integer and a vector such that .

An interesting feature is that, until 2007, when hypergraph versions of Szemerédi’s Regularity Lemma were developed by T. Gowers, there was no combinatorial proof of this result.

After establishing positivity, we discuss -norm convergence, solved in **Nonconventional ergodic averages and nilmanifolds** by B. Host and B. Kra.

Theorem 5(Host and Kra) Let be a mps and be bounded measurable functions on . Then

exists in .

Observe that we no longer have only one function, but . Three years later, in 2008, T. Tao extended it to the commuting setup in the work **Norm convergence of multiple ergodic averages for commuting transformations**.

Theorem 6(Tao) Let be a probability measure space, measure-preserving commuting transformations and be bounded measurable functions on . Then

exists in .

It is worth mentioning that this year T. Austin gave a new proof of it using classical ergodic theory (**On the norm convergence of nonconventional ergodic averages**). A few is known about pointwise convergence, only that

converges almost surely, for any and . This was obtained by J. Bourgain in **Double recurrence and almost sure convergence**.

Now consider polynomials with integers coefficients and the limits

What is known? In terms of combinatorial appications, does it at least have positive whenever is a non-negative bounded function such that ? Yes… due to V. Bergelson and A. Leibman in the work **Polynomial extensions of van der Waerden’s and Szemerédi’s theorems.**

Theorem 7(Bergelson and Leibman) Let be a probability measure space, measure-preserving commuting transformations, polynomials with integer coefficients and a bounded measurable functions on such that . Then

In fact, they proved a more general result.

Theorem 8(Bergelson and Leibman) Let be a probability measure space, measure-preserving commuting transformations, , ,, polynomials with integer coefficients and a bounded measurable functions on such that . Then

I could not find any result about -norm convergence. Two works, **Convergence of polyomial ergodic averages** by Host and Kra and **Pointwise convergence of ergodic averages for polynomial sequences of rotations of a nilmanifold** by Leibman, both in 2005, proved the situation of multiple polynomial ergodic averages along one transformation.

Theorem 9(Host, Kra and Leibman) Let be a mps,polynomials with integer coefficientsand bounded measurable functions on . Then

exists in .

**Last News:** a few hours ago this paper was posted in arXiv by Q. Chu, N. Frantzikinakis and B. Host stating many cases of the -norm convergence of multiple ergodic polynomial averages for commuting transformations.

Theorem 10(Chu, Frantzikinakis and Host) Let be a probability measure space, measure-preserving invertible commuting transformations,polynomials withdifferent degrees and bounded measurable functions on . Then

exists in .

As every fresh result, it first needs to be checked in full details.

]]>We begin with a question: what conditions a set must have to possess arbitrarily long arithmetic progressions? Well, if this set is very sparse (such as the powers of ), there is no chance for such thing. On the other hand, a set with arbitrarily large intervals trivially satisfies it. Althought the precise condition is not known, there is one of great interest which is sufficient. Define the * density* of as

(Here, stands for the cardinality of the set ). Such limit not always exists, so that it is more convenient to consider the ** upper density** of :

This is a well-defined number between and . In , Erdös and Turán conjectured that if , then has arbitrarily long arithmetic progressions. It remained wide open until , when Roth proved that such sets contain progression of lenght three. Later, Szemerédi, in , proved that they also have progressions of lenght four and, finally, in he solved the conjecture.

**Theorem (Szemerédi, ).** If has positive upper density, then it contains arbitrarily long arithmetic progressions.

His proof is a very hard combinatorial argument and relies in the *Szemerédi’s Regularity Lemma *(which we intend to talk in the future).

**Breakthrough and the birth of a new area.**

Two years later, Hillel Furstenberg gave another proof of Szemerédi’s Theorem, based on an deep analysis of the structure of general measure-preserving systems, known as *Furstenberg’s Structural Theorem *(see this lecture of Terence Tao for a discussion of this result in the case of distal systems). This gave birth to a new area, called **Ergodic Ramsey Theory**. As the name suggests, Ergodic Ramsey Theory deals with the use of Ergodic Theory (and related areas, such as topological dynamics) machinery to prove Ramsey Theory (and related combinatorial) problems.

In the next posts, we plan to discuss this interaction. Here is a sketch:

1. Poincaré’s Recurrence Theorem.

2. Classical Von Neumann’s Theorem.

3. Polynomial Von Neumann’s Theorem.

4. Multiple Poincaré’s Recurrence Theorem.

5. Furstenberg’s Correspondence Principle.

7. Topological Dynamics and Van der Waerden’s Theorem.

8. Two simple models of measure-preserving systems: compact and weak mixing systems.

9. Compact and weak-mixing extensions.

10. A glance at Furstenberg’s Structural Theorem and the proof of Multiple Poincaré’s Recurrence Theorem.

11. Generalized ergodic avergares: and a.e. convergence.

12. Green-Tao’s Theorem on the existence of arbitrarily long arithmetic progressions of primes.

The posts will be tagged by **ERT+(number of the lecture).**

How does behave? We do not really know. Will it help talking about it? Can we somehow look beyond the horizon and try to guess what the truth is?

Update 1: for the discussion: Here are a few more specific questions that we can wonder about and discuss.

1) Is the robustness of Behrend’s bound an indication that the truth is in this neighborhood.

2) Why arn’t there analogs for Salem-Spencer and Behrend’s constructions for the cup set problem?

3) What type of growth functions can we expect at all as the answer to such problems?

4) Where is the deadlock in improving the upper bounds for AP-free sets?

5) What new kind of examples one should try in order to improve the lower bounds? Are there some directions that were already extensively explored?

6) Can you offer some analogies? Other problems that might be related?

Roth proved that . Szemeredi and Heath-Brown improved it to for some (Szemeredi’s argument gave .) Jean Bourgain improved the bound in 1999 to and recently to (up to lower order terms).

Erdös and Turan who posed the problem in 1936 described a set not containing an arithmetic progression of size . Salem and Spencer improved this bound to . Behrend’s upper bound from 1946 is of the form . A small improvement was achieved recently by Elkin and is discussed here. (Look also at the remarks following that post.)

A closely related problem can be asked in . It is called the cap set problem. A subset of is called a cap set if it contains no arithmetic progression of size three or, alternatively, no three vectors that sum up to 0(modulo 3). If is a cap set of maximum size we can ask how the function behaves. Roy Meshulam proved, using Roth’s argument, that . Edell found an example of a cap set of size . So . Again the gap is exponential. What is the truth?

These are problems that attracted people’s interest for decades. The gaps between the lower and upper bounds are very large.

Will it help talking about it? Can we somehow look beyond the horizon and try to guess what the truth is? Is there a meaningful way to have a discussion of these problems? To give some heuristic non-rigorous arguments? To bring some useful analogies? To assign probabilities to the different possibilities? (We talked a little about assigning probabilities in cases of uncertainty in this post.)

Anyway, (as a little spin off to the polymath1 project, if you have any thoughts about where the truth is for these problems, or about how to discuss them meaningfully, or about the more general issue of trying to look “beyond the horizon” in mathematics in a meaningful way, you are most welcome to contribute.

For polymath1 background look especially here and here. Look also at this post on our blog. The updated version contains a discussion in what sense was polymath1 a massive collaboration.

And everybody is invited to participate in the following polls – one about 3-term arithmetic progressions and one about cap sets.

Update 2: another poll on the expected answer for density Hales Jewett added

The question is: what is the maximum size of a subset of without a combinatorial line. The recent proofs appears to lead to . A sort of hyper-optimistic conjecture that was proposed along the project asserts that the maximum is obtained by a union of slices, where a slice means all vectors with a prescribed numbers of 0’s 1’s and 2’s.

In polls the choice of answers is of important. For our choices of answers, look also at Scott Aaronson’s favorite growth rates.

Polls (even exit polls) can also be wrong…

]]>

Erdos and Turan asked in 1936: What is the largest subset of {1,2,…,n} without a 3-term arithmetic progression?

In 1946 Behrend found an example with

Now, sixty years later, Michael Elkin pushed the the factor from the denominator to the enumerator, and found a set with !

Here is a description of Behrend’s construction and its improvment as told by Michael himself:

“The construction of Behrend employs the observation that a sphere in any dimension is convexly independent, and thus cannot contain three vectors such that one of them is the arithmetic average of the two other. The new construction replaces the sphere by a thin annulus. Intuitively, one can produce larger progression-free sets because an annulus of non-zero width contains more integer points than a sphere of the same radius does. However, unlike in a sphere, the set of integer points in an annulus is not necessarily convexly independent. To counter this difficulty I show that as long as the annulus is sufficiently thin, the set U of its integer points contains a convexly independent subset W whose size is at least a constant fraction of the size of U. The subset W is, in fact, the exterior set Ext(U) of the set U.

The set U above is the set of integer points of the the intersection of a very thin annulus with a cube. The (minimum) dimension k of the space in which this body has non-zero volume is not constant, but rather it tends to infinity logarithmically with the radius of the annulus. Consequently, it becomes not trivial to estimate the volume of this body, leaving alone the the number of integer points that it contains. In addition, most known estimates for the discrepancy between the number of integer points and the volume assume that the dimension is fixed, and thus these estimates are inapplicable in this case. Moreover, since the annulus is very thin, its volume is not much larger than its surface area, and thus crude estimates of the discrepancy between the number of integer points and the volume do not suffice. Showing more precise estimates involves a rather delicate analysis.”

]]>In 1975, Szemerédi established the following theorem, which had been conjectured in 1936 by Erdős and Turán:

Theorem 1.(Szemerédi’s theorem) Let be an integer, and let A be a set of integers of positive upper density, thus . Then A contains a non-trivial arithmetic progression of length k. (By “non-trivial” we mean that .) [More succinctly: every set of integers of positive upper density contains arbitrarily long arithmetic progressions.]

This theorem is trivial for k=1 and k=2. The first non-trivial case is k=3, which was proven by Roth in 1953 and will be discussed in a later lecture. The k=4 case was also established by Szemerédi in 1969.

In 1977, Furstenberg gave another proof of Szemerédi’s theorem, by establishing the following equivalent statement:

Theorem 2.(Furstenberg multiple recurrence theorem) Let be an integer, let be a measure-preserving system, and let E be a set of positive measure. Then there exists such that is non-empty.

**Remark 1.** The negative signs here can be easily removed because T is invertible, but I have placed them here for consistency with some later results involving non-invertible transformations, in which the negative sign becomes important.

**Exercise 1.** Prove that Theorem 2 is equivalent to the apparently stronger theorem in which “is non-empty” is replaced by “has positive measure”, and “there exists ” is replaced by “there exist infinitely many “.

Note that the k=1 case of Theorem 2 is trivial, while the k=2 case follows from the Poincaré recurrence theorem (Theorem 1 from Lecture 8). We will prove the higher k cases of this theorem in later lectures. In this one, we will explain why, for any fixed k, Theorem 1 and Theorem 2 are equivalent.

Let us first give the easy implication that Theorem 1 implies Theorem 2. This follows immediately from

Lemma 1.Let be a measure-preserving system, and let E be a set of positive measure. Then there exists a point x in X such that the recurrence set has positive upper density.

Indeed, from Lemma 1 and Theorem 1, we obtain a point x for which the set contains an arithmetic progression of length k and some step r, which implies that is non-empty.

**Proof of Lemma 1.** Observe (from the shift-invariance of ) that

. (1)

On the other hand, the integrand is at most 1. We conclude that for each N, the set must have measure at least . This implies that the function is not absolutely integrable even after excluding an arbitrary set of measure up to , which implies that is not finite a.e., and the claim follows (cf. the proof of the Borel-Cantelli lemma).

Now we show how Theorem 2 implies Theorem 1. If we could pretend that “upper density” was a probability measure on the integers, then this implication would be immediate by applying Theorem 2 to the dynamical system . Of course, we know that the integers do not admit a shift-invariant probability measure (and upper density is not even additive, let alone a probability measure). So this does not work directly. Instead, we need to first lift from the integers to a more abstract universal space and use a standard “compactness and contradiction” argument in order to be able to build the desired probability measure properly.

More precisely, let A be as in Theorem 1. Consider the topological boolean Bernoulli dynamical system with the product topology and the shift . The set A can be viewed as a point in this system, and the orbit closure of that point becomes a subsystem of that Bernoulli system, with the relative topology.

Suppose for contradiction that A contains no non-trivial progressions of length k, thus for all . Then, if we define the cylinder set to be the collection of all points in X which (viewed as sets of integers) contain 0, we see (after unpacking all the definitions) that for all .

In order to apply Theorem 2 and obtain the desired contradiction, we need to find a shift-invariant Borel probability measure on X which assigns a positive measure to E.

For each integer N, consider the measure which assigns a mass of to the points in X for , and no mass to the rest of X. Then we see that . Thus, since A has positive upper density, there exists some sequence going to infinity such that . On the other hand, by vague sequential compactness (Lemma 1 of Lecture 7) we know that some subsequence of converges in the vague topology to a probability measure , which then assigns a positive measure to the (clopen) set E. As the are asymptotically shift invariant, we see that is invariant also (as in the proof of Corollary 1 of Lecture 7). As now has all the required properties, we have completed the deduction of Theorem 1 from Theorem 2.

**Exercise 2. **Show that Theorem 2 in fact implies a seemingly stronger version of Theorem 1, in which the conclusion becomes the assertion that the set has positive upper density for infinitely many r.

**Exercise 3.** Show that Theorem 1 in fact implies a seemingly stronger version of Theorem 2: If are sets in a probability space with uniformly positive measure (i.e. ), then for any k there exists positive integers n, r such that .

– Varnavides type theorems –

A similar “compactness and contradiction” argument (combined with a preliminary averaging-over-dilations trick of Varnavides) allows us to use Theorem 2 to imply the following apparently stronger statement (observed by Bergelson, Host, McCutcheon, and Parreau):

Theorem 3.(Uniform Furstenberg multiple recurrence theorem) Let be an integer and . Then for any measure-preserving system and any measurable set E with we have(2)

for all , where is a positive quantity which depends only on k and (i.e. it is uniform over all choices of system and of the set E with measure at least ).

**Exercise 4. **Assuming Theorem 3, show that if N is sufficiently large depending on k and , then any subset of with cardinality at least will contain at least non-trivial arithmetic progressions of length k, for some . (This result for k=3 was first established by Varnavides via an averaging argument from Roth’s theorem.) Conclude in particular that Theorem 3 implies Theorem 1.

It is clear that Theorem 3 implies Theorem 2; let us now establish the converse. We first use an averaging argument of Varnavides to reduce Theorem 3 to a weaker statement, in which the conclusion (2) is not asserted to hold for all N, but instead one asserts that

(2′)

is true for some depending only on k and (note that the r=0 term in (2′) has been dropped, otherwise the claim is trivial). To see why one can recover (2) from (2′), observe by replacing the shift T with a power that we can amplify (2′) to

(2”)

for all a. Averaging (2”) over we easily conclude (2).

It remains to prove that (2”) holds under the hypotheses of Theorem 3. Our next reduction is to observe that for it suffices to perform this task for the boolean Bernoulli system with the cylinder set as before. To see this, recall from Example 5 of Lecture 2 that there is a morphism from any measure-preserving system with a distinguished set E to the system with the product -algebra , the usual shift , and the set , and with the push-forward measure . Specifically, sends any point x in X to its recurrence set . Using this morphism it is not difficult to show that the claim (2) for and E would follow from the same claim for and .

We still need to prove (2”) for the boolean system. The point is that by lifting to this universal setting, the dynamical system and the set E have been canonically fixed; the only remaining parameter is the probability measure . But now we can exploit vague sequential compactness again as follows.

Suppose for contradiction that Theorem 3 failed for the boolean system. Then by carefully negating all the quantifiers, we can find such that for any there is a sequence of shift-invariant probability measures on X with ,

(3)

as . Note that if (3) holds for one value of , then it also holds for all smaller values of . A standard diagonalisation argument then allows us to build a sequence as above, but which obeys (3) for *all* .

Now we are finally in a good position to apply vague sequential compactness. By passing to a subsequence if necessary, we may assume that converges vaguely to a limit , which is a shift-invariant probability measure. In particular we have , while from (3) we see that

(4)

for all ; thus the sets all have zero measure for . But this contradicts Theorem 2 (and Exercise 1). This completes the deduction of Theorem 3 from Theorem 2.

– Other recurrence theorems and their combinatorial counterparts –

The Furstenberg correspondence principle can be extended to relate several other recurrence theorems to their combinatorial analogues. We give some representative examples here (without proofs). Firstly, there is a multidimensional version of Szemerédi’s theorem (compare with Exercise 7 from Lecture 4):

Theorem 4(Multidimensional Szemerédi theorem) Let , let , and let be a set of positive upper Banach density (which means that , where ). Then A contains a pattern of the form for some and .

Note that Theorem 1 corresponds to the special case when and .

This theorem was first proven by Furstenberg and Katznelson, who deduced it via the correspondence principle from the following generalisation of Theorem 2:

Theorem 5(Recurrence for multiple commuting shifts) Let be an integer, let be a probability space, let be measure-preserving bimeasurable maps which commute with each other, and let E be a set of positive measure. Then there exists such that is non-empty.

**Exercise 5. **Show that Theorem 4 and Theorem 5 are equivalent.

**Exercise 6. **State an analogue of Theorem 3 for multiple commuting shifts, and prove that it is equivalent to Theorem 5.

There is also a polynomial version of these theorems (cf. Theorem 1 from Lecture 5), which we will also state in general dimension:

Theorem 6(Multidimensional polynomial Szemerédi theorem) Let , let be polynomials with , and let be a set of positive upper Banach density. Then A contains a pattern of the form for some and .

This theorem was established by Bergelson and Leibman, who deduced it from

Theorem 7(Polynomial recurrence for multiple commuting shifts) Let k, , , E be as in Theorem 5, and let be as in Theorem 6. Then there exists such that is non-empty, where we adopt the convention (thus we are making the action of on X explicit).

**Exercise 7.** Show that Theorem 6 and Theorem 7 are equivalent.

**Exercise 8. **State an analogue of Theorem 3 for polynomial recurrence for multiple commuting shifts, and prove that it is equivalent to Theorem 7. (Hint: first establish this in the case that each of the are monomials, in which case there is enough dilation symmetry to use the Varnavides averaging trick. Interestingly, if one only restricts attention to one-dimensional systems k=1, it does not seem possible to deduce the uniform polynomial recurrence theorem from the non-uniform polynomial recurrence theorem, thus indicating that the averaging trick is less universal in its applicability than the correspondence principle.)

In the above theorems, the underlying action was given by either the integer group or the lattice group . It is not too difficult to generalise these results to the semigroups and (thus dropping the assumption that the shift maps are invertible), by using a trick similar to that used in Exercise 9 of Lecture 4, or by using the correspondence principle back and forth a few times. A bit more surprisingly, it is possible to extend these results to even weaker objects than semigroups. To describe this we need some more notation.

Define a *partial semigroup* to be a set G together with a partially defined multiplication operation for some subset , which is associative in the sense that whenever is defined, then is defined and equal to , and vice versa. A good example of a partial semigroup is the finite subsets of a fixed set S, where the multiplication operation is disjoint union, or more precisely when A and B are disjoint, and is undefined otherwise.

**Remark 2.** One can extend a partial semigroup to be a genuine semigroup by adjoining a new element to G, and redefining multiplication to equal if it was previously undefined (or if one of a or b was already equal to ). However, we will avoid using this trick here, as it tends to complicate the notation a little.

One can take Cartesian products of partial semigroups in the obvious manner to obtain more partial semigroups. In particular, we have the partial semigroup for any , defined as the collection of d-tuples of finite sets of natural numbers (not necessarily disjoint), with the partial semigroup law whenever and are disjoint for each .

If is a probability space and is a partial semigroup, we define a *measure-preserving action* of G on X to be an assignment of a measure-preserving transformation (not necessarily invertible) to each , such that whenever is defined.

An action T of on X is known as an *IP system* on X; it is generated by a countable number of commuting measure-preserving transformations, with . (Admittedly, it is possible that the action of the empty set is not necessarily the identity, but this turns out to have a negligible impact on matters.) An action T of is then a collection of d simultaneously commuting IP systems.

Furstenberg and Katznelson showed the following generalisation of Theorem 5:

Theorem 8(IP multiple recurrence theorem) Let T be an action of on a probability space . Then there exists a non-empty set such that is non-empty, where is the group element which equals A in the position and is the empty set otherwise.

It has a number of combinatorial consequences, such as the following strengthening of Szemerédi’s theorem:

Theorem 9.(IP Szemerédi theorem) Let A be a set of integers of positive upper density, let , and let be infinite. Then A contains an arithmetic progression of length k in which r lies in FS(B), the set of finite sums of B (cf. Hindman’s theorem from Lecture 5).

(There is also a multidimensional version of this theorem, but it requires a fair amount of notation to state properly.)

**Exercise 9. **Deduce Theorem 9 from Theorem 8.

**Exercise 10.** Using Theorem 9, show that for any k, and any set of integers A of positive upper density, the set of steps r which occur in the arithmetic progressions in A of length k is syndetic.

**Exercise 11.** Using Theorem 8, show that if is a finite field, and is the canonical vector space over spanned (in the algebraic sense) by a countably infinite number of basis vectors, show that any subset A of of positive upper Banach density (which means that contains affine subspaces of arbitrarily high dimension.

The IP recurrence theorem is already very powerful, but even stronger theorems are known. For instance, Furstenberg and Katznelson established the following deep strengthening of the Hales-Jewett theorem (Theorem 8 from Lecture 5), as well as of Exercise 11 above:

Theorem 10(Density Hales-Jewett theorem) Let A be a finite alphabet. If E is a subset of of positive upper Banach density, then E contains a combinatorial line.

This theorem was deduced (via an advanced form of the correspondence principle) by a somewhat complicated recurrence theorem which we will not state here; rather than the action of a group, semigroup, or partial semigroup, one instead works with an ensemble of sets (as in Exercise 3), and furthermore one regularises the system of the probability space and set ensemble (which can collectively be viewed as a random process) to be what Furstenberg and Katznelson call a *strongly stationary process*, which (very) roughly means that the statistics of this process look “the same” when restricted to any combinatorial subspace of a fixed dimension.

**Remark 3.** Similar correspondence principles can be established connecting property testing results for graphs and hypergraphs to the measure theory of exchangeable measures: see this paper of myself, and of myself and Austin, for details. There is also a correspondence principle connecting ergodic convergence theorems with a (rather complicated looking) finitary analogue; see the papers of Avigad-Gerhardy-Towsner and of myself on this subject. Finally, we have implicitly been using a similar correspondence principle between topological dynamics and colouring Ramsey theorems in our previous lectures (in particular Lecture 3, Lecture 4, and Lecture 5).

**Remark 4**. The Furstenberg correspondence principle also comes tantalisingly close to deducing my theorem with Ben that the primes contain arbitrarily long arithmetic progressions from Szemerédi’s theorem. More precisely, they show that any subset A of a *genuinely *random set of integers with logarithmic-type density B, with A having positive *relative *upper density with respect to B, contains arbitrarily long arithmetic progressions; see this unpublished note of myself. Unfortunately, the almost primes are not known to quite obey enough “correlation conditions” to behave sufficiently pseudorandomly that these arguments apply to the primes, though perhaps there is still a “softer” way to prove our theorem than the way we did it (there is for instance some recent work by Trevisan, Tulsiani, and Vadhan in this direction).

Scholarpedia seems to be an interesting experiment, trying to blend the collaborative and dynamic strengths of the wiki system with the traditional and static strengths of the peer-review system. At any rate, any feedback on my article with Ben, either at the Scholarpedia page or here, would be welcome.

[*Update*, July 9: the article has been reviewed, modified, and accepted in just three days - a blindingly fast speed as far as peer review goes!]

*Combinatorial number theory*, which seeks to find patterns in unstructured dense sets (or colourings) of integers;*Ergodic theory*(or more specifically, multiple recurrence theory), which seeks to find patterns in positive-measure sets under the action of a discrete dynamical system on probability spaces (or more specifically, measure-preserving actions of the integers );*Graph theory*, or more specifically the portion of this theory concerned with finding patterns in large unstructured dense graphs; and*Ergodic graph theory*, which is a very new and undeveloped subject, which roughly speaking seems to be concerned with the patterns within a measure-preserving action of the infinite permutation group , which is one of several models we have available to study infinite “limits” of graphs.

The two “discrete” (or “finitary”, or “quantitative”) fields of combinatorial number theory and graph theory happen to be related to each other, basically by using the Cayley graph construction; I will give an example of this shortly. The two “continuous” (or “infinitary”, or “qualitative”) fields of ergodic theory and ergodic graph theory are at present only related on the level of analogy and informal intuition, but hopefully some more systematic connections between them will appear soon.

On the other hand, we have some very rigorous connections between combinatorial number theory and ergodic theory, and also (more recently) between graph theory and ergodic graph theory, basically by the procedure of viewing the infinitary continuous setting as a limit of the finitary discrete setting. These two connections go by the names of the *Furstenberg correspondence principle* and the *graph correspondence principle* respectively. These principles allow one to tap the power of the infinitary world (for instance, the ability to take limits and perform completions or closures of objects) in order to establish results in the finitary world, or at least to take the *intuition* gained in the infinitary world and transfer it to a finitary setting. Conversely, the finitary world provides an excellent model setting to refine one’s understanding of infinitary objects, for instance by establishing quantitative analogues of “soft” results obtained in an infinitary manner. I will remark here that this best-of-both-worlds approach, borrowing from both the finitary and infinitary traditions of mathematics, was absolutely necessary for Ben Green and I in order to establish our result on long arithmetic progressions in the primes. In particular, the infinitary setting is excellent for being able to rigorously define and study concepts (such as structure or randomness) which are much “fuzzier” and harder to pin down exactly in the finitary world.

Let me first discuss the connection between combinatorial number theory and graph theory. We can illustrate this connection with two classical results from the former and latter field respectively:

- Schur’s theorem: If the positive integers are coloured using finitely many colours, then one can find positive integers x, y such that x, y, x+y all have the same colour.
- Ramsey’s theorem: If an infinite complete graph is edge-coloured using finitely many colours, then one can find a triangle all of whose edges have the same colour.

(In fact, both of these theorems can be generalised to say much stronger statements, but we will content ourselves with just these special cases). It is in fact easy to see that Schur’s theorem is deducible from Ramsey’s theorem. Indeed, given a colouring of the positive integers, one can create an infinite coloured complete graph (the *Cayley graph *associated to that colouring) whose vertex set is the integers , and such that an edge {a,b} with a < b is coloured using the colour assigned to b-a. Applying Ramsey’s theorem, together with the elementary identity (c-a) = (b-a) + (c-b), we then quickly deduce Schur’s theorem.

Let us now turn to ergodic theory. The basic object of study here is a *measure-preserving system* (or *probability-preserving system*), which is a probability space (i.e. a set X equipped with a sigma-algebra of measurable sets and a probability measure on that sigma-algebra), together with a shift map , which for simplicity we shall take to be invertible and bi-measurable (so its inverse is also measurable); in particular we have iterated shift maps for any integer n, giving rise to an action of the integers . The important property we need is that the shift map is measure-preserving, thus for all measurable sets E.

In the last lecture we saw that sets of integers could be divided (rather informally) into structured sets, pseudorandom sets, and hybrids between the two. The same is true in ergodic theory – and this time, one can in fact make these notions extremely precise. Let us first start with some examples:

- The
*circle shift,*in which is the standard unit circle with normalised Haar measure, and for some fixed real number . If we identify X with the unit circle in the complex plane via the standard identification , then the shift corresponds to an anti-clockwise rotation by . This is a very structured system, and corresponds in combinatorial number theory to*Bohr sets*such as , which implicitly made an appearance in the previous lecture. - The
*two-point shift*, in which X := {0,1} with uniform probability measure, and T simply interchanges 0 and 1. This very structured system corresponds to the set A of odd numbers (or of even numbers) mentioned in the previous lecture. More generally, any permutation on a finite set gives rise to a simple measure-preserving system. - The
*skew shift*, in which is the 2-torus with normalised Haar measure, and for some fixed real number . If we just look at the behaviour of the x-component of this torus we see that the skew shift contains the circle shift as a*factor*, or equivalently that the skew shift is an*extension*of the circle shift (in this particular case, since the fibres are circles and the action on the fibres is rotation, we call this a*circle extension*of the circle shift). This system is also structured (but in a more complicated way than the previous two shifts), and corresponds to quadratically structured sets such as the*quadratic Bohr set*, which made an appearance in the previous lecture. - The
*Bernoulli shift*, in which is the space of infinite 0-1 sequences (or equivalently, the space of all sets of integers), equipped with uniform product probability measure, and T is the left shift . This is a very random system, corresponding to the random sets B discussed in the previous lecture. - Hybrid systems, e.g. products of a circle shift and a Bernoulli shift, or extensions of a circle shift by a Bernoulli system, a doubly skew shift (a circle extension of a circle extension of a circle shift), etc.

One can classify these systems in precise terms according to how the shift action moves sets E around. On the one hand, we have some well-defined notions which represent structure:

*Trivial*systems are such that for all E and all n.*Periodic*systems are such that for every E, there exists a positive n such that . The two-point shift is an example, as is the circle shift when is rational.*Almost periodic*or*compact*systems are such that for every E and every , there exists a positive n such that and differ by a set of measure at most . The circle shift is a good example of this (thanks to the equidistribution theorem). The term “compact” is used because there is an equivalent characterisation of compact systems, namely that the orbits of the shift in are always precompact in the strong topology.

On the other hand, we have some well-defined terms which represent pseudorandomness:

*Strongly mixing*systems are such that for every E, F, we have as n tends to infinity; the Bernoulli shift is a good example. Informally, this is saying that shifted sets become asymptotically independent of unshifted sets.*Weakly mixing*systems are such that for every E, F, we have as n tends to infinity after excluding a set of exceptional values of n of asymptotic density zero. For technical reasons, weak mixing is a better notion to use in the structure-randomness dichotomy than strong mixing (for much the same reason that one always wants to allow negligible sets of measure zero in measure theory).

There are also more complicated (but well-defined) hybrid notions of structure and randomness which we will not give here. We will however briefly discuss the situation for the skew shift. This shift is not almost periodic: most sets A will become increasingly “skewed” as it gets shifted, and will never return to resemble itself again. However, if one restricts attention to the underlying circle shift factor (i.e. restricting attention only to those sets which are unions of vertical fibres), then one recovers almost periodicity. Furthermore, the skew shift is almost periodic *relative* to the underlying circle shift, in the sense that while the shifts of a given set A do not return to resemble A globally, they do return to resemble A when restricted to any fixed vertical fibre (this can be shown using the method of Weyl sums from Fourier analysis). Because of this, we say that the skew shift is a *compact extension* of a compact system.

As discussed in the above examples, every dynamical system is capable of generating some interesting sets of integers, specifically *recurrence sets* where E is a set in X and is a point in X. This set actally captures much of the dynamics of E in the system (especially if X is ergodic and is “generic”). The *Furstenberg correspondence principle* reverses this procedure, starting with a set of integers A and using that to generate a dynamical system which “models” that set in a certain way. Modulo some minor technicalities, it works as follows.

- As with the Bernoulli shift, we work in the space , with the product sigma-algebra and the left shift; but we leave the probability measure (which can be interpreted as the distribution of a certain random subset of the integers) undefined for now. The original set A can now be interpreted as a single point inside X.
- Now pick a large number N, and shift A backwards and forwards up to N times, giving rise to 2N+1 sets , which can be thought of as 2N+1 points inside X. We consider the uniform distribution on these points, i.e. we shift A by a random amount between -N and N. This gives rise to a discrete probability measure on X (which is only supported on 2N+1 points inside X). Each of these measures is approximately invariant under the shift T.
- We now let N go to infinity. We apply the (sequential form of the) Banach-Alaoglu theorem, which among other things shows that the space of Borel probability measures on a compact Hausdorff space (which X is) is sequentially compact in the weak-* topology. (This particular version of Banach-Alaoglu can in fact be established by a diagonalisation argument which completely avoids the axiom of choice.) Thus we can find a subsequence of the measures which converge in the weak-* topology to a limit (this subsequence and limit may not be unique, but this will not concern us). Since the are approximately invariant under T, with the degree of approximation improving with N, one can easily show that the limit measure is shift-invariant.

By using this recipe to construct a measure-preserving system from a set of integers, it is possible to deduce theorems in combinatorial number theory from those in ergodic theory (similarly to how the Cayley graph construction allowed one to deduce theorems in combinatorial number theory from those in graph theory). The most famous example of this concerns the following two deep theorems:

- Szemerédi’s theorem: If A is a set of integers of positive upper density, and k is a positive integer, then A contains infinitely many arithmetic progressions of length k. (Note that the case k=2 is trivial.)
- Furstenberg’s recurrence theorem: If E is a set of positive measure in a measure-preserving system, and k is a positive integer, then there are infinitely many integers n for which . (Note that the case k=2 is the more classical Poincaré recurrence theorem).

Using the above correspondence principle (or a slight variation thereof), it is not difficult to show that the two theorems are in fact equivalent; see for instance Furstenberg’s book on the subject. The power of these two theorems derives from the fact that the former works for *arbitrary* sets of positive density, and the latter works for *arbitrary* measure-preserving systems – there are essentially no structural assumptions on the basic object of study in either, and it is therefore quite remarkable that one can still conclude such a non-trivial result.

The story of Szemerédi’s theorem is quite a long one, which I have discussed in many previous places, and will not do so again here, though I will note here that all the proofs of this theorem exploit the dichotomy between structure and randomness (and there are some good reasons for this – the underlying cause of arithmetic progressions is totally different in the structured and pseudorandom cases). I will however briefly describe how Furstenberg’s recurrence theorem is proven (following the approach of Furstenberg, Katznelson, and Ornstein; there are a couple other ergodic theoretic proofs, including of course Furstenberg’s original proof). The first major step is to establish the *Furstenberg structure theorem*, which takes an arbitrary measure-preserving system and describes it as a suitable hybrid of a compact system and a weakly mixing system (or more precisely, a weakly mixing extension of a transfinite tower of compact extensions). This theorem relies on Zorn’s lemma, although it is possible to give a proof of the recurrence theorem without recourse to the axiom of choice. The proof requires various tools from infinitary analysis (e.g. the compactness of integral operators) but is relatively straightforward. Next, one makes the rather simple observation that the Furstenberg recurrence theorem is easy to show both for compact systems and for weakly mixing systems. In the former case, the almost periodicity shows that there are lots of integers n for which is almost identical with A (in the sense that they differ by a set of small measure) – which, after shifting by n again, implies that is almost identical with , and so forth – which soon makes it easy to arrange matters so that is non-empty. In the latter case, the weak mixing shows that for most n, the sets (or “events”) and are almost uncorrelated (or “independent”); similarly, for any fixed m, we have and almost uncorrelated for n large enough. By using the Cauchy-Schwarz inequality (in the form of a useful lemma of van der Corput) repeatedly, we can eventually show that are almost *jointly* independent (as opposed to being merely almost pairwise independent) for many n, at which point the recurrence theorem is easy to show. It is somewhat more tricky to show that one can also combine these arguments with each other to show that the recurrence property also holds for the transfinite combinations of compact and weakly mixing systems that come out of the Furstenberg structure theorem, but it can be done with a certain amount of effort, and this concludes the proof of the recurrence theorem. This same method of proof turns out, with several additional technical twists, to establish many further varieties of recurrence theorems, which in turn (via the correspondence principle) gives several powerful results in combinatorial number theory, several of which continue to have no non-ergodic proof even today.

(There has also been a significant amount of progress more recently by several ergodic theorists in understanding the “structured” side of the Furstenberg structure theorem, in which dynamical notions of structure, such as compactness, have been converted into algebraic and topological notions of structure, in particular into the actions of nilpotent Lie groups on their homogeneous spaces. This is an important development, and is closely related to the polynomial and generalised polynomial sequences appearing in the previous talk, but it would be beyond the scope of this talk to discuss it here.)

Let us now leave ergodic theory and return to graph theory. Given the power of the Furstenberg correspondence principle, it is natural to look for something similar in graph theory, which would connect up results in finitary graph theory with some infinitary variant. A typical candidate for a finitary graph theory result that one would hope to do this for is the triangle removal lemma, which was discussed in a recent blog post here. That lemma is in fact closely connected with Szemerédi’s theorem, indeed it implies the k=3 case of that theorem (i.e. Roth’s theorem) in much the same way that Ramsey’s theorem implies Schur’s theorem. It does turn out that this is possible, although the infinitary analogues of things like the triangle removal lemma are a little strange-looking (one such analogue can be found in this paper; another can be found in this one). But it is easier to describe the concept of a graph limit. There are several equivalent formulations of this limit, including the notion of a “graphon” introduced by Lovász and Szegedy, the flag algebra construction introduced by Razborov, and the notion of a permutation-invariant measure space introduced by myself. I will discuss my own construction here, which is closely modelled on the Furstenberg correspondence principle. What it does is starts with a sequence of graphs (which one should think of as getting increasingly large, while remaining dense) and extracts a limit object, which is a probability space together with an action of the permutation group on the integers, as follows.

- We let be the space of all graphs on the integers, with the standard product (i.e. weak) topology, and hence product sigma-algebra. This space has an obvious action of the permutation group , formed by permuting the vertices.
- Each graph generates a random graph on the integers – or equivalently, a probability measure in X – as follows. We randomly and independently sample the vertices of the graph infinitely often, creating a sequence of vertices in the graph . (Of course, many of these vertices will collide, but this will be not be important for us.) This then creates a random graph on the integers, with any two integers i and j connected by an edge if their associated vertices are distinct and are connected by an edge in . By construction, the probability measure associated to this graph is already -invariant.
- We then let n go to infinity, and extract a weak limit just as with the Furstenberg correspondence principle.

It is then possible to prove results somewhat analogous to the Furstenberg structure theorem and Furstenberg recurrence theorem in this setting, and use this to prove several results in graph theory (as well as its more complicated generalisation, hypergraph theory). I myself am optimistic that by transferring more ideas from traditional ergodic theory into this new setting of “ergodic graph theory”, that one could obtain a new tool for systematically establishing a number of other qualitative results in graph theory, particularly those which are traditionally reliant on the Szemerédi regularity lemma (which is almost a qualitative result itself, given how poor the bounds are). This is however still a work in progress.

]]>Perhaps my favourite open question is the problem on the maximal size of a *cap set* – a subset of ( being the finite field of three elements) which contains no lines, or equivalently no non-trivial arithmetic progressions of length three. As an upper bound, one can easily modify the proof of Roth’s theorem to show that cap sets must have size (see e.g. this paper of Meshulam). This of course is better than the trivial bound of once n is large. In the converse direction, the trivial example shows that cap sets can be as large as ; the current world record is , held by Edel. The gap between these two bounds is rather enormous; I would be very interested in either an improvement of the upper bound to , or an improvement of the lower bound to . (I believe both improvements are true, though a good friend of mine disagrees about the improvement to the lower bound.)

One reason why I find this question important is that it serves as an excellent model for the analogous question of finding large sets without progressions of length three in the interval . Here, the best upper bound of is due to Bourgain (he also has a recent, not yet published, improvement to , while the best lower bound of is an ancient result of Behrend. Using the finite field heuristic that “behaves like” , we see that the Bourgain bound should be improvable to , whereas the Edel bound should be improvable to something like . However, neither argument extends easily to the other setting. Note that a (still open) conjecture of Erdős-Turán is essentially equivalent (for progressions of length three, up to log log factors) to the problem of improving the Bourgain bound to .

The Roth bound of appears to be the natural limit of the purely Fourier-analytic approach of Roth, and so any breakthrough would be extremely interesting, as it almost certainly would need a radically new idea. The lower bound might be improvable by some sort of algebraic geometry construction, though it is not clear at all how to achieve this.

(Update, Feb 25: After some feedback and advice, and moving the entire blog to another site, I have finally gotten the math formulae to work out nicely. Thanks for all the help!)

(*Update*, Feb 27: As pointed out in the comments, one can interpret this problem in terms of the wonderful game Set, in which case the problem is to find the largest number of cards one can put on the table for which nobody has a valid move. As far as I know, the best bounds on the cap set problem in small dimensions are the ones cited in the __Edel paper mentioned above__.)

(*Update*, Mar 5: After discussions with Jordan Ellenberg, we realised that there is a variant formulation of the problem which may be a little bit more tractable. Given any , the fewest number of lines in a set of of density at least is known to be for some ; this is essentially a result of Croot. The reformulated question is then to get as strong a bound on ) as one can. For instance, the counterexample shows that , while the Roth-Meshulam argument gives .)