This post is therefore the final post of the polymath5 project. I refer you to Terry’s posts for the mathematics. I will just make a few comments about what all this says about polymath projects in general.

After the success of the first polymath project, which found a purely combinatorial proof of the density Hales-Jewett theorem, there was an appetite to try something similar. However, the subsequent experience made it look as though the first project had been rather lucky, and not necessarily a good indication of what the polymath approach will typically achieve. I started polymath2, about a Banach-space problem, which never really got off the ground. Gil Kalai started polymath3, on the polynomial Hirsch conjecture, but the problem was not solved. Terence Tao started polymath4, about finding a deterministic algorithm to output a prime between and , which did not find such an algorithm but did prove some partial results that were interesting enough to publish in an AMS journal called Mathematics of Computation. I started polymath5, with the aim of solving the Erdős discrepancy problem (after this problem was chosen by a vote from a shortlist that I drew up), and although we had some interesting ideas, we did not solve the problem. The most obviously successful polymath project was polymath8, which aimed to bring down the size of the gap in Zhang’s prime-gaps result, but it could be argued that success for that project was guaranteed in advance: it was obvious that the gap could be reduced, and the only question was how far.

Actually, that last argument is not very convincing, since a lot more came out of polymath8 than just a tightening up of the individual steps of Zhang’s argument. But I want to concentrate on polymath5. I have always felt that that project, despite not solving the problem, was a distinct success, because by the end of it I, and I was not alone, understood the problem far better and in a very different way. So when I discussed the polymath approach with people, I described its virtues as follows: a polymath discussion tends to go at lightning speed through all the preliminary stages of solving a difficult problem — trying out ideas, reformulating, asking interesting variants of the question, finding potentially useful reductions, and so on. With some problems, once you’ve done all that, the problem is softened up and you can go on and solve it. With others, the difficulties that remain are still substantial, but at least you understand far better what they are.

In the light of what has now happened, the second case seems like a very accurate description of the polymath5 project, since Terence Tao used ideas from that project in an essential way, but also recent breakthroughs in number theory by Kaisa Matomäki and Maksim Radziwiłł that led on to work by those authors and Terry himself that led on to the averaged form of the Elliott conjecture that Terry has just proved. Thus, if the proof of the Erdős discrepancy problem in some sense requires these ideas, then there was no way we could possibly have hoped to solve the problem back in 2010, when polymath5 was running, but what we did achieve was to create a sort of penumbra around the problem, which had the effect that when these remarkable results in number theory became available, the application to the Erdős discrepancy problem was significantly easier to spot, at least for Terence Tao …

I’ll remark here that the approach to the problem that excited me most when we were thinking about it was a use of duality to reduce the problem to an existential statement: you “just” have to find a function with certain properties and you are done. Unfortunately, finding such a function proved to be extremely hard. Terry’s work proves abstractly that such a function exists, but doesn’t tell us how to construct it. So I’m left feeling that perhaps I was a bit too wedded to that duality approach, though I also think that it would still be very nice if someone managed to make it work.

There are a couple of other questions that are interesting to think about. The first is whether polymath5 really did play a significant role in the discovery of the solution. Terry refers to the work of polymath5, but one of the key polymath5 steps he uses was contributed by him, so perhaps he could have just done the whole thing on his own.

At the very least I would say that polymath5 got him interested in the problem, and took him quickly through the stage I talked about above of looking at it from many different angles. Also, the Fourier reduction argument that Terry found was a sort of response to observations and speculations that had taken place in the earlier discussion, so it seems likely that in some sense polymath5 played a role in provoking Terry to have the thoughts he did. My own experience of polymath projects is that they often provoke me to have thoughts I wouldn’t have had otherwise, even if the relationship between those thoughts and what other people have written is very hard to pin down — it can be a bit like those moments where someone says A, and then you think of B, which appears to have nothing to do with A, but then you manage to reconstruct your daydreamy thought processes to see that A made you think of C, which made you think of D, which made you think of B.

Another question is what should happen to polymath projects that don’t result in a solution of the problem that they are trying to solve, but do have useful ideas. Shouldn’t there come a time when the project “closes” and the participants (and othes) are free to think about the problem individually? I feel strongly that there should, since otherwise there is a danger that a polymath project could actually delay progress on a problem by discouraging research on it. With polymath5 I tried to signal such a “closure” by writing a survey article that was partly about the work of polymath5. And Terry has now written up his work as an individual author, but been careful to say which ingredients of his proof were part of the polymath5 discussion and which were new. That seems to me to be exactly how things should work, but perhaps the lesson for the future is that the closing of a polymath project should be done more explicitly — up to now several of them have just quietly died. I had at one time intended to do rather more than what I did in the survey article, and write up, on behalf of polymath5 and published under the polymath name, a proper paper that would contain the main ideas discovered by polymath5 with full proofs. That would have been a better way of closing the project and would have led to a cleaner situation — Terry could have referred to that paper just as anyone refers to a mathematical paper. But while I regret not getting round to that, I don’t regret it too much, because I also quite like the idea that polymath5’s ideas are freely available on the internet but not in the form of a traditional journal article. (I still think that on balance it would have been better to write up the ideas though.)

Another lesson for the future is that it would be great to have some more polymath projects. We now know that Polymath5 has accelerated the solution of a famous open problem. I think we should be encouraged by this and try to do the same for several other famous open problems, but this time with the idea that as soon as the discussion stalls, the project will be declared to be finished. Gil Kalai has said on his blog that he plans to start a new project: I hope it will happen soon. And at some point when I feel slightly less busy, I would like to start one too, on another notorious problem with an elementary statement. It would be interesting to see whether a large group of people thinking together could find anything new to say about, for example, Frankl’s union-closed conjecture, or the asymptotics of Ramsey numbers, or the cap-set problem, or …

]]>- The logarithmically averaged Chowla and Elliott conjectures for two-point correlations, submitted to Forum of Mathematics, Pi; and
- The Erdos discrepancy problem, submitted to the new arXiv overlay journal, Discrete Analysis (see this recent announcement on Tim Gowers’ blog).

This pair of papers is an outgrowth of these two recent blog posts and the ensuing discussion. In the first paper, we establish the following logarithmically averaged version of the Chowla conjecture (in the case of two-point correlations (or “pair correlations”)):

Theorem 1 (Logarithmically averaged Chowla conjecture)Let be natural numbers, and let be integers such that . Let be a quantity depending on that goes to infinity as . Let denote the Liouville function. Then one has

For comparison, the non-averaged Chowla conjecture would imply that

which is a strictly stronger estimate than (2), and remains open.

The arguments also extend to other completely multiplicative functions than the Liouville function. In particular, one obtains a slightly averaged version of the non-asymptotic Elliott conjecture that was shown in the previous blog post to imply a positive solution to the Erdos discrepancy problem. The averaged version of the conjecture established in this paper is slightly weaker than the one assumed in the previous blog post, but it turns out that the arguments there can be modified without much difficulty to accept this averaged Elliott conjecture as input. In particular, we obtain an unconditional solution to the Erdos discrepancy problem as a consequence; this is detailed in the second paper listed above. In fact we can also handle the vector-valued version of the Erdos discrepancy problem, in which the sequence takes values in the unit sphere of an arbitrary Hilbert space, rather than in .

Estimates such as (2) or (3) are known to be subject to the “parity problem” (discussed numerous times previously on this blog), which roughly speaking means that they cannot be proven solely using “linear” estimates on functions such as the von Mangoldt function. However, it is known that the parity problem can be circumvented using “bilinear” estimates, and this is basically what is done here.

We now describe in informal terms the proof of Theorem 1, focusing on the model case (2) for simplicity. Suppose for contradiction that the left-hand side of (2) was large and (say) positive. Using the multiplicativity , we conclude that

is also large and positive for all primes that are not too large; note here how the logarithmic averaging allows us to leave the constraint unchanged. Summing in , we conclude that

is large and positive for any given set of medium-sized primes. By a standard averaging argument, this implies that

is large for many choices of , where is a medium-sized parameter at our disposal to choose, and we take to be some set of primes that are somewhat smaller than . (A similar approach was taken in this recent paper of Matomaki, Radziwill, and myself to study sign patterns of the Möbius function.) To obtain the required contradiction, one thus wants to demonstrate significant cancellation in the expression (4). As in that paper, we view as a random variable, in which case (4) is essentially a bilinear sum of the random sequence along a random graph on , in which two vertices are connected if they differ by a prime in that divides . A key difficulty in controlling this sum is that for randomly chosen , the sequence and the graph need not be independent. To get around this obstacle we introduce a new argument which we call the “entropy decrement argument” (in analogy with the “density increment argument” and “energy increment argument” that appear in the literature surrounding Szemerédi’s theorem on arithmetic progressions, and also reminiscent of the “entropy compression argument” of Moser and Tardos, discussed in this previous post). This argument, which is a simple consequence of the Shannon entropy inequalities, can be viewed as a quantitative version of the standard subadditivity argument that establishes the existence of Kolmogorov-Sinai entropy in topological dynamical systems; it allows one to select a scale parameter (in some suitable range ) for which the sequence and the graph exhibit some weak independence properties (or more precisely, the mutual information between the two random variables is small).

Informally, the entropy decrement argument goes like this: if the sequence has significant mutual information with , then the entropy of the sequence for will grow a little slower than linearly, due to the fact that the graph has zero entropy (knowledge of more or less completely determines the shifts of the graph); this can be formalised using the classical Shannon inequalities for entropy (and specifically, the non-negativity of conditional mutual information). But the entropy cannot drop below zero, so by increasing as necessary, at some point one must reach a metastable region (cf. the finite convergence principle discussed in this previous blog post), within which very little mutual information can be shared between the sequence and the graph . Curiously, for the application it is not enough to have a purely quantitative version of this argument; one needs a quantitative bound (which gains a factor of a bit more than on the trivial bound for mutual information), and this is surprisingly delicate (it ultimately comes down to the fact that the series diverges, which is only barely true).

Once one locates a scale with the low mutual information property, one can use standard concentration of measure results such as the Hoeffding inequality to approximate (4) by the significantly simpler expression

The important thing here is that Hoeffding’s inequality gives exponentially strong bounds on the failure probability, which is needed to counteract the logarithms that are inevitably present whenever trying to use entropy inequalities. The expression (5) can then be controlled in turn by an application of the Hardy-Littlewood circle method and a non-trivial estimate

for averaged short sums of a modulated Liouville function established in another recent paper by Matomäki, Radziwill and myself.

When one uses this method to study more general sums such as

one ends up having to consider expressions such as

where is the coefficient . When attacking this sum with the circle method, one soon finds oneself in the situation of wanting to locate the large Fourier coefficients of the exponential sum

In many cases (such as in the application to the Erdös discrepancy problem), the coefficient is identically , and one can understand this sum satisfactorily using the classical results of Vinogradov: basically, is large when lies in a “major arc” and is small when it lies in a “minor arc”. For more general functions , the coefficients are more or less arbitrary; the large values of are no longer confined to the major arc case. Fortunately, even in this general situation one can use a restriction theorem for the primes established some time ago by Ben Green and myself to show that there are still only a bounded number of possible locations (up to the uncertainty mandated by the Heisenberg uncertainty principle) where is large, and we can still conclude by using (6). (Actually, as recently pointed out to me by Ben, one does not need the full strength of our result; one only needs the restriction theorem for the primes, which can be proven fairly directly using Plancherel’s theorem and some sieve theory.)

It is tempting to also use the method to attack higher order cases of the (logarithmically) averaged Chowla conjecture, for instance one could try to prove the estimate

The above arguments reduce matters to obtaining some non-trivial cancellation for sums of the form

A little bit of “higher order Fourier analysis” (as was done for very similar sums in the ergodic theory context by Frantzikinakis-Host-Kra and Wooley-Ziegler) lets one control this sort of sum if one can establish a bound of the form

where goes to infinity and is a very slowly growing function of . This looks very similar to (6), but the fact that the supremum is now inside the integral makes the problem much more difficult. However it looks worth attacking (7) further, as this estimate looks like it should have many nice applications (beyond just the case of the logarithmically averaged Chowla or Elliott conjectures, which is already interesting).

For higher than , the same line of analysis requires one to replace the linear phase by more complicated phases, such as quadratic phases or even -step nilsequences. Given that (7) is already beyond the reach of current literature, these even more complicated expressions are also unavailable at present, but one can imagine that they will eventually become tractable, in which case we would obtain an averaged form of the Chowla conjecture for all , which would have a number of consequences (such as a logarithmically averaged version of Sarnak’s conjecture, as per this blog post).

It would of course be very nice to remove the logarithmic averaging, and be able to establish bounds such as (3). I did attempt to do so, but I do not see a way to use the entropy decrement argument in a manner that does not require some sort of averaging of logarithmic type, as it requires one to pick a scale that one cannot specify in advance, which is not a problem for logarithmic averages (which are quite stable with respect to dilations) but is problematic for ordinary averages. But perhaps the problem can be circumvented by some clever modification of the argument. One possible approach would be to start exploiting multiplicativity at products of primes, and not just individual primes, to try to keep the scale fixed, but this makes the concentration of measure part of the argument much more complicated as one loses some independence properties (coming from the Chinese remainder theorem) which allowed one to conclude just from the Hoeffding inequality.

]]>as for any fixed natural number . This conjecture remains open, though there are a number of partial results (e.g. these two previous results of Matomaki, Radziwill, and myself).

A natural generalisation of Chowla’s conjecture was proposed by Elliott. For simplicity we will only consider Elliott’s conjecture for the pair correlations

For such correlations, the conjecture was that one had

as for any natural number , as long as was a completely multiplicative function with magnitude bounded by , and such that

for any Dirichlet character and any real number . In the language of “pretentious number theory”, as developed by Granville and Soundararajan, the hypothesis (2) asserts that the completely multiplicative function does not “pretend” to be like the completely multiplicative function for any character and real number . A condition of this form is necessary; for instance, if is precisely equal to and has period , then is equal to as and (1) clearly fails. The prime number theorem in arithmetic progressions implies that the Liouville function obeys (2), and so the Elliott conjecture contains the Chowla conjecture as a special case.

As it turns out, Elliott’s conjecture is false as stated, with the counterexample having the property that “pretends” *locally* to be the function for in various intervals , where and go to infinity in a certain prescribed sense. See this paper of Matomaki, Radziwill, and myself for details. However, we view this as a technicality, and continue to believe that certain “repaired” versions of Elliott’s conjecture still hold. For instance, our counterexample does not apply when is restricted to be real-valued rather than complex, and we believe that Elliott’s conjecture is valid in this setting. Returning to the complex-valued case, we still expect the asymptotic (1) provided that the condition (2) is replaced by the stronger condition

as for all fixed Dirichlet characters . In our paper we supported this claim by establishing a certain “averaged” version of this conjecture; see that paper for further details. (See also this recent paper of Frantzikinakis and Host which establishes a different averaged version of this conjecture.)

One can make a stronger “non-asymptotic” version of this corrected Elliott conjecture, in which the parameter does not go to infinity, or equivalently that the function is permitted to depend on :

Conjecture 1 (Non-asymptotic Elliott conjecture)Let , let be sufficiently large depending on , and let be sufficiently large depending on . Suppose that is a completely multiplicative function with magnitude bounded by , such thatfor all Dirichlet characters of period at most . Then one has

for all natural numbers .

The -dependent factor in the constraint is necessary, as can be seen by considering the completely multiplicative function (for instance). Again, the results in my previous paper with Matomaki and Radziwill can be viewed as establishing an averaged version of this conjecture.

Meanwhile, we have the following conjecture that is the focus of the Polymath5 project:

Conjecture 2 (Erdös discrepancy conjecture)For any function , the discrepancyis infinite.

It is instructive to compute some near-counterexamples to Conjecture 2 that illustrate the difficulty of the Erdös discrepancy problem. The first near-counterexample is that of a non-principal Dirichlet character that takes values in rather than . For this function, one has from the complete multiplicativity of that

If denotes the period of , then has mean zero on every interval of length , and thus

Thus has bounded discrepancy.

Of course, this is not a true counterexample to Conjecture 2 because can take the value . Let us now consider the following variant example, which is the simplest member of a family of examples studied by Borwein, Choi, and Coons. Let be the non-principal Dirichlet character of period (thus equals when , when , and when ), and define the completely multiplicative function by setting when and . This is about the simplest modification one can make to the previous near-counterexample to eliminate the zeroes. Now consider the sum

with for some large . Writing with coprime to and at most , we can write this sum as

Now observe that . The function has mean zero on every interval of length three, and is equal to mod , and thus

for every , and thus

Thus also has unbounded discrepancy, but only barely so (it grows logarithmically in ). These examples suggest that the main “enemy” to proving Conjecture 2 comes from completely multiplicative functions that somehow “pretend” to be like a Dirichlet character but do not vanish at the zeroes of that character. (Indeed, the special case of Conjecture 2 when is completely multiplicative is already open, appears to be an important subcase.)

All of these conjectures remain open. However, I would like to record in this blog post the following striking connection, illustrating the power of the Elliott conjecture (particularly in its nonasymptotic formulation):

Theorem 3 (Elliott conjecture implies unbounded discrepancy)Conjecture 1 implies Conjecture 2.

The argument relies heavily on two observations that were previously made in connection with the Polymath5 project. The first is a Fourier-analytic reduction that replaces the Erdos Discrepancy Problem with an averaged version for completely multiplicative functions . An application of Cauchy-Schwarz then shows that any counterexample to that version will violate the conclusion of Conjecture 1, so if one assumes that conjecture then must pretend to be like a function of the form . One then uses (a generalisation) of a second argument from Polymath5 to rule out this case, basically by reducing matters to a more complicated version of the Borwein-Choi-Coons analysis. Details are provided below the fold.

There is some hope that the Chowla and Elliott conjectures can be attacked, as the parity barrier which is so impervious to attack for the twin prime conjecture seems to be more permeable in this setting. (For instance, in my previous post I raised a possible approach, based on establishing expander properties of a certain random graph, which seems to get around the parity problem, in principle at least.)

(Update, Sep 25: fixed some treatment of error terms, following a suggestion of Andrew Granville.)

** — 1. Fourier reduction — **

We will prove Theorem 3 by contradiction, assuming that there is a function with bounded discrepancy and then concluding a violation of the Elliott conjecture.

The function need not have any multiplicativity properties, but by using an argument from Polymath5 we can extract a *random* completely multiplicative function which also has good discrepancy properties (albeit in an probabilistic sense only):

Proposition 4 (Fourier reduction)Suppose that is a function such thatThen there exists a random completely multiplicative function of magnitude such that

uniformly for all natural numbers (we allow implied constants to depend on ).

*Proof:* For the readers convenience, we reproduce the Polymath5 argument.

The space of completely multiplicative functions of magnitude can be identified with the infinite product since is determined by its values at the primes. In particular, this space is compact metrisable in the product topology. The functions are continuous in this topology for all . By vague compactness of probability measures on compact metrisable spaces (Prokhorov’s theorem), it thus suffices to construct, for each , a random completely multiplicative function of magnitude such that

for all , where the implied constant is uniform in and .

for all (the implied constant can depend on but is otherwise absolute). Let , and let be the primes up to . Let be a natural number that we assume to be sufficiently large depending on . Define a function by the formula

for . We also define the function by setting whenever is in (this is well defined for ). Applying (4) for and of the form with , we conclude that

for all and all but of the elements of . For the exceptional elements, we have the trivial bound

Square-summing in , we conclude (if is sufficiently large depending on ) that

By Fourier expansion, we can write

where , , and

A little Fourier-analytic calculation then allows us to write the left-hand side of (5) as

On the other hand, from the Plancherel identity we have

and so we can interpret as the probability distribution of a random frequency . The estimate (5) now takes the form

for all . If we then define the completely multiplicative function by setting for , and for all other primes, we obtain

for all , as desired.

Remark 5A similar reduction applies if the original function took values in the unit sphere of a complex Hilbert space, rather than in . Conversely, the random constructed above can be viewed as an element of a unit sphere of a suitable Hilbert space, so the conclusion of Proposition 4 is in fact logically equivalent to failure of the Hilbert space version of the Erdös discrepancy conjecture.

Remark 6From linearity of expectation, we see from Proposition 4 that for any natural number , we haveand hence for each we conclude that there exists a

deterministiccompletely multiplicative function of unit magnitude such thatThis was the original formulation of the Fourier reduction in Polymath5, however the fact that varied with made this formulation inconvenient for our argument.

** — 2. Applying the Elliott conjecture — **

Suppose for contradiction that Conjecture 1 holds but that there exists a function of bounded discrepancy in the sense of (3). By Proposition 4, we may thus find a random completely multiplicative function of magnitude such that

We now use Elliott’s conjecture as a sort of “inverse theorem” (in the spirit of the inverse sumset theorem of Frieman, and the inverse theorems for the Gowers uniformity norms) to force to pretend to behave like a modulated character quite often.

Proposition 7Let the hypotheses and notation be as above. Let , and suppose that is sufficiently large depending on . Then with probability , one can find a Dirichlet character of period and a real number such that

*Proof:* We use the van der Corput trick. Let be a moderately large natural number depending on to be chosen later, and suppose that is sufficiently large depending on . From (6) and the triangle inequality we have

so from Markov’s inequality we see with probability that

Let us condition to this event. Shifting by we conclude (for large enough) that

and hence by the triangle inequality

which we rewrite as

We can square the left-hand side out as

The diagonal term contributes to this expression. Thus, for sufficiently large depending on , we can apply the triangle inequality and pigeonhole principle to find *distinct* such that

By symmetry we can take . Setting , we conclude (for large enough) that

Applying Conjecture 1 in the contrapositive, we obtain the claim.

The conclusion (8) asserts that in some sense, “pretends” to be like the function ; as it has magnitude one, it should resemble the function discussed in the introduction. The remaining task is to find some generalisation of the argument that shows that had (logarithmically) large discrepancy to show that likewise fails to obey (6).

** — 3. Ruling out correlation with modulated characters — **

We now use (a generalisation of) this Polymath5 argument. Let be the random completely multiplicative function provided by Proposition 4. We will need the following parameters:

- A sufficiently small quantity .
- A natural number that is sufficiently large depending on .
- A quantity that is sufficiently small depending on .
- A natural number that is sufficiently large depending on .
- A real number that is sufficiently large depending on .

By Proposition 7, we see with probability that there exists a Dirichlet character of period and a real number such that

By reducing if necessary we may assume that is primitive.

It will be convenient to cut down the size of .

*Proof:* By Proposition 7 with replaced by , we see that with probability , one can find a Dirichlet character of period and a real number such that

We may assume that , since we are done otherwise. Applying the pretentious triangle inequality (see Lemma 3.1 of this paper of Granville and Soundararajan), we conclude that

However, from the Vinogradov-Korobov zero-free region for (see this previous blog post) it is not difficult to show that

if is sufficiently large depending on , a contradiction. The claim follows.

Let us now condition to the probability event that , exist obeying (8) and the bound (9).

The bound (8) asserts that “pretends” to be like the completely multiplicative function . We can formalise this by making the factorisation

where is the completely multiplicative function of magnitude defined by setting for and for , and is the completely multiplicative function of magnitude defined by setting for , and for . The function should be compared with the function of the same name studied in the introduction.

The bound (8) then becomes

We now perform some manipulations to remove the and factors from and isolate the factor, which is more tractable to compute with; then we will perform more computations to arrive at an expression just involving which we will be able to evaluate fairly easily.

From (6) and the triangle inequality we have

for all (even after conditioning to the event). The averaging will not be used until much later in the argument, and the reader may wish to ignore it for now.

By (10), the above estimate can be written as

For we can use (9) to conclude that . The contribution of the error term is negligible, thus

for all . We can factor out the to obtain

For we can crudely bound the left-hand side by . If is sufficiently small, we can then sum weighted by and conclude that

(The zeta function type weight of will be convenient later in the argument when one has to perform some multiplicative number theory, as the relevant sums can be computed quite directly and easily using Euler products.) Thus, with probability , one has

We condition to this event. We have successfully eliminated the role of ; we now work to eliminate . Call a residue class *bad* if is divisible by for some and . and *good* otherwise. We restrict to good residue classes, thus

By Cauchy-Schwarz, we conclude that

Now we claim that for a in a good residue class , the quantity does not depend on . Indeed, by hypothesis, is not divisible by for any and is thus a factor of , and is coprime to . We then factor

where in the last line we use the periodicity of . Thus we have , and so

Shifting by we see that

Now, we perform some multiplicative number theory to understand the innermost sum. From taking Euler products we have

for ; from (11) and Mertens’ theorem one can easily verify that

More generally, for any Dirichlet character we have

Since

we have

which after using (11), Cauchy-Schwarz (using and Mertens theorem gives

for any non-principal character of period dividing ; for a principal character of period dividing we have

since and hence for all , where we have also used (13). By expansion into Dirichlet characters we conclude that

for all and primitive residue classes . For non-primitive residue classes , we write and . The previous arguments then give

which since gives (again using (13))

for all (not necessarily primitive). Inserting this back into (12) we see that

and thus by (13) we conclude (for large enough) that

We have now eliminated both and . The remaining task is to establish some lower bound on the discrepancy of that will contradict (14). As mentioned above, this will be a more complicated variant of the Borwein-Choi-Coons analysis in the introduction. The square in (14) will be helpful in dealing with the fact that we don’t have good control on the for (as we shall see, the squaring introduces two terms of this type that end up cancelling each other).

We expand (14) to obtain

Write and , thus and for we have

We thus have

We reinstate the bad . The number of such is at most , so their total contribution here is which is negligible, thus

For or , the inner sum is , which by the divisor bound gives a negligible contribution. Thus we may restrict to . Note that as is already restricted to numbers coprime to , and divide , we may replace the constraints with for .

We consider the contribution of an off-diagonal term for a fixed choice of . To handle these terms we expand the non-principal character as a linear combination of for with Fourier coefficients . Thus we can expand out

as a linear combination of expressions of the form

with and coefficients of size .

The constraints are either inconsistent, or constrain to a single residue class . Writing , we have

for some phase that can depend on but is independent of . If , then at least one of the two quantities and is divisible by a prime that does not divide the other quantity. Therefore cannot be divisible by , and thus by . We can then sum the geometric series in (or ) and conclude that

and so by the divisor bound the off-diagonal terms contribute at most to (15). For large, this is negligible, and thus we only need to consider the diagonal contribution . Here the terms helpfully cancel, and we obtain

We have now eliminated , leaving only the Dirichlet character which is much easier to work with. We gather terms and write the left-hand side as

The summand in is now non-negative. We can thus throw away all the except of the form with , to conclude that

It is now that we finally take advantage of the averaging to simplify the summation. Observe from the triangle inequality that for any and one has

summing over we conclude that

In particular, by the pigeonhole principle there exists such that

Shifting by and discarding some terms, we conclude that

Observe that for a fixed there is exactly one in the inner sum, and . Thus we have

Making the change of variables , we thus have

But is periodic of period with mean , thus

and thus

which leads to a contradiction for large enough (note the logarithmic growth in here, matching the logarithmic growth in the Borwein-Choi-Coons analysis). The claim follows.

]]>The problem is to show that if is an infinite sequence of s, then for every there exist and such that has modulus at least . This result is straightforward to prove by an exhaustive search when . One thing that the Polymath project did was to discover several sequences of length 1124 such that no sum has modulus greater than 2, and despite some effort nobody managed to find a longer one. That was enough to convince me that 1124 was the correct bound.

However, the new result shows the danger of this kind of empirical evidence. The authors used state of the art SAT solvers to find a sequence of length 1160 with no sum having modulus greater than 2, and also showed that this bound is best possible. Of this second statement, they write the following: “The negative witness, that is, the DRUP unsatisfiability certificate, is probably one of longest proofs of a non-trivial mathematical result ever produced. Its gigantic size is comparable, for example, with the size of the whole Wikipedia, so one may have doubts about to which degree this can be accepted as a proof of a mathematical statement.”

I personally am relaxed about huge computer proofs like this. It is conceivable that the authors made a mistake somewhere, but that is true of conventional proofs as well. The paper is by Boris Konev and Alexei Lisitsa and appears here.

]]>A very obvious candidate for a discrepancy theorem that we could try to modularize is Roth’s theorem, which asserts that for any -valued function on there exists an arithmetic progression such that . That gives rise to the following problem.

**Problem.** *Let be a prime. What is the smallest such that for every function that never takes the value 0, every can be expressed as for some arithmetic progression ?*

In this post I shall collect together a few simple observations about this question.

1. We can at least prove that such an exists. Indeed, by Szemerédi’s theorem, if is large enough then there is an arithmetic progression of length on which is constant. Since takes non-zero values and is prime, the sums along initial segments of run through all the numbers mod .

So the question is really asking whether the result can be proved with a reasonable bound for .

2. Roth’s discrepancy result tells us that if and takes only the values , then takes all values mod . That is because there is a progression such that in , and by what one might call the discrete intermediate value theorem, it therefore takes all values in between 0 and or between and . This is reasonably compelling evidence that the conjecture is true with a decent bound, but of course the intermediate-value-theorem argument breaks down completely when can take arbitrary non-zero values mod .

The known lower bounds for Roth’s discrepancy theorem for APs (which are equal to the upper bounds up to a constant) show that the best possible bound for the modular question is at least .

3. To make the problem more symmetrical, and therefore potentially easier to handle, it might be a good idea to begin by tackling the following variant.

**Problem.** *Let be a prime. What is the smallest such that for every function that never takes the value 0, every can be expressed as for some arithmetic progression in ?*

The big difference here is that arithmetic progressions are allowed to “wrap around”. That means that for every arithmetic progression and every coprime to (it might be convenient to insist that is prime), the sets and are also progressions. I think the bounds in the integer case with functions show that the best bound one could hope for for this modified problem are , but I need to check that.

3. One natural approach to proving that a function takes all values mod would be to attempt to show that it takes all values with approximately the same frequency. This would potentially allow us to bring in analytic tools. However, it is not in general true. For example, let be the function that is up to and from to . Then a small calculation shows that if is a mod- AP of common difference , then is never more than . I think can probably be taken to be 2, or maybe even 1. (By I mean the smallest modulus of any number congruent to mod .) Thus, for a positive proportion of common differences , the sums along APs of common difference never exceed , say, in modulus. If is large, this implies that the values of the sums are very far from uniformly distributed, since they are concentrated around 0. On the other hand, is obviously not a counterexample to the conjecture (by which I mean the assertion that the modular statement holds with a good bound).

4. That observation does not rule out a proof that goes via showing that the values of sums along APs are approximately uniformly distributed, but it does demonstrate that in order to prove that, we would have to put some conditions on . That is, we would need a two-step argument along the following lines (reminiscent of proofs of Szemerédi’s theorem, though I would hope that this problem is easier).

(i) If is quasirandom, or at least “contains substantial quasirandomness”, then the values of are approximately uniformly distributed mod .

(ii) If is not quasirandom, or better still “does not contain substantial quasirandomness”, then we can deduce in some other way (such as finding a long AP on which is constant, but that may be too much to ask) that the sums take all values.

5. The weaker the property we can get away with in (i), the better. However, in order to get started it would be good to find *any* quasirandomness property that implies that the values of are approximately uniformly distributed.

6. One thing that needs deciding here is what we mean by a random progression . The simplest would be to fix some and pick random mod- APs of length . These don’t work for the modular conjecture in general, since the constant function is a counterexample, but they might work for functions with a suitable quasirandomness property.

7. The fact that we can’t fix the length of the progressions shows that the modular conjecture differs in an important way from Roth’s discrepancy theorem, where fixing the length is not a problem (especially in the mod- version). This dampens any hopes one might start off with for adapting the proof of Roth’s discrepancy theorem to cope with the modular version. But that might in the end be a good thing, since the point of the modular conjecture is to introduce new techniques that can then be brought to bear on EDP.

8. When we are looking for a suitable quasirandomness property, it is tempting to turn to Fourier analysis. However, the following (modification of a standard) example is a bit troubling. Let be defined as follows. For each we randomly decide whether to set equal to or mod (where is the residue between 0 and ). If we now choose a random mod- arithmetic progression of length 4, there is an absolute constant such that with probability at least does not wrap around, and , and . When this happens . Therefore, the value 0 occurs much too often, but according to the natural Fourier definitions of quasirandomness (which we could get by looking at the function ) this function is highly quasirandom.

9. Of course, we are looking at much longer arithmetic progressions. However, it is difficult to think of proofs that work for long arithmetic progressions and fail for short ones. (It is mildly encouraging that at least the above source of examples seems to break down when the progressions become long — though that should be checked carefully.)

10. Hmm … just after writing that I thought of the following modified example. Instead of choosing *randomly* whether to set equal to or , let’s do it in the following highly deterministic way: if mod 4 we set equal to , if mod 4 we set it equal to , if mod 4 we set it equal to , and if mod 4 we set it equal to . In addition, let’s suppose that is a multiple of , since I seem to need that to make this example work. (Even if we can’t get rid of this restriction somehow, it will still show that a proof that worked for prime values of would have to make strong use of the fact that was not a multiple of — and there would be many other examples of a similar kind that would lead to similar requirements — which would be quite a big restriction on what it could look like.)

Then if is a multiple of 4 and is a random mod- AP of length , there is a probability 1/16 that and mod 4. (Note that because this makes sense independently of which residue we pick.) If that is the case, we can partition into APs of length 4 such that the sum of along each is zero. This shows that with probability at least 1/16 the sum is 0 mod .

Is this function quasirandom? According to many definitions, yes, since although we split into cases in a highly structured way, the behaviour of each case of is highly quasirandom.

I mentioned above that there are many similar examples. To give just one illustration, we could take as a multiple of and let take turns taking values , , , and . Essentially the same argument would work, but now the probability would be . (Actually, I now see that the probabilities can be doubled in both cases, since works just as well as .)

I’ll leave it there. I’ve mentioned some proof strategies, and also discussed some difficulties with them. Does anyone have other proof strategies to suggest? I haven’t mentioned using the polynomial method, but that is an obvious thing to try to do: that is, write down a polynomial that vanishes identically only if the modular conjecture for APs is true, and try to prove that it vanishes identically. It is certainly worth looking into that, though it is a little discouraging that the conjecture is false for APs of any fixed length, since that would necessarily make the polynomials less symmetrical. If the above strategy seems the most promising (or should that be least unpromising?) then does anyone have any thoughts about quasirandomness properties that might do the job? (Gil mentioned something in a recent comment — it would be good to have that thought in more detail.)

I’ll end by saying that even if proving the modular conjecture for mod- APs ended up having nothing much to do with EDP, it is a very nice problem on its own, and could make an excellent spin-off Polymath project.

]]>In this post I want to mention three strengthenings of EDP. One of them I find interesting but not promising as a way of proving EDP, for reasons that I will explain. The other two look to me also very interesting and much more promising. All of them have been mentioned already, but the point of this post is to collect them in one convenient place.

**Restricting the allowable common differences.**

I know of no -sequence that has bounded discrepancy on all HAPs with common difference either a prime or a power of 2. The sequence has bounded discrepancy on all HAPs of prime common difference, but if you allow powers of 2 as well, then no periodic construction can work, since if the period is for some odd integer , then a HAP of common difference will have a bias of at least 1 in each interval of length , for parity reasons, and this bias will gradually accumulate.

One can defeat powers of 2 by using the Morse sequence (if you haven’t seen it, then it’s a nice exercise to see how it is generated), but that has unbounded discrepancy on HAPs of odd prime length. (I can’t remember exactly why.)

Why do I not think that this potential strengthening of EDP is likely to be useful for attacking the problem? Well, one promising feature of EDP is that it seems to generalize to sequences taking other kinds of values, such as complex numbers of modulus 1 or even unit vectors in an arbitrary Hilbert space. However, something that I’ve only just noticed (though others may have spotted it ages ago) is that if one restricts to, say, prime-power common differences, then even the complex version of the problem becomes false. A very simple counterexample is the sequence , where (that is, a primitive sixth root of 1). Then the sum along any HAP with common difference that isn’t a multiple of 6 will be a sum of a GP with common ratio , and will therefore be (uniformly) bounded. In particular, this is true of HAPs with prime-power common difference.

This simple example also shows that a certain real generalization of EDP is false when you restrict the common differences in this way: you cannot prove unbounded discrepancy for sequences that take values in the set . The counterexample is simply twice the real part of the example above: the sequence repeated over and over again. So if (and I think it’s a big if) EDP is true for HAPs with common differences that are either primes or powers of 2, then any proof must make pretty strong use of the fact that the sequence is a -valued sequence. This rules out a lot of promising techniques, so it appears to make the problem harder rather than easier.

**A discrepancy question about matrices.**

In my earlier post I mentioned what I called the non-symmetric vector-valued EDP. I have subsequently realized (though perhaps a better word is “remembered” since I must have been sort of aware of this at some point) that it is equivalent to another discrepancy statement that I now find more appealing. The statement is the following.

**Conjecture.** Let be a function from to and suppose that for every . Then for every real number there exist HAPs and such that .

If we take a function and define , then for every the above conjecture, if true, gives us HAPs and such that , which proves that the discrepancy of is at least . So the above conjecture implies EDP. But the class of functions of the form with is a very small subset of the class of all functions that take the value 1 along the diagonal, so this conjecture is very much stronger than EDP.

It is also equivalent to the statement that for every there exists a diagonal matrix of trace at least 1 that can be written as a linear combination , where for each and are characteristic functions of HAPs and and . (This is the statement that I discussed at length in my previous post.) In one direction this is easy: if a decomposition of that kind exists, then equals the trace of and is therefore greater than 1, but it also equals , which is at most . It follows that there is some such that .

In the other direction, if no such decomposition of a diagonal matrix exists, then the Hahn-Banach theorem gives us a linear functional that separates the class of diagonal matrices with trace at least 1 from the class of linear combinations of HAP products defined above. It is easy to check that this functional must be a diagonal matrix with a constant diagonal such that the value along the diagonal is at least 1 and such that for any two HAPs and .

The conjecture is so much stronger than EDP that I think it would be a mistake just to assume that it is true. My guess is that it *is* true, but I would be very interested if there was a counterexample (even if, like the complex sequence above, it is disappointingly simple — in fact, if there is a counterexample, then it seems quite likely that it will be fairly simple). And if there isn’t a counterexample, then the fact that it is so much stronger a conjecture than EDP does this time make me think that the result might be easier to prove than EDP itself.

Until very recently, I had been mainly interested in the dual version of the question (that is, the question about decomposing diagonal matrices), but now it seems to me that the matrix discrepancy question is worth thinking about directly. It is a clean question, and it has the big advantage over the original EDP question that it does not restrict values to the set , so a number of methods can be used that cannot be used directly for EDP. For instance, linear programming can be used to get experimental results: Sasha Nikolov may be going to look into this.

Given any matrix , one can find vectors and such that , so this matrix question is actually trivially equivalent to the non-symmetric vector-valued question I had formulated earlier. But expressing it in terms of vectors makes it harder to think about rather than easier, and that is what had previously put me off thinking about the question directly.

There is one class of matrices that is particularly worth mentioning I think. If EDP is true but this matrix question is false, then the best candidates for counterexamples to the matrix problem are probably matrices of high rank, and an obvious class of matrices that tend to have high rank is matrices that are constant on diagonals (that is, Toeplitz matrices).

Suppose, then, that our matrix is defined to be for some function such that . Then , where by I mean the number of ways of writing as with and . So we have a class of functions that I’ll call *HAP convolutions*, and we’d like to show that if is any function with and is any constant, then there exists a HAP convolution such that . This is true if and only if there is an efficient way of writing the function that is 1 at 0 and 0 everywhere else as a linear combination of HAP convolutions. This is another question that could be investigated using linear programming, and perhaps since it concerns functions of just one variable we could get more extensive results than we could for the general matrix problem.

**The modular conjecture.**

In his most recent guest post, Gil Kalai reformulates EDP as a question about sums mod . EDP is trivially equivalent to the following assertion.

**Conjecture.** For every there exists such that for every sequence of length and every there exists a HAP such that mod .

At first that doesn’t look like a very interesting reformulation since it is too obviously equivalent to EDP. But what makes it interesting is that it has a very natural generalization that doesn’t have any obvious counterexamples.

**Stronger Conjecture.** For every there exists such that for every sequence of length of non-zero numbers mod and every there exists a HAP such that mod .

In other words, we replace the condition that the sequence takes values by the much weaker condition that it is never zero. Gil calls this the *modular conjecture*. (He also presented it in a comment on a much earlier EDP post.)

As Gil points out in his post, one can write down a polynomial that is identically zero (mod ) if and only if the modular conjecture is true for . It is tempting to try to prove that it is zero by analysing its coefficients. More generally, this approach to the problem appears to open the door to a number of algebraic methods.

If you want to prove EDP this way, you have to solve the conjecture for some non-zero . (Since is prime, if you can show it for one then you’ve shown it for all.) However, the problem with is interesting in its own right, and in particular it seems to be interestingly different from the problem with non-zero . At the time of writing, I don’t see any way of modifying the EDP examples to obtain exponentially long sequences mod that avoid zero sums — it seems to me that the upper bound on the length could be significantly smaller. That would be interesting, as it would place constraints on what a proof could look like for non-zero .

**A fourth strengthening.**

This isn’t really meant as part of the body of the post, but more of an afterthought. A strengthening of EDP that has been considered since very early on in the project is where you replace a sequence by a sequence of unit vectors in a Hilbert space. To be precise, one looks at the following statement.

**Conjecture.** Let be a sequence of unit vectors in a Hilbert space and let be a real number. Then there exists a HAP such that .

There is something slightly curious about this conjecture, which is that it is very hard to see how having infinitely many dimensions to play with could help one find a counterexample. In some sense, if you use too many dimensions, then it ought to be the case that there will be a HAP such that the vectors with are pointing in lots of different directions and therefore not cancelling. That makes me wonder whether one can prove that if there is a counterexample to the above conjecture, then there must be a counterexample for some finite-dimensional Hilbert space. Or if that is too much to ask, perhaps there might have to be a counterexample where all the live in some compact set. I think it would be very interesting to think about whether the vague intuition I have just expressed can be made precise. As it stands, it is of course not even close to a proper argument. (I should stress one aspect of what I am saying. If there is any vector sequence of bounded discrepancy, then one can arbitrarily modify for each prime and the sequence will still have bounded discrepancy. So I’m not suggesting that every bounded-discrepancy sequence lives, or almost lives, in finite dimensions, but that it can be used to construct one that does.)

As I write that, another question occurs to me. For this one I don’t even have a vague intuitive argument. Let’s suppose that we can find a counterexample to the matrix discrepancy question earlier. Suppose also that takes the form for some (not necessarily unit) vectors and in a Hilbert space. Must there be an example where the and lie in a finite-dimensional Hilbert space, or at least in a compact subset of a Hilbert space?

**A combined generalization.**

A second afterthought is that the matrix question has a modular version that might be of some interest. Let be a prime and let be a function from to with for every . Must the sums of on products take all possible values mod ? What if we merely ask for to take non-zero values on the diagonal?

If the strong modular conjecture is false, then we can turn a counterexample into a diagonal matrix in the obvious way and we get a counterexample to the second question. Indeed, suppose that for every and when . Then , which avoids some value, since is a HAP. But with matrices there are many more ways of trying to avoid particular values, so the strong modular matrix conjecture looks like a much stronger statement. Again there are two cases — avoiding 0 and avoiding a non-zero value — of which only the second obviously implies EDP.

]]>The polynomial method is another basic combinatorial technique that occasionally works. One way to describe the method is as a way to translate a combinatorial statement into the vanishing of a certain polynomial modulo .

**Theorem: **Every graph (or hypergraph) *G* with *n* vertices and *2n+1* edges contains a nontrivial subgraph *H* with all vertex-degrees divisible by 3.

(This is a theorem of Noga Alon, Shmuel Friedland, and me from 1984.)

**Before the proof**: If we want to get a subgraph with all vertex degrees even then we need* n* edges (or *n+1* edges for hypergraphs). This has a simple linear algebra proof which also gives an efficient algorithm.

**From-scratch proof sketch: **Associate with every edge *e* of the graph a variable . Consider the two polynomials

*P=* , and

Q=

If the theorem is false then *P-Q=0, *as polynomials over the field with three elements. This is impossible since P is a polynomial of degree 4n while Q is a polynomial which has a monomial of degree 4n+2 with nonzero coefficient.

The theorem follows more directly from a theorem of Chevalley-Warning and even more directly from a theorem of Olson, but the above proof serves best our purpose.

1) The polynomial method has many applications but only in specific cases. It is not nearly as widely applicable as, say, the probabilistic method.

2) Good basic references: A. Blokhuis, Polynomials in finite geometries and combinatorics. In Keith Walker, editor, *Surveys in Combinatorics, 1993*, pages 35-52. Cambridge University Press, 1993.

Noga Alon, Combinatorial Nullstellensatz, Combinatorics, Probability and Computing 8 (1999), 7-29.

3) The polynomial method is related to the “linear algebra method” in combinatorics. Often, however, direct linear algebraic proofs lead to efficient algorithms while this is not known for applications of the polynomial method. For example, no polynomial algorithm to find the graph *H* in the above theorem is known, and there is a related complexity class introduced by Christos Papadimitrou . The polynomial method is closely related to arguments coming from the theory of error-correcting codes, and to arguments in TCS related to interactive proofs and PCP.

The following is an equivalent way to formulate the Erdős 1932 conjecture that the discrepancy for EDP is unbounded.

1) Consider the sequence as a sequence with modulo , where is a prime that we can choose as large as we want.

2) Then every number modulo can be expressed as a sum of the sequence along a HAP modulo .

Translating EDP (in this form) into a statement about polynomials modulo is cumbersome. But one thing we may have going for us is that it suggests a natural extension of EDP where the supposed-to-vanish polynomial is simpler.

**Modular EDP Conjecture:** Consider a sequence of non-zero numbers modulo *p*. Then if *n* is sufficiently large w.r.t. *p*, then every number can be expressed as a sum of the sequence along a HAP modulo *p*.

As in the original EDP we can consider general sequences or just multiplicative sequences.

Here is the polynomial identity in *n* variables we need to prove over when grows to infinity with as slow as we wish. For every , ,

(*)

These polynomials are not familiar but they are related to generating functions which arise in permutation statistics. In particular, when we look at the product

and expand it to monomials, the coefficients have a combinatorial meaning in terms of permutations and inversions.

Given a permutation , and an integer we can ask how many inversions are there between and a smaller integer. This is a number between 1 and .

The coefficient of in the above product is the number of permutations where there are integers contributing j inversions. The proposed identity (*) may be expressed in terms of modular properties of such permutation statistics.

Challenge:Prove the modular EDP using the polynomial method.

It is especially easy to apply the large deviation heuristic to the modular version of EDP. Suppose we want to compute the probability that all HAP-sums miss the outcome .

Given , the probability that is not is . So we are interested in the value of with . (Restricting our attention to multiplicative sequences will divide the exponents on both sides by .) Solving this equation gives us . The LDH heuristic comes with a firm prediction and a weak prediction. In this case the LDH gives

a) (Firm prediction) There are sequences violating the modular EDP when .

b) (Weak prediction) There are no such sequences when .

The firm prediction is correct by the log*n* discrepancy constructions for EDP and as a matter of fact the LDH itself gives an even stronger prediction of for -sequences. By restricting our attention to sequences we see that the weak prediction is incorrect and LDH for the modular EDP is blind to the special substructure of sequences. Note that the firm conjecture is far from being known when we extend the modular EDP and replace all integers by a random subset of integers, or by square-free integers , or by SCJ-systems of integers etc.

The approach to the Erdős discrepancy problem that, rightly or wrongly, I found most promising when we were last working on it was to prove a certain statement about matrices that can be shown quite easily to imply a positive solution to the problem. In this post, I’m going to treat that matrix statement as *the* problem, and think about how one might go about trying to prove it. I’ll give the brief explanation of why it implies EDP, but not of what the possible advantages of the approach are (discussions of which can be found in some of the earlier material).

First I’ll need to recall a couple of definitions and some notation. A *homogeneous arithmetic progression*, or HAP, is a set of the form for some pair of positive integers and . Let me say that is a *HAP-function* if it is the characteristic function of a HAP. Finally, if and are functions defined on , let denote the function defined on that takes the value at , which can be thought of as an infinite matrix.

Here, then, is the statement that it would be great to prove. What is nice (but also quite daunting) about it is that it is a pure existence statement.

**Problem.** Prove that for every there exists a real matrix decomposition with the following properties.

(i) is a diagonal matrix and .

(ii) Each and is a HAP-function.

(iii) .

I have stated the problem in this form because (for a reason I will outline below) I am confident that such a decomposition exists. However, it seems to be quite hard to find.

Why would the existence of an efficient decomposition of a diagonal matrix into products of HAP-functions prove EDP? Well, let be a -sequence and consider the quantity . On the one hand it equals

,

and on the other hand it equals

.

Since , there must exist such that , from which it follows that there exists a HAP such that . This shows that must have HAP-discrepancy at least for every , and we are done.

Of course, one can deduce anything from a false premise, so one needs at least some reason to believe that the decomposition exists. Quite a good reason is that the existence of the decomposition can be shown (without too much difficulty) to be equivalent to the following generalization of EDP.

**Problem.** Show that if and are any two sequences of vectors in a Hilbert space such that for every , then for every constant there exist HAPs and such that .

This is a strengthening of EDP, since it clearly implies EDP when the Hilbert space is . However, it doesn’t seem like such a huge strengthening that it might be false even if EDP is true. At any rate, it seems just as hard to find a counterexample to this as it does to EDP itself. (You might ask why I believe that EDP is true. It’s partly blind faith, but the heuristics presented by Gil Kalai in the previous post offer some support, as does — very weakly — the experimental evidence.)

**How does one ever solve existence problems?**

Let’s begin by thinking very generally about the situation we’re in: we have to come up with a mathematical object that has certain properties . (Of course, we could combine those into one property, but it is often more natural to think of the object as having several different properties simultaneously.)

I’m going to list strategies until I can’t (or at least can’t immediately) think of any more that seem to have any chance of success. (An example of a strategy that I think has no chance of success for this problem is to search for a just-do-it proof, the problem being that the constraints appear to be quite delicate, whereas a just-do-it proof works better when one has a number of independent constraints that are reasonably easy to satisfy simultaneously.)

1. Use an object you already know, possibly with some small modifications.

2. Start with examples that work for a certain constant , and use them as building blocks for larger and more complicated examples that work with a larger constant .

3. Use some kind of duality to prove that if an example does not exist, then some other object *does* exist, and show that in fact the second object does not exist.

4. Use a computer to search for small examples, try to discern a pattern in those examples, use that pattern to guess how to construct larger examples, and check that the guess works.

5. Strengthen the conditions on the object until there is a unique (or almost unique) object that satisfies those conditions.

6. Weaken the conditions on the object (e.g. by forgetting about some of the properties ), try to describe all objects that satisfy the weaker conditions, and then reformulate the problem as that of finding an object of that description that satisfies the stronger conditions. [As a very simple example, if you were required to find a positive solution of some quadratic equation, you would solve the equation without the positivity condition and then pick out a positive root.]

7. Assume that you have an object with the required properties, deduce some interesting further properties, find a natural object with the further properties, and hope that it has the original properties as well. [This is a variant of 6.]

**How do the general strategies fare when it comes to EDP?**

I will discuss them briefly at first, before concentrating in more detail on a few of them.

1. This doesn’t appear to be the kind of problem where there is already some matrix decomposition (or related mathematical object) waiting in the wings to be modified so that it becomes the example we want. Or if there is, the only way we are likely to discover it is by pinning down more precisely the properties we are interested in so that someone somewhere says, “Hang on, that sounds like X.” But I think it is very unlikely that we will discover an example that way.

2. I am definitely interested in the idea of starting with small examples and using them to build bigger examples. However, an approach like this needs plenty of thought: non-trivial small examples seem hard to come by, and the only obvious method for building bigger ones is taking linear combinations of smaller ones. (A trivial small example is to take a linear combination of the form where each is the characteristic function of a singleton. This gives us . To get anything interesting, we need the to be HAP-functions for longer HAPs, but then it is very difficult to get the off-diagonal terms to cancel.)

3. We are trying to write as an efficient linear combination of objects of a certain form. That is a classic situation for applying the Hahn-Banach theorem: if there is no such linear combination, then there must be a separating functional. What can we say about that functional?

This approach seems promising at first, but is in fact of no use at all, since if you pursue it, the conclusion you come to is that if there is no such linear combination, then there are sequences of unit vectors and in a Hilbert space and a constant such that for every pair of HAPs and . The reason duality doesn’t help is that we used duality to reformulate EDP as a problem about decomposing diagonal matrices. If we use duality again, we get back to (a generalization of) EDP.

4. We have tried this. Moses Charikar and others have written programs to search for the most efficient decompositions (for this and for a related problem). The results are useful in that they give one some feel for what the diagonal matrix needs to be like, but it does not seem to be possible to obtain a large enough matrix to use it to go as far as guessing a formula. For that, more theoretical methods will be necessary (but knowing roughly what the answer needs to look like should make finding those methods quite a bit quicker).

5. One difficulty is that we are not looking for a unique object: it would be nice if the properties we were trying to obtain determined the matrix and its decomposition uniquely, since then we could hope to *calculate* them. An obvious property to add is that the object is in some sense extremal (though what that sense is is not completely obvious). Another thing we can do is insist on various symmetries — something that has already been tried. Yet another is to restrict the class of HAPs that may be used: as far as we can tell at the moment, we can get away with HAPs with common differences that are all either prime or a power of 2. (I say that simply because it seems to be hard to find a sequence that has bounded discrepancy on all such HAPs. Gil’s heuristics suggest that the discrepancy should grow like for sequences of length .)

6. What weaker properties might we go for? One idea is simply to try to find a linear combination of products of HAP-functions that leads to a considerable amount of cancellation off the diagonal, even if there are still off-diagonal terms left. This would be interesting because if one naively tries to find a decomposition that works, it seems to be very hard to get a substantial amount of cancellation (except if very small HAPs are used, in which case the resulting decomposition is not very informative). Another (somewhat related) idea is to look for interesting decompositions but with not particularly large values of . Yet another might be to allow a wider class of functions that HAP-functions.

7. I don’t have much idea of how to apply this strategy to the entire decomposition, but there is plenty one can say about the diagonal matrix that ones wishes to decompose, and that information, if it could be expressed neatly, would surely be very useful: it seems much easier to try to decompose a specific diagonal matrix than to try to decompose a diagonal matrix that you have to choose first, when the vast majority of diagonal matrices *cannot* be efficiently decomposed.

The proof that the diagonal decomposition implies EDP actually proves a much stronger result. It gives us a function such that such that if is any sequence of real numbers (not necessarily -valued) with , then has unbounded HAP-discrepancy. To put that loosely, the sequence cannot correlate with the square of any sequence of bounded discrepancy. That puts very strong conditions on , but so far we have not fully understood what those conditions are. It seems that we should be able to make progress on this, perhaps even adding reasonable conditions that would allow us determine the function uniquely.

**Creating new examples out of old ones.**

Suppose that and . Clearly if we take a linear combination of and , we get another decomposition of a diagonal matrix, and if the original decompositions were into products of HAPs then so is the new one.

Can we do anything more interesting? Another possibility is to look for some kind of product. Now Dirichlet convolutions are natural objects to look at in the context of EDP, since the expression can be thought of as follows. You start with a function defined on , then you define the dilate to be the function that takes the value at and 0 at non-multiples of , and finally you take a linear combination of those dilates. If is identically 1, then this is taking a linear combination of HAPs.

We can’t take a Dirichlet product of two matrices, but we can do something similar: given functions and of two integer variables we can take the function . If and are diagonal (meaning that they are zero if ), then the terms of the sum are zero unless and , so in this case , considered as a function of only, is the Dirichlet convolution of and , also considered as functions of one variable.

This operation also respects products in the following sense. If and , then

,

which equals , where is the Dirichlet convolution of and and is the Dirichlet convolution of and .

One further obvious fact is that the operation is bilinear in the two matrices that it is applied to.

So far everything is working swimmingly: out of two decompositions of diagonal matrices into linear combinations of products we build another one that has some number-theoretic significance. However, there is a problem: a Dirichlet convolution of two HAP-functions is not an HAP-function except in trivial cases.

How much does that matter? Without doing some detailed calculations I don’t know. Here are a couple of simple observations. Let and be two HAP-functions. Then their Dirichlet convolution is a sum of dilates of , one for each point in the HAP of which is the characteristic function. So we can at least decompose the Dirichlet convolution of two HAP-functions as a sum of HAP-functions. The sum of the coefficients is the number of points in the HAP corresponding to — though since the situation is symmetrical, we can add dilates of instead, so it is better to say that the sum of the coefficients can be taken to be the length of the smaller of the two HAPs.

That doesn’t sound very efficient, but we must also remember that when we take the Dirichlet convolution of two diagonal matrices, we obtain a diagonal matrix that sums to the product of the sums of the original two matrices. So there may be some gain there to compensate for the loss of efficiency. That is why I say that more detailed calculations are necessary, a point I will return to at some stage.

**What can we say about the diagonal matrix ?**

As I have already mentioned, if a diagonal decomposition of the required kind exists, then we can place strong constraints on the diagonal matrix itself. Writing for , it must have the following property: if is any real-valued function defined on the integers such that , then the HAP-discrepancy of must be at least . (This follows from the proof that the existence of the decomposition for a diagonal matrix with implies that every -sequence has HAP-discrepancy at least .)

Turning that round, if we have any real-valued sequence of discrepancy at most , then cannot be more than .

A simple example of the kind of conclusion one can draw from that is that the sum of over all s that are not multiples of 3 is at most 1. This follows from the fact that the sequence has HAP-discrepancy 1. More generally, if is large, then for all small most of the largeness must occur on multiples of .

In qualitative terms, this suggests that ought to be concentrated at numbers with many factors, and the experimental evidence backs this up, though not quite as cleanly as one might ideally like. (However, “edge effects” could well account for some of the peculiar features that are observed experimentally, so I still think it is worth trying to find a very aesthetically satisfying diagonal matrix .)

Here is another reason to expect that should be larger for numbers with many factors: they are contained in lots of HAPs. We are trying to find a measure on (at least if we insist that takes non-negative values, which seems a reasonable extra condition to try to impose) such that any function with a large -norm with respect to that measure must have a large HAP-discrepancy. What might make it difficult to find a function with large -norm and small HAP-discrepancy? It would be that we had lots of constraints on the values taken by the functions. And the natural way that we can place lots of constraints on one value is if has many factors and is therefore contained in many HAPs.

As an example of the opposite phenomenon, suppose we take an arbitrary sequence and alter its values however we like at all primes. What will happen to its HAP-discrepancy? Well, each HAP contains at most one prime, so we cannot have changed the HAP-discrepancy by more than 2. This illustrates fairly dramatically how little it matters what a sequence does at numbers with few factors.

**A first wild guess.**

It is tempting to jump straight from this kind of observation to a guess of a function that might work. For example, what if we took some primes , a collection of subsets of and let be the characteristic measure of the set of all numbers that can be written as for some . That is, takes the value on all such numbers and zero on all other numbers.

Does a function like that have any chance of working? Because we have ensured that it sums to 1, we would now be looking for a decomposition with absolute values of coefficients summing to at most for a small constant . But before we even think about doing that, we should try to find a sequence that is large on many of the products of primes but that has bounded discrepancy. Only if we find it hard to do that is it worth looking for a decomposition.

Let me be more precise about what we are looking for. If we can find a decomposition of (the diagonal matrix with entries given by ) as where the and are HAP-functions and , then for every sequence , we have two ways of calculating . One of them gives us , while the other gives us . So if , then there must be some HAP on which has discrepancy at least .

Therefore, in order to show that this particular will not do, we need to find a sequence such that averages at least 1 on the numbers but has bounded discrepancy.

Let us make our task easier to think about by looking for a sequence that takes values on the numbers . Can we choose the values elsewhere in such a way that we cancel out any discrepancy that we might accidentally pick up with those values?

To answer this question, let us think about how HAPs intersect the set, which, since I’m mentioning it quite a bit, I’d better give a name to: I’ll write for the set of numbers with . Let me also write as shorthand for . Any HAP that intersects must have a common difference of the form for some set . The resulting HAP will contain every for which , which is rather pleasantly combinatorial.

Let’s suppose that we’ve chosen the values everywhere on . What we are trying to do now is choose values off (not necessarily ) in such a way that any HAP-discrepancy contributed by values in is cancelled out by a roughly opposite discrepancy in the values outside .

An obvious way to do that is to be fairly greedy about it. That is, we look at HAPs that have a substantial intersection with and simply choose a few further values on those HAPs, hoping that our choices won’t mess up too many discrepancies elsewhere. Let’s see if we can get something like that to work.

Consider, then, the (infinite) HAP with common difference . As already mentioned, this intersects in every such that . So as we trundle along the multiples , we find that the partial sums of the s that we encounter and have already chosen go up and down, and we would like to choose some further values to ensure that they remain bounded.

At which values of are we still at liberty to choose the value ? It must be a multiple of , and two ways of ensuring that that multiple does not lie in are (i) to make sure that is also a multiple of some with and (ii) to make sure that is a multiple of for some .

The question now is whether if we do that we will find that the value we choose has an effect on lots of other HAPs at the same time. And it looks rather as though it will. Suppose is quite a large set. Then any multiple that does not belong to is also a multiple of for every , so there is indeed a danger that by choosing a value of we are messing up quite a lot of HAPs and not just the one we were interested in. We could of course try to sort out the HAPs with large first and later correct any damage we had done to smaller sets — but a large set has a lot of subsets.

This is another situation where the devil is in the detail. Maybe there is enough flexibility that some kind of careful greedy approach would work — perhaps we could even choose to be 1 everywhere on — but so far it seems at least possible that there exists an efficient decomposition of a diagonal matrix with entries that are concentrated on square-free numbers with many prime factors.

It is sometimes convenient to describe arithmetic progressions of the form as HAPs. Let me do that now, and let me define two HAPs to be *adjacent* if they are disjoint and their union is also a HAP (in this generalized sense). In other words, two HAPs are adjacent if one is simply a continuation of the other.

If and are HAP-functions coming from adjacent HAPs and (in which case I’ll say that they are adjacent HAP-functions), then is 1 on the diagonal at all points in the HAP , while off the diagonal it is 1 on and latex P\times Q\cup Q\times P$. If we take lots of products of this form, we can hope to get lots of cancellation off the diagonal, while getting none at all on the diagonal.

Let’s try to pursue this idea in more detail. Suppose we have some kind of probability distribution on functions of this form. That is, we pick, according to some distribution, a random set and then a random pair of adjacent HAPs and . Now let and be their characteristic functions and let and be two positive integers. What is the expected value of ?

A necessary condition for this quantity to be non-zero is that should divide both and . So let us condition on this event. If we are clever about the way we choose our probability distribution, we should be able to organize it so that if you choose a random , then typically plenty of its subsets have a similar probability of being chosen. The reason that is potentially important is that if we fix a common difference and pick random pairs of adjacent HAPs and of some given length, then pairs that are close to the diagonal will tend to get positive values (because normally if one of them is in then so is the other, and similarly for ) while pairs that are a little further away tend to get negative values. But if we have many different common differences in operation then we can hope that some of these biases will cancel out: a difference that tends to result in a positive value at one scale might tend to result in a negative value at another scale.

While writing this I have just remembered a useful trick that we were well aware of during the previous attempt at EDP. It is straightforward to show that EDP is equivalent to the same question for the rationals. That is, one wishes to show that for every function defined on the positive rationals and every constant there is a HAP (the definition is obvious) on which the sum of the function has absolute value at least . The nice thing about this formulation is a much greater symmetry: for every rational , the map is an isomorphism from (considered as a group under addition) to itself. This makes it unnecessary to think about numbers with lots of factors. (It is not hard to get into this nice situation using just integers — for example, you can just multiply everything by for a very large and you can divide your numbers by whatever you like, within reason — but it is more natural to use rationals.)

The hope was that if we thought about functions defined on the rationals, then the rather peculiar properties of the diagonal matrix one wishes to decompose might become less peculiar. However, one would still need to think carefully about *ratios* of numbers.

I’m tempted to continue thinking aloud about these possibilities, but I’ll save that up for the comments.

**A second wild guess.**

I’m now going to ignore what I’ve just written about working with the rationals and go back to the question of trying to find a natural function that is biased towards positive integers with many factors. What I’d like to do is build a sequence of non-negative functions , each better than the last, in the hope of eventually finding one that has (or at least doesn’t obviously not have) the property that every sequence such that has large HAP-discrepancy. (To make this non-trivial, I also need .)

I’ll start with a function I know doesn’t work: let for every and 0 for every . That fails because the sequence has HAP-discrepancy 1 but (at least if is a multiple of 3). That doesn’t satisfy the condition I said, but it does if we multiply it by .

The problem there was that has a large sum on non-multiples of 3. To correct that and similar problems, it feels natural to give greater weight to numbers with more factors. But how? Well, let’s do it in a rather naive way and simply weight each number by how many factors it has (and therefore how many HAPs it belongs to). This at least has the advantage of being a natural number-theoretic function: it is (up to a constant) the Dirichlet convolution of the constant function 1 with itself, at least until you get beyond . That is, .

Roughly how big is ? This sum is the number of pairs of positive integers such that (since is the number of pairs of positive integers such that ). Counting the number of pairs with we can crudely estimate this as , which is roughly . On the other hand, the Dirichlet convolution with itself of the function that is up to and 0 thereafter sums to (since each pair of positive integers with contributes 1 to the sum). Since the two functions are equal up to , we find that the vast majority of the second function lies beyond the point where it equals .

But let’s not worry about that. On average, grows like , which is not too fast a rate of growth, so let’s simply take the function up to . We now want to prove that every sequence such that has HAP-discrepancy at least (for some that tends to infinity with ). Alternatively, we want to find a counterexample, which will induce us to try to improve the function.

An obvious example to try is the usual one, namely the sequence . To see whether this works, we need to estimate the sum of over all non-multiples of up to . That equals the number of pairs such that neither nor is a multiple of 3 and . And that is roughly times what you get without the divisibility condition. In other words, it is roughly . So to get to equal we need to multiply the sequence by , which gives it a HAP-discrepancy of , a slight improvement on the that we obtained with a constant function.

This doesn’t prove much — maybe there are better examples — but it suggests in a weak way that taking Dirichlet convolutions results in improved functions. To be clear what I mean by “improved”, I am now looking for a non-negative function defined on such that for large , or perhaps even all , if is any sequence such that , then has HAP-discrepancy at least . The bigger I can get to be, the better I consider the function to be.

If you take repeated Dirichlet convolutions of the constant function 1, then you get functions , where is the number of -tuples of positive integers such that . (The function is equal to .) Suppose is large, is very large, and is a sequence such that . Does it follow that has large HAP-discrepancy? More ambitiously, does it have HAP-discrepancy that grows exponentially with , provided that is large enough?

**Back to the rationals.**

The following idea is one that feels familiar — I think something similar has come up already. But there’s nothing like a break from a problem to make an old stale idea seem new and fresh, and sometimes the apparent staleness was an illusion. So here it is again.

I’ll begin by creating a function on the diagonal that seems to me to have a sporting chance of being efficiently decomposable into HAPs. Let (for generating set) be the set and let be the -fold multiplicative convolution of the characteristic function of with itself. That is, is the number of ways of writing as with and all the and being integers between 1 and .

Before I think about how one might go about decomposing the diagonal “matrix” (that is, a function defined on that is zero at unless ) into products of HAP-functions, I want to see whether it looks hard to find a function such that but is of bounded HAP-discrepancy.

To have any chance of that, we need to understand a bit about . We can get that understanding by representing every rational as a product of (possibly negative) powers of distinct primes. For the rationals where is non-zero, the primes are all at most , so we can think of as a function defined on a lattice of dimension (the number of primes less than or equal to ). Then is the -fold convolution of the characteristic function of regarded as a subset of this lattice. If is large enough, then this convolution will look like a large and spread-out Gaussian, which will have the potentially useful property that it is approximately invariant if you multiply (or divide) by an integer between 1 and .

If has that property, does it rule out some kind of variant of the example? A key feature of that example is that the function in question is zero on multiples of 3, but in there is no such thing as a multiple of 3 (or rather, everything is a multiple of 3 so the concept ceases to make sense).

But that argument is far from conclusive. How about defining a function as follows: given a rational in its lowest terms, let be 0 if either or is divisible by 3, and otherwise 1 if mod 3 and if mod 3?

Let us think what the values of are at , , etc. If is a multiple of 3, then all those values are 0 (since is not a multiple of 3). If is a multiple of but not , then except if is a multiple of . If and are congruent mod 3, then the values of go . If they are not congruent mod 3, then they go . That last argument worked even if , so we have covered all cases.

What is the density of rationals such that neither numerator nor denominator is a multiple of 3? I haven’t made the question precise, and that allows me to give two different answers. The first answer is that if we pick the numerator and denominator randomly, then the probability that neither is a multiple of 3 is . Also, the probability that they are coprime is , and the probability that they are both coprime and not multiples of 3 can easily be shown by similar methods to be at least an absolute constant. So it seems that our function is supported on a set of rationals of positive density.

But another way of thinking about it suggests that the support of the function has zero density: every rational can be written as a product of powers of primes, and the probability that the power of 3 that we choose is 0 tends to 0 as we choose more and more rationals.

The second answer is, I think, the correct one for us, though the question is a bit misleading. The -density of rationals for which the “3-coordinate” is zero does indeed tend to zero, so if we regard as our measure, rather than something more additive, then we do appear to have ruled out examples that are similar to the example.

**Can we use information about the original problem to help us with the dual problem?**

We know that for the integers there is not an efficient HAP-decomposition of a diagonal matrix that is 1 for the first terms and zero afterwards. The proof we have is that if such a decomposition existed, then the sequence would have to have large HAP-discrepancy, which it doesn’t. But can we give a “direct” proof? I’m not quite sure what I mean by that, except that it should not involve the use of a separating functional. Ideally what I’d like to do is use that example and others like it to work out some rather general constraint that would end up saying a precise version of, “If an efficiently decomposable matrix equals 1 on the diagonal, then it must have quite a lot of stuff off the diagonal,” and ideally say a fair amount about that “stuff”.

The hope would be to find a clear obstacle *on the dual side* to a direct approach to decomposing the identity matrix, and then to show that when we move to the measure on the rationals defined in the previous section, that obstacle goes away and allows us to do what we wished we could do over the integers. The thing that would make life easier would be the ability to divide anything by small integers.

Actually, maybe it isn’t so important to go for the dual side. Let’s just understand what the existing proof is telling us goes wrong when we try to find an efficient HAP-decomposition. For the decomposition to be efficient, the HAPs we use have to be quite long. And for that to be the case, they must contain roughly as many numbers congruent to 1 mod 3 as they do numbers congruent to 2 mod 3. (They might contain none of either.) Therefore, any HAP product has roughly as many points congruent to or mod 3 as points congruent to or mod 3. Let us call these *white* points and *black* points, respectively. By roughly I mean that these two numbers differ by at most a constant. In fact, I think they differ by at most 1. But no points on the diagonal are congruent to or to , so if a product of HAPs includes points on the diagonal that are congruent to or , then it must include at least more black points than white points off the diagonal.

So, roughly speaking, we can’t build up the non-multiples of 3 on the diagonal without building up a bias towards black points off the diagonal.

Does that matter? Can’t we *correct* that unwanted bias by adding or subtracting some other HAP products? No we can’t, unless we also cancel out much of the sum on the diagonal.

What rescues us if we deal with rationals instead (or more precisely a portion of the rationals weighted by the function )? Also, how easy is it to find an efficient decomposition so that the weight off the diagonal is comparable to the weight on it? I suspect the answer to the second question is that it is not easy, since we have used just the fact that the sequence has bounded discrepancy, but there are many other examples.

**A variant of the HAP problem.**

I stopped the previous section prematurely because I was feeling a bit stuck. Instead I would like to think about a modified problem that may well be easier. EDP asks for a high discrepancy inside some HAP. The set of HAPs has the nice symmetry property that if you dilate a HAP then you get another one. This is particularly good over the rationals, when you can also contract.

Now let me ask a slightly vague question and then attempt to make it precise. Can we prove EDP for a different “base collection” of sets? For EDP I’m regarding the base collection as the collection of all sets . If you look at all dilations of those, you get all HAPs. Since I’m having trouble proving EDP for HAPs, I’d like to try to prove it for a different base collection. Of course, the smaller the collection, and the more closely related it is to the set of HAPs, the better, but for now my priority is to be able to prove *something*.

I actually have a collection of sets in mind. Let us think of the rationals multiplicatively as an infinite-dimensional lattice, or rather the portion of that lattice that consists of points with only finitely many non-zero coordinates. That is, the point corresponds to the rational , the being integers. I’m not sure exactly what I want to do next, so let me be slightly less precise. If we look at the rationals multiplicatively, as we are doing here, then dilation becomes simply translation. What I’d really like is something like an orthonormal basis, but I’d like it to consist largely of functions that are translates of one another. So far that won’t work, but it looks a lot more hopeful if we allow some kind of “spreading out” (which would be the analogue of taking longer and longer intervals). The most natural type of spreading out is, I would suggest, to take repeated convolutions. So a possible question — not necessarily the exact one I’ll end up asking — is this. Suppose we start with the set that we considered earlier. Is it true that every function that has large norm relative to the measure defined by (multiplicatively) convolving times has a large inner product with some (multiplicative) translate of some iterated convolution of ?

This problem should be not that hard since it is purely multiplicative, so we can take logs and make it purely additive. If the answer turned out to be yes, then for any -function we would get a large inner product with some iterated convolution of . If we could nicely decompose that iterated convolution into HAPs, we would then be done, though that seems like quite a lot to ask.

Hmm … I’ve almost instantly run into trouble. It seems that my “multiplicative version” of EDP is just plain trivially false. Consider the function that is 1 for products of an even number of primes and -1 for products of odd numbers of primes. (This is the completely multiplicative function , with which we are already very familiar.) From a multiplicative point of view, where we think of as an infinite-dimensional lattice, this is putting 1 at “white points” and -1 at “black points”. If we now take any “cuboid”, by which I mean a set of the form for every , then the sum of on that cuboid is either 0, 1 or -1. I was about to say that I couldn’t see an elegant proof of this, but here’s a fairly quick one. If we remove two adjacent slices from the cuboid, we remove an equal number of white and black squares. “Removing two adjacent slices” means reducing by 2, provided . If some is even, then by repeating this operation for that we can remove all points, which shows that the numbers of white and black points were originally the same. If all are odd, then we end up reducing the cuboid to a set with just one point, in which case we get .

If we multiplicatively convolve any two cuboids, we obtain a function that can be expressed as a linear combination of translates of a cuboid, so its inner product with will also be small, as will those of all *its* translates. (What’s more, those small inner products will themselves alternate in sign as you multiply or divide by primes.) So will have small inner product with everything you build out of products of GPs using multiplicative convolutions.

It could be fruitful to think about why HAPs have a better chance of giving rise to discrepancy than the kinds of multiplicative sets I’ve just considered. In preparation for that, here is a question to which I know the answer. Let be a nested collection of subsets of such that for every and . Is it possible for there to be a function and a constant such that for every and every ? In the case that for every this is EDP. But for a general nested collection, it is easy to see that bounded discrepancy is possible for that collection and all its dilates. You just take a completely multiplicative function such as and choose in such a way that for every (and such that the are distinct and every integer is equal to for some ). Then the sum is always either -1 or 0, and multiplicativity ensures that the sum on dilates of the is always either 1, -1 or 0.

The partial sums of grow (it is believed) like , so if we choose our sets greedily — that is, by taking to be the smallest positive integer we have not yet chosen such that takes the value — then will contain the first integers, where is around (at most), and look a bit like a random selection of integers within around of for the rest of the set. In other words, will be pretty close to the set , so the system of sets will be pretty close to the set of all HAPs. This suggests that proving EDP is going to be really quite delicate.

In fact, it will be more delicate still, since we can take a better multiplicative function than . If we take the unique completely multiplicative function that takes to 1 if is congruent to 1 mod 3, to if is congruent to 2 mod 3, and 3 to -1, its partial sums grow logarithmically, so we can say something similar to the above but with around rather than around .

We can go slightly further with this function. Because it is 1 on the infinite arithmetic progression and on the infinite arithmetic progression , we can choose our sets to be intervals up to roughly plus logarithmically short (non-homogeneous) arithmetic progressions of common difference 3.

Since these sets are cooked up so as to ensure that the function has bounded discrepancy, they might seem not that interesting. But I think they are important, because if we want to find an efficient decomposition of a diagonal matrix into HAP products, we probably need to understand what it is that makes HAPs so much better than HAPs with highly structured tiny logarithmic perturbations at the end. Otherwise we are in danger of taking seriously arguments that would work just as well for statements that we know to be false.

As I write, it occurs to me that we might be able to prove a result that would be quite a bit easier than EDP but nevertheless interesting (and in particular interesting enough to be publishable, unless the answer turns out to be disappointingly easy): that there *exists* a permutation of such that the discrepancy of every sequence on the sets of the form is unbounded. We know it is not true for all permutations — even ones that in some sense don’t permute by very much — and we suspect that it is true for the identity permutation but find that very hard to prove. What about if is in some sense “fairly random”? For example, what if we chose a large and randomly permuted the first integers. That would give us sets and all their dilates. Could we prove that the discrepancy on that set system is at least ? What if we drop the condition that the union of the should be all of ? What if instead we ask that for some strictly increasing sequence ?

I rather like those questions, which makes this a good place to stop the post — for the length of which I apologize.

]]>Here is a very general probabilistic-based heuristic that seems to give good predictions for questions related to EDP. I will refer to this heuristic as “LDH”. (In my polymath5 comments I referred to it as PH – probabilistic heuristic)). I am thankful to Noga Alon and to Yuval Peres for some helpful help.

Here is an example: Suppose we want to study the following extremal problem.

What is the largest number of edges in a graph on n vertices with no triangle.

If we use the probabilistic method we can ask what is the probability that a random graph in contains no triangle. As long as this probability is positive we know that a triangle-free graph with n vertices and m edges exists. (Being a little careful we can consider instead of where . Looking at random graphs gives us a perfectly correct proof of the assertion that there are triangle-free graphs with vertices and edges for every .

**LDH**:

1) Estimate naively the probability that a random graph in G(n,m) contains no triangle.

2) Choose m so that this estimated probability behaves like 1 over the number of graphs with n vertices and m edges.

So let’s implement this plan. The probability that a random graph in does not contain a specific triangle is . Naively assuming that these probabilities are independent we estimate the probability of not having any triangle as . We want to find so that this probability is roughly . This is the case when roughly .

**LDH prediction for Mantel-Turán problem:** The maximum number of edges in a triangle-free graph behaves like .

In other words, the LDH gives two predictions that we will refer to as the “firm” prediction and the “weak” prediction.

A) (The firm prediction.) There exist triangle-free graphs with vertices and edges

and

B) (The weak prediction.) There are no (substantially) larger triangle-free graphs.

Prediction A is correct. There are indeed triangle-free graphs with vertices and edges. (But the LDH does not prove their existence.) Prediction B is miserably false: Actually there are graphs with edges without a triangle. The LDH heuristic ignores the fact that not containing one triangle is not independent of not containing another, and is completely blind to large bipartite graphs.

**Excercise: **What is the LDH prediction for the question: How large can a subset of the integers {1,2,…,n} be if it contains no 3-term arithmetic progression?

We would like to propose the following points regarding the large deviation heuristic:

- LDH predictions about the existence of some combinatorial objects are quite often true. (We refer to such predictions as the
*firm predictions*.) - LDH weak predictions are blind to various structured examples. Sometimes, if we understand the relevant structures we can update the large deviation predictions. (I will come back to this at the end of the post.)
- LDH predictions appear to be quite good for the Erdős Discrepancy Problem (EDP) and for variations of EDP. This is what we are going to discuss now.

Since LDH is based on a heuristic method to compute probabilities it is quite possible that different heuristics will give different answers but, overall, we did not encounter this.

**Problem 1**: Find natural examples where the LDH firm prediction, namely the prediction for the existence of certain combinatorial objects, fails.

**Problem 2:** Find ways to improve the LDH when it fails. (Especially when the answer is known by other methods).

We consider the multiplicative version of Erdős Discrepancy Problem . The number of multiplicative -sequences of length is close to . (These sequences are determined by their values on prime indices and the Prime Number Theorem tells us that the number of primes smaller than behaves like .) We expect that the LDH computations for the general question will give the same answer.

What we need to compute is:

What is the value of so that the probability that all partial sums of a random sequence of length belong to the interval satisfies ?

This question can be formulated in terms of a simple random walk of length on the line. We start at the origin and at each step we go a unit length to the left or to the right with probability 1/2 for each direction. We want to know the probability that the random walk will be confined to the interval and the value of for which this probability is .

**Problem 3 (the answer is known):** What is this value ?

Towards answering problem 2, I asked over Mathoverflow “What is the probability that a random walk of length n will be confined to the interval ” and Douglas Zare provided a very detailed answer. Yet, I did not complete the work needed to answer Problem 2. So a few weeks ago I asked Yuval Peres

What is the probability that the simple random walk of n steps will be confined to the interval , and what is the value of for which this probability is ?

And a few hours later I received the following reply from Yuval:

Gil, the confinement probability in decays up to a constant like where is known: it is . This is classical and you can find it e.g. in Feller volume 2 or in Spitzer’s book. This holds for all . So the answer to your query is that for a suitable .

So, we get:

The LDH prediction for the EDP is that the maximum discrepancy of a multiplicative sequence of behaves like . (The same prediction applies to general sequences.)

If we consider general sequences we get the same answer. We need to compute the probability that for a random sequence of length , all HAP are confined to . These HAP are random sequences of lengths , , . The probability that the initial sums of a random sequence of length is confined to is . If we assume that these probabilities are independent we get an estimate of which behaves like when . We get the same outcome. Of course the confinement of different HAPs to are not independent. In fact they seem positively correlated and this strengthens the case for the firm prediction. But I don’t know how such positive correlation can be used to prove that the firm prediction is correct. (Of course, the feeling that small discrepancy on different HAP are positively correlated is rather tentative. We know that for every individual HAP we have with positive probability discrepancy bounded by 1. Yet the probability that this happens for all HAPs is zero.)

**Problem 4:** Fix two positive integers . For a function consider the two events

1) For a maximal HAP of gap the initial sums of the function are confined to .

2) For a maximal HAP of gap the initial sums of the function are confined to .

Are these two events positively correlated when is large enough?

At an earlier time, I had a version of the LDH based on gaps between vanishing partial sums. Let me discuss it here. We start with the following question: What is a probability that a sequence of of length will not have a vanishing partial sum where ? Another way to ask this question is:

What is the probability that a simple random walk of length t will not reach zero in the interval ?

For our purposes all we need to know is that this probability tends to some constant strictly between 0 and 1. The precise value is related to a classic question and let me cite another email by Yuval Peres about it:

The probability that a simple random walk

~~will~~not meet 0 in the time interval [s,t], where s=xt, tends as to . This is one of the two classical arcsine laws for random walks that you can find in many sources, including e.g. Durrett’s book or proposition 5.7 page 137 in My Brownian book . There you will see this law applies to all random walks with increments of mean zero and finite variance. More combinatorial arguments for the special case of SRW can be found in Feller vol I, as well as in these slides.

Given a random walk on the line we will try to estimate the probability that we can find sequence of indices for which the random walk reaches the origin so that the differences between consecutive indices is between and and compute the value of for which this probability is . Since the probability for a random walk of steps to reach the origin between the th and th steps is a certain constant we have the following LDH predictions:

(Firm prediction) There exists a multiplicative sequence of length such that the gap between two consecutive vanishing partial sums is (up to a multiplicative constant) at most .

(Weak prediction) This is best possible.

Some additional gymnastics allow us to move from the prediction regarding multiplicative sequences of length where all the gaps between consecutive zeros behave like to sequences where, in addition, the maximum discrepancy behaves like . The probability that between every two consecutive zeros the maximum discrepancy will be smaller than is indeed small but if is large this too behaves like so the indirect applications of the LDH based on gaps between consecutive zeroes gives the same prediction as the direct prediction above.

Here too the computation for general sequences gives the same outcome and indicates positive correlation between events which in the heuristic are pretended to be independent.

(Let me remark that over the polymath5 threads there were several remarks in the direction of trying to show that there are no sequences of length where the differences between consecutive vanishing partial sums are bounded for all HAP. This is weaker than what is required to show that the discrepancy is unbounded.)

Let me mention briefly several variations of EDP and what LDH says about them. The LDH is responsible for the guesses we propose in the previous post for variations **E1-E8** of EDP.

The weak LDH predictions will not change if we consider sequences where the zero entries occupy the indices divisible by 3. But the prediction fails in this case (the discrepancy can be bounded).

If we consider only HAPs with prime power differences the LDH prediction is the square root of . I would guess that this is the true behavior. Note that if we consider HAPs with prime differences, we have the same LDH prediction, but since we can have bounded discrepancy the weak LDH prediction is false.

The firm prediction of the LDH predicts a polylog(n) discrepancy (in fact even discrepancy) when we restrict our attention to square free integers and to random subsets of integers.

Let be the hypergraph in which we consider precisely one edge which contains every element with probability . Let be the hypergraph obtianed fron by adding as edges to all initial segments of edges in . The LDH predicts that the discrapancy of $\cal H$ is bounded. When we move from to then we come back to the prediction. Such probabilistic versions take away the number-theoretic aspect from EDP. Still the probabilities of low discrepancy for different edges are not independent and the problem still looks hard.

Of course, it is of interest to understand the LDH predictions (both for and ) for the general case when we have probabilities and the edges are random subsets based on these probabilities. The most famous case is when we consider all s to be 1/2 and . Joel Spencer’s Six Standard Deviation Theorem asserts that the discrepancy for random subsets of is at most . Nore generally it was proved that the discrepancy of a random hypergraph with edges behaves like when and like for .

Recall that Roth’s theorem is about the discrepancy of the hypergraph whose edges correspond to all arithmetic progressions in {1,2,…,n}.

The LDH predicts the correct answer, at least roughly. The probability that a maximal AP of gap r will be confined to [-K,K] is . When we have to multiply the th power of , for , where is small and this will give us for . The contribution coming from APs of larger gaps will be of a smaller order of magnitide.

**Problem 6:** Are the methods of proving upper bounds for the disrepancy problem for APs relevant for proving better upper bounds for EDP, its extensions, and its variations?

We started this post by trying to answer a classical problem in extremal graph theory using a probabilistic heuristic which is based on large deviation estimates. It would not be irresponsible to say that the heuristic estimates proposed a very poor prediction to the extremal problem we considered.

Let be a fixed graph and let be the maximum number of edges for a graph on vertices that does not contain as a subgraph. Turán’s 1941 theorem determined the value of when is a complete graph with vertices. Turán’s theorem is the starting point of the wide and deep area of extremal graph theory. The case of triangle-free graphs goes back to Mantel in 1907.

One of the important discoveries in graph theory which is related both to additive number theory and to probability theory is the notion of limits of graphs. This notion is connected to the famous Szemeredi lemma. The theory of limits of graphs sheds new light on extremal graph theory; in a sense it tells us what the relevant structures for are when is not bipartite.

A recent paper entitled *The large deviation principle for the Erdős-Rényi random graph* by Sourav Chatterjee and S. R. S. Varadhan revealed a connection between large deviation for properties of Erdős-Renyi graphs and graphons – limits of graphs. (It complements a large body of related results obtained by various other methods.) Here is the abstract.

Abstract:What does an Erdős-Renyi graph look like when a rare event happens? This paper answers this question when p is fixed and n tends to infinity by establishing a large deviation principle under an appropriate topology. The formulation and proof of the main result uses the recent development of the theory of graph limits by Lovasz and coauthors and Szemeredi’s regularity lemma from graph theory. As a basic application of the general principle, we work out large deviations for the number of triangles in G(n,p). Surprisingly, even this simple example yields an interesting double phase transition.

So we can “morally” understand why the weak prediction of LDH fails for the property of “including a triangle”. To obtain a better prediction we also need to condition on various relevant limit structures of the random graph.

Consider the following graph process. We start with the empty graph on vertices and add random edges one after the other conditioned on not forming a triangle. Bohman proved that such a process for will lead with substantial probability to a triangle-free graph not containing an independent set of size . This proved a conjecture of Joel Spencer and gave a new proof to a famous result by Jeong Han Kim (with a remarkable history that I won’t describe here). Can we apply the LDH to the triangle-free processes to give a heuristic argument why triangle-free graphs on vertices with edges exist?

**Problem 7:** Let be a fixed **bipartite** graph. Does the LDH give good predictions for ?

It turns out that for extremal problems on graphs the LDH gives quite similar predictions to those obtained by the rigorous well known edge-deletion method based on the following proposition:

**Proposition:** If, for a random graph with n vertices and 2m edges, the expected number of copies of is smaller than half the number of edges, then .

For Turán-type problems, the LDH’s predictions are quite similar to those obtained by the edge-deletion method. (I am not aware of a similar trick for discrepancy problems.)The LDH predicts the existence of -free graphs with roughly times more edges than what the edge-deletion method gives. Achieving an improvement of similar kind is a difficult and well-known problem in extremal combinatorics. See the paper: T. Bohman and P. Keevash. The early evolution of the H-free process. Inventiones Mathematicae, 181, 291–336, 2010.

The LDH prediction are still far from the correct answer. Erdős conjectured that for any bipartite with degeneracy the Turán number is at most , and that the Turan number is iff the graph is bipartite and 2-degenerate. For more on that see, e.g., N. Alon, M. Krivelevich and B. Sudakov, Turan numbers of bipartite graphs and related Ramsey-type questions, Combinatorics, Probability and Computing 12 (2003), 477-494.

]]>Let be a hypergraph, i.e., a collection of subsets of a ground set . The** discrepancy** of , denoted by is the minimum over all functions of the maximum over all of

.

We will mention one additional definition, that of **hereditary discrepancy**. When is a hypergraph and , the restriction of to is the hypergraph with vertex set whose edges are all sets of the form for edges of . The hereditary discrepancy of is the maximum over all of the discrepancy of restricted to .

Here is a link for a recent post discussing discrepancy and the famous Beck-Fiala theorem. The Beck-Fiala theorem assert that if every element in is included in at most sets in then . (Of course, the theorem applies also to the hereditary discrepancy.)

**Erdős Discrepancy Problem (EDP).** *Is it possible to find a -valued sequence and a constant such that for every and every ?*

A HAP (Homogeneous Arithmetic Progression) is an arithmetic prograssion of the form {k,2k,…,rk}. EDP asks about the sum of a sequence on HAPs.

Given a sequence define to be the maximum of sums of subsequences over HAPs which are subsets of {1,2,3,…,n}. EDP asks if we can find a sequence for which is uniformly bounded and we will be interested in finding sequences where grows slowly.

EDP was extensively discussed and studied in polymath5. Here is the link of the first post. Here are links to all polymath5 posts. Here is the link to polymath5 wiki.

A sequence is called *completely multiplicative* if for every and . EDP is of great interest even if we restrict our attention to completely multiplicative sequences. For those we have only to consider partial sums .

The function that takes n to 1 if the last non-zero digit of n in its ternary representation is 1 and -1 if the last non-zero digit is 2 is completely multiplicative and the partial sum up to n is easily shown to be at most . Therefore, the rate at which the worst discrepancy grows, as a function of the length of the homogeneous progression, can be as slow as logarithmic.

A random sequence (or a random completely multiplicative sequence) gives us discrepancy close to . (There is apparently an additional factor but I am not sure of the precise asymptotic behavior.)

From now on we write instead of .

**Greedy algorithm 1** (for multiplicative functions): Assign the value as to minimize the maximum discrepancy in every partial sum whose terms are now determined.

**Greedy algorithm 1** (for general functions): Assign the value so as to minimize the maximum discrepancy in every HAP whose terms are now determined.

**Problem 1:** How does Greedy algorithm 1 perform?

**Empirical observation:** (This was claimed in some polymath 5 remarks but I am not sure if there was definite evidence. I would appreciate clarifications.) The discrepancy for sequences based on the greedy algorithm 1 (for multiplicative functions and for general functions) is .

**Interpretation:** Greedy algorithm 1 optimizes an “irrelevant task”.

We would like to suggest here

**Greedy algorithm 2** (for multiplicative sequences): Assign the value so as to minimize the maximum discrepancy in every partial sum where unassigned entries get the value zero.

**Greedy algorithm 2** (for general sequences ): Assign the value f(n) as to minimize the maximum discrepancy in every partial sum in every HAP where unassigned entries get the value zero.

**Problem 2:** How does Greedy algorithm 2 perform?

Omri Floman ran Greedy 2 on inputs such as N=10000 and got a discrepancy of around 20 (there is a bit of randomness involved in cases of ties). For N=100000 he got about 45. It is unclear what the behavior is.

**Problem 3:** Can we find an optimal compromise between Greedy 1 and Greedy 2?

**Problem 4:** Can randomization help?

Of course, since we know that a sequence of length n and discrepancy log n exists if we draw a random sequence there is a probability larger than of reaching such a low discrepancy sequence. What we want to ask is if randomization can lead to a method of getting a low-discrepancy sequence with larger probability, or even better with provable larger probability. (Or even better yet, to a sequence of provable low discrepancy via an effective algorithm.)

**Problem 5:** How do Greedy algorithms 1 and 2 perform if we apply them for a random ordering of {1,2,…,n}.

Here are some variations of the Erdős’ Discrepancy Problem along with some guesses for the answers. I will explain where these guesses came from next time.

**E0:** Erdős’ Discrepancy Problem

Guess:

**E1**: Allow f(n) to attain values which are complex numbers of norm 1.

Guess, same answer

(Perhaps we can even consider norm-1 vectors in some Euclidean space or Banach space. If this is too premissive () we may go down to a constant.)

**E2**: The EDP for square-free integers

Here we simply consider sequences where if is not square-free. For this variation the multiplicative version of the problem is not a special case of the general question. (Here multiplicative means that if are coprime.)

Guess:for both versions, same answer .

**E3:** Instead of square-free integers consider a random dense subset of integers and assume that the sequence vanishes for indices not in the subset.

**Guess:** same answer .

Here we consider multiplicative functions which are non zero only on a random dense sert of primes.

**E4**: The EDP for a random dense subset of primes.

Guess:same answer

**Problem 6:** Find a sequence for problems E2, E3, and E4 with discrepancy , or better or better . **Update**: see below. It can be shown that sequences with discrepancy exist for all these variations.

**E5**: The EDP for HAP with prime power differences.

Guess:.

Beurling primes were defined by Arne Beurling in 1937 and he also proved a prime number theorem for them. The most general definition is very simple: Consider a sequence of real numbers regarded as “primes” and consider the (ordered) sequence of their products (multiplicities allowed) as the “integers”. (We will assume that all products are distinct although for the original purpose of defining a zeta function multiplicities may play a role.) Beurling primes played a role in the polymath4 discussions.

**E6**: The EDP for Beurling primes and integers

CarelessGuess:at most .

One way to think about Beurling primes is to identify with and to reorder the integers according to the ordering of the s. Actually, given the ordering we can recover uniquely the Beurling primes. A much more general notion of “pseudointegers” was suggested over polymath5 by Sune Kristian Jakobsen. See also the overview over Polymath5’s wiki.

An ordering of the natural numbers is a SKJ-ordering if it fulfills the following two conditions:

1) If and then .

and

2) for any the set is finite.

**Remark: **1) Sune Kristian considered orderings on sequences of integers (the exponents in the prime factorization). This is equivalent to (but perhaps less provocative than) the formulation here. 2) We can expect that Beurling-orderings are a tiny tiny subset of SKJ-orderings.

Given an SKJ-ordering of the natural numbers we can ask the EDP for that ordering.

**E7**: The EDP for Sune Kristian Jakobsen systems of integers.

CarelessGuess:at mostpolylog n

**Problem 7**: How does Greedy algorithm 2 perform for variations E2 -E7.

The answers for E2, E3 and E4 is especially interesting because these are examples where the best upper bounds I know behave like and are obtained from a random assignment. (See problem 5.)

**Problem 8:** Prove the assertion of EDP (namely that the discrepancy is unbounded) for an SKJ-pseudointegers of ** your** choice.

Recall that for a hypergraph defined on a ser the discrepancy of , denoted by is the minimum over all functions of the maximum over all of

.

When is a hypergraph and , the restriction of to is the hypergraph with vertex set whose edges are all sets of the form for edges of . The hereditary discrepancy of is the maximum over all of the discrepancy of restricted to .

Let be a hypergraph on a vertex set and assume that is ordered, e.g., . Consider the hypergraph obtained from by adding for every set all its initial subsets w.r.t. the ordering. We will consider the operation of moving from to . Note that for EDP all our variations E1-E8 the underlying hypergraph is obtained by this construction . (From a certain natural hypergraphs ). I further guess that for all the variations E1-E8 if we consider the hypergraph (before taking initial segments) then the discrepancy is bounded.

Let be a ground set and let be a set of reals. First consider the discrepancy problem for a random hypergraph with edges, whose ith edge is a random set so that every element of belongs to with probability (and all these events are statistically independent). Next we can move from to as described above.

The case of taking for the probabilities can be seen as a probabilistic analog of EDP. My guess for the discrepancy of this case is also . I also guess that the discrepancy of itself is bounded.

**Problem 9:** How do Greedy algorithms 1 and 2 perform for the random hypergraph .

For EDP the ground set is all natural numbers, or just the set {1,2,…,n} and the hypergraph is the collection of all HAPs(homogeneous arithmetic progressions). Roth considered the hypergraphs of **all** arithmetic progressions. Roth proved that the discrepancy in this case is at . The existence of with discrepancy of was proved by Beck and the factor was removed by Matousek and Spencer. These works by Beck, Matousek and Spencer may be very relevant to prove the existence of low-discrepancy sequences for EDP and some of its extensions.

Here are some results proved in collaboration with Noga Alon which are based on the Beck-Fiala theorem and a general argument on moving from to . (It is likely that some of them are known and we will be happy to know.)

**Proposition 1:** For the discrepancy of described above is at most .

This follows from the following general proposaition:

**Proposition 2:** Let be a hypergraph on an element ordered set and let be the maximum degree of a point of , then .

**Proof**: Let be a partition of into two nearly equal intervals, a partition of into 2 nearly equal intervals and similarly , etc ( levels). Now define a new hypergraph obtained from by replacing each set in by the following sets (possibly some empty):

, , , (for all ), .

Note that the maximum degree of a point in is , hence the discrepancy of is at most by Beck-Fiala’s theorem. Also, each initial segment of is a union of at most pairwise disjoint members of . This gives Proposition 1, and since for the case described in Proposition 2 the degree of vertices for behaves like we obtain Proposition 2 as well.

**Proposition 3:** for versions **E1-E8 **there are examples where the discrepancy is .

**Proof: **In all these examples we consider a hypergraph on the ground set A={1,2,…,n}, or on a subest of A with positive density (or for E7 and E8 on another non conventionally ordered set of integers). Then move to the hypergraph of initial subsets of edges of . To apply Proposition 2 and Beck-Fiala’s theorem we need to find an upper bound on the degree of a vertex in the hypergraph $\cal H$. Consider the case of EDP. (The other cases are similar.) The maximum degree is obtained by an integer that is the product of the first k distinct primes. In this case the degree is smaller than . Note that the proof applies even to hereditary discrepancy.

**Problem 10:** Use similar ideas to prove better (even polylog (n)) constructions for EDP and its variations **E1**–**E8**.

I extend all my earlier guesses when we move from discrepancy to hereditary discrepancy.

**Proposition 4: **The hereditary discrepancy for the hypergraph of HAP on {1,2,…,n} is .

The proof is obtained as follows: Consider the first primes, and a hypergraph on {1,2,…,m} of discrepancy . (For example, a Hadamard matrix of order describes such an hypergraph. ) Next, for every edge consider the integer that correspond to products of primes whose indices are in . Then restrict your attention only to these integers.

Propositions 3 and 4 show that the hereditary discrepancy of the hypergraphs of HAPs in {1,2,…,n} is between and .

Before our quick review of polymath5, let me mention a major difficulty which seems relevant to all approaches:

If we allow zero entries in our sequence, even a small density of zero entries, then the discrepancy can be bounded.

For example consider the sequence 1, -1, 0, 1, -1, 0, 1, -1, 0, …

**Experimentations:** Computer experiments played a large role in polymath5. One of the most striking discoveries was a sequence of length 1124 of discrepancy 2. Later a second sequence of length 1124 and discrepancy 2 was found but not a larger sequence. Several people made great contributions with computer experiments.

**Additive Fourier analysis ideas:** Using Fourier analysis on our sequence and somehow reaching a contradiction when we assume that the discrepancy is bounded was a suggestion that was dominant in the first few threads and we came back to from time to time.

**Terry Tao’s reduction:** Terry Tao found a reduction from the general question to the variation of multiplicative functions with complex norm-1 values (**E1**). The beautiful proof relies on “multiplicative” Fourier analysis and it is striking how “little” the proof uses.

**Problem 11** (Sune Kristian Jakobsen): Does Tao’s reduction apply to arbitrary SKJ-pseudointegers.

**Semi-definite programming:** A major turning point was a suggestion by Moses Charikar to use a natural semi-definite relaxation for the problem. This promising avenue was explored over several threads and here too computer experimentation was done. One reason to regard Charikar’s approach (and the related linear programming approach described next) as hopeful is precisely because it offers a convincing way to get around the difficulty demonstrated by the sequence 1, -1, 0, 1, -1, 0, …

**A generalization and linear programming:** The last few threads were centered around a different related relaxation proposed by Tim Gowers. The problem was generalized from a single-sequence problem to a pair-of-sequences problem. (This is a common motif in extremal combinatorics although here the motivation came from functional analysis.) Then relaxation of the problem led to a very appealing linear programming question.

**Mathoverflow questions:** Mathoverflow was used to ask several questions related to the project.

**Participation:** Polymath 5 attracted much interest and wide participation. Overall, it did not attract researchers with major prior interest in discrepancy theory.

Discrepancy theory is a huge an exciting area. Let me just give references to some books.

Jozsef Beck, William W. L. Chen: *Irregularities of Distribution*, Cambridge University Press, 1987. And, Jozsef Beck: *Irregularities of Distribution (Cambridge Tracts in Mathematics Cambridge Tracts in Mathemat Volume 89)(Paperback)*

Bernard Chazele: *The Discrepancy Method: Randomness and Complexity*

Jiri Matousek: *Geometric Discrepancy: An Illustrated Guide (Algorithms and Combinatorics)*

J. Beck: *Combinatorial Games: Tic-Tac-Toe Theory*, Cambridge University Press, 2008.

J. Beck: *A forthcoming book.*

Reading these books will prepare you better to deal with EDP and will enrich your life tremendously.

If you want to look at some of the earlier posts, they are collected together in the polymath5 category on this blog.

]]>One tool that we have at our disposal is duality, which is what we used to convert the problem to an existential one in the first place. Now obviously we don’t want to apply duality twice and end up with the original problem, but, perhaps surprisingly, there are ways that applying duality twice could be useful.

Here are two such ways. The first is that you prove that a certain kind of decomposition would be sufficient to prove EDP. Then you argue that if such a decomposition exists, then a more restricted kind of decomposition must also exist. Dualizing again, one ends up with a new discrepancy problem that is different from the original one (though it will imply it). The second way is this: if it is not easy to write down a decomposition that works, then one wants to narrow down the search space. And one way of doing that is to prove rigorously that certain kinds of decompositions do not exist. And an efficient way of doing that is to use duality: that is, one finds a function with low discrepancy on the class of sets that one was hoping to use for the decomposition. Since this class is restricted, solving the discrepancy problem is easier than solving EDP (but this time it doesn’t imply EDP).

We have already had an example of the first use of dualizing twice. In this post I want to give in detail an example of the second.

**General results about duality.**

Recall that we are trying to find a linear combination with the following properties, and with as small as possible.

- Each is a “square” in the sense that it is the characteristic function of a set of the form where itself is an interval of positive integers.
- If and are distinct and coprime, then

Recall also that since an equivalent formulation is to replace conditions 2 and 4 by the condition that

Now the set of functions that satisfy conditions 2 and 3 is convex — indeed, it is an affine subspace And the set of linear combinations such that is also convex. If we *cannot* find such a decomposition, then these two convex sets must be disjoint, which implies that there is a linear functional that separates them. By rescaling, we can ask that should take the value 1 everywhere on and be less than 1 on every function that belongs to

The first of these two conditions on implies that is constant on all rays (that is, intersections of with lines through the origin), since otherwise we would be able to find points in such that the inner product with was whatever we wanted. The second condition implies that for every and is also implied by it. Therefore, if no decomposition of the desired form exists then there is a function that is 1 on the main diagonal, constant on rays, and that has discrepancy at most on all squares. The converse is almost trivial.

Now let us see what happens if we modify the requirements of the decomposition as follows. We shall no longer ask for the sets to be squares (so that we can allow, for instance, rectangles or fly traps). We shall insist that every set we use belongs to some set system We shall also assume that has disjoint subsets and such that if then the coefficient of is required to be non-negative, and if then the coefficient of is required to be non-positive.

What happens if we dualize the problem with these new restrictions? This time we replace the set of all combinations such that by the set of all linear combinations with that property and the additional property that if and if This tells us that is at most if at least if and at most in modulus for all other Again, the converse is easy.

**Ruling out simple decompositions.**

Now let us see how we can exploit this. I would like to obtain a lower bound for the best one can obtain if consists of 2-by-2 and 3-by-3 squares, and fly traps, and if the coefficient of every fly trap is non-positive.

To do this, let us define a function as follows. If then we take to be 1, as we must. If and one of or is a multiple of 3, then we take and otherwise we take (So the only way that can be -1/2 is if mod 3.) If and one of or is a multiple of 6, then we take (this is forced since we have already decided that ) and otherwise

That determines on all points such that and on all multiples of all such points. It is not hard to check that the sum of on any 2-by-2 or 3-by-3 square lies between 1 and 2. So it remains to look at the fly traps.

Since we are assuming that the coefficients of fly traps are all negative, our aim is to choose the remaining values of in such a way that the sum on any fly trap is never less than -2. There is a trivially best strategy for doing that, which is to choose to be infinity for all remaining points. (This is overkill, but it works.)

If we do that, then what could go wrong? We would have to have a fly trap on which the sum was less than 2, which means we would have some pair of positive integers such that the values of (or the same but with minuses) are all chosen before we put in any infinite values, and at least five of them are -1/2.

The condition that the values are all chosen is equivalent to the condition that every is a factor of Indeed, the value of is chosen only if or where is the highest common factor of and If then and if then so

Let us see what we can deduce from this. We know that or we certainly cannot have at least five s. Setting we deduce that Setting we deduce that And setting we deduce that Therefore, It follows that And from that it follows that so Since is coprime to both 2 and 3, (as is a multiple of 6). Therefore, from which it follows that so

There is a sort of race going on here. It is possible that but otherwise all must be 0 up to So now it follows that In particular, so But this implies that since now is a multiple of 36. So we’re sort of back to square one.

Once we know that what property must have if there is to be any hope that ? If then we need not to be a multiple of 3, so we need to be a multiple of 9. And if is a factor of but not of then we need not to be a multiple of 6, which it will not be since it is odd. So that is still a possibility, but it requires to be a multiple of 8.

Going back to our example, we now know that we can restrict attention to multiples of 8 or 9. But by the time we get to five of those, we have passed 16, which tells us that and that now needs to be a multiple of 16. And before we know it, we hit 27. And so on.

I won’t try to make that last bit rigorous, but I’m pretty sure that it’s possible to do so, and to prove that the race can never be won. What I mean by that is that for every fly trap we will be able to put in an infinity before we reach five -1/2s. If that is correct, then it proves that we cannot improve on a bound if we use a linear combination of 2-by-2 and 3-by-3 squares and try to cancel it out with some fly traps, all of which have the same sign.

I’ve just spotted that I made a mistake above, which is that a fly trap has two wings, so to speak, which means that for the sum to reach less than -2, we need only four -1/2s. (The diagonal starts you off at 1, and then each has the effect of subtracting 1, so we can afford up to three of them.) But I think the conclusions I drew are valid even for this tighter race.

What happens if in addition to fly traps we look at rectangles of width 1? We’ve dealt with ones that intersect the diagonal, but what about a rectangle such as the one that consists of all the points from to for some ?

Now we need five -1/2s. What can we say about the highest power of 2 that divides if all the points in divide ? Let be the largest power of 2 that divides any number in and let be the largest power of 3. Now let belong to that interval. How can fail to be 0?

If then a necessary condition is that should not be a multiple of 3, since otherwise But that implies that If but is not a factor of then must be a factor of since So the total number of -1/2s is at most the number of multiples of that belong to the interval plus the number of multiples of that belong to the interval. But since any three consecutive multiples of include a multiple of and any two consecutive multiples of include a multiple of the total number of -1/2s in a rectangle of width 1 that contains no infinity is at most 3.

This is actually a rigorous treatment of the fly trap case as well. It proves that we can’t obtain a decomposition with if all we allow ourselves to use is 2-by-2 squares, 3-by-3 squares, and fly traps and rectangles of width 1 with negative coefficients.

If we allow ourselves symmetric rectangle pairs, then the above function has discrepancy 3, so the argument no longer works. However, we can improve on that bound by changing the coefficient -1/2 to something slightly closer to zero. If we choose it to be then the worst discrepancy on a 3-by-3 square turns out to be and the worst on a rectangle pair is Equating the two, we can take and that gives us a discrepancy of which tells us that we can’t make any better than even with this extra flexibility.

**A fatter example?**

Suppose we try to extend the above result to include 4-by-4 squares as well. Now we have to give sensible values to points of the form The simplest way we might try to do that without messing anything up would be to define to be 0 if is not a multiple of 3 (in which case and are coprime) and otherwise. This means that except if mod 9, in which case it is

This does not add any s to the fly traps, since the only s we have put in the lines are ones that were already there. However, it does change some infinities to zeros. What difference does that make?

Well, the condition now for not to be infinite is that So if we have an interval on which is finite, and if is the highest power of 3 that divides any of the integers in that interval, then the highest power of 3 that divides is at least (whereas earlier it was ). Let’s suppose that it is exactly (since this is the most difficult case). If but does not divide then so is not divisible by 3, which means that So we must have This takes us back to the previous case, except that the highest power of 3 that divides is smaller. So now from the condition that should not be a multiple of 3, all we can deduce is that Unfortunately, if is the highest power of that goes into any number in an interval, can go into up to eight numbers in that interval. So it looks as though this method of fattening the example fails completely. This may be worth looking into in case it suggests a decomposition (though that is asking a lot, since there is no particular reason to suppose that the construction I have just attempted has to be exactly as it was in my attempt).

**What next?**

At the moment I am trying to solve the problem by means of the following time-honoured technique. First I try to prove the result. Then, when I run into difficulties, I try to disprove it. Then, when I run into difficulties there, I try to prove it again, or perhaps prove merely that the way that I was trying to disprove it cannot work. And so on. The more iterations of this process, the smaller the bearing any success will have on the problem, but if the problem is hard enough then it may be best to start small.

How does that relate to EDP? Well, we would like a decomposition. So we try finding a decomposition with stronger properties. Then, having failed to do so, we see whether we can prove that no decomposition with the stronger properties exists. Fortunately, duality gives us a technique for doing this: we try to find a function with small discrepancy on all the relevant sets. On failing to find such a function, we try to get some sense of why *that* is difficult, in the hope that it will lead us to a yet more refined class of possible decompositions. And so on.

So the next thing I feel like trying is to generalize the above argument to strips of width 3, 4, 5, etc. about the main diagonal. At some point difficulties will emerge (or if they don’t then we’ll have proved something very interesting). What I hope is that those difficulties will lead us to a construction that improves on I feel more and more that this would be a crucial breakthrough: if we can do that in a theoretical way (rather than just finding a pattern on a computer that we are unable to interpret, though even that would be very interesting) then we will have much more of a chance of generalizing it to arbitrary positive

]]>- where and is the number of points in the interval that defines or, more relevantly, the number of points in the intersection of with the main diagonal of
- Let Then for any pair of coprime positive integers we have

The second condition tells us that the off-diagonal elements of the matrix you get when you convert the decomposition into a matrix indexed by are all zero, and the first condition tells us that we have an efficient decomposition in the sense that we care about. In my previous post I showed why obtaining a collection of squares for a constant implies that the discrepancy of an arbitrary sequence is at least In this post I want to discuss some ideas for constructing such a system of squares and coefficients. I’ll look partly at ideas that don’t work, so that we can get a sense of what constraints are operating, and partly at ideas that might have a chance of working. I do not guarantee that the latter class of ideas will withstand even five minutes of serious thought: I have already found many approaches promising, only to dismiss them for almost trivial reasons. [Added later: the attempt to write up even the half promising ideas seems to have killed them off. So in the end this post consists entirely of half-baked ideas that I’m pretty sure don’t work. I hope this will lead either to some new and better ideas or to a convincing argument that the approach I am trying to use to create a decomposition cannot work.]

** Using squares and fly traps.**

A general idea that I have not managed to rule out is to build a decomposition out of “squares and fly traps”. I’ve already said what a square is. If you take the two squares and then their difference is the set of all points or such that It has the shape of two adjacent edges of a square. It is this kind of shape that I am calling a *fly trap*.

The idea then is to take a collection of fly traps with negative coefficients and a collection of squares with positive coefficients. In order for the second condition to hold, we need the following to hold: as you go along any line from the origin other than the main diagonal if you sum up the coefficients associated with the squares you visit, then the result should be cancelled out by the sum of the coefficients associated with the fly traps. In particular, if all squares have coefficients equal to 1 and all fly traps have coefficients equal to -1, then the number of times the line hits a square should be the same as the number of times it hits a fly trap. (I think of the squares as sending out “flies” that are then caught by the fly traps, which have some nasty sticky substance at their points.)

It’s not really necessary for the fly traps all to point in the same direction, and there are other small adjustments one can make, but the basic square/fly-trap idea seems to be what the computer is telling us works best in very small cases. (It is far from clear that this is a good guide to what happens in much larger cases, but it seems sensible at least to consider the possibility.)

For a nice illustration of a square/fly-trap construction, see this picture that Alec produced. Alec also has a general construction that gives us arbitrarily close to 2. Rather than repeat it here, let me give a link to the relevant comment of Alec’s (if you think of the bar as squaring the interval, it will be consistent with what I am saying in this post), and a link to a similar comment of Christian’s.

This example (or rather family of examples) uses a single fly trap of width and squares of width 2 (unlike the example in Alec’s picture, which I therefore find more interesting, despite the fact that it gives a worse constant). It is instructive to see why this gives us a bound of If the fly trap has width then it has off-diagonal points. So we need flies. Each square of width 2 contributes two flies, so we need such squares. This means that (since the fly trap needs two squares to make it) and that The ratio of these two numbers tends to 2 as tends to infinity.

It is not hard to see that if we could use squares of width 3 instead, then we would be able to get a constant arbitrarily close to 3. However, significant difficulties arise almost immediately. However again, this could be good news, because if we can find some way of getting beyond 2, we may by that stage have found a pattern that can be generalized. And I think there is some hope of pushing beyond 2, as I shall now try to explain.

** One fly trap is not enough.**

First, let us see why there is absolutely no hope of achieving this with just one fly trap. The argument is simple. Let be the flytrap If all the off-diagonal points in the square are caught by this fly trap, what can we say about ? One necessary condition is that both and are factors of But this implies that is at least which in turn implies that is at least Since we need almost all points in the fly trap to catch flies, we need at least distinct flies, which is more than So, roughly speaking, we need a constant fraction of the numbers of order of magnitude to be such that both and are factors of This just isn’t going to happen.

Note that if we make smaller, to give numbers near a better chance of dividing then we are forced to increase (or else the flies miss the fly trap). And that makes things even worse — we now have fewer possible flies and a bigger fly trap.

I’m sure it would be easy to state and prove something rigorous here, but for now I’d prefer to leave those thoughts as a convincing enough demonstration that a single fly trap will not do the job. But if that’s the case, what can we do? Well, the obvious next thing to try is several fly traps.

**Pure randomness as a way of catching flies.**

How can we make use of multiple fly traps? A first thought is that if we take the square and send out some flies, we could create two traps, one to catch flies with maximum coordinate and the other to catch flies with maximum coordinate But the trouble with this is that it seems to be far too tailored to one particular square: it is hard to believe that such a trap could catch the flies from several different squares. We would be asking for two integers and such that there are many integers such that one of and divides and the other divides

Actually, on writing that I realize that I have given no thought to it at all, so perhaps it is worth trying to show that it cannot be done (just in case, contrary to expectations, it *can* be done).

Since and are always coprime, there seems no point in giving and any common factors, so let’s take and to be highly smooth numbers that are coprime to each other. And let’s try to find 3-by-3 squares that send out flies that are caught by one of the fly traps or (I’m assuming that and are roughly the same size. If it is convenient to take different s then I don’t mind doing it, but I don’t expect it to help.)

If is fixed and and are large, then … I think we’re completely dead. We have fly trap points below the diagonal to get rid of and three flies below the diagonal per 3-by-3 square, so we need about squares. If is one of those squares, then for the fly at not to miss the fly traps, we need to be at least where is the rough size of and So we need pairs such that we can find an integer with and But that more or less fixes the ratio of to and anyway is bigger than

From this I am happy to conclude that we need to change our attitude and go for many fly traps. The idea would be that the reason a fly hits a fly trap is not that the fly starts out at a very carefully chosen point (roughly speaking, a “factor” of the fly trap) but that there are enough fly traps for it to be almost inevitable that the fly will hit at least one of them.

What I mean by “pure randomness” is that we use the following mechanism for ensuring that the fly at hits a trap. If then we simply make sure that there are at least traps, fairly randomly placed, or rather traps for some large constant Then the probability that the fly misses all traps is small: roughly speaking, the expected number of traps it hits is and if we can get enough independence then we can hope that all flies will hit roughly of the traps. (This model turns out to be much too crude, since the probability of a fly hitting a trap depends very much on divisibility properties of the coordinates of the fly and the trap. But let us work with it for now.)

**Some back-of-envelope calculations.**

Let us try to check the feasibility of this idea. An initial observation is that most fly traps are useless for our purposes. If you choose a random large integer then the fraction of integers coprime to will be around (If is the other integer, then the probability that divides both and is so the probability that they are coprime is roughly ) But if is coprime to then the point cannot catch any flies. If we have a large set of such points, then we are in trouble.

To deal with this, it seems that the only option we have is to insist that our fly traps occur at highly composite values of so that almost all other integers have quite high common factors with and therefore give rise to points that can catch many flies. It will be convenient to call and the *height* and *width* of the fly trap In that language, we want fly traps with highly composite heights. (Note that the height refers to the altitude of where the fly trap is placed, whereas the width measures the size of the trap itself. Indeed, “altitude” is probably a better word than “height” here, but I prefer an Anglo-Saxon word if there is one.)

Now let us suppose that we have 3-by-3 squares, and a reception committee of fly traps with highly composite heights between and If the widths of the fly traps are all (or perhaps all between and or something like that), then we’ll need fly traps if we want a one-to-one correspondence between flies and trap points, and a bit more than that if we want each fly to hit traps. Let us take fly traps.

Now consider a fly at say. If its chances of hitting a given trap are then we’ll also need there to be about fly traps. That is, we’ll want to be about And for that fly not to miss the traps altogether (because its angle from the main diagonal is too large), we’ll need to be at most So we’ll need to be bigger than That looks pretty problematic, because we now need a very large number of fly traps, and it will not be possible to put them all at highly smooth heights between and : there just aren’t that many highly smooth numbers.

Just to make that more conceptual, the problem we have is that there are two conflicting pressures on the flies. If they are not high enough, then the angle they make with the main diagonal is forced to be large and they therefore miss all the fly traps. But if they are too high, then they are very unlikely to hit any given fly trap, which forces the fly traps to be extremely numerous, which forces there to be several fly traps at non-smooth heights, and therefore several points in the traps that cannot catch flies.

**Smooth traps and smooth squares.**

Is there anything we can do to get round this problem? I think there may be. There was one questionable assumption in the discussion above, which was that the probability of a fly hitting any given fly trap was about The condition we need for this fly to hit the trap is that should divide and that should be at least Now if we choose randomly, then of course the probability that is But if we choose it as a random number with lots of small prime factors, and if also has quite a lot of small prime factors, then we hugely increase the chances that For instance, if is a random multiple of 6, and also happens to be divisible by 6, then the chances that divides are now

Let us now go back to the attempt above. Again let us suppose that we have 3-by-3 squares. Again, if we are taking fly traps with between and and if we want each fly to hit traps, then we will need about or so traps. But now let us suppose that all the traps have very smooth heights. More precisely, let us suppose that all the heights are such that all but a small proportion of integers have a fairly high common factor with Simplifying absurdly, let us suppose that this gains us an extra factor of when we think about the probability that a fly or is caught by a given fly trap: now the probability is more like rather than What does that do for us?

It means that now, if we want each fly to hit traps, we’ll need not traps (where is the height of the fly) but more like traps. We already know we need about traps, so equating the two we find that needs to be about And if we want a fly at that kind of height not to be too far from the diagonal to hit the traps, we need to be at most which tells us that should have approximate size which is rather better than the earlier estimate of (up to a constant).

But at this point we have an important question: are there enough highly smooth numbers between and ?

To answer that, we need to think about what the typical probability gain is for a given number. Suppose, for instance, that is divisible by where is some set of small integers. For what can we say that a random integer has a good chance of having a highest common factor with of at least ?

The expected number of such that is and we can expect this to be reasonably concentrated if the expectation is not too small. Writing for and assuming that is a fairly dense set of primes (something like a random set of of the first primes, say) then the expectation will be around so the value we get for assuming (not quite correctly) that the primes that divide are fairly evenly distributed, ought to be around or around (We could get there by saying that the typical size of a is fairly close to and we are choosing of these primes.)

This is fairly worrying, because in order to gain a factor we have to make the set of we are allowed to choose very much sparser. It seems as though we lose a lot more than we gain by doing this.

The “smooth squares” in the title of this section refer to the possibility that we might try to choose so that both and have quite a lot of small prime factors. But such numbers are hard to come by, so again it seems that any gain one might obtain is more than compensated for by the loss in their frequency.

**Special configurations of squares and fly traps.**

Can we achieve what we want by making very careful selections of our s? It’s clear that something that helps us is to have pairs of heights such that is, when written in its lowest terms, of the form or

It’s quite easy to find such that all of are of this form: just make extremely smooth and make all the differences small. Then the differences will divide and we are done. But what if we try to ensure that and are of the required form? Then we need large positive integers and such that is of the form for a positive integer That is, we need the reciprocal of to be an integer. Rearranging, we want to be an integer. It’s moderately reassuring to observe that this can be done: for instance, if and we get But how about if and are very large? Or perhaps they don’t have to be *very* large, just as long as we can find a set such that many of the ratios are integers.

Let’s think about this slightly further. Suppose we have such a collection of integers. Then choose with enough factors for all the numbers I mention to be integers, and for each let What we want is for to divide That is, we want to divide So we need to be an integer. (This doesn’t look very symmetric, but it is true if and only if is an integer.)

Suddenly this looks a bit easier. It looks as though we’ll be OK if we make the all fairly smooth and make their differences small. Hmm … except that that doesn’t look easy after all, since if is smooth and is small, then will not be all that smooth.

I won’t think about this for the time being, but I think it may be possible to construct, in a not quite trivial way, an arbitrarily long sequence of integers such that is always an integer.

Let’s suppose we managed that. Would it help? What we could do is this. We let be some huge factorial so that it’s divisible by whatever we need it to be divisible by. We then define the numbers as above: that is, Since whenever we have of the form for some positive integer we can find an integer such that and

Therefore, potentially at least, we could consider using the square to knock out some points in the fly traps at heights and

However, for this to have a chance of working, we want to be big, since otherwise our flies will be out wide again, which will force the traps to be big and we’ll get into all sorts of problems.

But it’s problematic either way. If we want traps of width at most some fixed then we need to be at least For that we need the integers to be of size at least (since ), and more than that unless they are close together.

But we also need the to divide so we can’t just choose the and then make huge. Rather, what we seem to want is a number that has so many factors of size somewhat bigger than that we can find interesting clusters of them such that many of the numbers are integers.

I should think a bit more about how many of these numbers actually need to be integers. Perhaps we don’t need them all to be integers — if not, then we would have a much greater chance of success.

If the fly traps have width then we have points below the diagonal that need to be hit. Each good pair leads to three flies that can do the hitting. So it looks as though needs to be bigger than

I think I must have made a mistake here, since there are basically only two chances to hit the point : we must do so either at or at So we need an extraordinary miracle to occur: it must be possible to partition (almost all of) the numbers and into pairs of consecutive integers. This does not feel possible to me.

I’m going to stop at this point. I’ll end with the obvious question: is it possible to create an example out of squares and fly traps? Part of me thinks that the square/fly-trap idea is almost forced, since we need the bigger points to avoid coprime pairs. I think also that I have not devoted enough space to discussing bigger fly traps — ones where the width is proportional to the height, say. This requires bigger squares, but it may be possible to do something. In fact, I’ll think about that (not for the first time) and if anything interesting comes out of it then I’ll extend this post.

]]>**Representing diagonal matrices**

First, let me briefly look again at how the ROD (representation of diagonal) approach works. If and are HAPs, I shall write for the matrix such that if and 0 otherwise. The main thing we need to know about is that for every

Suppose now that is a diagonal matrix with diagonal entries and that we can write it as where each and each is a HAP. Then

If and for every then it follows that there exists such that

and from that it follows that there is a HAP such that So if we can make arbitrarily small, then EDP is proved.

The advantage of this approach (and similar approaches related to semidefinite programming) is that it replaces a universal question — every sequence has unbounded discrepancy — with an existential one — there exists a diagonal matrix that can be decomposed in a certain way. However, it doesn’t solve the problem just like that, because it is far from clear how to find such a decomposition of a diagonal matrix.

**Representing the identity over the rationals**

A more recent idea that appears to simplify the problem considerably is to work over the rationals, which allows one to take the diagonal matrix to be the identity. Let me give a heuristic argument for this, which I hope will make it clear what I mean, and then follow it up with a more rigorous version of the argument.

By “work over the rationals” I mean that we shall consider infinite matrices where the indices are positive rational numbers. The identity matrix is simply the matrix that is 1 if and otherwise. Matrix multiplication is defined in the usual way: We shall consider matrices with only finitely many non-zero entries in each row and column, so we do not have to worry about the sums being infinite.

Now let’s suppose, back in the normal integers-up-to- case that we have a decomposition

Given any matrix and any rational number define the –*dilation* of to be the matrix defined by That is, does to what does to Now let us take the matrix and call it the *smearing* of (The rough idea is that we’ve spread all over the place.)

If is a diagonal matrix, what is the smearing of ? Well if then for all positive rationals so And

That is, is the identity matrix multiplied by the trace of

Now each dilation of a matrix gives us another matrix of the same form — that is, another product of HAPs. (Moreover, if and have the same common difference, then that is true for the dilations as well.) So from a decomposition of a diagonal matrix into HAP products we can get a decomposition of the *identity* over into HAP products.

Where this gets a bit handwavy is the following statement: because the ratio of the sum of the absolute values of the coefficients to the trace of the diagonal matrix is small, the “average coefficient” in the infinite case is small too. Therefore, we can prove the same bound for functions defined on that we could prove for functions defined on It is this last step that I would like to make rigorous.

**A finite approximation of the rationals**

I shall use a technique that is standard in additive combinatorics (I’m referring here to regular Bohr sets), though the basic idea presumably goes back well beyond that. The particular way I’ll use it is essentially the same as what Terry Tao did in his Fourier reduction argument. I want to take a big finite subset of that is almost closed under multiplication by any number I could conceivably want to multiply by. By that I mean that if has size and is some rational that I might want to multiply by, then is almost the same size as To visualize this, imagine taking a huge sphere and intersecting it with Call the resulting set If we take a random point in and add a non-huge integer point to it, then unless we are extremely unlucky and is right near the edge of we will find that also belongs to So is almost closed under adding small integer vectors. I want to do the same but for multiplication, which is pretty similar to vector addition by prime factorization.

Suppose, then, that we have an matrix Then the non-zero entries of occur at pairs of the form where and are integers between and Therefore, the non-zero entries of the dilation are at pairs of the form where and are integers between and And from this it follows that the non-zero entries of the smearing of occur only at pairs of the form such that is a rational with numerator and denominator between and Let be the set of all such rationals.

Let us therefore construct a set of rationals such that for almost every (Here, stands for the obvious thing: the set ) We can do this in any number of ways. Perhaps the simplest is to let be the set of all primes up to to let be some large positive integer, and to take to be the set of all numbers of the form such that each lies between and Note that the product of any element of by an element of is an element of Note also that has cardinality So if we want of the points to have the property that then all we have to do is choose so large that which we can certainly do.

From now on, all we shall use about is this property: the details of its construction are not important. We shall also consider functions defined on which we are thinking of as a finite approximation to One remark about is that we can dilate it so that it becomes a set of integers, so the fact that I have been talking about rationals is just a convenient infinitary way of thinking about the problem rather than a fundamental difference of approach. (In particular, I am not using any clever notions of limit.)

**Running the argument inside X**

Now let’s suppose that we have an diagonal matrix that is decomposed as where each and is a HAP. (Incidentally, by “HAP” I mean here the slightly generalized HAPs of the form .) This time, instead of looking at the sum of *all* dilations of let us look instead at the sum of all dilations such that This gives us a diagonal matrix What can we say about it?

Well, For each the entry will be included in this sum if In particular, it will be included if Therefore, equals the trace of for at least elements For all it is at most the sum of the absolute values of the entries of and for it is zero.

This tells us, in a way that is easy to make precise, that the entries of are given by a function that is approximately the trace of times the characteristic function of (Because the functions are bounded and the support of the difference is much smaller than this approximation can be arranged to be good in any norm with )

But since any dilation of a HAP product is another such product, this means that we have a decomposition into HAP products of a diagonal matrix that is very close to the identity, where the indexing set is now rather than (Actually, the way I said it above, the matrix is supported not on but on But that can easily be fixed by summing over dilations corresponding not to all but to all such that So I won’t worry too much about this technical detail.)

The key point here is that the sum of the absolute values of the coefficients in the decomposition is roughly and the trace of the diagonal matrix we get, which is very close to the identity, is roughly times the trace of So we obtain roughly the same as before but now with the “identity matrix on “.

That is a precise and rigorously proved version of the statement that if we can find a diagonal decomposition with a small then we can find a representation of the identity on with the same small

The next thing to think about is what happens if we take a HAP product and sum over all its dilates. We are making the simplifying assumption that and have the same common difference. (It may turn out that we cannot achieve this. Then we simply switch to searching for more general decompositions where the common differences are allowed to be different.) Let us therefore assume that and What is the sum of all dilates of at a point such that and ?

Well, we need there to be some rational and integers and such that and Given such we know that Conversely, if can be written as with and then we can set We know then that (since ), and we have and Thus, the value of the smearing of at is the number of ways of writing with and That is, it is the number of integer points in the intersection of the rectangle with the line of gradient

Two thing follow from this. First, if we are going to look at smearings, then we need only look at products where the common differences of and are 1. That is, WLOG and The second thing is that if we have a linear combination of functions where each and each is a subinterval of then its smearing will be diagonal if and only if for every pair of distinct coprime integers we have where denotes the number of multiples of that lie in the rectangle

If we put all these observations together, together with the observation made in this comment, we obtain the following conclusion: the representation-of-diagonals approach to EDP works if and only if for every there exists a system of rectangles (each being an interval of positive integers between 1 and ) and coefficients such that

- for every pair of coprime positive integers between 1 and we have

I have explained the “only if” part of this. To see that a decomposition of this kind works, suppose that we have a function defined on For convenience we shall assume that takes values in though more general assumptions are possible. For convenience let us normalize so that We then take the smearing of the matrix This equals 1 on the diagonal (except at the boundary of — I’m referring to the smearing where we add all dilates for ) and 0 off it (again, perhaps with a few exceptions at the boundary). In other words, it gives us a very good approximation of the identity on

Let us write this decomposition as Here the HAPs run over all dilations of the HAPs by numbers Therefore, Since and the trace of the identity on is this tells us that our decomposition of the identity has been achieved with the same constant

Let us now calculate in two ways. The first is to think of it as (where is the identity indexed by ). The second is to use our decomposition and to think of it as

Since the two are the same (almost, at least), there must be some such that In particular, if for every then there must be some such that which proves EDP (since for sufficiently large the interval contains a dilate of , and HAPs remain HAPs after dilation).

**EDP and EDP for multiplicative functions**

Here is a thought that I haven’t followed through properly, but I’m pretty sure something can be got out of it. Suppose one tries to prove the existence of a decomposition of the kind we want, not by going ahead and finding one but by showing *abstractly* that such a thing exists. One can return to the formulation of the previous post, where we are trying to write as a linear combination of functions of the form which, like Alec, I prefer to write as on the understanding that this represents not the characteristic function of the set of all with and but the number of ways of representing a number as with and . By the Hahn-Banach theorem, this is impossible with a given upper bound for the sum of the absolute values of coefficients if and only if there is a certain functional that separates from all the functions By that I mean that and for every Now if is such a functional, and is a dilate of then I claim that is another such functional. Indeed,

(I haven’t checked that that is correct — it might be that the second term should be )

I have a slight technical difficulty here, which is that I don’t know about Let me assume that it is and try to justify that assumption in a moment. I then want to readjust what I said above by changing the new functional to if That ensures that we still send to 1. I think that to justify that assumption I might want to do something like assuming that that is optimal. But I didn’t promise a rigorous proof here so will come back to this unless someone else manages to fill in the details.

What it is supposed to be leading to is that we can assume that the function is a completely multiplicative -valued function. The rough idea would be that we could replace by bigger and bigger averages of the form and that if wasn’t completely multiplicative in the first place then these averages would eventually become small, which would contradict the fact that they separated from all functions of the form

If that is correct (and at the moment we have a rather dodgy step so this certainly cannot be assumed) then we seem to have something like a proof that if the representation-of-diagonals approach to EDP fails, then there is a completely multiplicative -valued function with bounded discrepancy. I’m suspicious of that statement, because it seems to suggest that the vector-valued formulation of the problem is equivalent to the -valued version. More likely, the effort to rescue the above argument from its technical problems would result in a modified statement that would be precisely what Terry proved in his Fourier reduction argument.

**The moral of the previous section**

I’m interested in trying to get the ideas of the previous section to work for their own sake, but there’s also a point I want to make about how one might hope to prove EDP by finding a rectangle decomposition of the kind discussed above. What the previous section suggests is that if we try to prove this *using an abstract existence argument* then we will have applied duality twice and will be back to the original hard problem (albeit in its completely multiplicative form, but even that seems to be hard). So we must resist the temptation to try to find an argument that says something like, “These functions are sufficiently numerous and spread about that we must be able to find a decomposition of the desired form,” for the simple reason that *if* there is some function of bounded discrepancy then it shows that they are *not* sufficiently numerous and spread about.

Instead, we must try to find a constructive proof that such a decomposition exists. Alec has got the process started for us by finding a general construction that gets arbitrarily close to 1/2. I think we will learn a lot if we can beat that barrier, even if initially we do so by finding a fairly formless example by means of a brute-force search.

]]>A brief word also on why I am posting again on EDP despite the fact that we are nowhere near 100 comments on the previous post. The main reason is that, now that the rate of commenting has slowed to a trickle, it is far from clear that the same rules should apply. I think the 100-comment rule was a good sufficient condition for a new post, but now I think I want to add a couple more: if there is something to say and quite a long time has elapsed since the previous post, or if there is something to say that takes a while to explain and is not a direct continuation of the current discussion, then it seems a good idea to have a new post. And both these conditions apply.

[Added later: this is a rather strange post written over a few weeks during which my thoughts on the problem were constantly changing. So everything that I say, particularly early on, should be taken with a pinch of salt as I may contradict it later. One approach to reading the post might be to skim it, read the very end a bit more carefully, and then refer back to the earlier parts if you want to know where various ideas came from.]

**A simple question about functions on .**

Let me cut to the chase and ask a question that I find quite nice and that can be understood completely independently of all the discussion of EDP so far. I will then try to motivate the question and sketch a proof that a positive answer to this question would imply EDP.

The question, in isolation, is this. Does there exist, for every constant a sequence with the following properties?

1.

2. for every

3.

Without the third condition, the answer is easily seen to be yes, since one can take and all the other to be zero.

After writing that question, I had to stop for a day or so, during which I realized that the answer was a trivial no. Indeed, if and then so

which rules out condition 3.

How could I have asked such a dull question? Well, the question may be dull but I think there remains some interest in the motivation for it, especially as it leads to a related question that may be harder to answer but have equal potential to be useful. If you want to cut to the new chase, then skip to the bottom of this post, where I shall ask a different question that again involves finding a function that satisfies various conditions (though this time the function is of two variables).

**Where the question came from.**

But if you would prefer some motivation, then here is how the first question arose. We would like to find a decomposition of the identity on the rationals. (By this I mean the matrix where and range over ) We want this decomposition to take the form where each and each is the characteristic function of a HAP, and the sum of the is small. (What precisely this smallness condition is I’ll discuss later.)

Now let me consider briefly a very general question related to the facetious title of this post: how is that mathematicians often manage to solve difficult problems with an NP flavour? That is, how is it that when they are faced with searching for an X that does Y, and most Xs clearly don’t do Y, they nevertheless often succeed in finding an X that *does* do Y. Obviously the answer is not a brute-force search, but what other kinds of searches are there?

One very useful method indeed is to narrow down the search space. If I am looking for an X that does Y, then one line of attack is to look for an X that does Y and Z. If I am lucky, there will exist Xs that do both Y and Z, and the extra condition Z will narrow down the search space enough to make it feasible to search.

One of the best kinds of properties Z to add is symmetry properties, since if we insist that X is symmetric in certain ways then we have fewer independent parameters. For example, if we want to find a solution to a PDE, then it may be that by imposing an appropriate symmetry condition one can actually ensure that there is a *unique* solution, and finding something that is unique is often easier than finding something that is far from unique (because somehow your moves are more forced).

Let us write for the characteristic function of the HAP Then we are looking for a decomposition as a linear combination of functions of the form which we have been writing as What symmetry conditions can we impose?

An obvious one to start with is that we could insist that and That is, we could attempt to find a decomposition of the form (Note that ranges over the rationals and over the positive integers.) Actually, I think that’s more of a simplicity condition than a symmetry condition, but the same cannot be said of the next condition, which is to insist that should be independent of This tells us that the decomposition “looks the same everywhere” and takes the form

Now what is the smallness property we need of the coefficients ? I won’t fully justify this here, because I have already discussed how to deal with the fact that the rationals form an infinite set. Let me simply say that if the for which are bounded, then we are done if can be made arbitrarily small. (Roughly speaking, we are interested in the ratio of the average diagonal entry to the average per Since the average diagonal entry of the identity is 1 and depends only on we are interested in )

Let be the function Then we would like to express the identity as a linear combination of the with coefficients with a small absolute sum. A good start might be to calculate It is the number of such that and for some positive integers If and are minimal integers such that then the only that “divide” and are numbers of the form where and We then get and so we need to be at most and also at most But the maximum of and is so we need to be at most Since any positive integer satisfying this condition works, we ind that

One nice thing to observe about this is that it depends only on and not on and themselves. This turns our question into a one-dimensional one (but unfortunately it is the question above that has a boring answer).

But let’s see why that is. We are now in a position where we want to write the identity as a linear combination of the functions But now we know that if we write what we are trying to do is write the function as a linear combination of the (since we want if and 0 otherwise.

How are we to deal with the functions given that they have unpleasant integer parts involved? There turns out to be a nice answer to this: look at the differences We see that if is a multiple of and 0 otherwise. Now where So the condition translates to the condition (interpreting to be 0). And which we want to be 1 when and 0 otherwise, which is where conditions 1 and 2 come from.

If we could get conditions 1-3 then I’m pretty sure we’d have a solution to EDP. But we can’t, so where does that leave us? In particular, are we forced to give up the very appealing idea of having a decomposition that is “the same everywhere”?

**Rectangular products.**

Actually no. It shows that we cannot combine the following two features of a decomposition: being the same everywhere, and using only “square” products — that is, products of the form rather than more general “rectangular” products of the form What happens if we allow ourselves a bit of extra generality? We would like to do this in a minimal way, so for the time being let us assume that and have the same common difference.

At first it may seem as though we gain nothing, because there is a sort of polarization identity operating. Observe first that if we decompose as a sum of matrices then we can replace each by its symmetrized form and we will have a decomposition of into symmetric parts. So without loss of generality we can assume that the coefficient of in any decomposition is the same as the coefficient of If is longer than then we can now use the fact that

to replace the rectangular products by a comparably small combination of square products. However, there is an important difference, which is that we are now allowing ourselves (square) products of HAPs of the form Perhaps surprisingly, this gives us a large amount of extra flexibility.

To demonstrate this, I need to go through calculations similar to the ones above, but slightly more complicated (though with a “magic” simplification later that makes things nice again). First, it turns out to be convenient to define to be the characteristic function of the HAP (One indication of this is that it has length or to be more accurate a piece of pedantry that turns out to be unexpectedly important later.) Now let us define to be We can calculate in a similar way to the way we calculated earlier. Let be minimal such that and let be the largest rational that divides both and which tells us once again that and For to contribute to the sum, we again need to be of the form for some positive integer but now we need and to lie in the interval Without loss of generality Then our requirements are that and Since is an integer, this is equivalent to requiring that and

The number of that satisfy this is obviously Again, the max with zero is extremely important to us, and not just because it makes the expression correct. So now we have our formula:

As in the previous one-dimensional case, we have ugly integer parts to contend with, not to mention the max with zero. But once again we can simplify matters by looking at telescoping sums and the “derivative” of our function, except that this time we want two-dimensional analogues. What we shall do is this (I think it is fairly obviously the natural thing to do so I won’t try to justify it in detail). Define a function by

We would now like to express an infinite linear combination as a linear combination of the (Note that is identically zero if Hence the range of summation.) Since we don’t actually know what the are, let us just imagine that we have some coefficients that satisfy

Then

But turns out to be something we can understand very well, because the functions are unexpectedly simple. Recall that

where and are minimal such that So in principle we can calculate This looks unpleasant until we make a couple of simple observations. First of all, note that is constant on rectangles of sidelengths and (Here I am taking and fixed and varying and ) More precisely, there is a tiling of into such rectangles, and is constant on each of these rectangles. As you go horizontally from one rectangle to another to its right, increases by 1, and as you go vertically upwards it decreases by 1. If you put all these facts together, you find that on all multiples of and is zero everywhere else.

What does this tell us about the coefficients ? Well, we want to be 1 if and otherwise. It follows that whenever and are coprime, with then we want to be 1 if and otherwise. This is very similar to our one-dimensional criterion, but with the huge difference that the sets along which we are summing the function are disjoint. (They are lines through the origin, or rather intersections of those lines with )

We also have a smallness criterion. Recall that in the one-dimensional case we needed to be arbitrarily small (interpreting as 0). Here we need the very similar property that

is arbitrarily small. Let me briefly remark that there is nothing to stop us defining however we like for values of with and it will be convenient to exploit this.

So there is a rather simple looking question: can we find a function such that the “-norm of the mixed partial derivatives” is arbitrarily small, the sum along the diagonal is 1, and the sum over all other lines of positive rational gradient is 0?

Somehow this question doesn’t feel as though it should be that hard, since the constraints don’t seem to interact all that much. So it would be natural to think that either there is a simple reason for no such existing, just as there was in the one-dimensional case, or there is a fairly simple construction of such an example.

**Solving the two-dimensional question.**

If there is a construction, how might one expect it to work? There are two reasonable possibilities: either one writes something quite clever straight down (perhaps involving trigonometric functions somehow to get the sums to be zero along the lines of gradient not equal to 1), or one goes for a just-do-it proof. My feeling (with the benefit of some hindsight) is that there are few enough constraints for a just-do-it approach to be appropriate, and in any case attempting a just-do-it proof can hardly fail to increase one’s understanding of the task, whether or not the attempt succeeds.

I should interrupt what I’m writing to say that I’m almost certain that a just-do-it approach works. By that I mean that I’m sitting here about to write down what I think is a correct proof, but that proof has been sitting in my head for the last day or so and has not been written down. And I’m pretty sure that my arguments that this would solve EDP can be tightened up and turned into a rigorous proof too, though that will need to be checked very carefully. So right now I don’t see why we don’t have a complete proof of EDP.

I should also at this point make a general remark about Polymath. I’ve been spending quite a lot of time thinking about the problem on my own, which is contrary to the spirit of Polymath and therefore demands an apology — but also an explanation. The explanation is that I have been trying to write this post for a long time — I started it nearly a week ago — but my efforts have been interrupted by plane journeys and ICM activities, including blogging the ICM. However, plane journeys, long waits for opening ceremonies, lonely moments in my hotel room, etc., are very conducive to mathematical thought, and since most of the argument I am writing here is rather simple (it was just a question of finding the right idea to get it started), I have found that the length of what I need to put in the post has been increasing faster than the length of the post itself. And one other mitigating factor is that there has been very little reaction to my recent comments on the previous post, where I first set out some of this approach (see in particular the two most recent comments), so I feel more entitled to go it alone for a while than I would have if the project had been roaring ahead at the sort of speed it was going at earlier in the year. Needless to say, the proof, if it turns out to work, is still very much a joint effort — this most recent argument depends crucially on the insights that have emerged from the more active discussion phase — and will be another paper of D. H. J. Polymath.

I must repeat that this post is being written over the course of many days. Several hours have passed since I finished the last paragraph, and during them I thought a bit harder about the step I was most worried about, which was dealing with the fact that the rationals are infinite and that at some point one must do some kind of truncation. I now think that that step will be more problematic than I had thought, and it seems to me to be about 50-50 whether it will be possible to get it to work. Having said that, I feel as though something non-trivial has emerged from the argument I shall complete in a moment, and I will be quite surprised (not to mention disappointed) if it leads nowhere. What I’m really saying is that I have an argument that does exactly what seems to be needed, except that the sums involved are a bit too formal. Now it could be that establishing convergence in some appropriate sense is where the real difficulty of the problem lies. In that case, this approach will be much less useful than I thought. But I am hoping that it is more of a technical problem. No time to think about it right now as I am going to a performance of A Disappearing Number, a play about which I shall undoubtedly have more to write later.

**WELL OVER A WEEK LATER**

The first thing to say is that EDP definitely isn’t yet solved. I’ve also reformulated the problem again, in a way that I find clearer. I could give a long description of how the new perspective evolved from the one above, but I think instead I’ll just sketch what happened.

**Solving the two-dimensional question, continued**

The first thing to observe is that if we take the characteristic function of a rectangle, then the sum of the is 4, since the only contributions come from the four corners.

Therefore, all we have to do is this. We shall build up an infinite matrix as a linear combination of rectangles — that is, functions of the form where and are arithmetic progressions with common difference 1. We start by choosing a large integer We then put entries at every point That ensures that the sum along the main diagonal is 1, provided that we do not do anything else to that diagonal. However, it also causes the sums along several other lines to be non-zero, so there is work to do. Note that the sum of coefficients so far is since there is just one rectangle and its coefficient is

To do the rest of the work, let us enumerate all rationals greater than 1 as and deal with each gradient in turn. What “deal with” means is that we shall add in some very small multiple of a rectangle in such a way that the sum along the line of the gradient we are dealing with is now zero, and will agree that from then on we will not change any entry in that line. Can we do this? Well, let’s suppose that we are dealing with the gradient So far, the sum of the entries of the matrix along the line of gradient is say. So we must find a rectangle that contains many integer points along the line of gradient . And we also need it not to intersect any line of gradient with But if we go far enough up the line with gradient we will find ourselves at a very great distance from the lines we are trying to avoid, so we can find a very large rectangle that is disjoint from those lines and contains many integer points of the form If it contains such points, then we make the coefficient of that rectangle equal to Since we can get to be as big as we like, we can get this coefficient to be as small as we like. So we have “dealt with” the gradient and the problem is solved.

**Why doesn’t that solve EDP?**

Let me start by saying why I thought it might solve EDP. I thought that if the sum of coefficients associated with each was arbitrarily small, then that would be saying that in a certain sense the average weight of the coefficients was arbitrarily small. Since the average size of the diagonal entries is 1, this ought somehow to give us what we want.

To understand why this doesn’t work, and probably can’t be made to work, I find it helpful to reformulate the problem again. To do so, let us think about the operators once again. (These first came in fairly near the beginning of the section entitled “Rectangular products.”) In particular, let us think what one of these operators does to the function that is 1 at and 0 everywhere else.

In fact, let us be slightly more general, as it will be useful to us later. What is the value of at For it to be non-zero we need and and then it is 1. So if we sum over all then we get the number of with that property. Now each with that property is of the form for some so when we sum over we are counting the number of ways of writing as for some and some

That is, if we define to be the number of ways of writing as with and then That is another way of saying that where is a *multiplicative convolution*.

If you don’t like following calculations but want to know what the moral of all that is, then here you are. Let us define the multiplicative convolution of two functions and to be the function given by the formula

Then the operator takes a function to the multiplicative convolution of with the function which is itself the multiplicative convolution of the characteristic functions of the sets and (That is, as I’ve already said, is the number of ways of writing as a fraction with numerator in and denominator in )

Now we’re trying to express the identity as a small linear combination of these convolution operators, so we can turn the problem into a one-dimensional problem: to write the function (which is the identity for the multiplicative convolution) as a small linear combination of functions

I am about 99% sure that if we could do that with a finite linear combination then we would have a proof of EDP. I used to think that “whatever works for finite should also work for infinite but with absolutely summable coefficients,” but it seems that this is wrong because even if the coefficients can be made very small, the functions themselves may be getting quite large in the norms we care about, so the convergence may be much too weak.

That does indeed seem to be the case here. If you convert the just-do-it argument above into an argument to express in terms of the functions then you get pointwise (absolute) convergence, but that is not strong enough.

Let me just elaborate on the remark that this problem can be solved directly by means of a just-do-it argument. I won’t give the full argument, but I’ll just point out that for each rational you have the option of doing the following. You choose a very large integer and a very much larger integer You then set and Since is much larger than is very close to for every and Moreover, the number of pairs such that is (assuming that and are coprime, that is). So if we want to get rid of a value at the point then we can subtract the function which has a small coefficient and is supported on only a very small neighbourhood of

What is not so good about it is that the norm of is roughly which is not small at all.

Thinking slightly further about this, I see that what I said above about pointwise convergence was a bit too pessimistic: for example, we can easily organize for the convergence to be uniform. And indeed, if we think of as being, in its broad shape, roughly like a set of size close … actually, what is it like?

Rational numbers sort of repel each other, so if we’ve devised and such that all points in are close to then all points that are not actually equal to will tend to have rather large denominators, and therefore not to occur very frequently. So let’s hazard a guess that for the purposes of thinking about norms, we can think of as having a spike at where the value is and other values that are all around 1. That would mean that the pth power of the norm was around Thinking of and as constants, it looks as though we might expect some kind of good behaviour to kick in when passes 2. Remembering that we’re actually interested in times the norm, we get something like But if the value we were trying to get rid of was small, we can multiply that by a small factor Ah, but the problem is that there are an awful lot of those s to worry about, and they add up in an -ish way.

What that boils down to is that this kind of greedy keep-away-from-everything-else approach doesn’t give us any kind of useful convergence, which we need if we are to get the averaging argument to work. I realize that I haven’t been too clear about what kind of convergence *would* work, but there is one kind that is definitely OK, and that is if we have a finite linear combination. So let me now ask a completely precise question.

**A sufficient (I’m almost sure) condition for EDP**

Let and be subsets of We define to be the function whose value at is the number of ways of writing as with and I claim that EDP follows if for every there exists a way of writing the function (which is defined on and takes the value 1 at 1 and 0 everywhere else) as a finite linear combination where the and are intervals of positive integers and

**Summary.**

In this post I have been looking at the representation-of-diagonals approach over the rationals with the extra symmetry constraint that the HAP products are all of the form where and that the coefficient of is independent of This symmetry condition forces the diagonal map we are trying to represent to be the identity. (As I write this it occurs to me that we might be able to take an *arbitrary* efficient representation of a diagonal matrix and average it over all multiples in order to obtain an efficient representation of the identity. So perhaps this “new” approach is equivalent to the old one — which would be quite encouraging as it would increase the chances that the new one can be made to work.)

Thinking through what this means, we find that we can reformulate the problem and obtain the question asked in the previous section.

**Where next for EDP?**

To finish, I’d like to ask if anyone has views about how they would like the EDP project to proceed. We are in uncharted waters at the moment: we have a project that started with a burst of activity — in fact it was sustained enough to count as more than just a burst — but now it has slowed right down and I’m not sure it will ever regain anything like its earlier intensity. And yet I myself am still enthusiastic about it, and perhaps I’m not alone in that.

My question is what should happen if it turns out that there are only two or three people, or even just one, who feel like working on the problem. In that case, there are various options. We could simply continue to post comments on this blog and not worry if there are very few participants or if the rate of posting is small. Or we could move the project to a different phase, more like a normal collaboration, with longer periods of private thought, but from time to time write comments or posts to make sure that any important developments are kept public (so that, in particular, anyone who wanted to join in would be free to do so). Or we could turn the project into a completely conventional collaboration, always with the understanding that the author of the eventual paper would be D. H. J. Polymath. And that’s not quite the most conventional it could go. The final possibility would be to close the project altogether, write a paper summarizing all the partial results, and then “free up” the problem for anyone who wants to think about it privately — if they used any of Polymath’s ideas they could simply refer to the paper that contained those ideas.

I think I favour keeping the project going as a Polymath project but with longer periods of private thought, but I don’t favour this strongly enough that I couldn’t be persuaded to change my mind if others feel differently. If that is the way we go, then my final question is, are there people out there who would like to be part of a small group of reasonably dedicated participants, or am I on my own now?

]]>**Representing the identity.**

Let me now discuss an approach that doesn’t work. (If you have been keeping up with the discussion, then this will be familiar material explained in a slightly different way.) Let be a large integer, and if and are two HAPs contained in then write for the matrix that is 1 at if and and 0 otherwise. In other words, it’s the characteristic function of Note that if then Let us write as etc.

Suppose that we could find for every pair of HAPs contained in a coefficient in such a way that and Then for every real sequence we have

It follows by averaging that there exist and such that

In particular, if each then there must exist and such that from which it follows that either or is at least So if we can get to tend to zero as tends to infinity then we are done.

Unfortunately, it is impossible to get to tend to zero. The reason is that the above argument would imply that if when mod 3, then for every there exists a HAP such that But that is not true: it can never be greater than 1.

Because of this example, we have been trying a different approach, which is to look for a more general diagonal matrix and write that as a linear combination of matrices of the form If one generalizes the approach in this way, then it is no longer clear that it cannot work — indeed, it seems likely that it can. However, it also seems to be hard to find a suitable diagonal matrix, and hard to think how one might decompose it once it is found.

**Working over the rationals instead.**

The single main point of this post is to suggest a way of overcoming this last difficulty. And that is to resurrect an idea that was first raised right near the beginning of this project, which is to look at the problem for functions defined on the positive rationals rather than the positive integers. (It is a straightforward exercise to show that the two problems are equivalent. For details go to the Polymath5 wiki and look at the section on the first page with simple observations about EDP.)

The point is that the counterexamples that show that the approach cannot work for the integers all make crucial use of the fact that some numbers are more divisible by small factors than others. But over the positive rationals all numbers are equally divisible. Or to put it another way, multiplying by a positive rational is an automorphism of This suggests that perhaps over the rationals it would be possible to use the identity matrix.

**Dealing with infinite sets.**

Now a problem arises if we try to do this, which is that the rationals are infinite. So what are we supposed to say about the sum of coefficients when we decompose the identity into a linear combination ?

Let me answer this question in two stages. First I’ll say what happens if we decompose the identity when the ground set is which shows a way of dealing with infinite sets, and then I’ll move on to where some additional problems arise.

Suppose, then, that the infinite identity matrix (that is, the function where and range over ) has been expressed as a linear combination Now let be a sequence. We’d like to show that it has unbounded discrepancy: that is, we’d like to show that for every there exists a HAP such that

Our problem is that is going to be infinite. We’d somehow like to show that it has “density” at most where is an arbitrary positive constant, or perhaps show that it has density zero. One way we might do this is as follows. For each positive integer Let be the sum of the over all such that both and have non-empty intersection with Then define the upper density of the coefficients to be If this is zero we can say that the coefficients have density zero. And if it is at most then we can say that they have density at most (In fact, even the would be OK — we just want the density to be small infinitely often.)

Let’s suppose that is at most Then if we truncate the sequence at , by changing all values after to zero, we find that

where I have written for the set of all HAPs that have non-empty intersection with Since the same averaging argument as before gives us a HAP (either or — WLOG ) such that

I fully admit that this is not very infinitary, but it is simple, and I’m not sure it matters too much that it is not infinitary. I’ll just briefly mention that one can use it to express the proof of Roth’s theorem (about AP discrepancy rather than HAP discrepancy). One expresses the infinite identity matrix as the following integral:

where One then expresses each function as a linear combination of HAPs of length (an arbitrary positive integer) and common difference at most One then obtains some cancellation in the coefficients, and proves that the density of the coefficients is at most (up to a constant factor). For details of how this calculation works, see this write-up, and in particular the third section.

The importance of the restriction on the length and common difference is that the edge effects (that is, APs that intersect without being contained in ) are negligible for large It is this feature that is slightly trickier to obtain for the rationals, to which I now turn.

**Transferring to HAPs and rationals.**

One useful feature of the set of APs of length and common difference at most is that each number greater than or equal to is contained in precisely such APs. A first question to ask ourselves is whether we can find a set of HAPs that covers the rationals in a similarly nice way. To start with, observe that if is a rational, then we can easily describe every HAP of length that contains Indeed, for every between and we have the HAP consisting of the first multiples of and that is all of them (since must be in the th place of the HAP for some and that and the length determine the HAP). So we have the extremely undeep result that every is contained in precisely HAPs of length (Note, however, how untrue this is if we work in the positive integers rather than the positive rationals.)

This looks promising, but we now need an analogue of the “increasing system of neighbourhoods” that was provided for us by the sets (It might have been more natural to work in and take the sets ) What is a sensible collection of finite sets with union equal to ?

One way of thinking about the sets is as follows. Using our system of APs, we can define a graph: we join to if there is an AP of length and common difference at most that contains both and The sets are quite close to increasing neighbourhoods in this graph: start with the number 1 and then take all points of distance at most from it. If we work with rather than then this graph is a Cayley graph, and after a while the neighbourhoods grow linearly with which is why the boundary effects are small.

What happens if we define a similar graph using HAPs in ? Now we are joining to if there exists and such that That is precisely the condition that there exists a HAP of length that contains both and In other words, we take the multiplicative group of and inside it we take the Cayley graph with generators all numbers with

This feels like the right graph to take, but it has the slight drawback that it is not connected: it is impossible to get from to if is a rational such that in its lowest terms either its numerator or denominator is divisible by a prime greater than The connected component of 1 in the graph is the set of all rationals where both and are products of primes less than or equal to But this is not really a problem: we’ll just work in that component of the graph.

Let’s write for the component of we are working in, and for the set of all points at distance at most from 1 in the graph. Now we can say how it would in principle be possible to prove EDP. We would like to find a way of writing the identity (this time thought of as the function where and range over ) in the form

where and are HAPs of length at most that are contained in For each let be the set of all such HAPs that have non-empty intersection with the neighbourhood Then we can define to be and we can define the *upper density* of the coefficients to be

Now let me show that if is at most for some sufficiently large then the HAP-discrepancy of every function on is at least This is by almost exactly the same argument that worked in The first step is to consider the restriction of to Then we know that

Now any HAP of length at most that intersects is contained in from which it follows that any HAP of length at most that intersects but is not contained in must intersect Since is a finitely generated Abelian group, the sets grow polynomially, which implies that the ratio tends to zero with

I’ll now be very slightly sketchy. We are supposing that is at most It follows that either or is noticeably smaller than In the second case we can change to and start again — we won’t be able to do this too many times so eventually we’ll reach the first case, where

In that case we have that

and that

After that, the argument really is the same as before (give or take the small approximations). [Remark: I have not checked the details of the above sketch, but I’m confident that something along these lines can be done if this doesn’t quite work. It’s slightly more difficult than in because it isn’t obvious that the intersection of a HAP with is a HAP, whereas the intersection of an AP with is an AP.]

**Characters on the rationals.**

Now we must think about how to express the identity as a linear combination of HAP products with coefficients of density at most where is some arbitrary positive constant. Taking our cue from the case, it would be natural to express the identity on in terms of characters, and then to decompose the characters. So a preliminary task is to work out what the characters are.

This is (not surprisingly) a piece of known mathematics. I shall discuss it in a completely bare-hands way, but readers who don’t like that kind of discussion may prefer to look at the comment by BCnrd to this Mathoverflow question.

First I’ll work out what the characters on are, and then I’ll look at Recall that a character is a function from to the unit circle in such that for every That is, it is a homomorphism from to

Suppose we know the value of at a rational That tells us what is for every integer However, it does not tell us what is, say. All we know is that which gives us two possibilities for So in order to specify we need to specify its values at enough rationals that we can write every other rational as a multiple of one of them. And the choices we make at those rationals have to be compatible with each other.

A simple way of doing that is to choose the value of at for every positive integer making sure that That is, we choose to be any point in and then for each we have choices for given our choices up to that point.

An equivalent way of thinking about this is that we choose a sequence of real numbers satisfying the conditions that and is a multiple of Then the corresponding character is defined at to be for any Note that this is well-defined, since if and are both at least then so

Now because we chose an element of and then made a sequence of finite choices, it is easy to put a probability measure on the set of characters. We can therefore make sense of the expression

and prove that it is Let me briefly sketch this. Suppose that Then for every so we get 1. If then let when written in its lowest terms. If is the (random) sequence that determines then each is uniformly distributed in the interval and But is uniformly distributed in the interval so this expectation is zero (as ).

We have therefore shown that

where is the identity indexed by

What do we do if we want to modify this to work for ? Well, an initial complication that (I hope) turns out not to be a serious complication is that is not an additive group: it contains 1 and it does not contain any prime greater than However, it generates a subgroup of which consists of all rationals with denominators that are products of primes less than or equal to When I refer to “characters on ” I will really mean characters on

To describe these, we no longer need a sequence of reciprocals such that *every* rational is a multiple of one of them: we just want to capture all rationals in But that is straightforward: instead of taking the sequence we could take the sequence or we could replace by the product of all the primes up to There are any number of things that we could do.

**Decomposing a character into “non-local” HAPs.**

The thing that seems to me to make this approach very promising is that for any character on it is possible to partition into long HAPs on each of which is approximately constant. As this result suggests, it is possible to decompose in an efficient way as a linear combination of HAPs, which is very much the kind of thing we need to do in order to imitate the Roth proof.

I should warn in advance that it is not quite good enough for our purposes, because the HAPs we use are not “local” enough: they are sets of the form such that is small, but we do not also know that and are small. Without that, each number is contained in infinitely many HAPs, so we no longer have the condition that enabled us to define the “density” of a set of coefficients. Later I shall present a different idea that does use “local” HAPs, but fails for a different reason. My gut instinct is that these difficulties are not fundamental to the approach, but whether that is mathematical intuition or wishful thinking is hard to say.

Before I go any further, here is an easy lemma.

**Lemma.** Let be a character on and let Then there exists such that for every there exists a positive integer such that

**Proof.** Without loss of generality Let and let be the lowest common multiple of the numbers from 1 to [Thanks to David Speyer for pointing out at Mathoverflow that the l.c.m. is around a non-negligible improvement over .] Then by the usual pigeonhole argument we can find a positive integer such that so we can take

For the next result we define a HAP to be a set of the form

**Corollary.** Let be a character on let and let be a positive integer. Then we can partition into HAPs of lengths between and on each of which varies by at most

**Proof.** We begin by covering the integers. Find a positive integer such that Then on any HAP with common difference and length at most varies by at most We can partition the multiples of into HAPs of length and they will cover all the integers. (Indeed, they will cover all the multiples of )

We now want to fill in some gaps. Let us write and let us pick an integer a multiple of and greater than such that Between any two multiples of there are at least multiples of forming a HAP. This HAP can be partitioned into HAPs of common difference and lengths between and If we continue this process and make sure that every positive integer divides at least one (which is easy to do), then we are done.

Now a character restricted to a HAP is just a trigonometric function. If the character varies very slowly as you progress along the HAP, then convolving it with an interval (as in the Roth argument, but now we are talking about a chunk with the same common difference as the HAP) we obtain a multiple very close to 1 of the character itself. With the help of this observation, we can actually decompose the character efficiently as a linear combination of HAPs. Since this does not obviously help us, I will leave the details as an exercise.

**What can we do with local HAPs?**

Let us fix a positive integer and define a HAP to be *local* if it is of the form where (Thus, the definition of “local” depends on ) What happens if we have a character and try to decompose its restriction to as a linear combination of local HAPs (of reasonable length) on each of which varies very little?

The short answer is that it is easy to *cover* with HAPs of this kind, but it doesn’t seem to be easy to *partition* into them. In order to achieve the partition into non-local HAPs in we helped ourselves to smaller and smaller common differences, and correspondingly larger and larger values of and Another problem with that method was that what we are really searching for is a very nice and uniform way of decomposing characters, such as we had in the Roth proof. There the niceness was absolutely essential to getting enough cancellation for the proof to work, but it wasn’t essential to represent the identity — we could allow ourselves a bit extra as long as that extra was positive semidefinite.

So let’s not even try to partition Instead, we could simply take our character and use it as a guide to the coefficients that we will give to our HAPs.

The rough idea would be something like this. Given a character and a HAP we choose a coefficient in some nice way that makes it large when is small. It could for example be for some suitable And then we could use convolution to represent times the restriction of to as a linear combination of sub-HAPs of of length smaller than but not too much smaller, and common difference

We would know from the lemma above that every would be contained in at least some HAPs with large coefficients, so the restriction of every would in some sense be catered for. I think there would be some that were catered for too much (the precise relationship between those and remains to be worked out, but I think this will be straightforward), but I hope that the whole decomposition can be defined in a nice enough way for the function that results to be the pointwise product of the original character with a “nice” non-negative real function that’s bounded away from zero. More speculatively, one can hope that the coefficients have small density and that it is possible to subtract a not too small multiple of the identity from the matrix and still be left with something that’s non-negative definite.

**An attempt to be more precise.**

In this final section I want simply to guess at a matrix decomposition that might potentially prove EDP. As I write this sentence I do not know what the result will be, but the two most likely outcomes are that it will fail for some easily identifiable reason or that the calculations will be such that I cannot tell whether it fails or succeeds.

Actually, to make the guess just a little bit more systematic, let’s suppose that for each HAP in the class of HAPs we are considering and for each character we have a coefficient That is, corresponding to we are taking the function Given two HAPs and what will the coefficient of be in the decomposition ?

Expanding this out gives us

If we reverse the order of expectation and summation then we see that the coefficient of is

Now let us think what we are trying to achieve with the HAPs and Given a HAP of the form we want to use it to contribute to the representation of characters for which is small. If is such a character, then we know that the numbers vary only slowly. Therefore, if we take short subprogressions of the form then on each one will be roughly constant. If we fix a length (which may have to be logarithmic in or something like that), then we can represent the restriction of to as a linear combination of the HAPs of length give or take some problems at the two ends.

Roughly speaking, the coefficient of will be That is, if we write for the HAP then it is approximately true to say that the restriction of to is

Now we want to do this only if is close to 1. So the coefficient of in the decomposition of will in general be where is some nice function (which, if the Roth proof is anything to go by, means that it has non-negative and nicely summable Fourier coefficients) that is zero except at points that are close to 1.

What would we expect to be like? Is there any chance that it is close to a multiple of ?

Since is roughly constant on whenever is non-zero, the value of this function at is roughly the same as it is if we take just singletons — that is, if we set and define to be So the function we get should have a value at that is close to

In other words, we get something that has the same argument as but that has big modulus at a rational if it so happens that is close to 1 for unexpectedly many positive integers Now this can happen, but I would think that for nearly all and if is not too small, then would be about its expected value of So I am hoping that we will have some kind of rough proportionality statement.

Going back to the coefficient of in the decomposition, let us take the two HAPs and We have decided to define to be so is equal to

It is a significant problem that we are forced to consider the case where in the Roth proof, we did not have to look at products of APs with different common differences. But if we are to have any chance of decomposing into “local” HAPs, then necessarily we must use completely different common differences to deal with rational numbers that are a long way apart in the graph. This is such a significant difference from the Roth proof that it may be a reason for this approach not working at all. However, it does look as though there is plenty of cancellation.

I think the expression we would be trying to bound is

which is perhaps better thought of as

And our main strength would be that we would be free to choose as it suited us. There would be two challenges: to obtain a good bound on the above expression, and to prove that we could subtract a reasonable-sized multiple of the identity and still be left with a positive semi-definite matrix.

I think the above counts as a calculation for which I cannot tell whether it fails or succeeds. But I hope it may be enough to provoke a few thoughts.

]]>The first step of this programme was an obvious one: obtain a clean and fully detailed proof in the APs case. That has now been completed, and a write-up can be found here. For the benefit of anyone who is interested in thinking about the next stage but doesn’t feel like reading a piece of formal mathematics, let me give a sketch of the argument here. That way, this post will be self-contained. Once I’ve given the sketch, I’ll say what I can about where we might go from here. It is just possible that we are in sight of the finishing line, but that is unlikely as it would depend on various guesses being correct, various potential technical problems not being actual, various calculations giving strong enough bounds, and so on. Thus, the final part of this post will be somewhat speculative, but I will make it as precise as I can in the hope that it will give rise to a fruitful line of enquiry.

**Roth’s theorem on AP-discrepancy.**

Recall that Roth proved that for every function there is an arithmetic progression such that , and that this result was shown to be sharp by Matousek and Spencer (who removed a log factor from an upper bound of Beck). Roth actually proved a more precise result: the discrepancy is attained on a progression of length at most . In this post I shall sketch an argument that shows that for any we can find an AP of length at most and common difference at most such that . (I haven’t actually checked, but I’m 100% confident that this slightly stronger result would also follow from Roth’s proof, and also from a proof due to Lovász that uses the semidefinite programming method that we’ve discussed at length in the past.)

Recall first some notation. If and are subsets of then I shall also write and for their characteristic functions. In general, given two functions and defined on will stand for the rank-1 matrix

Suppose now that we can find a diagonal matrix , arithmetic progressions and and coefficients such that Let the diagonal entries of be real numbers Then for any function defined on we have

Let be the ratio It follows from the above identity that there exists such that

and hence that there exists an arithmetic progression such that

That is the representation-of-diagonal method in its purest form. However, as Moses pointed out (I haven’t actually found the precise comment), it is enough to obtain an identity of the form where is positive semidefinite, and everything else is as before. That is because so including does not make things worse for us. And in fact, it turns out to introduce a useful extra flexibility.

**The decomposition.**

We start with the following simple lemma. Up to now, the inner product has been the obvious one But from now on it is more convenient to take the inner product (where is, as usual, shorthand for ).

**Lemma 1.** Let be an orthonormal basis of Then

**Proof.** Let be the unitary matrix with rows The entry at of is which is the inner product of the th and th columns of Since the columns of a unitary matrix also form an orthonormal basis, this gives us as required.

**Corollary 2.** For each let be the function defined by Then

**Proof.** The functions form an orthonormal basis.

The plan now is to decompose each trigonometric function into a linear combination of characteristic functions of suitably chosen APs. This automatically gives us a decomposition of each matrix into a linear combination of products We then find that each individual is used for several different s, and when we add up the coefficients there is a considerable amount of cancellation. It is that cancellation that gives us a good upper bound on and leads to the results claimed above.

At this point, let us fix We would like to decompose into a linear combination of arithmetic progressions mod with common difference at most It will turn out that their cardinalities will be at most as well. Provided that is at most this will mean that each AP wraps around (that is, passes zero) at most once, so we can split it into two genuine APs.

Given how do we choose a suitable common difference? Well, in an earlier and less successful version of this argument we chose for each a such that and Here stands for the distance from to the nearest integer. This estimate implies that and that (together with the triangle inequality) implies that and more generally that In more qualitative terms, adding a small multiple of to does not change the value of very much.

Let be the interval inside where for some smallish absolute constant Let and let be the *characteristic measure* of — that is, the function that takes the value on and 0 outside From the observation just made it is straightforward to prove the following lemma. The fact that is real follows from the symmetry of

**Lemma 3.** Let . Suppose that and Then there is a real constant between 1/2 and 1 such that

The next lemma is also easy to prove. Note that we have dropped the hypothesis that is small and weakened the conclusion accordingly.

**Lemma 4.** Let and suppose that Then there is a real constant such that

It will be problematic for us if is negative, but a simple trick makes sure that that never happens: we simply convolve with twice. By the two lemmas above, we may conclude that where is always non-negative, and is at least 1/4 when (That is because )

A discrepancy bound of can be obtained if for each we choose such that (which is possible by the usual pigeonhole principle argument), and replace by which can be seen to be a linear combination of functions of the form

The main ingredient that improves the bound from to is instead to choose rather carefully some coefficients and to replace by For each the earlier proof took to be 1 for precisely one and 0 otherwise. What we do instead is approximate this in such a way that for each with the function is "sufficiently smooth".

The next lemma, which I shall state without proof (the proof is straightforward), gives a good idea of what we mean by "sufficiently smooth". Given a function we define its Fourier transform by the formula

**Lemma 5.** Let and define a function by whenever Then is non-negative and real for every and

From this and easy Fourier manipulations, we know that the function also has these two properties: that is, for every and

We now take to be an integer approximately equal to and we define to be (the dependence on is via the definition of the function ).

We note here that for a typical there will be values of such that and that the usual pigeonhole argument gives us at least one. (There are some for which there are more values of with small, but this does not matter.) On the other hand, since is nothing other than we know that for each fixed the function has non-negative Fourier coefficients that sum to 1. These properties are what are needed for the rest of the calculations to work.

We then take the matrix

Since for each there is at least one with the remark following Lemmas 3 and 4 tells us that this matrix has the form with each at least 1/8. It follows that is positive semidefinite, since it equals

It remains to bound from above the sum of the absolute values of the coefficients of the various products of progressions used in the above decomposition. I won’t reproduce those here, but the Fourier estimates for the functions turn out to come in and allow us to calculate the sum exactly. It equals Since is proportional to this is proportional to a gain of a factor proportional to on the trace of It follows that the discrepancy bound we get is proportional to When this gives us

**From APs to HAPs.**

Is there any possibility of adapting the above proof to give a proof of EDP? I think there is, and in this final section I shall try to explain why.

To begin with, let us set to be the diagonal matrix with down the diagonal. (By I mean the function discussed in the previous post and set of comments, defined by the properties and ) Now let us suppose that we could find an efficient HAP-decomposition of a matrix of the form

with every The inner matrix is of the form with positive semidefinite, so this would give us an efficient decomposition of where is also positive semidefinite. Now the matrix we would like to decompose can also be written as

It is therefore tempting to look for a proof that is similar to the proof of Roth’s discrepancy theorem above: we try to decompose each function (the value of which at is ) in an efficient and unified way as a linear combination of characteristic functions of *homogeneous* arithmetic progressions rather than just arithmetic progressions. And then we hope that there will be plenty of cancellation in the coefficients.

Now the decomposition of into APs very definitely used non-homogeneous ones. So why is there any hope of decomposing into HAPs? Well, the kind of thing one might try is this. Choose a small and for each choose some such that Let be (as before) the interval and let be (as before) the characteristic measure of Then which tells us that can be written as a (very nice) linear combination of translates of The trouble is that most of these translates are non-homogeneous.

However, if we choose to grow to infinity so slowly that it can be thought of as a constant (this is a standard trick — treat as a large constant, and as long as for sufficiently large we can get a discrepancy lower bound that tends to infinity with we are done), then the fraction of translates of that are non-homogeneous is “only” Furthermore, and much more importantly, is on average very small on the complement of the HAP So perhaps we can afford to ignore the decomposition outside this HAP.

How would we decompose the part *inside* the HAP? Well, let’s suppose we were trying to decompose rather than (We could do this, and the result would be a decomposition of a matrix with positive semidefinite. I don’t know whether changing the coefficients from to would be a problem.) Since is naturally expressed as a Dirichlet convolution (except at 1) we have a natural decomposition of into HAPs. I think we should also have a natural decomposition into HAPs of the restriction of to So if we use that, and then take the coefficients from the Roth argument (or perhaps something else tailored to the new situation), then … just perhaps … we will have an efficient decomposition that gives a good approximation to And if we want more than just a good approximation, then we could decompose the part of that lies outside into functions supported on singletons. This would be inefficient, but (after cancellation of the contributions from different ) the total weight of the inefficiency should be small.

But they are not quite the same: as an individual researcher I often give up on problems with the strong intention of returning to them, and I often find that if I do return to them then the break has had an invigorating effect. For instance, it can cause me to forget various ideas that weren’t really working, while remembering the important progress. I imagine these are common experiences. But do they apply to Polymath? Is it possible that the EDP project could regain some of its old vigour and that we could push it forward and make further progress? Is it conceivable that it could go into a different mode, where people contributed to it only occasionally? (A problem with that is that one of the appeals of a polymath project is checking in to see whether there are new ideas, or whether people have reacted to your comments. It is not clear to me that this appeal works if it is substantially slowed down.)

Anyhow, as Terry Tao might put it, this situation can be regarded as an opportunity to add a new datapoint to the general polymath experiment. Recently the following conjunction of circumstances occurred: I found myself on a plane, my laptop battery lasts a fraction of the time it should, and all the films on offer were either unappealing or too appealing (meaning that I’d been looking forward to seeing them but didn’t want to waste them by watching them in aeroplane conditions). I therefore found myself with several hours to think about mathematics. It was just like the old days when I didn’t have a laptop and there would only be a couple of films, both terrible. So I thought about EDP. Now I would like to avail myself of the opportunity to obtain a new datapoint by writing my thoughts, such as they were, down in a post and seeing whether anyone feels like joining or rejoining the discussion. I have plenty of questions, some of which may be fairly easy: I hope that will be a good way of tempting people.

**A brief reminder of some of the ideas we have been thinking about.**

Since one of the aims of this post is possibly to attract newcomers to the project, let me briefly recap one or two things, including a statement of the problem. We define a *homogeneous arithmetic progression*, or HAP for short, to be a set of the form for some pair of positive integers . Given any sequence , we define its *HAP-discrepancy*, or discrepancy for short (since in this context we will almost always be talking about HAP-discrepancy and not any other kind of discrepancy) to be the maximum of over all HAPs . Erdős’s discrepancy problem is the annoyingly simple sounding question of whether it is possible that one can find arbitrarily long sequences with bounded discrepancy. That is, does there exist a constant such that for every there is a sequence of length and discrepancy at most ?

When we started work on the problem, my perception was that a major difficulty was that the problem relied heavily on the sequence taking values in . There is a very simple example that appears to demonstrate this: the sequence . It is easy to check that the discrepancy of this sequence (even the infinite sequence) is 1, and yet it takes values of modulus 1 two thirds of the time. This is a pity, because the usual analytic methods tend to give results that can be generalized from purely combinatorial statements (typically involving sequences that take just two values) to more general analytic ones (typically involving sequences that satisfy some analytic condition such as a norm bound).

**A more general question.**

One of the main pieces of progress to come out of the original discussion was Moses Charikar’s observation that it *is* possible to attack the problem analytically after all. The trick is to use a weighted norm that tends to give more weight to numbers when they have more factors. For instance, a multiple of the example mentioned above shows that there is no hope of proving that the discrepancy is unbounded for every sequence such that . However, there appears to be no reason in principle that one could not prove that the discrepancy was always at least for a sequence such that is unbounded. If one could do that, then it would imply EDP, since for sequences we have for every .

There has been quite a bit of discussion about how one might be able to find such a set of weights and prove the result, but so far we have not succeeded. Rather than repeating what the possible methods of proof might be, I would like to suggest focusing on the problem of choosing the weights, because it seems to me that there is plenty of opportunity here to get further insights into the problem.

To repeat: in order to prove EDP we need to find a system of weights such that is unbounded above and such that every sequence has discrepancy at least . The problem I would like to focus on for a while is this: find a good candidate system of weights. That is, find weights that do not obviously fail to work. What I hope is that if we look at several systems of weights and manage to show that they do not work (by constructing increasingly clever sequences ) then we will eventually settle on a system of weights that does work. If we can do that, then we will have split the problem into two and solved one of the two parts. And we have ideas about how to do the other part. (Roughly speaking, we want to find a way of decomposing a diagonal matrix: the two parts of the problem are to choose the diagonal matrix and to find the decomposition.)

**Which systems of weights are definitely bad?**

For the remainder of this post I want to make a start (or rather, a restart, since much of what I say can be found in the earlier discussions) by thinking about what is wrong with certain systems of weights. It will be convenient to rephrase the problem slightly: let us ask for a system of weights such that the discrepancy of an arbitrary real sequence is always at least (where tends to infinity as tends to infinity). That way we care about the weights only up to a scalar multiple, so we do not have to normalize them carefully in advance.

The sequence 1, -1, 0, 1, -1, 0, … shows that we cannot take for every . (I have already said this above, in a very slightly different way.) Indeed, its discrepancy is 1, but would be around 2/3 if we took for every . And the ratio of these two is not unbounded.

The same sequence disposes of many other systems of weights. For example, suppose we were to choose randomly to be 1 with probability and otherwise. Then with very high probability would be close to and would be close to . Therefore, once again we would have a quantity that is close to 2/3 for the ratio, which does not tend to zero.

Indeed, any system of weights that is not concentrated in the multiples of 3 will have the same defect, for precisely the same reason. To put that more precisely, suppose that Then with as above, we have while the discrepancy of the sequence is 1. Thus, for the weights to yield a proof of EDP we need

This simple observation can be generalized in many directions. For instance, it is easy to prove in a similar way that we need It is also easy to prove in a similar way that we need A general message from this argument is this: the weights must be strongly concentrated on numbers with many factors. And when one thinks about this, it is not too surprising: the numbers with plenty of factors are precisely the numbers that belong to many HAPs and therefore the numbers for which it should be in some sense difficult to incorporate a large real number into a sequence with low discrepancy (because there are likely to be many constraints on ).

But let us stay for the moment with elementary observations rather than thinking about what sequences there are that are concentrated on numbers with many factors. We know that for every the proportion of the total weight of the on values of such that but is . One way of achieving that is to and to choose a set that contains, for each , exactly one element that is divisible by but not by . One could then let if and only if

If is the set then it is easy to find an appropriate sequence : just let if for even , if for odd , and 0 otherwise. The intersection of any HAP with is either empty or takes the form so this sequence has a discrepancy of 1, while

However, if consists of an arbitrary number that is equal to 1 mod 3, an arbitrary number that is equal to 3 mod 9, and so on, then many more HAPs can intersect and I have not come up with a method of constructing bounded-discrepancy sequences with a substantial restriction to Let me ask this as a formal question. It doesn’t seem to be trivial, but I cannot believe that it is especially hard either.

**Question.** Let be a subset of such that for every there is no element of that is congruent to mod and at most one element that is congruent to mod Must there be a sequence of bounded discrepancy that takes values everywhere on ?

It is not in fact necessary for the sequence to take values everywhere on in order to show that the characteristic function of is unsuitable. However, since I expect such a sequence to exist, I have stated the question in this stronger form. Probably one can also ask for the sequence to take values or 0 outside

I should briefly explain why I have not asked the more general question of whether it is possible to prove that *no* small set can work (as opposed to a set that is forced to be small by the divisibility conditions above). That is because cardinality on its own is not enough: for example, one could take to be a HAP of length , and then to find a sequence of bounded discrepancy that takes values on would be equivalent to finding a counterexample to EDP. The idea here is to show that for certain sets one can exploit the freedom of a sequence outside to make the discrepancy small.

Nevertheless, there might be a condition of the following kind: perhaps if has the property that it cannot intersect a HAP of length in more than elements, then it cannot serve as a system of weights for EDP. And I don’t have any particular reason to think that is the right function here — perhaps one could prove a similar result for significantly faster-growing functions.

**A candidate for a good system of weights.**

I’d now like to resurrect an idea that I mentioned at some point earlier in the discussion. It’s a suggestion for a sequence of weights that seems to me to have some chance of working. And if it doesn’t work for some obvious reason, then it feels like the kind of idea that could be modified in various ways to get round preliminary objections.

Let me start with two ideas that don’t work. The first is to take for every . We have seen that this doesn’t work — indeed, the proportion of weight it attaches to multiples of 3 is not

The second is a feeble attempt to get round this obvious problem with the first. We want to give extra weight to numbers with more factors, so how about defining to be , the number of factors of ?

To see that this does not work is slightly harder, but only slightly. Again we shall show that this system of weights does not concentrate on multiples of 3. (Of course, we care about multiples of other numbers too, but if it doesn’t work for 3 then we’re already in serious trouble.) To see this, let us first consider how to estimate This we can rewrite as since each contributes 1 to the sum for each multiple of that is at most And this we can approximate by which is roughly (I won’t bother to analyse how good an approximation that is.)

Now let’s consider what happens if we estimate in a similar way. This time, we can approximate the sum as since if is a multiple of 3 we get no contribution, and if is not a multiple of 3, we get a contribution of 1 for each multiple of that is not a multiple of 3, and there are about of those. So the sum works out to be around Now this is 4/9 of the entire sum, so the usual sequence shows that setting does not work.

Nevertheless, we do seem to have achieved *something*: with this choice of we have given less than average weight to non-multiples of 3. The problem is that we have not gone far enough. Now one way of thinking of is that it is . A way that one might think of attaching yet more weight to numbers with plenty of factors would be to define to be This can be rewritten as That is, it counts pairs such that and A fairly simple calculation similar to the one done above shows that this function again attaches too much weight to non-multiples of 3, but it is an improvement on the previous function. The constant function 1 attaches weight 2/3 to the non-multiples of 3, the function attaches weight 4/9, and the function attaches weight 8/27 (or 4/9 of what it should if it were unbiased).

This suggests, correctly, that we can obtain the desired weight if we iterate this process an unbounded number of times. This leads to two suggestions for a system of weights. The first is to choose some slow-growing function and take the function (which is defined inductively by the formula Another is to iterate until a fixed point is reached, though for this the definition must be modified somewhat. We obviously cannot find a positive function such that but we can find a function such that Indeed, if we set then the first few values are Let me now just write the sequence: it goes

This sequence can be found on Sloane’s database. (It’s sufficiently natural that I’d be perturbed if it couldn’t.) I haven’t yet understood everything it says there about it, but there appear to be some facts there that might conceivably be of use.

So here is an obvious question: is there some reason that it cannot work in a proof of EDP? Let me say once again what this means. For this function to work in a proof of EDP we would need to be able to prove an inequality that said that for *every* sequence the HAP-discrepancy is always at least So to prove that it does not work it is sufficient to find a sequence (I’ve been assuming it will be real, but even a vector-valued sequence would do) of bounded discrepancy such that for some constant that is independent of

One possible reason for its not working might be that it grows too fast, or perhaps has a subsequence that grows too fast. Let me try to explain what might conceivably go wrong.

To do this, we need a more explicit definition of . It can be defined as follows: is the number of sequences such that and for each we have . Here we include the empty sequence as a sequence. (An equivalent definition would be to ask for sequences that start with 1.) It is easy to check that this function satisfies the relation and the initial condition . And to see how it works in a concrete case, let's take . The possible sequences are (), (2), (3), (4), (6), (2,4), (2,6) and (3,6).

In general, working out is slightly complicated. If for some prime , then we can take any subset of in increasing order, so If is a product of distinct primes, then for each there is a one-to-one correspondence between the sequences of length that we can take and the set of surjections from to . Indeed, given a surjection , define to be the product of all such that and take the sequence . Conversely, from the sequence we can recover the and hence the function , which has to be a surjection since the sequence is strictly increasing.

Unfortunately, there is no nice formula for the number of surjections from a set of size to a set of size , but we can think of as being a function that behaves fairly like .

So how big can be if ? The biggest it can be for a prime power is , since the best possible case is . As for products of distinct primes, the best we can do is if we take the first primes. The product of these we can sloppily estimate to be about , which is around , which in turn is around . By contrast, is around , so in this case appears to be much smaller than . However, it is also a lot bigger than 1, so it would appear that the weight attached to a big power of 2 does not by any means dominate the sum.

Given that evidence, I find it not quite clear whether we should be happy with the function as already defined, or whether we should consider weighting it in some way so as to make it roughly the same size throughout the range. At this stage it is perhaps enough to say that we have the option of attempting the latter. (I am mindful of the experimental evidence discovered by Moses and Huy, which suggests that the optimal function has a tail that shows some kind of exponential decay. I don't have any kind of theoretical argument for that.)

**How might one prove the result we want?**

In my previous post on EDP, I suggested various techniques for finding delicate examples. One of them was recursion, which I rather dismissed. However, I am interested in it again now, not least because the function above has a natural recursive definition.

How might that lead to a proof? Well, here’s a suggestion. Suppose we find a non-trivial way of writing the identity matrix as a linear combination , where the and are HAPs and is the rank-1 matrix with xy-entry equal to 1 if and and 0 otherwise. We know that this will not be a useful decomposition of the identity. However, the formula can be rewritten where if and if . It follows that the diagonal matrix that has down the diagonal can be expressed as a sum , where is the diagonal matrix that takes the value at all multiples of apart from itself, and 0 everywhere else. (To see this, note that )

If we have a way of decomposing the identity (for all ) then we can subtract the matrix that’s 1 at (1,1) and 0 everywhere else, to obtain the matrix And then we can use the same method to obtain decompositions of all and finally we can take an appropriate linear combination to obtain the matrix So far, this achieves nothing much, but note that our decompositions involve negative as well as positive coefficients. If we can do it in a way that involves many HAPs with many common differences, then perhaps there is some hope that there will be significant cancellation so that the decomposition of is more efficient than the decompositions of the individual matrices

**Further remarks.**

I’ll start by briefly pointing out that if the above approach is to work, then it will have to use HAPs that are not “full”. (By a full HAP I mean one of the form ) That is because the sequence that is 1 up to and thereafter has bounded discrepancy on all full HAPs. More generally, for a decomposition of a diagonal matrix to work, it must lead to a discrepancy proof for the class of HAPs used in the decomposition, so one should not try to build the matrix out of some restricted class of HAP products unless there is a chance that the theorem is still true for that restricted class.

A second remark is essentially the same as this recent comment of Gil’s. It’s that we already know that the extremal examples — by which I mean long sequences with low discrepancy — have to have some kind of multiplicative structure. This information ought to help us make our guesses. At the time of writing I don’t have a clear idea of *how* it would narrow down the search, so in order to pursue this thought, the first thing to do would be to consider exactly that question: if you have information about what the worst sequences are like, how does that affect the possible matrix decompositions?

**Could the weights be multiplicative?**

Since it’s relevant to that last question, let me mention a system of weights I thought of that doesn’t seem to work, but that might perhaps be salvageable. The idea is to define weights such that for every and

An instant problem with that idea is that it seems to be hard to concentrate the weights on multiples of 3. Indeed, either tends to infinity, or the sum of the first weights tends to zero as a proportion of the sum of the first weights. That is because by multiplicativity.

The second condition essentially means that the sequence should grow faster than any power of . But if we set to be the same for every prime , as seems quite natural, then we find that the sequence grows at most polynomially (since the biggest we can get is at powers of 2, and would be which is a power of ). So we seem to be forced to take to be a “constant” that tends to infinity. But then, if some back-of-envelope calculations are correct (but I’m far from sure about this), the weight of the sequence is too concentrated on large powers of 2. In fact, I think it may even be concentrated at just the single largest power of 2 less than , which is obviously disastrous.

So it looks as though multiplicative sequences are out, but this claim needs careful checking as it could be rubbish. (One way it could be rubbish is if it is possible to salvage something from the idea by having different values at different primes, but at the moment I’m not very convinced by that idea.)

**A rather vague general question.**

The following question is important, even if hard to make precise: how much does the choice of the weights matter?

Let me at least try to pin the question down a little bit. We know that there are certain simple constraints that the have to satisfy, and also that these constraints narrow down the choice quite a bit. However, they do not seem to determine the weights anything like uniquely. It is quite possible that with a bit more thought we could come up with some more constraints. In fact, that is clearly something we should try to do, so let me interrupt the discussion to ask that more formally.

**Question.** Are there some other kinds of constraints that the must satisfy?

In relation to this question, it is worth looking at this comment of Sune’s, and the surrounding discussion.

Even if the answer is yes, it is clear that the constraints will not completely determine the sequence . (Why? Well, given one matrix decomposition that works, one can build other ones out of it in artificial ways. I leave that as an easy exercise.) But do they come close? For example, is there some low-dimensional space such that every system of weights that works is a small perturbation of a sequence in that space?

I don’t really care about that as a mathematical question so much as a strategic question. From the point of view of solving EDP, can we afford to be reasonably relaxed about the choice of the and focus our attention on the decomposition, or will we find that the desired decomposition does not exist unless we choose the extremely carefully?

**A summary of what I have said above.**

I hope very much that it will be possible to revive the EDP discussion, even if the pace is slower than before. It seems to me that an excellent topic to concentrate on for a while would be the question of how to find a system of weights such that the square of the HAP-discrepancy of every sequence is always at least Amongst the subquestions of this question are whether the sequence I suggested above has a chance of working, “how large” the set of sequences that could work is, and whether there are constraints of a completely different kind from the ones considered so far.

I have considered real sequences, but if a sequence of weights is to work, we also need the inequality to generalize to vector-valued sequences. That is, we need the square of the discrepancy of to be at least A further question of interest is whether generalizing to vectors introduces interesting constraints that are not felt if one just looks at scalar sequences.

]]>The difficulty we face seems to be this. (I do not have full confidence in what I am about to write. It could be that I will change my mind, and it could be that others already have a different and better analysis.) I’ll take for the purposes of illustration the problem of writing a diagonal matrix with unbounded trace as a sum where the and are HAPs and . After trying a few things, I have started to feel as though it isn’t going to be easy to find a completely explicit construction for this. That is partly because no pattern has jumped out at me from the output from Moses’s experiments, though I have not looked as hard as I might. So the best I can think of to do at the moment is something like this (but almost certainly not precisely this).

1. By some kind of wild guess, choose some HAPs and some positive constants .

2. Form the matrix , which has trace .

3. Make those choices in such a way that is significantly larger than .

4. Prove that it is possible to cancel out the off-diagonal part of this matrix and at most half the diagonal part by subtracting a sum of the form , with also significantly smaller than .

The problem with this is twofold. I don’t know how to make the wild guess in the first place (though I suppose we should be trying to favour numbers with many factors) and I don’t see how to do the subtracting-off part. In other words, I don’t see how to make *any* of it work. And we also know that nothing *too* neat can work, or we’d end up proving things that we know to be false.

Now if we had a good idea about how to produce the matrix that we then wanted to make diagonal, then we could go ahead and think about how to make it diagonal, and if we had a good idea about what the subtracting-off procedure was, then perhaps we could come up with a matrix that was particularly well suited to that procedure. But to have to guess both at once is rather more problematic.

On the plus side, we have not been thinking about it for very long and have not tried all that many approaches. So I still feel optimistic about this method of proof.

Let me briefly mention two open problems that have come up recently and are related to EDP.

**Problem 1.** Does there exist a matrix with 1s down the diagonal such that for every set of the form we have ?

We can also write as . So this is another discrepancy problem. Moses, who came up with this problem, has done some computer experiments that suggest that such a matrix may exist. If it does, then it proves that there exist sequences of vectors and such that for every but for no set of the above form is it the case that , which would disprove a strengthening of EDP.

**Problem 2.** For which sets of HAPs is EDP true? Specifically, if you have a sequence of length for sufficiently large , must there be a HAP with common difference a prime or a power of 2 on which the discrepancy is at least ? And what about a HAP with common difference 1 or else a HAP of the form ?

In both cases, these are minimal reasonably natural sets of HAPs that rule out certain counterexamples. In the first case, the powers of 2 are there to stop the counterexample 1,1,-1,-1,1,1,-1,-1,… and others like them. In the second case, allowing all lengths for HAPs of common difference 1 is to stop you taking the first terms to be 1 and the rest to be -1.

Also, a “morally necessary” condition if you want unbounded discrepancy is that the HAPs should form a spanning set. If they don’t, then you can find a sequence (though not necessarily a sequence) that has *zero* discrepancy on any of them. Even if that doesn’t quite disprove EDP for that collection of HAPs, it certainly rules out the kinds of methods we are trying to use to prove it, and it also rules out vector-valued strengthenings of EDP.

To finish, let me mention that Klas Markstrom seems to be getting close to the first rigorous (and very computer-assisted) proof that there is no infinite sequence of discrepancy 2. See this sequence of comments to get an idea of where he has got to. I think we can forgive Mathias for the admission in his paper that “Repeated efforts to improve the proposition to the case have failed.”

]]>**Lewis’s theorem**

Before I explain the new result, I’d like to discuss a result from the geometry of Banach spaces. (I don’t need to do this, but it is a very nice result and this is a good excuse to write about it. If you just skim this section and take note of the definition of “well spread out” below, that will be enough to follow the rest of the post.) The result is a special case of an important theorem of Dan Lewis, though the proof that I shall sketch is a little different from Lewis’s.

Let be a symmetric convex body in . The *minimal volume ellipsoid* of is, as its name suggests, the ellipsoid of minimal volume that contains . Now if is this ellipsoid, then obviously the boundary of will touch the boundary of , or else we could just shrink by a small amount and have a new ellipsoid with smaller volume that still contains . (For what it’s worth, this argument relies on compactness, which guarantees that if the boundaries of and are disjoint, then there is a positive distance between them.)

However, a moment’s thought suggests that we ought to be able to say more. Suppose, for instance, that is a sphere and that its boundary touches the boundary of only at two antipodal points. It feels as though we ought to be able to expand in the subspace orthogonal to those points and get a larger affine copy of inside . Note that the problem of finding the minimal volume ellipsoid that contains is equivalent to the problem of finding the maximum volume linear copy of contained in the unit sphere, where by “linear copy” I mean the image of under a linear map.

More generally, it ought to be the case that if we take the maximum volume linear copy of inside the unit sphere , then the *contact points* (that is, the points where the boundary of touches the boundary of ) should be “well spread about”. The intuition would be that if the contact points are not well spread about, then we could find some big area of the sphere that is nowhere in contact with and expand into that area (perhaps very slightly contracting in other places in such a way as to increase the volume in total).

To turn this intuition into a formal proof, we need to start with a formal definition. When do we count some unit vectors (the contact points) as being “well spread about”? There turns out to be a very nice answer to this: define the unit vectors to be well spread about if there are positive scalars such that the identity matrix . Here, I write for the matrix . One could alternatively think of the as column vectors and write for the sum instead. Either way, if is any vector, then . Another small remark is that saying that can be written as with positive is equivalent to saying that belongs to the convex hull of the , since the trace of is , and this implies that the have to add up to . A more geometrical way of expressing this condition is as follows. A well-known property of orthonormal bases is that each vector is equal to the sum of its projections on to the 1-dimensional subspaces generated by the basis vectors: that is, for every . The unit vectors are well spread about if we can find weights such that every is equal to the *weighted* sum of the “coordinate projections”: that is, we need positive constants such that every is equal to .

Lewis’s theorem is the statement that the contact points of and the maximal volume copy of inside are well spread about in this sense. (Or rather, Lewis’s theorem is a more general statement of which this is a special case.) Before I sketch the proof, let me point out that it implies Fritz John’s famous result that for any symmetric convex body there is an ellipsoid such that . To show this, let be the Euclidean norm and let be the norm with unit ball . Since we know that for every . If we can write as , where each is a contact point, then since each is a unit vector in both norms, we know that . By Cauchy-Schwarz, the square of this is at most the product of and . But the latter expression is equal to . Therefore, , so , which is equivalent to saying that . And this obviously implies John’s theorem.

And now for the proof of the assertion about the contact points. Let us suppose that is *not* in the convex hull of the matrices with a contact point. Then by the Hahn-Banach separation theorem there exists a linear functional that separates from all such matrices. We can phrase this in terms of the matrix inner product , which is the trace of . Then the linear functional can be thought of as a matrix with the property that , which is times the trace of , is 1, whereas there is some such that for every contact point . Now is equal to .

Now if is any matrix, then the determinant of is . It follows that if the trace of is , then the determinant of is .

Next, we use the fact that for every contact point . If is another unit vector, then

which is at most If , then this is at most . Thus, there is a positive constant , independent of , such that if is a contact point and then . From this it follows that

By compactness, there is some positive distance between the inner symmetric convex body and the set of all points on the surface of the sphere that are not within of a contact point. Therefore, if we perturb by a sufficiently small amount, we will not cause any point not within of a contact point to go inside . But the volume of is equal to the volume of up to and every point within of a contact point maps to a point of norm at most . Therefore, for sufficiently small , the body has volume greater than that of but still lies inside the unit sphere. The proof is complete.

**What has this to do with EDP?**

The Erdős discrepancy problem asks us to prove that if is any sequence of length , then there is some HAP with characteristic function such that . Now one way one might imagine trying to show this would be by showing that these functions are so spread about that *every* sequence of norm (that is, every sequence such that ) has inner product at least with one of them. And that one could prove if one had a representation with , since

Actually, there is an obvious problem with this idea, which is that if all the are positive, then unless all the are singletons then there will be positive elements off the diagonal and therefore no hope of representing the identity. However, one can rescue the situation by considering slightly more general vectors such as convex combinations of the set of all s and their negatives. If is such a combination and , then must have discrepancy at least in one of the HAPs that makes up . And the advantage of these vectors is that they are allowed to have negative coordinates.

I had this thought a few years ago when attempting solve EDP and dismissed it once I spotted that the troublesome 1,-1,0,1,-1,0,… example kills the idea. It simply isn’t true that if is a sequence of length and , then has unbounded discrepancy in some HAP, as the troublesome example (multiplied by ) demonstrates.

However, an important lesson from the SDP approach is that this is a less serious problem than it at first appears, because we are not forced to use the norm as a lower bound for discrepancy. Instead, we can use a *weighted* norm. In the context of this proof, this suggestion becomes the idea that instead of trying to represent the identity, we could go for a diagonal matrix with large trace instead.

To spell this out, let us define a *test vector* to be any vector of the form , where and the are characteristic functions of HAPs. Suppose that is a diagonal matrix with entries down the diagonal, and that we can write it as a convex combination , where the are test vectors. Then if is any sequence, we know that . But from the representation of we also know that

It follows that there exists such that , and hence, since is a test vector, that the discrepancy of in some HAP is at least as well.

Thus, we can prove EDP if we can find a convex combination , where the are test vectors, that equals a diagonal matrix with unbounded trace.

Moses Charikar points out a modification of this requirement that is somewhat simpler. If with , then , which is a convex combination of matrices of the form , where both and are characteristic functions of HAPs. Conversely, if we write a diagonal matrix as , where the and are HAPs and , then

From this it follows, as before, that if is a sequence then there is some HAP on which the discrepancy of is at least .

**The non-symmetric vector-valued EDP.**

Why should we believe that good representations of diagonal matrices exist?The best answer I can give to this is that it turns out to be equivalent to a strengthening of EDP that has (I think) no obvious counterexample. Let me give the strengthening and then prove the equivalence.

We have already met the vector-valued EDP: given any sequence of unit vectors in a Euclidean space and any constant there exists some HAP such that has norm at least . The problem that we shall now take is the “non-symmetric” vector-valued EDP: given any *two* sequences and of vectors in a Euclidean space satisfying the condition for every , there are HAPs and such that . If we insist that , then this reduces to the usual vector-valued EDP (for ).

The reason this looks as though it is probably true (or rather, that it and EDP are “equi-true”) is that if you try to make one of the sequences have small discrepancy by making it small on some HAP, then you have to make the other one large on that HAP. (This argument applies rather better to the real-valued case.) Also, the condition means that the directional bias of the is somehow similar to that of the , which helps the discrepancies of the two sequences to align with each other.

These are fairly feeble arguments, which reflects the fact that just at the moment I very much want to believe this statement and have not made a big effort to disprove it.

How about the equivalence? This again follows from the Hahn-Banach theorem. Let’s suppose that we cannot represent any diagonal matrix with trace greater than as a convex combination of s. Then the Hahn-Banach theorem implies that there is a separating functional, in this case a matrix such that whenever is a diagonal matrix with trace at least , and whenever and are characteristic functions of HAPs. Moreover, this is an equivalence: if such a matrix exists, then we trivially cannot represent as a convex combination of s.

The first condition (that whenever is a diagonal matrix with trace at least ) is easily seen to fail if is not constant on the diagonal, and if it is constant then we see that its value on the diagonal must be at least .

If such a matrix exists, then choose vectors and such that for every and . (For instance, could be the rows of and could be the standard basis of .) Then Therefore, the condition that is always at most 1 is telling us that is always at most 1. If we now rescale by multiplying the by then we have a counterexample to the non-symmetric vector-valued EDP.

Conversely, if we have such a counterexample, then we can set and we have a separating functional that proves that no diagonal matrix with trace greater than can be expressed as a convex combination of s.

**What are the difficulties in carrying out this programme?**

The main problem we now face is this. If we want to write a matrix with unbounded trace as a convex combination of s, then we need to use fairly large HAPs. Indeed, the trace of is (if I may mix up sets and characteristic functions), so if our convex combination is then we need to be unbounded. This means more than merely the statement that the weighted average size of is unbounded, since some of the are of necessity negative.

In particular, we need the sizes of the and to be unbounded. But if that is the case, then we are trying to write a diagonal matrix with a large trace as a convex combination of matrices that are very far from diagonal, which forces us to rely on clever cancellations.

Having said that, I would also like to point out that the task is by no means hopeless. For example, if and are the Walsh functions (with respect to some bijection between and ), then equals the identity, even though each individual is maximally far from being diagonal. Thus, there is nothing in principle to stop cancellations of the kind we want occurring. It is just a technical challenge to find them in the particular case of interest.

Note that if we are looking for clever cancellations, then we are more likely to be able to find them if we deal as much as possible in points that belong to several HAPs. This suggests, and the suggestion is borne out by the experimental evidence we have so far, that the diagonal matrix we produce will probably be bigger at points with many factors. But one of the nice aspects of this approach is that one can concentrate on getting rid of the off-diagonal parts and not worry too much about which exact diagonal matrix results in the end. Making sure that its trace is big is a far easier task than guessing what it should be and then trying to produce it.

I will end this post by reiterating what I said after Moses first suggested using SDP: it seems to me that we have now “softened up” EDP. That is, it no longer feels like a super-hard problem with major obstacles in the way of a solution. Obviously I can’t be sure, but I feel optimistic that we are going to get there in the end.

]]>So far, the data has had some expected features (such as the sequence values tending to be larger when is smooth) and some puzzling ones (such as the fact that is much much larger than ). An initial hope for the method was that the experimental data would give rise to a very precise conjecture, but so far that has not happened. There are, however, various avenues that have not been fully explored, and I still have some hope that we will suddenly stumble on some data that we can fully understand.

One way of doing this is to introduce a smoothing. One idea I had turned out to go too far: in the search for a pretty formula I ended up with a version of EDP that was false. For the record, here is that version. One can think of the discrepancy of a sequence as its largest inner product with the characteristic function of any HAP. Those characteristic functions have sharp cutoffs (in that they go up to for some and then suddenly stop), so one might hope to make the problem nicer by smoothing the cutoffs. One natural way of doing this is to look at functions such as , which take the value at and 0 at non-multiples of . For this to be a sensible idea, we need a “smoothed EDP” to be true. That is, we need it to be the case that for every sequence there exist and such that has absolute value at least . However, this is false for the character-like function . The multiplicativity of implies that it will be enough to show this when , so let us fix some and calculate the sum .

We do this in the usual way. First, let us look at non-multiples of 3. That gives us the sum

which equals . More generally, the sum over multiples of that are not multiples of works out as . One can check that the function is increasing in the interval , so if we sum this over all then the alternating series test tells us that the total is at most 1/3, whatever the value of .

A similar argument works if we use a weight of instead, but the calculations are easier. By the alternating series test, we know that is positive and at most 1. Call this value . Then , which is at most .

At one point I observed that if you have a sequence of bounded discrepancy then the Dirichlet series is uniformly bounded for all positive real , and wondered if we could deduce anything useful from this. The fact that has this property shows that we certainly cannot derive a contradiction, though it leaves open the possibility of deducing that the sequence is forced to have character-like behaviour.

In the light of these observations, what hope is there for obtaining nicer formulae by smoothing? The answer is that even if we cannot smooth the HAPs, we can at least smooth the interval of integers we are thinking about as a whole. That is, instead of thinking of discrepancy as a function of (the length of the sequence ), we could take an infinite sequence , associate with a weight , and think of the discrepancy as a function of the decay rate (or equivalently the half-life, which is proportional to ). That is, we would like to prove that for each there must exist such that has absolute value at least , where as .

If we try to prove this using Moses’s SDP approach, then, as Moses points out, there is a nice formulation of the problem. We would like to find non-negative coefficients that sum to 1, and coefficients such that , such that

for every sequence . This is similar to what we were looking at before, but now we have attached some slowly decaying weights to the .

Note that we could think of the previous version of the problem as doing exactly the same, but with weights that equal 1 up to and thereafter. I am hoping that with these smoother weights, we will get nicer numbers coming out.

I would like to end this post by drawing attention to Gil’s polynomial method. This is a completely different approach to EDP, where one constructs a polynomial over a finite field that is identically zero if and only if EDP is true. Rather than explain the idea in detail, let me link to two comments in which Gil himself gives a nice explanation. He introduced the idea in this comment and an interesting variant of the idea in this one. It would be good to add this method too to the list of possible proof techniques on the wiki.

]]>Since I posted EDP9, there has been a development that has radically changed my perception of the problem, and I imagine that of anyone else who is following closely what is going on. It began with this comment of Moses Charikar.

Moses’s idea, which I shall partially explain in a moment (for about the fifth time) is based on the theory of semi-definite programming. The reason I find it so promising is that it offers a way round the difficulty that the sequence 1, -1, 0, 1, -1, 0, 1, -1, 0, … has bounded discrepancy. Recall that this fact, though extremely obvious, is also a serious problem when one is trying to prove EDP, since it rules out any approach that is just based on the size of the sequence (as measured by, say, the average of the squares of its terms). It seemed to be forcing us to classify sequences into ones that had some kind of periodicity and ones that did not, and treat the two cases differently. I do not rule out that such an approach might exist, but it looks likely to be hard.

Moses proposes (if you’ll excuse the accidental rhyme) the following method of proving that every sequence has unbounded discrepancy. I’ll state it in infinitary terms, but one can give finitary versions too. Suppose you can find non-negative coefficients (one for each pair of natural numbers ) that sum to 1, and non-negative coefficients summing to infinity, such that the quadratic form

is positive semi-definite. Then you are done. Why? Because if were a sequence of s with discrepancy at most , then the first term in the above sum would be at most , while the second would be , which contradicts the positivity in a rather big way.

Why does this deal with the troublesome sequences? Because it is perfectly possible (and necessary, if this method is to work) for the sum of the over all that are not multiples of 3 to be finite. So this method, unlike many previous proof ideas, would not accidentally be trying to prove something false.

Note that to prove that the quadratic form is positive semi-definite, it is sufficient to write it as a sum of squares. So EDP is reduced to an existence problem: can we find appropriate coefficients and a way of writing the resulting form as a sum of squares?

Now this idea, though very nice, would not be much use if there were absolutely no hope of finding such an identity. But there is a very clear programme for finding one, which Moses and Huy Nguyen have started. The idea is to begin by using semidefinite programming to find the optimal set of coefficients for large (that is, for a finite truncation of the infinite problem), which can be done on a computer, and which they have already done (see the comments following the one I linked to above for more details). Next, one stares very hard at the data and tries to guess a pattern. It is not necessary to use the very best possible set of coefficients, so at this point there may be a trade-off between how good the coefficients are and how easy they are to analyse. (This flexibility is another very nice aspect of the idea.) However, looking at very good sets of coefficients is likely to give one some idea about which choices have a chance of working and which don’t. Having made a choice, one then tries to prove the positive semidefiniteness.

As Moses points out, if such coefficients can be found, then they automatically solve the vector-valued problem as well, since we can look at the expression

instead, and the positivity will carry over. As he also points out, if you modify our low-discrepancy multiplicative examples such as by multiplying by the unit vector , where is the largest power of 3 that divides , then you get a sequence of discrepancy that grows like , which shows that this method cannot hope to do better than a bound. But I’d settle for that!

Finally, I want to draw attention to another comment of Moses, in which he introduces a further idea for getting a handle on the problem. I won’t explain in detail what the idea is because I haven’t fully digested it myself. However, it gives rise to some Taylor coefficients that take values that are all of the form for some integer . It is clear that they have a great deal of structure, but we have not yet got to the bottom of what that structure is. If we do, then it may lead to a concrete proposal for a matrix of coefficients that should be a good one.

My optimism may fade in due course, but at the time of writing it feels as though these new ideas have changed the problem from one that felt very hard into one that feels approachable.

]]>Another question that has been investigated, mostly by Sune, is the question about what happens if one takes another structure (consisting of “pseudointegers”) for which EDP makes sense. The motivation for this is either to find a more general statement that seems to be true or to find a more general statement that seems to be false. In the first case, one would see that certain features of were not crucial to the problem, which would decrease the size of the “proof space” in which one was searching (since now one would try to find proofs that did not use these incidental features of ). In the second case, one would see that certain features of *were* crucial to the problem (since without them the answer would be negative), which would again decrease the size of the proof space. Perhaps the least satisfactory outcome of these investigations would be an example of a system that was very similar to where it was possible to prove EDP. For example, perhaps one could find a system of real numbers that was closed under multiplication and had a counting function very similar to that of , but that was very far from closed under addition. That might mean that there were no troublesome additive examples, and one might even be able to prove a more general result (that applied, e.g., to -valued functions). This would be interesting, but the proof, if it worked, would be succeeding by getting rid of the difficulties rather than dealing with them. However, even this would have some bearing on EDP itself, I think, as it would be a strong indication that it was indeed necessary to prove EDP by showing that counterexamples had to have certain properties (such as additive periodicity) and then pressing on from there to a contradiction.

A question I have become interested in is understanding the behaviour of the quadratic form with matrix . The derivation of this matrix (as something to be interested in in connection with EDP) starts with this comment and is completed in this comment. I wondered what the positive eigenvector would look like, and Ian Martin obliged with some very nice plots of it. Here is a link to where these plots start. It seems to be a function with a number-theoretic formula (that is, with a value at that strongly depends on the prime factorization of — as one would of course expect), but we have not yet managed to guess what that formula is.

I now want to try to understand this quadratic form in Fourier space. That is, for any pair of real numbers I want to calculate , and I would then like to try to understand the shape of the kernel . Now looking back at this comment, one can see that

Since the bilinear form is determined by the quadratic form