Has DeepMind mechanized mathematical intuition?
What is human-level mathematics? Part 2: correlation and gradient saliency
The awareness that they are about to make the continuum of history explode is characteristic of the revolutionary classes at the moment of their action.
(Walter Benjamin, Theses on the Philosophy of History, XV)
The continuum of history and the continuum of personal time unfold in a common universe but their structures are so distinct that the single word “time” seems inadequate to account for both.1 Nothing exploded when I put off brushing my teeth, for two hours of my personal time on the morning of December 2, 2021, in the agitation of reading the articles in the press that appeared simultaneously to announce the news that, as one headline put it, “Maths researchers hail breakthrough in applications of artificial intelligence.” The headlines themselves did announce an explosion in the continuum of history, but that’s what headlines are meant to do; this is one way to delineate the “awareness” of the media.
Part I of this essay ended in 1892, with Klara 1.0 doing her best to answer Poincaré’s questions as he struggled to decide whether
two surfaces with the same Betti numbers always have the same group?
(Poincaré used the word “surface” for what we now call a manifold.) We now know the answer to this question — in fact, Poincaré answered the question one page later, in the negative — and a “strong semantic search engine” of the kind Christian Szegedy of Google is hoping to develop would know how to find the answer in the collected works of all the mathematicians who have ever lived, stored as a massive database. Just now Siri failed to report the answer when I asked Poincaré’s question, although she did provide the results of a Google search that proved that she recognized the relevant keywords — not any better than what I would have found myself if I didn’t already know the answer, but pretty impressive nonetheless.
A few weeks ago I saw a “snapshot” document listing current programs at nine mathematical institutes in the US, Canada, and England. The word “data” appears in seven titles, the word “number” exactly once. Poincaré’s question was not inspired by vacuuming up all the data generated by the history of mathematics but by a single data point:
We know that two closed two-dimensional manifolds with the same Betti numbers are homeomorphic.
He reminds the reader why we know this — the answer is identical to what we learn in contemporary graduate courses in Riemann surfaces, though elsewhere he recalls a different answer, the one we learn in introductory courses in algebraic topology — and sharing this information would constitute the first stage of Klara 1.0’s “supervised learning.” But it wouldn’t be nearly enough for the first of our Klaras to come up with Poincaré’s initial question, not to mention the definitive version of what we know as the Poincaré Conjecture. So we must move on to the next model.
Klara 2.0 is the most personable of the Klara series. I imagine her on the model of the “unthreatening intermediate stage” computer in the fictional dialogue from Tim Gowers’s 1999 article “Rough Structure and Classification,” an ideal post-doc, brimming with suggestions, who through no fault of her own just happens to be a robot. Or, as Gowers writes, “an unthreatening intermediate stage between what we have now, computers that act as slaves doing unbelievably boring calculations for us” — that’s Klara 1.0, who answers Poincaré’s questions but formulates none of her own — “and full automation of mathematics.” Here are a few slightly edited excerpts from the dialogue Gowers wrote to illustrate a computer that has “guided the mathematician to an important insight which is the basis for Roth’s proof of his theorem on arithmetic progressions”:
…
Mathematician: Well random methods often give the best answer for problems like this. Let’s try to prove that […] is best possible.
Klara 2.0: [Pauses for 0.001 seconds] Actually it isn’t. Behrend found a much better bound in 1946 [Downloads paper].
M. Oh dear, I'm out of ideas then. Could you give me a suggestion by any chance?
K. We have a set A. We want to prove that a subset of a certain form exists. The best way of proving existence is often to count.
M. [Intrigued] Yes, but what would that mean for a problem like this?
K. Here we wish to count the number of solutions (x, y, z) of the single linear equation 2 y=x+z. A standard tool in such situations is Fourier analysis, exponential sums, the circle method, whatever you like to call it.
…
M. We are trying to show that that expression cannot be zero unless […]
K. [Trained to humour mathematicians] That is indeed almost equivalent to our original problem. Thank you. Notice that … is exactly the number we had before when A was chosen randomly. This number is large and positive - a good sign.
M. So we would like the rest of the sum to be small enough […]
K. Yes. I am trying a few obvious ideas such as […]
M. Please show me your calculations.
K. Here they are. [Displays them.] They do enable me, or rather us, to prove a partial result. […]
This dialogue, if you squint at it, could almost be lifted directly from Ishiguro’s novel, with Professor Gowers replacing the teenage Josie as Klara’s human friend. Gowers goes on to speculate on how a “menu-driven” approach to guide the choice of “standard tricks,” like the ones mentioned in the dialogues, could help to orient Klara 2.0 in her probabilistic wandering across the infinitely forking paths through the mathematical landscape.
Williamson on the one hand, and Juhász and Lackeby, on the other hand, also settled DeepMind’s incarnation of Klara 2.0 firmly in their familiar grooves of representation theory and knot theory, respectively, while Klara 3.0, in the spirit of Poincaré, is expected to carve her own grooves. But this year’s model has a very different engine than the one Gowers devised in the far-off 20th century, long before deep learning techniques had ended the intervening AI winter by achieving disconcerting success in “machine translation, speech recognition, and visual object recognition.”2 The authors of the Nature article call it “powerful pattern recognition and interpretation methods from machine learning.” More technically, the process proceeds in three steps, illustrated by the three blue boxes in the flowchart copied below:
The first box, outlined in gray, is still the mathematician’s job at this stage: stating the problem assigned to the Artificial Mathematician. Klara 1.0 never gets past providing the calculations indicated in the first blue box. The second and third blue steps represent the value added by the deep learning techniques that differentiate the first two Klaras. Step 2, supervised learning, relates the two features X(z) and Y(z) of the mathematical problem, or more precisely “train[s] a function that predicts Y(z), using only X(z) as input.” Here “train” is a term of art that I will have to explain in a future book review, but it ultimately comes down to the statistical term “correlation” that I borrowed for today’s subtitle. Step 3 has the unwieldy title of the use of attribution techniques, and in the event consists of “gradient saliency … calculating the derivative of outputs” of the function “with respect to the inputs.” I take it this is a standard technique for directing the search for patterns most quickly in the most promising direction.
The initiative returns to the human mathematicians to “identify and prioritize aspects of the problem that are most likely to be relevant” once the patterns have been generated. The can then choose to iterate the process again on the basis of this new information (the dotted arrows in the figure) or to try to turn the conjecture created by human-machine interaction into a theorem.
The flowchart is so helpful in providing intuition about the stages of the process that I find myself wondering whether a History of Science AI could generate it on its own, without human intervention. The real historical Poincaré had no Klaras at all in his closet and had to fill in all the boxes himself. We’ve seen where Klara 1.0 and Klara 2.0 intervene in the generation of conjectures, and now it’s easy to guess that Klara 3.0 — to whom we will return in a few weeks — will be responsible not only for the blue boxes but for the gray boxes as well — except possibly for the box labelled “Prove theorem,” which for the time being is the responsibility of a different Google department.
One advantage Klara 2.0 shares with Poincaré is the ability to learn from mistakes, like this first erroneous version of the Poincaré conjecture:
Each polyhedron which has all its Betti numbers equal to 1 and all its tables T_q orientable is simply connected, i.e. homeomorphic to a hypersphere.3
Klara 2.0 has no choice in the matter, because machine learning, deep or otherwise, is precisely the automated process of learning from mistakes, with the goal of maximizing some “reward function” or other and/or minimizing a corresponding “loss function”; this explains why “gradient saliency” is a preferred method, as indeed it has been since the days of Newton.
For the sake of argument we will assume that the first and erroneous conjecture arises through Klara 2.0’s interaction with a mathematician, whom we may as well call Poincaré, similar to the Gowers dialogue reproduced above, but following the framework in the flowchart from the Nature article. So we can take Y(z) to be the class of the 3-manifold z up to homeomorphism, while X(z) is its homology. The hypothesized function f^ that Klara 2.0 correlates with the data from the first blue box, using the procedures in the next two boxes, ends up attributing homeomorphism to the 3-sphere as Y(z) to X(z) with trivial homology in degrees 1 and 2.
This doesn’t strike me as an especially plausible strategy. For that matter, I don’t really believe that DeepMind’s Deep Learning Klara 2.0 is ready to use modifiers like “actually” or “obvious” or even “often” correctly, much less to “humour mathematicians,” as in the Gowers dialogue. But I’ll put off to the next installment consideration of how these attributes might be modeled by an actual computer.
Would any of these Klaras’ algorithms have chanced upon the datum consisting of Poincaré’s counterexample to his first version of the Poincaré conjecture? Here is the capsule Wikipedia description:
The Poincaré homology sphere (also known as Poincaré dodecahedral space) is a particular example of a homology sphere, first constructed by Henri Poincaré. Being a spherical 3-manifold, it is the only homology 3-sphere (besides the 3-sphere itself) with a finite fundamental group. Its fundamental group is known as the binary icosahedral group and has order 120. Since the fundamental group of the 3-sphere is trivial, this shows that there exist 3-manifolds with the same homology groups as the 3-sphere that are not homeomorphic to it.
The first homology group of the Poincaré homology sphere is trivial precisely because the binary icosahedral group is perfect and therefore has trivial abelianization.
It’s interesting to compare Poincaré’s counterexample with the famous counterexamples constructed by (my Paris colleague) Marie-France Vignéras to the speculation that two riemannian manifolds with the same Laplace spectrum are isometric — Mark Kac’s question whether “one can hear the shape of a drum.” Vignéras’s counterexamples used hyperbolic geometry, rather than spherical geometry as in Poincaré’s case, but moreover her construction was informed by number theory, which was not at all explicit in Kac’s original formulation.
“How did Poincaré arrive at producing his example?” Claudo Bartocci asks, in his article “Analogy and Invention”.4
This is, of course, an unanswerable question: we can only hazard some guesses. We can safely assume that Poincaré was not unaware of Felix Klein’s book Vorlesungen über das Ikosaeder und die Auflösung der Gleichungen vom fünften Grade (Teubner, Leipzig 1884), where the icosahedral group was described in detail. However, there is evidence that Poincaré had no inkling of what today is called the Hurewicz theorem (namely, the fact that the abelianization of the fundamental group is the first homology group with integer coefficients): in fact, he did not exploit the property of the icosahedral group of being a perfect group, and instead explicitly computed both the “homology equivalences” and the “homotopy equivalences” … Moreover, it seems certain that he believed that his example was only one among many possible others …while we know that he stumbled over the only possible example. In conclusion, we can plausibly suppose that Poincaré essentially proceeded by trial and error, perhaps keeping in his mind (or in the back of his mind) that the icosahedral group could have been a workable algebraic object.
Poincaré’s mistake was simultaneously “loss” and “reward”: the falsification of his initial conjecture by means of a counterexample, the insight on the role of the fundamental group provided by this counterexample, and the speculation — extremely audacious, in retrospect — that this new insight was all he needed to formulate what turned out to be the correct conjecture.
Thurston’s article “Proof and Progress in Mathematics” famously identified “understanding” as the goal — the “reward” — mathematicians seek:
what [mathematicians] really want is usually not some collection of “answers”—what they want is understanding.
Would Poincaré have used the word “understanding,” or “knowledge,” or “intuition,” to characterize his reward? The urgent need to react to the “explosion” provoked by the Nature article does not allow me the time needed to find the answer in Poincaré’s writings on philosophy of science; all I can say is that all three words occur frequently in Poincaré’s Science et méthode. Nor have I had time to determine how Poincaré felt about the fruitfulness of mistakes. Like nearly all mathematicians, living or dead, I haven’t made as many mistakes as Poincaré during his short life, and that’s nothing to be proud of. Will Klara 3.0 be able to appreciate that?
The primary reward for the mathematicians who co-authored the Nature article is the same as Poincaré’s and Thurston’s: I’ll call it understanding, but you can read this as knowledge or intuition, as you prefer. For some of the mathematicians working on mechanization, the understanding that can be acquired from working with computers is itself part of the reward; Geordie Williamson has told me that this is one of his motivations. It takes more than understanding, though, to make the continuum of history explode. Walter Benjamin wrote
no fact that is a cause is for that very reason historical. It became historical posthumously, as it were, though events that may be separated from it by thousands of years.
Nature’s editors and the authors of the press release that preceded the publication date by 4-5 days don’t have the leisure of waiting thousands of years. It’s too soon to know whether or not this interaction between DeepMind and mathematicians will be seen as a turning point in the history of mathematical intuition. It’s striking, though, that the reward for the journalists who are covering this event, and similar events, is aligned with the reward sought by the AI industry: namely the sense that they are in the business of exploding the continuum of history, either by generating turning points or by reporting on them.
I don’t mean to suggest that there is a causal connection between the crude economic interests of Springer Nature and those of the tech industry. Certainly we can expect the journalists to have a sense of what counts as “news,” as I already wrote at the beginning of this essay, just as we can expect the industry to know what’s best for its bottom line. Perhaps the two are both aligned with a deeper historical structure. My aim is not to pursue this kind of speculation, but rather to remind readers that it is not forbidden to their own understanding to resist the unsubtle hints built into titles that start with “Advancing mathematics…,” or headlines that include the word “breakthrough.”
Comments are welcome at the Mathematics without Apologies blog.
This observation is so natural and banal that I expected Google to find for me a sentence very similar to the one you just read, but signed by a respected philosopher. So I temporarily forgot that Google is not yet able to answer questions like “Which philosophers thematized the distinction between personal and historical time?” The answers were inconclusive, and I failed to find a pithy sentence that stressed the role of the media in forging historical memory. The quotation from Reinhard Koselleck’s Futures Past that opened the News Flash two weeks ago is not bad. Wilhelm Dilthey’s philosophy also thematized the distinction but emphasized the irreducibility of lived time to the time of the natural sciences, whereas historical time is a separate Zeitschicht, to use Koselleck’s term. In the end I decided to stick with Benjamin because his word “explode” is appropriate to the media reaction to the DeepMind announcement — though nothing Google does is in any way “characteristic of the revolutionary classes”!
These are “three of the most important subfields in AI,” according to Stuart Russell’s Human Compatible. I am tempted to add automatic journalism to the list, not because it’s especially important but rather because I strongly suspect that several of the unsigned articles that appeared in various languages (Spanish, Portuguese, Dutch) the day after the publication of the Nature article were generated by machines on the basis of press releases.
In more modern language: the 3-sphere is the only closed oriented 3-manifold with trivial first and second homology groups. Poincaré formulated this version of his conjecture in the Second Supplement to Analysis situs, and replaced it by the one that turned out to be correct only in the Fifth and final Supplement.
Claudio Bartocci, “Analogy and invention Some remarks on Poincaré’s Analysis situs papers,” in The Philosophers and Mathematics. Festschrift for Roshdi Rashed, ed. H. Tahiri, Springer (2017)