Mechanical understanding of proof?

Rough notes from a seminar talk at the Laboratoire SPHere in Paris

Feb 29, 2024

From the home page of the laboratoire SPHERE, or SPHere, or SΦHERE, in Paris

One of the problematic effects of AI on society is its epistemic influence—the way it renders intelligence as machine intelligence and implicitly fosters knowledge as procedural knowledge. (Matteo Pasquinelli, The Eye of the Master, p. 253)

This week’s title is the title of the talk I gave in Paris on February 23 at the invitation of the formidable Karine Chemla. As the date approached I felt increasingly convinced she had made a mistake in inviting me; on January 27 I wrote

Heureusement que j'ai encore un mois pour préparer l'exposé, je ne sais toujours pas ce que je pense sur la question.

This feeling of dread persisted even after I had written in the last installment that the Logic Day panel discussion had “provided enough material to fill an hour.” You have to understand that the laboratoire SPHere is one of the wonders of the world1 for anyone interested in Science-Philosophy-History and the relations among them. There is a widespread misconception among mathematicians that, because the language, the methods, and the concerns of mathematics are so impenetrable to outsiders — which is definitely true — that only mathematicians are qualified to write the history of mathematics. This is a misconception because history has its own methods and concerns that are quite different from those of mathematicians; few mathematicians are familiar with the specialized training required for the rigorous application of these methods.

I know I cannot be a competent historian, but I am at least aware of the methods of historical practice, and for this I have to thank my interaction with French members of the research group REHSEIS, which was absorbed into SPHere in 20092 . Nor am I a philosopher. And my hosts at SPHere were well aware of that when they invited me. Nevertheless I felt the need to apologize for this — not once, but twice — in the notes I prepared for my talk:

The purpose of my intervention today is to think more carefully about the issues that arise when we try to understand the differences, if any, between human and machine understanding of proof, so that I will be able to give a coherent presentation at some point in the near future. I am doing this as a mathematician. I am neither a philosopher nor a historian, but I can read what mathematicians write and try to understand what they have in mind.

By a “coherent presentation” I meant one built around the defense of a particular thesis and leading to a specific conclusion, because I figured that this is the kind of talk a Historian or a Philosopher would be expected to do. A mathematician’s talk is also supposed to end with a conclusion, so I had (at least) two reasons to apologize. Anyone thinking clearly about my predicament would realize, however, that a mathematician speaking to an audience of Historians and Philosophers of Mathematics is serving as a live specimen, so that my role as speaker was to provide raw empirical data. Certainly my hosts understood this. So why was I so apologetic?

My talk began with the account of Kevin Buzzard’s Formalizing Fermat project that readers have already seen, and moved quickly to Buzzard’s positive vision —

One could envisage a computer-formalised version of the services such as Math Reviews which summarise modern mathematical research papers for humans, or databases of results in algebraic and arithmetic geometry which can be mined by AI researchers.

— as a possible way of interpreting what Buzzard meant by “understanding.” This vision (as I wrote in my notes) is familiar from proposals to formalize mathematics. It's very common for a mathematician to need a specific result that may or may not have been treated in the literature. To find it one might, in chronological order: (a) keep looking in the library; (b) ask an expert; (c) perform a Google search; (d) ask the question on MathOverflow. You might want to add (e) ask ChatGPT but this turns out to be a bad idea; on one attempt ChatGPT told me I had already answered the question in a paper I hadn't known I had written.

It sounds like Buzzard hopes to add a new option (f) to create a specialized database that would know all the answers (unlike mere mortal humans) and would interpret the question correctly (unlike ChatGPT). This is how Buzzard seems to define "understanding": he wants to be able to consult the database in the same way one would consult a human colleague. And it is different from the definition familiar from Bourbaki's Architecture of Mathematics:

Indeed every mathematician knows that a proof has not been "understood" if one has done nothing more than verify step by step the correctness of the deductions of which it is composed and has not tried to gain a clear insight into the ideas which have led to the construction of this particular chain of deductions in preference to every other one.

This definition should settle once and for all, in the negative, the question of whether Lean “understands” its formal proofs. Building on Buzzard’s proposal, we can ask an

OPEN QUESTION: Can a "semantic search" of the literature, as Buzzard hopes to make possible, constitute understanding of a mathematical question, in the absence of understanding of the proofs involved in obtaining the results of the search?

Before I continue (say my notes), I want to draw attention to a curious observation. Just over a year ago I was reading texts by mathematicians and computer scientists about mechanical proofs and I noticed that the former were using the word "understanding" while the latter were using the word "reasoning." I did a word frequency count of the texts and found that, indeed, the most common meaningful word cluster in the mathematical texts centered around "understanding," while the most common word cluster for the computer scientists centered around "reasoning."3 The insistence on "understanding" on the part of mathematicians is no surprise; for example, one often reads this quotation by Thurston4:

Mathematics is not about numbers, equations, computations, or algorithms: it is about understanding.

and there are many others of the same nature. But when I turned to the page on "understanding" in the (English translation of) the Dictionnaire des intraduisibles (de Barbara Cassin) I was stunned to find that (here I showed them the page from the book that readers can find in this post).

OPEN QUESTION: So when mathematicians talk about understanding — my title only refers specifically to the "understanding of proofs" because I was under the impression that "proof" is the theme of this seminar5 — are they keeping alive an obsolete 18th-century belief in a hypothetical faculty?6

(Here I turned to comments by AI researchers Geoffrey Hinton, Yann LeCun, and Andrew Ng on what LLMs do or don’t understand, as in this post. And I naturally also digressed on the computational theory of mind, which would see talk of understanding as a mystification of neural processes which, proponents say, can be adequately modeled by digital neural networks.)

Text within this block will maintain its original spacing when published

           Several mathematicians try to characterize mathematical understanding

The experience associated with the hypothetical faculty of understanding is in many ways what characterizes the life of the mathematician. My visit to Bogotá last month coincided with the visit of several logicians, so I was asked to speak at a conference organized for them. Instead of giving a talk I suggested a few questions that might shed light on the role of understanding in mathematics and how it differs from what happens when a machine interprets a proof, say by verifying it formally, or by translating it into another mathematical language, or by using it as the basis for creating a new proof.

Colleagues' responses were very diverse, and raise several new OPEN QUESTIONS, which I paraphrase here:

1. (The imitation game)7 Is there a kind of Turing test for determining understanding of a proof? Would we then say that the machine understands the proof if it imitates a mathematician's understanding successfully? Then can we design the test without first pinning down what we mean by understanding? Or is it purely intuitive?

2. Actual mathematical proofs are rarely read in detail by anyone but the referee. In practice the work of the mathematician typically is the reconstruction of a proof from the statement of the result, with minimal guidance from the text of the proof. The expert knows the problem and the methods and looks for the key new ideas that distinguish the world as it was before the proof from its current situation. These are analogous to turning points in a fictional narrative.

Here I observe that a formal proof has no turning points. Can one assign a degree of importance or novelty to a sentence in a formal language"? I return to this below.

3. Following Wittgenstein,8 one could say that a proof must be surveyable (übersichtlich). This is typically taken to mean what Ian Hacking calls "cartesian," understandable instantly, in the sense of an "Aha!" experience. Felix Mühlhölzer, however, argues that Wittgenstein uses words like “übersichtlich” in “a purely formal sense that has nothing to do with understanding (at least not in the sense of ‘mathematical understanding”) and identifies this with Hilbert's aim when he writes "the fundamental idea of my proof theory is none other than to describe the activity of our understanding, to make a protocol of the rules according to which our thinking actually proceeds." So is Wittgenstein a "leibnizian" rather than a "cartesian," an ally rather than an adversary of the computational theory of mind? This wouldn't make him a friend of mathematicians.

4. Continuing in the vein of Wittgenstein, Xavier Caicedo said one understands a proof when you can draw a picture of it (see the discussion of Wiles's picture below). Is this the same thing as building a model, in Ng's sense? But he also said that understanding is accompanied by pleasure. That this is no small matter is illustrated by the following quotations:

For almost a year and a half, I struggled to make any progress. I understood the individual components, but putting them together seemed like an impossible task. Then, on one day that I vividly remember, everything fell into place. I was so ecstatic that I couldn’t stop working out the details – even during dinner at one of my favorite restaurants! In just a few hours, the problem I had worked on for so long was solved. I felt euphoria that was incomparable to anything I had felt before; it validated the arduous struggle that came before it. (from an application to Columbia's PhD program).

Doing mathematics is like taking a drug. Once you have experienced the buzz of cracking an unsolved problem or discovering a new mathematical concept, you spend your life trying to repeat that feeling. (Marcus du Sautoy)

… it happens suddenly: one direction becomes more dense, or more luminous. To experience this intense moment is the reason why I became a mathematician. (Marie-France Vignéras)

Compare:

Eine falsche Note zu spielen ist unbedeutend, aber ohne Leidenschaft zu spielen, ist unverzeihlich. (quotation from Beethoven, found on the page of a promising young number theorist)

This is the starting point for a political or ethical response to the question of whether machine understanding can reproduce or substitute for human understanding of mathematics, but this is not my assignment today.

5. It's significant that the metaphors for understanding — "I see" or "I grasp" — would be purely artificial and conventional if used by machines without bodies. This is especially visible in the etymology of the French entendre or comprendre.

6. Alex Cruz suggested that a machine's understanding of a mathematical statement or proof could be gauged by its capacity to formulate new conjectures. I can supplement this: if it can apply the methods to propose and solve new problems. This leaves unclear how the machine would distinguish what is general in the methods from the specifics. Is this form of abstraction from an example — presumably a process learned by training, reinforced or not, on (billions? trillions?) of examples, analogous to "inventing the concept of a cat"? Can the method applied once be conceived as an instance of a general concept?

Text within this block will maintain its original spacing when published

         What do you understand when you understand a mathematics lecture?

You can say you have reached a degree of mathematical maturity when you take a lecture as an opportunity to realize how much you don’t understand. But this is not why students attend their teachers’ lectures. One says the purpose of teaching is to impart knowledge but this usually entails understanding, and this is what exams aim to measure. (Here there was a digression on whether AlphaGeometry’s high scores on Olympiad problems means the system understands geometry; the reader who understands the question can fill in the details.) Listeners don’t get graded for attending seminar talks, but they also aim at a kind of phase transition from a state of less to one of more understanding. How is this possible?

I described two kinds of seminar talks. A typical colloquium-style talk is organized around, or culminate in, the presentation of a few results without proof; at the end one hopes the audience understands the context of these results and why they are interesting. A slightly more ambitious colloquium-style talk might give a very broad sketch of a proof of the main result, similar to my outline of a proof of Fermat’s Last Theorem in a Quanta article whose aim was to argue that a project to Formalize Fermat would addd nothing to the understanding of Wiles’s theorem.

A talk for specialists, in contrast, aims to convince experts that you have actually proved what you claim,9 which is only possible if they understand the outline of the proof at the level appropriate to experts. After apologizing a second time and asking what standard my SPHere talk was supposed to meet, I suggested that I am in the middle of a process of generating human understanding10 of a proof, specifically for a talk at the Séminaire Bourbaki next November; by observing this process closely I might be able to shed light on what can or cannot be reproduced by a machine understanding. The proof I need to explain is contained in a paper by S. Y. Chen that establishes a case of a famous conjecture of Deligne that was historically very important. The introduction to such a paper typically includes a section called something like "the outline of the proof" and Chen's paper is no exception:

For Theorem 1.2, the idea of the proof is to separate the dependence on Σ and Π of the algebraicity of the critical values for L(s, Σ x Π).

This is an idea I had developed in my own papers on other cases of Deligne's conjecture, so this is quite familiar to me. The "outline" then reformulates the sentence as an explicit formula (1.1), and proceeds to list a series of constructions Chen uses to establish the formula. From (1.1) Chen derives a new formula called Conjecture 1.1. By working with Conjecture 1.1 in a well-chosen special case he is then able to prove Deligne's conjecture by an inductive argument.

Did you notice that I just conveyed to you an understanding of the proof — both a confirmation of my own understanding, at some level, and an invitation to you to begin the process of understanding? In this form it is barely more than a picture of the proof, as Caicedo called it. At the same time, it's perfectly plausible that ChatGPT could have generated a similar reading of Chen's "outline." Would that qualify as understanding, in Bourbaki's sense?

I suggested that the different kind of seminar talks worked at different “degrees of magnification” (while apologizing for the metaphor) and indeed had prepared a more detailed outline of the steps of Chen’s proof, which I will share in a future post.

Text within this block will maintain its original spacing when published

                                              Understanding as dialogue

It is certain, that when an inference is drawn immediately from an object, without any intermediate cause or effect, the conviction is much stronger, and the persuasion more lively, than when the imagination is carryed through a long chain of connected arguments, however infallible the connexion of each link may be esteemed. It is from the original impression, that the vivacity of all the ideas is derived, by means of the customary transition of the imagination; and it is evident this vivacity must gradually decay in proportion to the distance, and must lose somewhat in each transition. Sometimes this distance has a greater influence than even contrary experiments would have; and a man may receive a more lively conviction from a probable reasoning, which is close and immediate, than from a long chain of consequences, though just and conclusive in each part. Nay it is seldom such reasonings produce any conviction; and one must have a very strong and firm imagination to preserve the evidence to the end, where it passes through so many, stages.

(Hume, Treatise of Human Nature, Sect. XIII, my emphasis)

Hume is pointing to a familiar experience that is not specific to mathematicians but that has driven some of the latter to have given up on understanding, choosing formalization (in Lean or whatever) as an acceptable alternative.11

A few weeks ago I went through a different kind of exercise with a visiting colleague: we both wanted to understand a new paper that derived a conclusion about a question of interest to both of us, but by using completely unfamiliar methods. The introduction gave an outline at the level of my outline of Wiles’s proof of FLT, much too imprecise for us to be able to claim understanding. Without going into any details about the process, which was only moderately successful, the experience strongly suggests that "understanding" is best understood as an aspect of a dialogue.12

Image from vecteezy.com, found online somewhere.

The underlying epistemological issue is to what extent the machines, or the dinosaurs, would have to resemble us in order for us to be able to understand each other.

The underlying ethical issue is what "understanding" of a proof adds to the bare fact of the existence of a proof. Buzzard's initial motivation was to remedy a situation where the existence of the proof of FLT depended on the continued existence of a mortal human being. The existence of this human being is not on its own the kind of knowledge that is valued by mathematicians like Buzzard, although those who take for granted the correctness of the proof they don't understand can use it in their own work. Many mathematicians point to the hypothetical scenario in which a machine produces a proof of the Riemann Hypothesis in a way that no human can understand. This is a result that many mathematicians would be tempted to use, but the existence of such a proof does not settle the question according to the historical norms of mathematics. That this is the case can be seen in the controversy over Mochizuki's supposed solution of the ABC conjecture; even before Scholze and Stix claimed to have found a specific mistake, mathematicians were unwilling to admit it as established mathematics while understanding was apparently limited to a small and suspect group.

Here is my attempt at a conclusion, not because I am especially convinced by what I have to say but because I feel a seminar talk requires a conclusion. What mathematicians call "understanding" is inseparable from the possibility of dialogue, in the sense in which Mazur refers to mathematics as "a long conversation." In other words, "understanding" in the human sense is a diachronic process that cannot be settled by calculation (or by functional MRI). If machines develop something their creators want to call "understanding" of mathematics, it will be incumbent on them (the creators, or perhaps the machines) to explain how this is a surrogate for human understanding.

Silicon Reckoner

Discussion about this post