In the fifth century, Augustine famously incorporates Plato’s realism into the Christian tradition by positioning mathematical objects (and other Forms) as ideas in the mind of God. They are included somehow in the Logos, the eternal Word who, according to John’s gospel, was “in the beginning,” and was both “God” and also “with God” (1:1). Thus mathematical objects remain eternal, necessary, and mind-independent (at least as far as human minds are concerned), but they are now directly connected to God himself and can even be equated with the divine essence.… Augustine’s Christianized reading of Plato becomes the standard mathematical ontology for most of Christian history—for most, but not quite for all.1
Pavia’s Renaissance cathedral is much too grand for its 73000 inhabitants. But its university is one of the oldest in Europe, and centuries before the university was founded, when Pavia was the capital of the Lombard kingdom, the city was important enough to be chosen as the (final) final resting place for St. Augustine’s remains, preserved in a niche at the bottom of the marble Ark pictured above.2
Nowadays Pavia is overshadowed by its huge neighbor Milan, but it retains influential institutions of higher education, including the Istituto Universitario di Studi Superiori, the scene of the 7th international meeting of the Association for the Philosophy of Mathematical Practice, which took place in June of this year.3 This branch of philosophy, which mainly focuses on aspects of mathematics ignored by the so-called Standard View,4 has grown enormously over the last 20 years. This time I was one of a handful of mathematicians to take part, but I expect more will attend the 8th meeting.
My slide presentation is reproduced below, more or less verbatim.
Disclaimer
This is a talk by a mathematician. It is only suitable for a meeting of philosophers as a source of questions a mathematician might pose (and in fact is posing) about the nature of mathematical understanding.
Such questions acquire a certain urgency in connection with a scenario that is often discussed when imagining a future AI mathematics. In the fantasy the promises of Silicon Valley to “solve mathematics” lead to a situation where a machine has proved an important conjecture – in this scenario it’s usually the Riemann Hypothesis – but the proof consists of a million pages of computer code.
Is it possible to talk of “understanding” in such a situation?
I was pleased to read the word “understanding” in the titles of several lectures and in the abstracts of many more, but also alarmed to be reminded that the subject of my remarks is already an established theme in philosophy of mathematical practice and in philosophy more generally, with its own extensive literature that I have not read.
It occurs to me that, as I wrote on my newsletter,
Principle: If you try to define “understanding” unambiguously you will probably bump up against the paradox that you have set yourself the task of making the word comprehensible to zombies [i. e., to entities devoid of understanding].
Understanding vs. Reasoning
In the AI industry “understanding” is conflated with “reasoning;” the latter lends itself to measurement by way of computation.
When mathematicians talk about understanding – when Thurston, for example, says that
Mathematics is not about numbers, equations, computations, or algorithms: it is about understanding.
– they are usually talking about an immediate subjective experience. Nevertheless, this is paired with the belief that this experience is a reflection of some objective change of state, so that one can refer objectively to a moment in which understanding was achieved.
Both aspects are reflected in the responses I obtained when I recently asked mathematicians to explain what they mean by mathematical understanding, followed by comments.
What understanding is, according to mathematicians.
The Imitation Game: Is there a kind of Turing test for determining understanding of a proof? Would we then say that the machine understands the proof if it imitates a mathematician’s understanding successfully? Can we design the test without first pinning down what we mean by understanding?
In this connection, the following observation of NYU computer scientist and AI specialist Ernie Davis (private communication) may be relevant:
The best chess-playing programs cannot recognize and name concepts like a “fork” or a piece being “pinned” in chess, and cannot read and discuss an article in a chess journal, but nonetheless they play very good chess.
Interactive understanding: Actual mathematical proofs are rarely read in detail by anyone but the referee5. Typically the mathematician reconstructs a proof from the statement of the result, with minimal guidance from the proof text. The expert, who knows the problem and methods, looks for key new ideas that distinguish the world as it was before the proof from its current situation. These are analogous to turning points in a fictional narrative.
I add that a formal proof has no turning points. Can one assign an importance or novelty score to individual lines in a 50 million line formal proof of the Riemann Hypothesis?
Wittgensteinian surveyability and representability by a picture: Following Wittgenstein, one could say that a proof must be surveyable (übersichtlich). This is typically taken to mean what Ian Hacking calls “cartesian,” understandable instantly, in the sense of an “Aha!” experience, which is taken to be a necessary (but not sufficient) objective index of understanding.
Continuing in the vein of Wittgenstein, the logician Xavier Caicedo said one understands a proof when one can draw a picture of it.
Pleasure: Caicedo also said that understanding is accompanied by pleasure. That this is no small matter is illustrated by the following quotation from an application to Columbia’s PhD program:
For almost a year and a half, I struggled to make any progress. I understood the individual components, but putting them together seemed like an impossible task. Then, on one day that I vividly remember, everything fell into place. I was so ecstatic that I couldn’t stop working out the details... I felt euphoria that was incomparable to anything I had felt before....
If pleasure is a necessary aspect of understanding then either the mechanization of mathematics will have to be reengineered or else one has to assume that understanding is not a requirement for mechanization.
Understanding as metaphor: It’s significant that the metaphors for understanding – “I see” or “I grasp” – would be purely artificial and conventional if used by machines without bodies. This is especially visible in the etymology of the Italian capire or comprendere.
(Some might want to argue that understanding is a misleading name for information processing, and that machines are an improvement on humans in that they can process information without relying on metaphors.)
Functional test of understanding: A machine’s understanding of a mathematical statement or proof could be gauged by its capacity to formulate new conjectures.
I can supplement this: if it can apply the methods to propose and solve new problems.
This leaves unclear how the machine would distinguish what is general in the methods from the specifics. Is this form of abstraction from an example analogous to Google “inventing the concept of a cat” by statistical pattern recognition? Can the method, applied once, be conceived as an instance of a general concept?
Indices of understanding
Retrospective understanding
Generalizability is highly underdetermined. Understanding becomes retrospective when, for example, the Langlands program exhibits unified understanding of previously disparate observations in number theory.
In this sense the history of mathematics shows that understanding is treated as a process that is never complete. A proof always present the question not only of its own generalization but of why the proof was possible in the first place.
Some indices of understanding
The use of temporary names and notation
The use of conventional names and notation
Division of articles into sections and subsections
Reference in the course of a proof to a research program
None of these was mentioned by the mathematicians quoted above.
Communication and understanding
The lecture room
No one seems to have made anything of the observation that mathematical communication, in the seminar or lecture room, or in front of the blackboard, proceeds by principles quite different from those formalizers consider legitimate. Nevertheless, something is being transmitted from speaker to listener in these interactions.
Participants in this process are likely to refer to “understanding.” Is this philosophically meaningless, or purely psychological and therefore out of bounds for philosophers?
Strategies to convey understanding
I briefly analyze a few examples of (my own) strategies to make a proof understandable.
From my talk last week
(This is the main theorem of work in progress with Harald Grobner, Jie Lin, and A. Raghuram. The next slide indicates one step in the proof.6)
The philosophical lesson of this excerpt from a lecture I gave in Germany in early June will be drawn momentarily, following a discussion of the proof of Fermat’s Last Theorem, in six steps, excerpted from a colloquium talk I gave several times in April and May.
Fermat’s Last Theorem
Understanding is multiple
Much of what we call proving is calculation. The step illustrated from my talk last week is simple arithmetic, the least esoteric form of calculation. The kind of “understanding” involved in recognizing the correctness of a calculation (according to relevant rules) is very different from that required to introduce an auxiliary construction, like the Frey curve (or the step preceding the proof of Chen’s formula).
Machine understanding
So looking for understanding in machines may be misguided. Machines can perform simple (and complicated) arithmetic and can presumably be programmed to try billions of algebraic manipulations per second in order to find one that extracts an expression for the quantity of interest. On the other hand, recognizing that choices can be made to simplify the other three terms, and even more so hitting on the Frey curve and recognizing its special properties (in the steps omitted), requires what mathematicians would call a deep understanding of the field and what Polanyi would call tacit knowledge.
Accounting for this kind of understanding requires a philosophy of practice far removed from logic. Can machine learning substitute for embeddedness in a disciplinary practice?
Incomprehensible proofs
Algorithms to generate explanations
It is difficult to (mathematically) define interpretability. A (non-mathematical) definition of interpretability that I like by Miller (2017) is: Interpretability is the degree to which a human can understand the cause of a decision. Another one is: Interpretability is the degree to which a human can consistently predict the model’s result. (Molnar)
We want to explain the predictions of a machine learning model. To achieve this, we rely on some explanation method, which is an algorithm that generates explanations. An explanation usually relates the feature values of an instance to its model prediction in a humanly understandable way. (Molnar, my emphasis)
Formalization of the Philosophy of Mathematical Practice
Now let’s assume the fantasy of mechanization of mathematical proof is realized, in the strongest possible form. A machine learning model has been trained so well on human proofs that it produces formal proofs of unlimited length that solve 10% of a random collection of unsolved problems. Christian Szegedy (xAI) predicts that this will happen by 2029. However, these proofs are humanly incomprehensible.
Following the lead of the program to develop methods for interpreting the means by which a machine learning algorithm produces its output – and this is a very active research program that has attracted a good deal of funding – the Philosophy of Mathematical Practice could attempt to mechanize the interpretability of such proofs.
Automatic translation of formal proofs
Patrick Massot, a leading member of the Lean community, has developed software called Blueprint that translates formal proofs in Lean automatically.
Each ⊕ is an invitation to expand the justification, while each oval indicates the state of progress toward the proof’s goal.
However the formal proof was generated, the illustration makes clear that the formalizers see a proof structured by a collection of nested subgoals. Is this structure implicit in what we mean by “proof”?7
Kevin Buzzard and Terence Tao have both confirmed that, as Tao wrote,
The arrow of causality currently is
(traditional informal human-written proof) → (semi-formal blueprint) → (Lean formalization).
The reverse arrow is not yet operational in any serious sense, and those working in the area seem to envisage a group project, consisting of humans, if a human translation of a formal proof is needed.
The content of a proof
This raises interesting logical issues about the equality, as Buzzard would put it, between the informal and formal proofs; Massot wrote that “the content is the same,” but this clearly begs all the philosophical questions.
The ability to imagine a machine that can create incomprehensible proofs of interesting theorems does not necessarily entail a theoretical possibility. What I mean is that the training process may incorporate or be modeled on the process of understanding. AI theoreticians Yoshua Bengio and Nikolay Malkin refer to subgoals in a recent text entitled
“Machine Learning and Information Theory Concepts towards an AI mathematician.”
Does the AI mathematician imitate the human? Or vice versa?
Formalization, and computer science more generally, are not entirely abstract but are grounded in historical mathematical practice. Does this make the million-page proof a chimera?
The philosopher of the future faces the paradox: has technology reproduced the pre-existing process of mathematical understanding or has mathematical understanding been modified by the adaptation to technology? In other words, is mathematical understanding itself fundamentally dependent on practice?
A formal proof is necessarily linear or sequential. The same is assumed to be true of human proofs; one wants to avoid circularity. But is this actually the case?
Equality, and what one actually means
[Mathematics] doesn’t bear on human experience, human progress, ordinary human language. . . . It’s a branch of zoology. The great ape branch. That’s why men are teaching apes to communicate. So we can discuss mathematics with them.
(from The Names, by Don DeLillo)
(Session talks lasted 30 minutes, with 10 minutes for questions, so I left some material for the end in case I had time. What follows is essentially a continuation of this earlier post, based on an expansion by Kevin Buzzard of the talk on which the earlier post was based.)
When one is forced to write down what one actually means and cannot hide behind such ill-defined words as “canonical,” or claims that unequal things are equal, one sometimes finds that one has to do extra work, or even rethink how certain ideas should be presented. ... I am certainly not arguing that the literature is incorrect, but I am arguing that many arguments in the literature are often not strictly speaking complete from a formalist point of view. (Emphasis added.)
Buzzard is concerned that, two objects that are not strictly equal according to the axioms of set theory are treated as such by mathematicians, using the word “canonical” to describe the existence of a mental operation confirming their identity.
There is a notion of “what one actually means” that is not analyzed.
Nevertheless, it is presumed that it can only be communicated to other humans if it can be communicated to machines, i.e. formalized.
In order to make mathematics suitable for formalization, it is “dangerous” (Buzzard’s word) to leave any ambiguity in any of the definitions or proofs. To quote Buzzard again:
This theorem cannot be applied without reading the proof: this is in fact a sure fire sign that it is not actually a theorem. From a formalist point of view, theorems are mathematical objects which are completely encapsulated by their statements.
(I stopped at this point, leaving Buzzard’s statement as a comment on the “formalist point of view.”)
Steven D. Boyer and Walter B. Huddell III, “Mathematical Knowledge and Divine Mystery: Augustine and his Contemporary Challengers,” Christian Scholar’s Review, April 15, 2015.
Boethius has a tomb just below Augustine’s, but although this is mentioned in Dante’s Paradiso I was told the later philosopher is not actually in his tomb.
You are probably wondering how the philosophers felt about spending a week in a country governed by the “post-fascist” Fratelli d’Italia. In fact, the issue didn’t arise even once… which is probably more alarming than anything else I might have reported.
…known in this newsletter as the Central Dogma, the “standard view” emphasizes the supposed logical underpinnings of mathematics and encourages the belief that human mathematicians can (or inevitably will) be replaced by machines.
and even then…
More precisely (in case specialists are reading this essay), this is an analogue of a recent theorem that Chen uses to prove Deligne’s Conjecture on special values of the L-functions of symmetric powers of modular forms. We expect to be able to prove this but for now it has to be treated as work in progress.
My talk in Pavia was preceded by one by Maximilian van Remmen and Benjamin Zayton entitled Mathematical proof and inquiry. While listening to this talk I was startled to see that the “zetetic paradigm” they propose as an alternative to the vision of proof encapsulated by the Central Dogma included two features, very roughly designed to protect proofs from local and global challenges, that correspond precisely to the ⊕’s and ovals of Massot’s Blueprint. I mentioned this in my own talk and was told afterwards that they had been following the Lean community and had also seen the analogies.