MECHANIZATION OF RIGOR

A presentation at the PhilMath Intersem in Paris in June 2019

Sep 14, 2021

PhilMath Intersem is an annual seminar in philosophy of mathematics, a joint initiative of the University of Notre Dame and the laboratory SPHERE at the Université de Paris (formerly Paris 7, formerly Paris-Diderot), that for years was led by Notre Dame professor Michael Detlefsen. The June 2019 edition, devoted to "the nature and place of rigor in mathematics," was Detlefsen's last before his death later that year. When I was invited to speak at the seminar I naturally asked to share the platform with Kevin Buzzard, a number theorist at Imperial College, London, whom I first met when he was Richard Taylor's student at Cambridge. Over the past few years Kevin and I have had a lively, mostly private, exchange about the program to formalize mathematics, whose recent successes were reported in a previous newsletter. Kevin invited his collaborator Patrick Massot, a topologist at Orsay. The title for our joint presentation was "Mechanization of rigor." Here is the abstract:

Most mathematicians have been generally satisfied in recent decades with the state of rigor, as enforced by the peer review process as well as by informal means. Increasingly, however, some proofs have become so long or complex, or both, that they cannot be checked for errors by human experts. In response, a small but growing community of mathematicians, collaborating with computer scientists, have designed systems that allow proofs to be verified by machine. Do the development of automated proof checkers and the promise of effective automatic theorem provers, represent a turning point in mathematical research? If so, how will future human mathematicians adapt — or will there still be a place for human mathematicians?

A month before our appearance at the seminar I proposed the following plan to Kevin and Patrick:

I was thinking I would start with a 15-minute presentation of some of the historical and philosophical background to automated proof verification. Then the two of you can spend 30-60 minutes explaining your motivation and the nature of the system you use, and perhaps some of the alternative systems, bearing in mind that the audience is not used to formal systems (at least I'm not). My guess is that Kevin will have a lot to say about motivations, so Patrick might want to focus on the technicalities; but you can do this any way you like. I can then speak for 15-30 minutes about some of the philosophical issues this raises (I attach my text from 2007, which I assume Patrick hasn't seen). After that the three of us can "discuss whatever comes to our minds in reaction," and the rest of the time would be open to the audience. This is looking like a 3 hour session.

On June 20, 2019, I met Kevin at Gare du Nord. After lunch at one of the South Indian restaurants near La Chapelle we headed past a wall inscription that could serve as a Turing test and a racist Historic Monument, before arriving at the Rive Gauche campus that was not yet called Université de Paris. There we were greeted by philosopher Andrew Arana and the other participants in the seminar; Detlefsen was already too ill to travel and attended from Indiana through a somewhat unreliable video link. The meeting didn't work out according to plan, however; Patrick focused so enthusiastically "on the technicalities" that the clock ran out and there remained essentially no time for the philosophical discussion that was the seminar's primary motivation.

My initial plan for this week was to review Christian Szegedy's June 2020 report on progress in "Autoformalization and General Artificial Intelligence." After going through my notes, I decided it would make more sense to share what I was thinking on the subject around the time I first heard of Szegedy's work, which was in a message Kevin sent me while we were planning our Paris presentation. The text that follows is a version of my notes for PhilMath Intersem, edited for continuity. Some of the material — the quotations from Atiyah and Huizenga, or the reference to Zamyatin's We — has already appeared on this newsletter and will appear again; my apologies, as before, for redundancy. Bear in mind that this was a presentation primarily intended for philosophers.

As always, discussion is welcome on the MWA blog.

Debates on the role of computers in mathematics are usually framed by stark oppositions: desirable or undesirable, possible or impossible, or by the question: is is really mathematics? Rather than trying to come to conclusions on these rather sterile alternatives, I'm interested in shifting the terms of the debate to questions that are more promising for philosophical consideration.

Mechanical proof assistants — for the purpose of proof verification as well as for proving theorems — have been around as long as computers. Mathematicians have only been paying serious attention over the last 20 years or so. Several highly respected mathematicians independently came to the conclusion that the form of peer review that was institutionalized over the last few generations has reached a kind of crisis, in which it can no longer be seen as a reliable warrant for the correctness of mathematical proofs. This is either because the proofs are too long or complex to be scanned by a human reader or because it fails to detect fatal mistakes. Kevin Buzzard has expressed different concerns — much mathematical knowledge is implicit (as many philosophers have recognized) but often so localized (even within the mind of a single mortal mathematician) that it cannot be considered reproducible. More positive reasons to welcome mechanical proof assistants have also been expressed — to help with the tedious parts of the referee process, or of the proofs themselves, or for heuristic interaction at various stages in seeking proofs (Gowers). Patrick Massot has written to me that "Having formalized definitions, statements and proofs is also very useful to learn, communicate, and teach mathematics" and I hope he'll elaborate.

Before the experts speak up, I would like to frame the questions I see raised by two aspects of mechanization of rigor — automated proof verification and formalization of proof, on the one hand, which I see as linked, and automated theorem proving on the other. The unreliability of human perceptions is one of the oldest themes in philosophy (a 14000-word entry on the Stanford Encyclopedia of Philosophy). Special properties of deductive reasoning are supposed to confer immunity on mathematics from this unreliability, but since most actual proofs contain mistakes we see that this is insufficient. Several alternative paradigms are proposed to account for this discrepancy.

a. (Formalist) Actually existing proofs are interpreted as approximations to the ideal genuine proof, which is written in a formal and unambiguous axiomatic system. Most philosophers of mathematics take this seriously, some mathematicians resort to this claim when backed into a corner, others find it completely irrelevant (Reuben Hersh's Front and Back), and most just ignore or pretend to ignore the question.

b. (Conventional) I guess this comes closest to Wittgenstein's position: that proper procedure is based on the process of learning behavior within the form of life of the society of mathematicians. (This version can't account for the complexity of the kinds of behavior that are learned nor of the complexity of interactions among mathematicians faced with a putative proof.)

c. (Dialectical) Reliability as well as meaning are established gradually through a process of proofs and refutations, as in Lakatos.

1. Automated proof verification, as I see it, is an attempt to realize the first paradigm. Its prospects for success can be measured both technically (does it meet the expectations of the program?) and epistemically (to what extent does the mathematical community adhere to the formalist paradigm?) But it also raises a serious question: why are we ready to trust machines more than we trust human beings? The underlying question is that of the different natures of machines, including possibly-existing-machines, and human beings. One view is of machines as embodiments of human reason, without the imperfections of the latter. Another is of machines as improvements on human beings in many or all respects. Machines don't get tired, machines don't have prejudices, machines don't lie. A footnote from my book Mathematics without Apologies illustrates the ideological mindset of some proponents:

On pp. 326-327 (of his 2001 book Mechanizing Proof), Donald MacKenzie quotes a computer scientist who argued that mathematicians' failure to perform proofs formally, "by manipulating uninterpreted formulae accordingly [sic] to explicitly stated rules" proves that "mathematics today is still a discipline with a sizeable pre-scientific component, in which the spirit of the Middle Ages is allowed to linger on" [Dijkstra 1988]1, a philosopher (Peter Nidditch) who claimed that "in the whole literature of mathematics there is not a single valid proof in the logical sense," and another philosopher who doubts "that mathematics is an essentially human activity" [Teller 1980]. Already in 1829, Thomas Carlyle could complain of "the intellectual bias of our time," that "what cannot be investigated and understood mechanically, cannot be investigated and understood at all" and that "Intellect, the power man has of knowing and believing, is now nearly synonymous with Logic, or the mere power of arranging and communicating." [Carlyle 1829].

2. The need for reliability has several aspects. First, errors propagate. Here's an anecdote that involves me personally. Much of the theory of automorphic forms was built on the basis of the twisted version of Arthur's Trace formula, worked out by Clozel, Labesse, and Langlands over the course of the year 1983-84 in the so-called Morning Seminar at the IAS. This was recorded in mimeographed notes that were never published but were cited extensively. More than 25 years later, Clozel and Labesse, and then just Labesse, decided it was time to put the finishing touches on the notes. All the chapters except the last one were polished and ready to go, but Labesse was unable to complete one of the arguments in the final chapter, by Langlands. After 6 months he asked Langlands, who said he didn't remember the missing steps. Finally he asked Waldspurger, who took a look and decided that les arguments que Langlands suggère sont probablement "vrais" mais semblent impossible à établir avec les connaissances actuelles.

Between July and November 2011 the whole theory, including the book Labesse and I had edited with Clozel and Ngô and just published, as well as all my work of the past 15 years with Taylor, and much else besides, was therefore missing a crucial foundation. Finally,

Comme prévu, Waldspurger a tordu le problème. Il vient de m'envoyer une rédaction de 50 pages TeX pour boucher le trou. …

This sort of thing is a danger for the stability of the whole undertaking.

Then there is the argument that we have a moral duty to guarantee the correctness of our proofs. Our rare privilege of doing what interests us has to be earned; we are rewarded in proportion to the difficulty of what we are supposed to have achieved, and therefore cheaters and free-riders need to be weeded out. This ties in with the political point to be made below.

3. A third question is whether machine mathematics and human mathematics are qualitatively different.

Quotation from Stephanie Dick: "In developing theorem-proving software, practitioners endowed mathematical objects with computational properties - they were translated into dynamic, algorithmic, discrete things."

Question: is that what they were already?

"These tools endowed existing mathematical systems and ideas with new and different meaning."

Question: Did the meaning change?

Donald MacKenzie, interestingly, sees rigor and formalism as contrasting values:

A formal proof is a finite sequence of ‘well-formed’ (that is, to put it loosely, syntactically correct) formulae leading to the theorem, in which each formula is an axiom of the formal system being used or is derived from previous formulae by application of the system’s rules of logical inference. [...] Rigorous arguments, in contrast, are those arguments that are accepted by mathematicians (or other relevant specialists) as constituting mathematical proofs, but that are not formal proofs in the above sense.

(MacKenzie, “Computing and the Cultures of Proving”)

4. Are human flaws perhaps indispensable for mathematics?

The sociologist of science Harry Collins has argued that because machines are not social beings, much of human knowledge will be inaccessible to them as knowledge is inherently social in character. According to Collins, machines will only ever be capable of participating in knowledge-making in domains where human activity in them has been mechanized and rule bound already in advance. Game playing and mathematics were such domains and that is what made them viable choices for Artificial Intelligence research. (Quotation from Stephanie Dick, 2015 Harvard dissertation.)

My starting point in thinking about proof assistants is to ask whether Collins's image of mathematics is accurate. Is it an area of knowledge that is not inherently social? Has it been mechanized in advance? This ties in with my longstanding concern that this view seems to dominate in philosophy of mathematics, and that sociology of mathematics (as well as literary disciplines) engage with this view, rather than with views expressed by mathematicians themselves, which definitely do not see mathematics as an already mechanized activity.

5. Finally, and most importantly, is the question of the purpose of mathematics. This is both an ethical and a political question. Proof assistants have been developed within computer science to verify that programs will function as intended. This clearly has important social as well as commercial implications, and no one can argue with its relevance.

The problem is the attempt to extend this model to mathematical research. If we subscribe to the political thesis that the purpose of human activity is to produce stuff (for the market, including the "free market of ideas"), then the position of mathematicians is only tenable if what we produce is free of flaws. But the implication, that (according to Kevin) is shared by scientists at Google (e.g. Christian Szegedy), is that eventually machines will be better at producing mathematical "stuff" than human beings. Given how Facebook has reformed the practice of friendship, I think we should be reluctant to give Google the authority to determine the purpose of mathematics.

An alternative view in Homo Ludens Johan Huizinga

quotes a Dutch proverb: "It is not the marbles that matter, it's the game." To those of my colleagues who predict that computers will soon replace human mathematicians by virtue of their superior skill and reliability in proving theorems, I am inclined to respond that the goal of mathematics is to convert rigorous proofs to heuristics. The latter are, in turn, used to produce new rigorous proofs, a necessary input (but not the only one) for new heuristics.

(from Mathematics without Apologies)

Deep learning has transformed machine perception in the past five years. However, recognizing patterns is a crucial feature of intelligence in general. Here we give a short overview on how deep learning can be utilized for formal reasoning, especially for reasoning in large mathematical theories. The fact that pattern recognition capabilities are essential for these tasks has wider implications for other tasks like software synthesis and long term planning in complicated environments. Here is will give a short overview on some methods that leverage deep learning for such tasks.
Christian Szegedy is research scientist at Google, working on deep learning for computer vision, including image recognition, object detection and video analysis.

(Deep Learning for Formal Reasoning, from Re-Work website, 2015)

The notion that thinking is or can be a mechanical process is much older.

the human body is, manifestly, a machine disposed to the perpetuation of contemplation. (Leibniz, letter to Gackenholtz, 1701)

Nowhere does Leibniz say that the Chinese are themselves automata, though he does seem to share in the widespread view that in Chinese history conscious knowledge of the truths of reason has been lost, even though the souls of Chinese people remain rational simply to the extent that they are human. The result is a culture that expresses rationality in what it does, without reasoning about the process or the causes of its manifestations of reason. To this extent its mechanical reason is comparable to that of an adding machine or of the more complex logic-processing machines Leibniz hoped would be developed someday soon.

(Justin E. H. Smith, to appear).

DISCUSSION

The lie of the ideal so far has been the curse on reality; on account of it, mankind itself has become mendacious and false down to its most fundamental instincts—to the point of worshipping the opposite values of those which alone would guarantee its health, its future, the lofty right to its future.

(Nietzche, Ecce Homo).

My main concern, once again, is to defend mathematics from an ideal that can be assimilated to the thesis that the purpose of human activity is to produce stuff, ultimately for the profit of those identified as decision-makers. Most discussions of mechanization of mathematics focus on the technical aspects: what is possible (see Alpha Go), what methods are the most effective or promising, and of course which can be implemented at minimal cost (of whatever kind). This kind of discussion is only legitimate on the basis of a shared understanding of the purpose of mathematics; in particular, a clear distinction between the mechanical verification of proofs that software works as promised and the mechanical verification of proofs in pure mathematics.

From the perspective of software, a proof is effectively a logical analysis of a physical object, or at least of an object (the program) that has the function of manipulation of physical objects (possibly electrons). Thus the proof really is nothing more than the series of instructions and any intuitions connected to the construction of the proof are incidental to its function.

The proof in mathematics, on the other hand,

is much more than a list of instructions connected by logical operations. As Atiyah wrote,

I may think that I understand, but the proof is the check that I have understood, that’s all. It is the last stage in the operation—an ultimate check—but it isn’t the primary thing at all.

Here Atiyah agrees with a great many mathematicians who have expressed themselves on the topic, in subordinating the proof to understanding. Insofar as a mechanically verified proof adds nothing to the understanding — and even more a mechanical proof that cannot be understood in a human way — this is either a distraction from the human purpose of mathematics or a signal that mathematics is henceforth no longer to be considered a human activity. Here are some passages from Mathematics without Apologies, with the footnotes removed:

Voevodsky worries about the glue. Bombieri's "intuition" or Deligne's "understanding" do not offer the "absolute certainty" at which Descartes was aiming. In developing his "conceptual notation" [ Begriffschrift ] for mathematics, Frege worked hard "to prevent anything intuitive [Anschauliches] from penetrating ... unnoticed." For "the working mathematician" understanding and intuition are real. The project of mechanization, which can be traced through Frege back to Leibniz, proposes to substitute a model of real closer to Frege's, one that lacks the defects of subjectivity. "I never would have guessed," wrote Mike Shulman in 2011 , "that the computerization of mathematics would be best carried out ... by an enrichment of mathematics" [my emphasis] along the lines of Voevodsky's univalent foundations project, to which Shulman is an active contributor.
I italicized the definite article in the Shulman quotation to emphasize how mechanization of mathematics - with or without human participation in the long term-is viewed in some circles as inevitable. To my mind, the fixation on mechanical proof checking is less interesting as a reminder that standards of proof evolve over time, which is how it's usually treated by philosophers and sociologists, than as a chapter in the increasing qualification of machines as sources of validation, to the detriment of human rivals. In Yevgeny Zamyatin's novel We, whose protagonist is a mathematician named D-503, the scourge of human subjectivity -what virtual reality pioneer Jaron Lanier calls "the unfathomable penumbra of meaning that distinguishes a word in natural language from a command in a computer program - is ultimately eliminated by an X-ray operation that turns people into machines.
Who finds machines more appealing than humans, and for what ends? Most mathematicians have long since recovered from the exhausting journey of the Foundations Crisis, but certification of mathematical knowledge is still a preoccupation among the "mainstream" philosophers of Mathematics, for whom participation by human mathematicians is optional. Convinced, like Hume, that no ought can follow from an is, philosophers of Mathematics skip straight ahead to the ought, disdaining the is as unfit for philosophical consideration. When I insist, like philosophers of mathematics (from my prephilosophical internal vantage point) on getting the is right-the "what" question-I will in fact be addressing an ought that overlaps with the ought of the philosophers of Mathematics but is by no means identical, nor is the overlap necessarily very substantial.

In an article entitled “Estimating the Cost of a Standard Library for a Mathematical Proof Checker,” Freek Wiedijk refers in his article on to a QED utopia, which from my perspective looks uncomfortably like Zamyatin's. So what about that "unfathomable penumbra" and does mathematics really need it? Anything "unfathomable" is probably difficult for philosophers to judge, but as a substitute, my article Do androids prove theorems in their sleep? argued at length that human proofs have a narrative structure. They are built with turning points, key steps, acknowledgment and overcoming of obstacles, and so on. A proof tells a kind of story. (Dana Scott's preface to Wiedijk's collection illustrates another kind of proof, which is purely visual, in this case of the infinite descent proof of irrationality of the square root of 2. In that case a legend is needed to read the proof into the picture.)

References as in MacKenzie’s book

Silicon Reckoner

MECHANIZATION OF RIGOR

A presentation at the PhilMath Intersem in Paris in June 2019