…when it is a question of … forms in which the proletarians carry out before our eyes their organization as a class, some are seized with real fear and others display a transcendental disdain.1
This week’s mildly apocalyptic title was deliberately chosen to divert readers’ attention briefly from the unrelenting and genuinely apocalyptic news that has been bombarding us with increasing urgency. I place it in quotation marks to indicate that I do not expect that the events I am about to report signal the end of human mathematics, but rather to report on some events that some naive readers might take as signs that the vision of the replacement of human mathematicians by machines, promoted by some leading general scientific publications, is on the verge of realization.
Solving the peer-review crisis: a modest proposal?
From a recent article in Inside Higher Ed:
academic publishing has long been a delicate system that operates—tenuously—on goodwill, in the form of comprehensive, unpaid article analyses from expert volunteers. But the pandemic has pushed this system to breaking, or close to it. With academics’ professional and personal lives disrupted in so many ways for years now, this kind of labor is increasingly harder to source: journal editors across fields say scholars are significantly less likely to accept article-review requests, if they respond at all, and (to a lesser degree but concerningly nonetheless) they are more likely to return reviews that are late or even rushed.2
“I just don’t know what we’re going to do. The model has to change,” wrote an editor of a prominent psychology journal.
The fact that “the vast majority of the academic workforce is now employed off the tenure track” is “at the very root of the peer-review crisis.” The article quotes a tweet by Ryan Cordell, associate professor of information sciences and English at the University of Illinois at Urbana-Champaign:
there’s no secret, massive reserve of refreshed & energized peer reviewers who could sweep in & fix things—it’s burnt out folks all the way down … the real answers are, like they are across labor sectors in 2022: hire more people, give them fairer contracts, reduce exploitative workloads.
Cordell and the article’s author missed an alternative “real answer”: toss the burnt out adjunct proletarians on the street and replace them with machines. It remains to be seen how this would work in psychology, but mathematics may be on the verge of creating a new model.
A Bloomsday conversation
“We are still far from having an automated mathematical journal referee system,” Tom Hales wrote in 2014, “but close enough to propose this as a realistic research program.” For Dan Grayson, writing in 2017, replacing human referees by machines
Grayson’s anticipatory expression of relief, already quoted here, referred specifically to proposals on mechanical verification based on Voevodsky’s Univalent Foundations. Five years later colleagues are more likely to predict compulsory automated verification of proofs written in Lean or another proof assistant. And I predict that you’ll soon be reading in the likes of Nature or Quanta about arithmetic combinatorist Thomas Bloom’s remarkable recent exploit, the first hint that Grayson’s wish may someday be fulfilled.
It never hurts to talk about mathematics in an essay devoted to mathematics, and it’s a special pleasure to report on another of Bloom’s accomplishments, more conventionally mathematical: his proof of the conjecture of Erdős and Graham that, if A is any subset of the natural numbers (positive integers) with strictly positive density, then there is a finite subset of (distinct) elements a(1), … , a(n) in A such that
1/a(1) + 1/a(2) + … + 1/a(n) = 1
Euclid could have understood the conjecture and could also have proved that no such sum exists if A is the set of prime numbers — and so can you, entirely without words and with nothing more than a mental picture — and this already shows that the set of primes in the closed interval [1,x] grows more slowly than a positive multiple of its length as x goes to infinity.3 The colleagues who were discussing Bloom’s theorem, in a Paris café after a seminar on June 16, 20224 were all algebraic rather than analytic number theorists, and therefore not conversant with Bloom’s methods, but agreed that the result is not only striking in its simplicity but also would be difficult to believe if we hadn’t all heard that it had been proved.
That it had in fact been proved was announced six months ago. It was the manner of the proof’s confirmation that revived it as a topic of conversation:
Grayson is a prophet. Oblivious and smugly indifferent to the fact that yours truly, who was sitting at the same table, had over the past year self-published some 75000 words on the meaning and ramifications of mechanization of mathematics, one colleague concluded from this news, and from the success of AlphaGo in overcoming his deeply-rooted skepticism about the possibility of automating Go, that our generation would likely be the last not elbowed aside by mechanical competitors. Acknowledging the truism that it’s not enough for a machine to be able to prove true theorems — that the theorems also have to be interesting — he proceeded to share his conviction that interestingness could also be quantified and built into the robots5 destined to sit at our desks in the CNRS-funded labs of the future. To my objection that different communities of mathematicians find different things interesting, he expressed his transcendental disdain with a Gallic shrug.
Here’s where a kibitzer would butt in, if this text were a conversation in a Paris café, to remind everyone of the
Important distinction: Formal verification (FV) is not (necessarily) the same thing as Automated Proof Verification (APV, possibly obsolete), a (human) hands-off version of FV which in turn is very different from Automated Theorem Proving (ATP).
When this happens to you, I suggest you remind the kibitzer of the
Important empirical correlation: Conversations about FV often take an apocalyptic turn centering on the prospect of replacing humans by ATP.
And so it was in that Paris café on Bloomsday 2022.
First they come for the proletarians…
A diligent referee is supposed to verify that every symbol is in its place and that a sequence of deductions, like this one,
— the largest block of Bloom’s manuscript not interrupted by natural language text — is fully justified. It’s fair to ask: why should our junior, often untenured colleagues ruin their eyes in the hunt for typos as well as unwarranted assertions rather than spending their time generating their own strings of formulas in the hope of contributing as authors rather than referees to the the peer-review process?
In a very real sense mathematical peer-review is performative: it contributes to bringing into being the institutions in whose names it is supposedly acting. Sociologist Christian Greiffenhagen, whom I quoted in the last installment, captures this well in his interviews with mathematicians:
despite acknowledging the problems of peer review, almost all of my interviewees were keen to keep it as “some sort of quality assurance” [91] or “hygienic practice” [67] without which “math will be a jungle”.6
And the process effectively creates the operative standards for correctness:
[A 2010 survey] found that only half of the editors expected referees should check the correctness of all the details of a proof. … Editors [interviewed by Greiffenhagen] wanted referees “to convince themselves, it’s most likely to be true” [9], so that they “are pretty confident that is really right” [19]. In that sense, referees were doing a “plausibility check” [84] “with the hope to catch mistakes” [27].7
So if in a published paper you see a list of calculations like the one copied above — or if you read the five pages of almost uninterrupted calculations that form the climax of one of my own (co-authored) papers now undergoing peer review — you may want to check them yourself.
“no one pretends that papers certified as correct are correct” [61] … “being published in a journal is not a certificate for correctness” [59].8
Greiffenhagen goes on to conclude that “it is ultimately the community as a whole that validates the correctness of results. As my interviewees put it: ‘But because of the way mathematics works, the community as a whole has also some responsibility of assessing mathematics because all mathematics is being assessed by others.’”
It should be stressed that the “crisis” in peer-review described above refers to the difficulty of finding referees and not to their failure to sustain disciplinary standards. This is no less true in mathematics than in psychology; none of the editors interviewed by Greiffenhagen worried that the propensity of referees to let the occasional mistake slip through undermined the profession. Automated Reasoning specialist John Harrison, however, finds “[t]he fallibility of such a ‘social process,’” as described by Greiffenhagen’s informants, to be “well-known” and “thoroughly lamentable,” evidence of the “inadequacy of traditional peer review.”9 Harrison is speaking for what appears to be a rather broad cross-section of the computer science community, which (as I previously reported) has been kvetching consistently about the various inadequacies of mathematics since the dawn of the computer age. He also speaks for the philosophers who subscribe to the Central Dogma, as well as a small but growing minority of mathematicians, including but not limited to participants in the Xena Project. Reading comments by the latter I find it difficult to disentangle their sense that peer-review is in crisis from their manifest excitement at participating in a novel and explosively expanding research program that appeals to whatever in their personal experience led them to choose10 mathematics in the first place. The narrative arc of a plausible future history of this program will surely mark Bloom’s initiative as a turning point.
Practically everything that has been written about current interest in formalization has failed to untangle the cluster of factors that have contributed to its growth. In Part 2 I will attempt to hack through the confusion, paying due attention to the industrial stakes that I won’t have time to mention today. The rest of today’s essay will be devoted to explaining my conviction that Bloom’s self-refereeing exercise with Bhavik Mehta has unleashed an antinomy from which the Central Dogmatists will have difficulty extricating themselves.
Indeterminacy of translation cannot be finessed
Aficionados will know that computer science offers more than one route to formalization. Hales’s 2014 Bourbaki report Developments in formal proofs lists Mizar, Coq, Isabelle, and HOL as alternative systems; more recently Hales has switched to Lean, the Xena Project’s proof assistant of choice. Although these systems are based on very different logical foundations, there are, I’m told, not only rules of thumb but dictionaries and even theorems that tell you how to translate between proof assistants.
Thus Anthony Bordg was pleased to announce, quite recently, that he and his collaborators had succeeded in formalizing schemes in Isabelle/HOL, without dependent types. Moreover, his slides affirm that there are
In Isabelle none of the problems mentioned by Buzzard et al. with identity types in Lean.
This may or may not solve the equality issue raised by Buzzard and described in the last post. I can assure you that this is one race in which I do not have and never will have a pony of my own. But I am willing to believe that specialists will be able to teach their competing systems to talk to each other, or at least to determine the boundaries of intercommunicability.11
The robustness of human mathematics, meanwhile, is demonstrated by the ease with which it is translated from one language to another. There may be issues in choosing among alternative modern translations of Euclid or al-Khayyam, not to mention ancient Babylonian or Egyptian texts. But I have never heard complaints that the mathematical content of Gauss’s Disquitiones Arithmeticae is lost in translation from its original Latin to living languages.12 Gerd Faltings was proud to publish his landmark paper on the Mordell Conjecture in German, but nowadays few mathematicians learn German unless it is their native language, and no one questions the reliability of the standard source for the proof, the English translation in the book Arithmetic Geometry edited by Cornell and Silverman.13
Mathematicians take this for granted but it’s worth taking a moment to ponder how remarkable it is that our material is so resilient. Compare the first verse of William Blake’s The Mental Traveler:
I travelled through a land of men,
A land of men and women too,
And heard and saw such dreadful things
As cold earth wanderers never knew.
with its rendition in Antonia Lloyd-Jones’s retranslation of the Polish version that appears in Olga Tokarczuk’s Drive Your Plow over the Bones of the Dead:
As I wandered human countries, Men’s and women’s shared domain, Dreadful things did I encounter, Horrors none on Earth had seen.
Whatever else can be said about the relation of the two texts I think you’ll agree that it hardly qualifies as equality, as discussed here two weeks ago in connection with Kevin Buzzard’s talk about Grothendieck.
Since the mental travel of a mathematical text from one language to the next is so unproblematic, it’s natural to assume, as everyone seems to, that when a proof is formalized, as Bloom and Mehta did with Bloom’s theorem on unit fractions, the original human text and its translation into the language of a proof assistant are equal and not merely equivalent for all intents and purposes. But a moment’s thought should convince you that this can’t possibly be right; at least it can’t be right if you agree with Buzzard that “Axiomatically modelling Grothendieck’s concept of equality seems to me to be an unsolved problem.”
It’s a big and very old can of philosophical worms that we open when we try to pin down what is preserved in any kind of translation. Translators of poetry worry about balancing concern for the meter and rhyme scheme (if there is one) as well as mood, tone, and especially meaning, whatever that is. Only the last item on the list is relevant in most exercises in mathematical translation. That’s precisely the item that is most difficult for philosophers to delineate definitively, and yet it practically always works. A typical dodge — though not one I’ve seen invoked in connection with translation — is to fall back on the assumption of the Central Dogma that the human mathematical text is an approximation to an ideal formal text. But since it’s the human text that is mentally traveling from German to English or Latin to French or whatever, and not a formal version, which until recently was considered no less inaccessible than Plato’s forms, this is simply begging the question.
Again, I’m not denying that it works, or even that a procedure can be devised to transform human text systematically into Lean. But I do emphatically deny that this in any way resolves the epistemological conundrums that ostensibly motivate some of the most vocal of the formalizers, those who are most clearly attached to the Central Dogma. It literally makes no sense to claim that Bloom’s original proof and its formalization in Lean are the same thing, because it is impossible to define a procedure for verifying that a sequence of formulas is a formalization of a human proof. Even attempting to prove the validity of such a procedure would lead to paradox. Formalization would be superfluous if that weren't the case.
Thus formalization inevitably requires a human decision to read a proposed formalization as a translation of the human proof: in other words an act of faith, like the acts of faith that sustain human mathematics in this first half of the 21st century — in the refereeing process, in the published documents, and in the seminar room. Formalization can only eliminate the epistemological antinomies if humans are excluded from the process entirely, and the machines are left to sort it out among themselves.
Meanwhile, now that Bloom and Mehta have produced a formal proof of Bloom’s theorem in Lean, what philosophically coherent reason could any journal give to justify publishing his devalued human proof?
What has been gained?
There's a revealing passage at the end of Hales's 2008 article Formal Proof:
…we may hope for advances in our understanding that will permit us someday to convert a printed page of textbook mathematics into machine-verified formal text in a matter of hours, rather than after a full week’s labor. As long as transcription from traditional proof into formal proof14 is based on human labor rather than automation, formalization remains an art rather than a science. Until that day of automation, we fall short of a practical understanding of the foundations of mathematics.
Even more revealing is Hales’s choice of epigraph from Francis Bacon:
…the mind itself [should] be from the very outset not left to take its own course, but guided at every step; and the business be done as if by machinery.
It’s understandable that entities that spend more time communicating in programming code with machines than in natural language with human beings should lament the latter alternative’s inadequacy. I wouldn’t dream of ruling out a future convergence of the community of mathematicians with a community of robots. When that happens, it will of course be the robots who determine what does and doesn’t qualify as “practical understanding.” Until then, though, I don’t see how the narrow, highly contestable, and largely outdated image of science still promoted by the dominant strands of academic philosophy of mathematics is preferable to art;15 “practical understanding” in the mathematical enterprise will remain a matter of human communication, with all its inadequacy.
In the meantime, then, while humans are still running the mathematical game, it’s fair to ask: what is gained by successful formalization?
Philosophically? The locus of the antinomies has just been displaced. But we know that the point of philosophy is not to find the right answers but rather to find the right questions. I think the robust survival of mathematical content under translation to robotic as well as human language is astonishing enough to merit philosophical reflection, on a par with Jody Azzouni’s question16
how the mathematician’s apparent attention and focus on the surface features of ordinary mathematical proof enables such widespread and nearly universal compliance17 with the formalization norm.
and is much more interesting than futile attempts to pin down the meaning of mathematics once and for all like a dead butterfly in a display case.
Karmically? I can appreciate that some practitioners feel a pressing need for emotional reassurance that their vocation is not a karmic dead end. But why should their personal fixations be a matter of general interest?
Mathematically? In spite of media attention to the formalization program, the case has yet to be made that it is of substantial value to mathematics as a whole, and not just to the (recognized) subdiscipline of automated proof verification.
The potential value of automated theorem proving or conjecture formation by machine learning is another matter; but here I yield the floor to the kibitzer.
So why does it matter?
To avoid misunderstandings, I want to make it clear that Thomas Bloom’s result on sums of unit fractions, confirming a conjecture of Erdős and Graham, is a genuine landmark, independently of its status as a possible harbinger of a radical transformation of the refereeing process.
Karl Marx, The Poverty of Philosophy, Chapter Two.
Coleen Flaherty, “The Peer-Review Crisis,” Inside Higher Ed., June 13, 2022.
This is an immediate consequence of the Prime Number Theorem. I don’t know whether or not the Prime Number Theorem is used at any stage in Bloom’s proof or in the results on which the proof relies. Update: Bloom has written to tell me that the Prime Number Theorem is not used in the proof, and also that it has not (yet?) been formalized in Lean.
Bloomsday, and on the 100th anniversary of the publication in Paris of Joyce’s Ulysses, no less.
It sounds like he was talking about top-down GOFAI rather than a machine-learning paradigm, but the talk around the café table was not at that level of detail. This may be several decades out of date, but it does pre-empt the inevitable and salutary debate on whether an AI on the model of AlphaGO — that just learns the rules of human mathematics, and is told that the goal is to prove theorems — will evolve an aesthetics radically different from ours.
C. Greiffenhagen, “Refereeing for Correctness: Degrees of Certainty in Mathematics,” manuscript 2022.
Ibid.
Ibid.
J. Harrison. “Formal proof—theory and practice.” Notices of the AMS, 55(11) (2008) 1395‐1406.
Or that led mathematics to choose them, as Yuri I. Manin memorably put it.
In private communication Bordg has expressed some confidence in the possibility of automatic translation from Lean to Isabelle.
Though see note 6 on the bottom of p. 133 of The Shaping of Arithmetic after C.F. Gauss's Disquisitiones Arithmeticae edited by Goldstein, Schappacher, and Schwermer.
Spanish is spoken in many countries that do not all have the same conventions for translating mathematical terms from other languages. The word for field, as used in algebra, can be either campo or cuerpo, depending on the country. The word “smooth” can be translated as lisa or suave, depending on the mathematical context, but when I was writing notes for a course a few years ago it seemed that no one had chosen the appropriate translation for smooth representation. After consulting with colleagues from Spain, Peru, and Argentina we settled on what may or may not be established as the official translation.
It’s telling that no one seems to be devoting much energy to translation in the other direction.
Whether mathematics is or should be an art or a science or both has been a matter of debate for well over a millenium, even as the meanings of the words themselves — and their translations! — were repeatedly altered beyond recognition. Now that some of this history has been traced in Chapters 8 and 10 of Mathematics without Apologies, I would like to believe that mathematicians will approach the subject with a bit more scholarly self-awareness (and fewer Gallic shrugs, now that my book has been translated into French).
In “Why Do Informal Proofs Conform to Formal Norms,” Found Sci (2009) 14:9–26.
What Azzouni calls “compliance” is something observed a posteriori and is not to be attributed to any intention on the authors’ part.