Formal proof and epistemic value

A guest post by (philosopher) Jim Weatherall and (mathematician) Jesse Wolfson

Oct 03, 2025

Just under a year ago Jesse Wolfson unexpectedly sent me a preprint that he had written with James Owen Weatherall entitled

Correctness, Artificial Intelligence, and the Epistemic Value of Mathematical Proof

That same day I wrote back to invite the two authors to contribute a guest post of roughly 3000 words, summarizing their paper (which is currently available at the University of Pittsburgh philosophy of science archive). Both authors are at the University of California at Irvine: Weatherall is in Logic and Philosophy of Science and Wolfson is a mathematician. I was actually familiar with some of Wolfson’s papers but had no idea he was also thinking seriously about philosophy.

The Weatherall-Wolfson paper is a rare example of a sustained argument that not only disagrees with the Central Dogma of Mathematical Formalism but actually presents a radical alternative. For this reason, and because it is also a rare example of a collaboration between a mathematician active in research and a philosopher,1 I believe their paper is of historic importance. I am grateful to the authors for taking the time to make a key part of their argument available to readers of Silicon Reckoner. Those readers who think they are too busy to read even this edited excerpt are encouraged to ponder the following sentence, which appears near the beginning of their text.

Formal correctness tracks mathematical correctness because formal correctness was designed to do precisely that.

After a few minutes of pondering, any reader who is seriously interested in philosophy of mathematics will want to read the rest of what Weatherall and Wolfson have to say about formal correctness.

The post below is an edited excerpt from a longer piece, which Michael generously invited us to contribute here given the Substack's long-running interest in the "Central Dogma" and its role in formalization and mechanization projects around mathematics.

That a mathematical proof be correct is neither sufficient nor necessary for it to have epistemic value. By “correct,” here, for the sake of the argument, we mean that the proof can be formalized, i.e., translated into a formal proof system. (Below, we will call this sense of correctness “formal correctness” and distinguish it from “mathematical correctness”.) By having “epistemic value”, we mean being well-suited to play the various epistemic roles that proof does, and should, play in mathematical practice. In other words, epistemic value is what makes proofs “work”, as proofs, for the purposes that mathematicians seek to attain. We will say more below about what these roles are, and about how we see the relationship between formal and informal proofs.

There are two related strands of literature that form the context for our arguments, both of which will be familiar to Michael’s readers. One is the philosophical literature on the Standard View of mathematical proof and rigor – what Michael has called “the Central Dogma”–which holds that ordinary “informal” proofs, as generated by working mathematicians and published in high quality mathematics journals, should be seen as describing or encoding a formal proof that could, at least in principle, be generated from the informal proof. The other strand of literature – or at least, conversation -- has appeared mostly in talks and online forums, including this one, with some cross-over to mathematics journals. It concerns the speculative possibility of AI systems that could generate correct proofs of theorems beyond the ken of human mathematicians, much like AI systems have exceeded human abilities in games like Go and Chess.

Our main claim may give you pause. Of course mathematical proofs should be correct! And if a proof cannot be formalized, surely that is a signal that something odd or surprising is going on. We grant this—as, we think, would most discussants of the Standard View. But we want to offer what we believe is a deeper critique. On our view, while it seems to be generally true that mathematically correct informal proofs can be made formally correct, this fact is incidental and secondary to the epistemic value of informal proof. Proofs are formally correct, when they are, not because correctness is the normative ideal governing mathematicians' activities, nor because correctness underwrites the epistemic value of those activities. Instead, informal proofs are formally correct because logicians have developed a rich and successful mathematical theory of mathematical practice intended to capture various aspects of mathematical argument. Formal correctness tracks mathematical correctness because formal correctness was designed to do precisely that. In other words, the formal systems in which formal correctness is established have been invented and tuned with the goal of making existing informal proof methods formalizable. Formalizability speaks to the success of this activity in mathematical logic—but it is irrelevant to the question of what makes the informal proofs epistemically valuable.

Although our motivations for criticizing the Standard View are similar to those of previous authors, as this substack has documented, the issue has become increasingly urgent. We are in a period of rapid advances in computer-assisted theorem proving and proof verification. Those working in these areas often invoke the ideology of the Standard View to explain and justify their goals and this has led to some striking soothsaying to the effect that in the future, one should expect AI mathematicians to outpace human mathematicians at proving theorems, in much the same way that AI game players have outpaced their human competitors in Go, chess, Jeopardy, and other games. Many groups have already devoted substantial resources to realizing this vision of future mathematics; and some very prominent mathematicians have become boosters for AI-driven mathematics.

Who are we to gainsay the future value of AI for mathematics? We, too, are impressed by the rate of progress in recent years, and we suspect that computer assistance will change many aspects of mathematical life—just as Mathematica and Matlab changed the lives of undergraduate calculus students and many applied mathematicians a generation ago. But if one accepts our main views, then the mere generation of correct proofs is not, in itself, a contribution to mathematics, at least not in the absence of several other necessary conditions. The broader mathematical community needs to be clear about what these sorts of activities can and cannot contribute, and what risks are posed to more traditional mathematical activities by reorienting towards machine-driven methods.

Fruitful Errors

The argument that formal correctness is not sufficient for a proof to have epistemic value is familiar: not every formal proof is interesting. Virtually everyone writing on this topic would agree on that. But deeper reflection on what makes a proof interesting also reveals why formal correctness cannot be necessary, either.

Ultimately, what makes a proof interesting is the way it fits into a broader network of mathematical and scientific activities. An interesting proof may be one that establishes a surprising proposition, where what makes that proposition surprising has to do with what others have expected to be true in some mathematical domain. Or it may be the proposition is unsurprising, but it has turned out to be surprisingly difficult to prove, and so it will be interesting to see how the nut was finally cracked. Very often the most interesting proofs involve essentially new methods or ideas, which may spark new conjectures or allow other proofs.

Perhaps most importantly of all, interesting proofs contribute to what Field’s Medalist William Thurston famously described as mathematical understanding in his essay “On Proof and Progress in Mathematics.”

We suggest that this sort of understanding is ultimately what valuable proofs contribute to mathematics. This, far more than their roles in justification, is what makes proofs epistemically valuable. And formal correctness is not necessary for a proof attempt to make this sort of contribution.

To see this, consider how fruitful erroneous proof-attempts can be. (We give other examples in the paper.) In 1895, Poincaré released the first installment of his monumental paper Analysis Situs. His stated goal in this paper was to launch a new field of mathematics, now known as “algebraic topology”. To say that he succeeded is to do scant justice: algebraic topology grew throughout the twentieth century and into the present to be one of the major fields of mathematical research, with influences and cross-pollination across mathematics, and across the sciences more generally. Today it finds significant applications in condensed matter physics, quantum chemistry and contemporary data science, just to name a few. Analysis Situs is also riddled with errors, which led Poincaré to produce five additional “supplements.” Even these did not suffice to fully clear up the issues, neither to today's sensibilities nor to those of his contemporaries.

It is hard to square the success and impact of these papers with the formalist view of mathematics. Standard formalist responses have alternated between treating them as anathema and ignoring them altogether. Rather than rehash this discussion, we want to focus on the fecundity of Poincaré's mistakes, most notably his false assertion that every homology 3-sphere is homeomorphic to a 3-sphere, which led to his discovery of his dodecahedral space and the formulation of the “Poincaré conjecture’’. This has been one of the most fertile mathematical conjectures of the last century and a half, leading to Smale's work on h-cobordism (and the proof of the conjecture in dimension ≥ 5), Freedman's work on the disc-embedding theorem and the classification of simply connected (topological) 4-manifolds, and Thurston's geometrization conjecture/Perelman's theorem classifying the (piece-wise) uniform geometries on closed 3-manifolds.

For those keeping score at home, these results alone comprise three Fields Medals awarded, one Fields medal declined and one ($1,000,000) Millenium Prize declined (the only one which could have been awarded to date). Notably, Poincaré's errors are only the first in a series of major errors which have given rise to large swaths of contemporary algebraic topology and algebraic geometry, and which continue to be the site of major open problems and advances.

We mention only Lefschetz's work on the topology of algebraic varieties, which led Hodge to initiate what we now call “Hodge theory” and the “Hodge Conjecture” (an open Millenium Prize problem). A classic quip, repeated by Griffiths in his biography of Lefschetz (p. 289), asserts that

“Lefschetz never stated a false theorem nor gave a correct proof.'”

The Relationship Between Logic and Mathematics

If we are right that formal correctness is neither necessary nor sufficient for epistemic value in mathematics, it raises the question of whether formal correctness bears any relationship to the epistemic value of mathematical proof at all. We suggest that the answer is “yes”, but the situation is subtle. Our answer goes via a different notion of correctness, which might be called “mathematical correctness”. This is the standard of correctness for mathematical proof accepted by the mathematical community. Advocates for the Standard View would maintain that formal correctness is equivalent to mathematical correctness, or at least, necessary and sufficient for mathematical correctness. In doing so, they extend a long tradition taking logic to provide a normative metaphysical and perhaps epistemic foundation for mathematics—or, in its most totalizing form, all human knowledge. This picture is one on which the rules for manipulating formal syntax are taken to be fundamental laws of reason, and logical axioms are necessary truths prior to other forms of knowledge.

We endorse a different view of the relationship between logic and mathematics. (We think this view has a folk status in philosophy of mathematics, but some authors, such as Penelope Maddy, have endorsed similar ideas explicitly.) On our view, mathematical logic is best seen as a mathematical theory, or family of mathematical theories, of mathematical practice. It bears a relationship to ordinary mathematics analogous to the relationship that a mathematical theory in science bears to some real-world phenomenon. It captures some aspects of mathematics in precise mathematical terms. Among other things, it provides a mathematical framework for thinking about how mathematicians reason, and it offers a precise account of mathematical correctness—namely, formal correctness.

From this point of view, mathematical logic is a valuable tool for thinking about mathematics. But to take formal correctness as the standard against which mathematical correctness is evaluated, or worse as equivalent to mathematical correctness, is to put the cart before the horse. It is analogous to saying that the reason the earth follows its orbit around the sun is that our best theory of gravitation says it ought to. Or perhaps better, to say that a proof is mathematically incorrect because it is not formalizable would be like saying that Mercury has erred in some way because its perihelion advances a few dozen arcseconds each century relative to its Newtonian orbit.

Why hold the view we propose? The first reason is that mathematical logic clearly is a mathematical theory of mathematical practice. Classical first order logic captures many aspects of mathematical reasoning. Second order logic captures common reasoning in fields like point-set topology that typically use high order quantification. And so on. All of these are mathematical theories. So they are mathematical theories of mathematical practice.

So far so good. But the important question is whether mathematical logic, and by extension formal correctness, is also more than this. Maybe it is mathematical reasoning with certain failure modes removed. Or maybe it offers a higher, more pure or ideal standard of correctness than demanded by working mathematicians. But we would reject this. Yes, mathematical logic describes a certain kind of ideal of reasoning, but that is because it’s a highly idealized theory of mathematical reasoning: various bits of mathematical logic describe more or less simplified versions of mathematical reasoning, with domains of application in which their descriptions are more or less accurate, and others where they break down–just like other mathematical theories of things we care about.

We conclude with a final argument, which is somewhat speculative but which seems to us to accurately reflect the situation. Perhaps it is a Rorschach test of sorts. Suppose there were a result that could not be formalized in any known logical system, but which nonetheless the mathematical community came to accept as mathematically correct. This may seem implausible, but suppose it happens. What should we make of that situation? Advocates of the Standard View would have to say that the mathematical community has gone off the rails, they have accepted an incorrect result, where incorrectness is determined by failures of formalizability. But that is not what we would expect mathematicians to actually do. Instead, we think such a situation would spark a lot of excitement among logicians, and important new work trying to find a new formal system in which the result could be formalized. In other words, if logic failed to capture mathematical correctness, mathematicians would develop new logic, not change mathematics.

The Automation of Mathematics Revisited

As we said above, one motivation for our arguments is to reflect on what automated theorem provers can contribute to mathematics—and to push back against the idea that automating mathematics is certain, or even likely, to significantly advance mathematics.

If one thinks that formal correctness is the guiding virtue of mathematical practice, if one believes it is sufficient, or even just necessary, for epistemic value, then it would follow that a machine that could reliably generate correct proofs would generate epistemic value. If, on the other hand, correctness is neither necessary nor sufficient for a proof to have epistemic value, it is much less clear what reliably generating correct proofs can accomplish. It would seem that mathematicians need more than this—and also, that providing correct proofs is not enough.

Where does that leave us?

First, there is a sharp distinction to draw between computational proof generation and proof assistance. Using computers to help prove theorems that have been stated by mathematicians, which assert things that mathematicians have prior interest in establishing, is surely valuable.

None of this is problematic, as long as the crucial role that human mathematicians play in posing questions and extracting new understanding from answering them remains in focus. For that to be the case, the inputs – the conjectures stated, the definitions used, etc. – must come from mathematicians who are seeking to extend the body of existing mathematical knowledge, with full integration with the rest of that body of knowledge; and the outputs, namely, proofs of theorems, must be such that they can be read and understood by human mathematicians, and such that they provide the kind of insight and understanding that other, non-assisted work does.

More troubling, though, is the prospect that generative AI could simulate the sorts of inputs and output just described. One could even imagine that such a system could learn to accurately predict what sorts of definitions, conjectures, and theorems would be judged to be interesting by other mathematicians. In such cases, one might think that there is no loss in full automation, since the automated system would be generating “interesting'” results in a form that is comprehensible to humans. But we think that even in this case, something crucial would be missing. The sort of scenario we are imagining, if truly devoid of human feedback, would be static, whereas mathematics is dynamically evolving in response to new ideas, techniques, applications, and results. We would lose a process where humans discover structures and relationships, leading to new ideas about how to investigate those structures and what significance they have for other things of prior interest, leading to yet more ideas about those structures and new ones. This is an essentially creative activity, and a social one that stems from engagement with other mathematicians' work. One can entertain more speculative possibilities, which perhaps could play more constructive roles. For instance, consider a customized AI system, trained on both the body of mathematical knowledge and a particular mathematician’s style, previous work, ways of thinking, and preferred questions. That system, engaged in a dynamic conversation, could play the role of a close collaborator or graduate student, proposing ideas, translating others' work, and drawing connections that the mathematician can then respond to. Or consider a situation where an AI system develops a body of new theorems, but then also generates pedagogical materials – textbooks, video lectures, question and answer sessions, and the like – to teach the new work to humans, much as a human mathematician would. (Thank you to Tim Gowers for pushing us on this sort of possibility.) It is easy to see how in this sort of scenario, an AI assistant could be helpful, just as a good teacher or student could be helpful. Many mathematicians engage with others' work, at least at some points in their career, through proxies of just the sort that an AI system could presumably simulate.

Again, though, we do not dispute that AI or other automated systems can be useful to mathematicians. Of course they can be. Our point is that no degree of simulation of the processes of mathematics via generation of the text and symbols that mathematicians produce to communicate with one another about mathematical ideas can play the cognitive role that reasoning about mathematics can play. This is true even if the simulated material is indistinguishable from the sorts of text that mathematicians themselves would produce. At very best, what one can hope for would be text that generates, in the mind of a mathematician who reads it, new interpretive and creative processes analogous to those that reading work by another mathematician would produce—thereby stimulating new ideas. But before one declares that this is enough, note that again the contribution to mathematics is what the mathematician produces when attempting to engage with the AI generated text, and not the text itself, which is essentially sterile.

Silicon Reckoner

Formal proof and epistemic value

A guest post by (philosopher) Jim Weatherall and (mathematician) Jesse Wolfson

Discussion about this post