Dear Alberto Naimo and Walter Dean, a few weeks ago I asked (through your emails) if I am allowed to reprint a version of this contribution for the journal of the German Math. Association. I am still very much interested in a "yes" from your side. Thank you, Gudrun
As a logician I've been trying to convince all the people who think that superintelligent AIs will be magic of the existence of inherent computational complexity for a long time. No AIs won't have godlike capabilities relative to us.
But I don't really understand how you get from that fact to the idea that they won't massively change how proofs are found by being substantially faster (or at least able to bring about more computational resources) at proving things than us. I mean, sure we are more efficient than brute force proof search via unification/resolution but at some level when we search for an idea we are engaged in a type of brute force search and there is no particular reason to think that AI won't eventually match that.
Sadly, imo, that's kinda the most depressing outcome. It will still take lots of effort to solve hard problems but we will all be in the situation where we can't even pretend that we are adding much value by trying to prove things ourselves.
I mean, if I felt that, in computability theory, I'd get the theorems I'm interested in proved faster by doing someone else's laundry so they could take the time to solve it for me I wouldn't be a mathematican. Whether or not you want to say that the machine proof doesn't count as mathematical understanding to you, it counts as far as I care about doing math. I know some people would do math even if it was all already known to rediscover it themselves. I wouldn't indeed, I'm not interested if I don't think I'm at least sorta advancing the frontier of knowledge in a meaningful way.
Obviously, other mathematicans may be brighter than me but I need to believe that I can at least learn a sliver of the mathematical universe well enough that my contributions advance knowledge there more than interacting with me slows it down and I worry we will -- not in the next few years but maybe in 20 years -- hit that point with AI.
To people who see doing math like playing music, as a kind of purely expressive/artistic activity maybe that will be fine. But to those of us who do math exactly because we don't like that kind of thing it will be sad. Though maybe VR will come along and give us something else fun to do.
Thanks, Peter, for your comment which also touches on many interesting
points. But as we explain in the full version of our paper (now
published here: https://academic.oup.com/philmat/advance-article/doi/10.1093/philmat/nkaf005/8182666?searchresult=1), our aim is not to argue that mathematics will -- or should -- remain an exclusively human endeavor. On the contrary, our goal is to explore whether certain problems can be solved by automated systems, and more specifically, whether the use of machines is necessary for solving them. We believe that logical tools -- and indeed ones from computability theory -- can play a crucial role in guiding this type of analysis. In particular, we are interested in whether the logical form of problems (e.g. as measured relative to the arithmetical hierarchy) is relevant to such an investigation, and whether such structure bears on the proof techniques which may be applicable. In this sense, our work can be seen as one of categorization and theoretical analysis: we seek to understand whether it is possible to delineate formal and conceptual boundaries between different types of problems and proofs -- and, ultimately, to identify where human-specific contributions lie.
Thanks for the reply. I understood you were saying that but note that you still need it to be true that having a human involved in proving results is reasonably useful. If a machine on its own can simply do better (even at producing the kind of proofs we find informative) that is a very different sort of situation. Of course it's not going to be able to just do everything in a flash but even if it's just true that an AI can do as well as the best human mathematicians could if they specialized in that area/problem and it can do this at a reasonably priced computational cost why would you bother to have the person involved at all? Basically, the question is whether you could answer mathematical questions more efficiently by just getting a different job and using the money to buy computation instead of trying to assist yourself.
I think whether that will happen is still a genuinely open question. Sure modern LLMs are pretty amazing but the training stages are hugely computationally expensive. Depending on how the relative benefits of a bigger working memory, greater recall etc work out relative to particularized experience (the way you learn about specific features of a problem as you work on it over weeks or months) it's not totally clear to me how that will turn out. In my more pessimistic moods I worry that it will turn out that if you want to know an answer to a given mathematical hypothesis it will turn out that it makes more sense to go get some other job (ppl will like ppl to do shit for them even in a world full of AI) and use the money to buy computer time for an AI to address the problem on its own.
---
Just regarding computability I'm not sure how practically relevant that is going to be. It's great for theoretical results but in practice we may need to look elsewhere or even develop a whole new branch of complexity theory.
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
I have not yet read compltely the article by philosophers but I want to stress two points
-- The idea of referring tio complexity arguments is already in Manin's article
'' Komogorov's complexity and .... '' see ARXiv ...
-- I doubt mixture of logic an complexity arguments could answer what was called byMichael - thanks to him )'' Kantor 's problem '' - see an older blog.
I would like to be convinced by this argument but am not. We mathematicians prove in a world were proving is hard, even — as the authors point out — in decidable theories like Pressburger arithmetic. However, on a daily basis we (try to) find some passage through formally undecidable terrain. A central question seems to be: is there some formal reason why we humans can navigate this terrain and computers cannot?
The authors seem to suggest that progress has so far either been on a) situations where brute force is possible or b) situations which have better formal properties (e.g. in their discussion of AlphaGeometry). AlphaProof solved questions in number theory at IMO level. I agree this isn’t exactly what we mathematicians do, but can one convincingly say that IMO number theory problems are qualitatively different to what Michael and I do for our daily bread?
I agree that much of the press coverage is full of hype. I agree that so far no convincing example of AI alone doing research level maths has occurred. I would like there to be a formal reason why this isn’t possible, but I just don’t see it. To quote someone mentioned often on this blog: “I see no obstruction.”
Thanks, Geordie, for your comment which touches on many interesting points. Let me clarify, however, that our paper does not aim to answer the question: “Is there a formal reason why humans can navigate this terrain while computers cannot?” Rather, our goal is more modest. We ask whether formal methods can help us to analyze what AI systems are capable of in mathematics. In particular, we are interested in identifying if there are any common features to current (and foreseeable) applications of automated systems to resolve recognized open questions. This inquiry leads to broader questions, not limited to AI but relevant to mathematical practice as a whole:
1. What makes an open problem “significant” to mathematicians? Is this also a question which can be analyzed, or at least, approached, by formal means?
2. What kinds of proofs do we expect from automated systems? Should they produce informal proofs (like those found in human-written articles or textbooks), or formal ones (given that AIs are a kind of formal system themselves)?
My co-author spoke to 1. in his reply. But let me also say that another interesting question -- related to the issue of external realignment we mentioned in the post -- is whether we will reach a point where we begin to defer to automated systems in deciding what conjectures are worth working on? Or is this something which will always require human intervention?
In regard to 2., we focused in the paper on cases in which automated systems both successfully resolved a recognized open question and delivered formal (and formally verifiable) proofs. But in the case of more recent results for math competition problems (to which we alluded briefly), do we expect the same the standards of proof? For instance, if only a numerical answer is required -- as in some competition questions and current benchmarks -- are we simply asking for an informal explanation of why an answer is correct? If so, will this change the standards of justification which have been traditionally employed in mathematics? (This is a question which will naturally interest philosophers.) How do ongoing attempts at auto-formalization bear on this? And might these themselves give rise to new obstructions, formal or otherwise?
The possibility of an obstruction to artificial intelligence doing research-level mathematics on its own is indeed a central question we hoped to bring into better focus. But in order address this systematically, we still think it is important to ask whether "doing research-level maths" is being understood as resolving individual open questions or as the general capacity to resolve an open-ended class of conjectures which arise in the ongoing practice of mathematics.
If it's the former, we agree there is "no obstruction". For instance, an automated system might simply 'get lucky' while performing something akin to a brute force search (in fact this is what we think has largely happened to date). So if we really want to discuss the possibility of *formal obstructions* - as you seem to suggest - then it seems that we need to understand "doing research-level maths" in the latter sense as a general problem.
But if we do this, we must then address the further issue of whether we are going to understand this problem simply as that of deciding whether an arbitrary statement is derivable from axioms - i.e. deciding membership in Prov(\Gamma)- or as a restriction of this problem to "research-level questions we care about in practice (or would care about if they happened to be formulated)" - i.e. what we called *significant questions*.
This time, in the first case, there *are* formal obstructions in the form of traditional limitative results. It thus seems to us that the challenge you're pointing to only becomes interesting once we *both* understand "doing research-level maths" as a general problem and also consider the restriction to significant open questions. In this case, there is also precedent for thinking that the subproblem might be easier - e.g. in the way that 2SAT is easier than SAT or deciding if a Mersenne number is prime is easier than an arbitrary one. In fact, this is what Gowers seems to predict in his Manifesto linked here:
In particular, he simultaneously acknowledges the general difficulty of deciding Prov(\Gamma) - which he calls "boring mathematics" (B) - while pointing towards a strategy for developing systems capable of resolving significant open questions - which he describes as "the class of statements ... we find interesting and approachable'' (M \subset B). And it is also presumably part of what the Frontier Math benchmarks are intended to assess (albeit currently with a class of 300 questions, rather than a truly open-ended one).
But whether the relevant subproblem is actually easier remains an untested hypothesis. In the face of this, we wish to call attention to the possibility that it will remain difficult for automated systems. Or in concrete terms: either current Millennium-style problems, or others that human mathematicians will continue to generate on their own accord, will remain unresolved by automated systems. Of course even if this turns out to be true, it would not necessarily provide evidence that "we humans can navigate this terrain and computers cannot". Rather, it might attest to the fact that even when the boundaries of significance are defined relative to our own (human) standards, part of the relevant terrain is *inherently unnavigable* by technology we can construct and apply in practice (albeit for reasons for which we currently lack a formal explanation). Or at least it seems to us that this is amongst the genuinely interesting questions raised by current developments.
Thanks Alberto and Walter for your reply. It made me realise that I'd somewhat misunderstood what you were arguing in the original post. I appreciate the additional comments and clarification.
I would like to respond briefly to: "For instance, an automated system might simply 'get lucky' while performing something akin to a brute force search (in fact this is what we think has largely happened to date)."
This reminds of the dismissive attitude that is often taken towards brute force. However, I would argue that what I do as a research mathematician, and what AlphaGo does as a Go-playing computer game is different by degree, but not qualitatively. We both explore possibilities, guided by some idea of what looks promising. I explore hundreds of possibilities, and AlphaGo explores billions.
When I reflect on theorems I've proved in the past, I've had some good ideas, but most of the work is following up what seems like every possibility in a big branching tree!
Thanks, Geordie, for taking note. We wouldn't want to contest the
basic point you're drawing here. But we also think that reflecting
on the specifics brings into focus a couple of points about
the nature of brute force which are worth highlighting.
First, as our perspective is closer to that of complexity computational theory, we are not intending to be dismissive of brute force ourselves. Rather we hope to further clarify its status both in light of the traditional attitude of mathematicians to which you allude (e.g. that brute force doesn't provide the appropriate kind of understanding within a proof) but also the details of contemporary developments.
Within this frame, we also think that a further distinction should be made between the form of brute force which is employed by logic-based techniques -- such as SAT-solvers and traditional automated theorem proving -- and that employed within systems based in part or whole on machine learning (such as AlphaGo). For the most part, it's the former which have been successfully employed to date in resolving open problems. But a lot of attention -- as well as extensive resources -- are currently being devoted to attempts to demonstrate the value of the latter to mathematical research.
An illustration of some of the methodological issues which arise here can be gleaned by comparing the details of how automation was applied to resolve the Boolean Pythagorean Triple conjecture (Heule et. al 2016) and to improve a lower bounds for cap sets (Romera-Paredes et al. 2024) as we did in our paper (https://doi.org/10.1093/philmat/nkaf005).
Distilling a couple of morals:
1) In the BPT case, a SAT-solver was used to demonstrate the *unsatisfiability* of a propositional formula with 3745 variables and 14672 clauses. This would have required an exhaustive search through 2^{3745} rows via the paradigmatically "brute" method of truth tables. But it was accomplished in less than 40,000 CPU hours using a sophisticated (but provably correct) variant of the Davis-Putnam algorithm. While still "brute" in some sense, the authors also claim that this is likely to be the *only* way the relevant statement can be demonstrated.
2) In the cap set case, a large language model (trained on general, non-mathematics-specific data) was used in conjunction with a greedy search strategy and techniques from genetic algorithms to find a specific finite object (a so-called admissible set) which was then employed in a traditional inductive proof which improved a standing lower bound. But this just involved finding an individual witness to a decidable property -- i.e. the result did *not* require an exhaustive search through the relevant space of possibilities nor is it clear that grab-bag of techniques which were applied even could have accomplished this (let alone provably so) despite the fact that is what would have been required for an exact reuslt. The authors still claim that the LLM played an essential role. But the approach is still "brute" in sense that it required "many-shot prompting" -- for "many" on the order of two million iterations.
Of course these examples probably don't raise to the level of "significance" as discussed in our post. But we still acknowledged both there and in our paper that systematic conjecture exploration of the sort you seem to describe -- "explor[ing] possibilities, guided by some idea of what looks promising" -- is an instance in which automation is already affecting mathematical practice. (A widely publicized example is Davies et. 2021 "Advancing mathematics by guiding human intuition with AI".)
But the suggestion that AlphaGo in particular is another sort of paradigm for brute force seems to raise some further questions:
i) Are you aware of any successful applications to resolve open questions which involve the combination of techniques employed by AlphaGo or similar systems originally developed for board games? (We ask in virtue of having thought through the comparison ourselves and noting a number of what we take to be disanalogies between proof search and game play.)
ii) One of the characteristic features of AlphaGo is its use of a Monte Carlo tree search algorithm in addition to pretrained neural networks for evaluating moves. While the latter can be understood as a heuristic means of avoiding brute force, the former continues to employ random sampling of "play outs" (and this is something which distinguishes AlphaGo from, say, the Davis-Putnam algorithm). So if we take your analogy seriously, would you agree that a degree of "luck" would indeed be involved if a novel mathematical result were obtained via the application of such a system?
Here are a few papers that do something like what you're talking about. (I.e. using neural networks to point search in promising directions, in a similar way to AlphaGo -- though admittedly the architecture is typically much less sophisticated):
There are surely many more. Note that each paper finds some new mathematical objects which are significant in their domains. (E.g. in the PatternBoost paper we are able to refute a conjecture of Niali Graham in combinatorics which had been open for 30 years.) The work on AC conjecture and the work on ribbon graphs is equally or more remarkable.
I appreciate your point about the conceptual difference between "getting lucky" (with a neural network) and exhaustively showing no solution (with a SAT solver). I would say that the former is closer to my experience as a mathematician, which is why I'm drawn to it. If I find an argument I typically feel that luck played some role. No-go theorems do exist, but it is rarer for mathematicians to encounter a statement along the lines "it is impossible to prove Z using only tools X and Y".
I read the following article when it came out: “At Secret Math Meeting, Researchers Struggle to Outsmart AI”, Lyndie Chiou, Scientific American, June 6, 2025. and IT IS WILD and almost to the point that 1) I wonder if Scientific American is still a reliable source of scientific info, 2) I wonder if this is one of those heavily editorialized/edited click-bait articles that keep the lights on, and lastly 3) Pretty sure the net information I got from that article was negative ...
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
Dear Alberto Naimo and Walter Dean, a few weeks ago I asked (through your emails) if I am allowed to reprint a version of this contribution for the journal of the German Math. Association. I am still very much interested in a "yes" from your side. Thank you, Gudrun
Dear Gudrun, thanks for your kind reminder. We replied you privately by email. Best, Alberto
As a logician I've been trying to convince all the people who think that superintelligent AIs will be magic of the existence of inherent computational complexity for a long time. No AIs won't have godlike capabilities relative to us.
But I don't really understand how you get from that fact to the idea that they won't massively change how proofs are found by being substantially faster (or at least able to bring about more computational resources) at proving things than us. I mean, sure we are more efficient than brute force proof search via unification/resolution but at some level when we search for an idea we are engaged in a type of brute force search and there is no particular reason to think that AI won't eventually match that.
Sadly, imo, that's kinda the most depressing outcome. It will still take lots of effort to solve hard problems but we will all be in the situation where we can't even pretend that we are adding much value by trying to prove things ourselves.
I mean, if I felt that, in computability theory, I'd get the theorems I'm interested in proved faster by doing someone else's laundry so they could take the time to solve it for me I wouldn't be a mathematican. Whether or not you want to say that the machine proof doesn't count as mathematical understanding to you, it counts as far as I care about doing math. I know some people would do math even if it was all already known to rediscover it themselves. I wouldn't indeed, I'm not interested if I don't think I'm at least sorta advancing the frontier of knowledge in a meaningful way.
Obviously, other mathematicans may be brighter than me but I need to believe that I can at least learn a sliver of the mathematical universe well enough that my contributions advance knowledge there more than interacting with me slows it down and I worry we will -- not in the next few years but maybe in 20 years -- hit that point with AI.
To people who see doing math like playing music, as a kind of purely expressive/artistic activity maybe that will be fine. But to those of us who do math exactly because we don't like that kind of thing it will be sad. Though maybe VR will come along and give us something else fun to do.
Thanks, Peter, for your comment which also touches on many interesting
points. But as we explain in the full version of our paper (now
published here: https://academic.oup.com/philmat/advance-article/doi/10.1093/philmat/nkaf005/8182666?searchresult=1), our aim is not to argue that mathematics will -- or should -- remain an exclusively human endeavor. On the contrary, our goal is to explore whether certain problems can be solved by automated systems, and more specifically, whether the use of machines is necessary for solving them. We believe that logical tools -- and indeed ones from computability theory -- can play a crucial role in guiding this type of analysis. In particular, we are interested in whether the logical form of problems (e.g. as measured relative to the arithmetical hierarchy) is relevant to such an investigation, and whether such structure bears on the proof techniques which may be applicable. In this sense, our work can be seen as one of categorization and theoretical analysis: we seek to understand whether it is possible to delineate formal and conceptual boundaries between different types of problems and proofs -- and, ultimately, to identify where human-specific contributions lie.
Thanks for the reply. I understood you were saying that but note that you still need it to be true that having a human involved in proving results is reasonably useful. If a machine on its own can simply do better (even at producing the kind of proofs we find informative) that is a very different sort of situation. Of course it's not going to be able to just do everything in a flash but even if it's just true that an AI can do as well as the best human mathematicians could if they specialized in that area/problem and it can do this at a reasonably priced computational cost why would you bother to have the person involved at all? Basically, the question is whether you could answer mathematical questions more efficiently by just getting a different job and using the money to buy computation instead of trying to assist yourself.
I think whether that will happen is still a genuinely open question. Sure modern LLMs are pretty amazing but the training stages are hugely computationally expensive. Depending on how the relative benefits of a bigger working memory, greater recall etc work out relative to particularized experience (the way you learn about specific features of a problem as you work on it over weeks or months) it's not totally clear to me how that will turn out. In my more pessimistic moods I worry that it will turn out that if you want to know an answer to a given mathematical hypothesis it will turn out that it makes more sense to go get some other job (ppl will like ppl to do shit for them even in a world full of AI) and use the money to buy computer time for an AI to address the problem on its own.
---
Just regarding computability I'm not sure how practically relevant that is going to be. It's great for theoretical results but in practice we may need to look elsewhere or even develop a whole new branch of complexity theory.
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
I have not yet read compltely the article by philosophers but I want to stress two points
-- The idea of referring tio complexity arguments is already in Manin's article
'' Komogorov's complexity and .... '' see ARXiv ...
-- I doubt mixture of logic an complexity arguments could answer what was called byMichael - thanks to him )'' Kantor 's problem '' - see an older blog.
Best wishes
Jean-Michel KANTOR
I would like to be convinced by this argument but am not. We mathematicians prove in a world were proving is hard, even — as the authors point out — in decidable theories like Pressburger arithmetic. However, on a daily basis we (try to) find some passage through formally undecidable terrain. A central question seems to be: is there some formal reason why we humans can navigate this terrain and computers cannot?
The authors seem to suggest that progress has so far either been on a) situations where brute force is possible or b) situations which have better formal properties (e.g. in their discussion of AlphaGeometry). AlphaProof solved questions in number theory at IMO level. I agree this isn’t exactly what we mathematicians do, but can one convincingly say that IMO number theory problems are qualitatively different to what Michael and I do for our daily bread?
I agree that much of the press coverage is full of hype. I agree that so far no convincing example of AI alone doing research level maths has occurred. I would like there to be a formal reason why this isn’t possible, but I just don’t see it. To quote someone mentioned often on this blog: “I see no obstruction.”
Thanks, Geordie, for your comment which touches on many interesting points. Let me clarify, however, that our paper does not aim to answer the question: “Is there a formal reason why humans can navigate this terrain while computers cannot?” Rather, our goal is more modest. We ask whether formal methods can help us to analyze what AI systems are capable of in mathematics. In particular, we are interested in identifying if there are any common features to current (and foreseeable) applications of automated systems to resolve recognized open questions. This inquiry leads to broader questions, not limited to AI but relevant to mathematical practice as a whole:
1. What makes an open problem “significant” to mathematicians? Is this also a question which can be analyzed, or at least, approached, by formal means?
2. What kinds of proofs do we expect from automated systems? Should they produce informal proofs (like those found in human-written articles or textbooks), or formal ones (given that AIs are a kind of formal system themselves)?
My co-author spoke to 1. in his reply. But let me also say that another interesting question -- related to the issue of external realignment we mentioned in the post -- is whether we will reach a point where we begin to defer to automated systems in deciding what conjectures are worth working on? Or is this something which will always require human intervention?
In regard to 2., we focused in the paper on cases in which automated systems both successfully resolved a recognized open question and delivered formal (and formally verifiable) proofs. But in the case of more recent results for math competition problems (to which we alluded briefly), do we expect the same the standards of proof? For instance, if only a numerical answer is required -- as in some competition questions and current benchmarks -- are we simply asking for an informal explanation of why an answer is correct? If so, will this change the standards of justification which have been traditionally employed in mathematics? (This is a question which will naturally interest philosophers.) How do ongoing attempts at auto-formalization bear on this? And might these themselves give rise to new obstructions, formal or otherwise?
The possibility of an obstruction to artificial intelligence doing research-level mathematics on its own is indeed a central question we hoped to bring into better focus. But in order address this systematically, we still think it is important to ask whether "doing research-level maths" is being understood as resolving individual open questions or as the general capacity to resolve an open-ended class of conjectures which arise in the ongoing practice of mathematics.
If it's the former, we agree there is "no obstruction". For instance, an automated system might simply 'get lucky' while performing something akin to a brute force search (in fact this is what we think has largely happened to date). So if we really want to discuss the possibility of *formal obstructions* - as you seem to suggest - then it seems that we need to understand "doing research-level maths" in the latter sense as a general problem.
But if we do this, we must then address the further issue of whether we are going to understand this problem simply as that of deciding whether an arbitrary statement is derivable from axioms - i.e. deciding membership in Prov(\Gamma)- or as a restriction of this problem to "research-level questions we care about in practice (or would care about if they happened to be formulated)" - i.e. what we called *significant questions*.
This time, in the first case, there *are* formal obstructions in the form of traditional limitative results. It thus seems to us that the challenge you're pointing to only becomes interesting once we *both* understand "doing research-level maths" as a general problem and also consider the restriction to significant open questions. In this case, there is also precedent for thinking that the subproblem might be easier - e.g. in the way that 2SAT is easier than SAT or deciding if a Mersenne number is prime is easier than an arbitrary one. In fact, this is what Gowers seems to predict in his Manifesto linked here:
https://gowers.wordpress.com/2022/04/28/announcing-an-automatic-theorem-proving-project/
In particular, he simultaneously acknowledges the general difficulty of deciding Prov(\Gamma) - which he calls "boring mathematics" (B) - while pointing towards a strategy for developing systems capable of resolving significant open questions - which he describes as "the class of statements ... we find interesting and approachable'' (M \subset B). And it is also presumably part of what the Frontier Math benchmarks are intended to assess (albeit currently with a class of 300 questions, rather than a truly open-ended one).
But whether the relevant subproblem is actually easier remains an untested hypothesis. In the face of this, we wish to call attention to the possibility that it will remain difficult for automated systems. Or in concrete terms: either current Millennium-style problems, or others that human mathematicians will continue to generate on their own accord, will remain unresolved by automated systems. Of course even if this turns out to be true, it would not necessarily provide evidence that "we humans can navigate this terrain and computers cannot". Rather, it might attest to the fact that even when the boundaries of significance are defined relative to our own (human) standards, part of the relevant terrain is *inherently unnavigable* by technology we can construct and apply in practice (albeit for reasons for which we currently lack a formal explanation). Or at least it seems to us that this is amongst the genuinely interesting questions raised by current developments.
Thanks Alberto and Walter for your reply. It made me realise that I'd somewhat misunderstood what you were arguing in the original post. I appreciate the additional comments and clarification.
I would like to respond briefly to: "For instance, an automated system might simply 'get lucky' while performing something akin to a brute force search (in fact this is what we think has largely happened to date)."
This reminds of the dismissive attitude that is often taken towards brute force. However, I would argue that what I do as a research mathematician, and what AlphaGo does as a Go-playing computer game is different by degree, but not qualitatively. We both explore possibilities, guided by some idea of what looks promising. I explore hundreds of possibilities, and AlphaGo explores billions.
When I reflect on theorems I've proved in the past, I've had some good ideas, but most of the work is following up what seems like every possibility in a big branching tree!
Thanks, Geordie, for taking note. We wouldn't want to contest the
basic point you're drawing here. But we also think that reflecting
on the specifics brings into focus a couple of points about
the nature of brute force which are worth highlighting.
First, as our perspective is closer to that of complexity computational theory, we are not intending to be dismissive of brute force ourselves. Rather we hope to further clarify its status both in light of the traditional attitude of mathematicians to which you allude (e.g. that brute force doesn't provide the appropriate kind of understanding within a proof) but also the details of contemporary developments.
Within this frame, we also think that a further distinction should be made between the form of brute force which is employed by logic-based techniques -- such as SAT-solvers and traditional automated theorem proving -- and that employed within systems based in part or whole on machine learning (such as AlphaGo). For the most part, it's the former which have been successfully employed to date in resolving open problems. But a lot of attention -- as well as extensive resources -- are currently being devoted to attempts to demonstrate the value of the latter to mathematical research.
An illustration of some of the methodological issues which arise here can be gleaned by comparing the details of how automation was applied to resolve the Boolean Pythagorean Triple conjecture (Heule et. al 2016) and to improve a lower bounds for cap sets (Romera-Paredes et al. 2024) as we did in our paper (https://doi.org/10.1093/philmat/nkaf005).
Distilling a couple of morals:
1) In the BPT case, a SAT-solver was used to demonstrate the *unsatisfiability* of a propositional formula with 3745 variables and 14672 clauses. This would have required an exhaustive search through 2^{3745} rows via the paradigmatically "brute" method of truth tables. But it was accomplished in less than 40,000 CPU hours using a sophisticated (but provably correct) variant of the Davis-Putnam algorithm. While still "brute" in some sense, the authors also claim that this is likely to be the *only* way the relevant statement can be demonstrated.
2) In the cap set case, a large language model (trained on general, non-mathematics-specific data) was used in conjunction with a greedy search strategy and techniques from genetic algorithms to find a specific finite object (a so-called admissible set) which was then employed in a traditional inductive proof which improved a standing lower bound. But this just involved finding an individual witness to a decidable property -- i.e. the result did *not* require an exhaustive search through the relevant space of possibilities nor is it clear that grab-bag of techniques which were applied even could have accomplished this (let alone provably so) despite the fact that is what would have been required for an exact reuslt. The authors still claim that the LLM played an essential role. But the approach is still "brute" in sense that it required "many-shot prompting" -- for "many" on the order of two million iterations.
Of course these examples probably don't raise to the level of "significance" as discussed in our post. But we still acknowledged both there and in our paper that systematic conjecture exploration of the sort you seem to describe -- "explor[ing] possibilities, guided by some idea of what looks promising" -- is an instance in which automation is already affecting mathematical practice. (A widely publicized example is Davies et. 2021 "Advancing mathematics by guiding human intuition with AI".)
But the suggestion that AlphaGo in particular is another sort of paradigm for brute force seems to raise some further questions:
i) Are you aware of any successful applications to resolve open questions which involve the combination of techniques employed by AlphaGo or similar systems originally developed for board games? (We ask in virtue of having thought through the comparison ourselves and noting a number of what we take to be disanalogies between proof search and game play.)
ii) One of the characteristic features of AlphaGo is its use of a Monte Carlo tree search algorithm in addition to pretrained neural networks for evaluating moves. While the latter can be understood as a heuristic means of avoiding brute force, the former continues to employ random sampling of "play outs" (and this is something which distinguishes AlphaGo from, say, the Davis-Putnam algorithm). So if we take your analogy seriously, would you agree that a degree of "luck" would indeed be involved if a novel mathematical result were obtained via the application of such a system?
Here are a few papers that do something like what you're talking about. (I.e. using neural networks to point search in promising directions, in a similar way to AlphaGo -- though admittedly the architecture is typically much less sophisticated):
https://arxiv.org/abs/2104.14516
https://arxiv.org/abs/2411.00566
https://arxiv.org/abs/2408.15332
https://arxiv.org/abs/2304.09304
https://arxiv.org/abs/2311.03583
https://arxiv.org/abs/2502.05199
There are surely many more. Note that each paper finds some new mathematical objects which are significant in their domains. (E.g. in the PatternBoost paper we are able to refute a conjecture of Niali Graham in combinatorics which had been open for 30 years.) The work on AC conjecture and the work on ribbon graphs is equally or more remarkable.
I appreciate your point about the conceptual difference between "getting lucky" (with a neural network) and exhaustively showing no solution (with a SAT solver). I would say that the former is closer to my experience as a mathematician, which is why I'm drawn to it. If I find an argument I typically feel that luck played some role. No-go theorems do exist, but it is rarer for mathematicians to encounter a statement along the lines "it is impossible to prove Z using only tools X and Y".
I read the following article when it came out: “At Secret Math Meeting, Researchers Struggle to Outsmart AI”, Lyndie Chiou, Scientific American, June 6, 2025. and IT IS WILD and almost to the point that 1) I wonder if Scientific American is still a reliable source of scientific info, 2) I wonder if this is one of those heavily editorialized/edited click-bait articles that keep the lights on, and lastly 3) Pretty sure the net information I got from that article was negative ...
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.