Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
I have not yet read compltely the article by philosophers but I want to stress two points
-- The idea of referring tio complexity arguments is already in Manin's article
'' Komogorov's complexity and .... '' see ARXiv ...
-- I doubt mixture of logic an complexity arguments could answer what was called byMichael - thanks to him )'' Kantor 's problem '' - see an older blog.
I would like to be convinced by this argument but am not. We mathematicians prove in a world were proving is hard, even — as the authors point out — in decidable theories like Pressburger arithmetic. However, on a daily basis we (try to) find some passage through formally undecidable terrain. A central question seems to be: is there some formal reason why we humans can navigate this terrain and computers cannot?
The authors seem to suggest that progress has so far either been on a) situations where brute force is possible or b) situations which have better formal properties (e.g. in their discussion of AlphaGeometry). AlphaProof solved questions in number theory at IMO level. I agree this isn’t exactly what we mathematicians do, but can one convincingly say that IMO number theory problems are qualitatively different to what Michael and I do for our daily bread?
I agree that much of the press coverage is full of hype. I agree that so far no convincing example of AI alone doing research level maths has occurred. I would like there to be a formal reason why this isn’t possible, but I just don’t see it. To quote someone mentioned often on this blog: “I see no obstruction.”
Thanks, Geordie, for your comment which touches on many interesting points. Let me clarify, however, that our paper does not aim to answer the question: “Is there a formal reason why humans can navigate this terrain while computers cannot?” Rather, our goal is more modest. We ask whether formal methods can help us to analyze what AI systems are capable of in mathematics. In particular, we are interested in identifying if there are any common features to current (and foreseeable) applications of automated systems to resolve recognized open questions. This inquiry leads to broader questions, not limited to AI but relevant to mathematical practice as a whole:
1. What makes an open problem “significant” to mathematicians? Is this also a question which can be analyzed, or at least, approached, by formal means?
2. What kinds of proofs do we expect from automated systems? Should they produce informal proofs (like those found in human-written articles or textbooks), or formal ones (given that AIs are a kind of formal system themselves)?
My co-author spoke to 1. in his reply. But let me also say that another interesting question -- related to the issue of external realignment we mentioned in the post -- is whether we will reach a point where we begin to defer to automated systems in deciding what conjectures are worth working on? Or is this something which will always require human intervention?
In regard to 2., we focused in the paper on cases in which automated systems both successfully resolved a recognized open question and delivered formal (and formally verifiable) proofs. But in the case of more recent results for math competition problems (to which we alluded briefly), do we expect the same the standards of proof? For instance, if only a numerical answer is required -- as in some competition questions and current benchmarks -- are we simply asking for an informal explanation of why an answer is correct? If so, will this change the standards of justification which have been traditionally employed in mathematics? (This is a question which will naturally interest philosophers.) How do ongoing attempts at auto-formalization bear on this? And might these themselves give rise to new obstructions, formal or otherwise?
The possibility of an obstruction to artificial intelligence doing research-level mathematics on its own is indeed a central question we hoped to bring into better focus. But in order address this systematically, we still think it is important to ask whether "doing research-level maths" is being understood as resolving individual open questions or as the general capacity to resolve an open-ended class of conjectures which arise in the ongoing practice of mathematics.
If it's the former, we agree there is "no obstruction". For instance, an automated system might simply 'get lucky' while performing something akin to a brute force search (in fact this is what we think has largely happened to date). So if we really want to discuss the possibility of *formal obstructions* - as you seem to suggest - then it seems that we need to understand "doing research-level maths" in the latter sense as a general problem.
But if we do this, we must then address the further issue of whether we are going to understand this problem simply as that of deciding whether an arbitrary statement is derivable from axioms - i.e. deciding membership in Prov(\Gamma)- or as a restriction of this problem to "research-level questions we care about in practice (or would care about if they happened to be formulated)" - i.e. what we called *significant questions*.
This time, in the first case, there *are* formal obstructions in the form of traditional limitative results. It thus seems to us that the challenge you're pointing to only becomes interesting once we *both* understand "doing research-level maths" as a general problem and also consider the restriction to significant open questions. In this case, there is also precedent for thinking that the subproblem might be easier - e.g. in the way that 2SAT is easier than SAT or deciding if a Mersenne number is prime is easier than an arbitrary one. In fact, this is what Gowers seems to predict in his Manifesto linked here:
In particular, he simultaneously acknowledges the general difficulty of deciding Prov(\Gamma) - which he calls "boring mathematics" (B) - while pointing towards a strategy for developing systems capable of resolving significant open questions - which he describes as "the class of statements ... we find interesting and approachable'' (M \subset B). And it is also presumably part of what the Frontier Math benchmarks are intended to assess (albeit currently with a class of 300 questions, rather than a truly open-ended one).
But whether the relevant subproblem is actually easier remains an untested hypothesis. In the face of this, we wish to call attention to the possibility that it will remain difficult for automated systems. Or in concrete terms: either current Millennium-style problems, or others that human mathematicians will continue to generate on their own accord, will remain unresolved by automated systems. Of course even if this turns out to be true, it would not necessarily provide evidence that "we humans can navigate this terrain and computers cannot". Rather, it might attest to the fact that even when the boundaries of significance are defined relative to our own (human) standards, part of the relevant terrain is *inherently unnavigable* by technology we can construct and apply in practice (albeit for reasons for which we currently lack a formal explanation). Or at least it seems to us that this is amongst the genuinely interesting questions raised by current developments.
I read the following article when it came out: “At Secret Math Meeting, Researchers Struggle to Outsmart AI”, Lyndie Chiou, Scientific American, June 6, 2025. and IT IS WILD and almost to the point that 1) I wonder if Scientific American is still a reliable source of scientific info, 2) I wonder if this is one of those heavily editorialized/edited click-bait articles that keep the lights on, and lastly 3) Pretty sure the net information I got from that article was negative ...
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
I have not yet read compltely the article by philosophers but I want to stress two points
-- The idea of referring tio complexity arguments is already in Manin's article
'' Komogorov's complexity and .... '' see ARXiv ...
-- I doubt mixture of logic an complexity arguments could answer what was called byMichael - thanks to him )'' Kantor 's problem '' - see an older blog.
Best wishes
Jean-Michel KANTOR
I would like to be convinced by this argument but am not. We mathematicians prove in a world were proving is hard, even — as the authors point out — in decidable theories like Pressburger arithmetic. However, on a daily basis we (try to) find some passage through formally undecidable terrain. A central question seems to be: is there some formal reason why we humans can navigate this terrain and computers cannot?
The authors seem to suggest that progress has so far either been on a) situations where brute force is possible or b) situations which have better formal properties (e.g. in their discussion of AlphaGeometry). AlphaProof solved questions in number theory at IMO level. I agree this isn’t exactly what we mathematicians do, but can one convincingly say that IMO number theory problems are qualitatively different to what Michael and I do for our daily bread?
I agree that much of the press coverage is full of hype. I agree that so far no convincing example of AI alone doing research level maths has occurred. I would like there to be a formal reason why this isn’t possible, but I just don’t see it. To quote someone mentioned often on this blog: “I see no obstruction.”
Thanks, Geordie, for your comment which touches on many interesting points. Let me clarify, however, that our paper does not aim to answer the question: “Is there a formal reason why humans can navigate this terrain while computers cannot?” Rather, our goal is more modest. We ask whether formal methods can help us to analyze what AI systems are capable of in mathematics. In particular, we are interested in identifying if there are any common features to current (and foreseeable) applications of automated systems to resolve recognized open questions. This inquiry leads to broader questions, not limited to AI but relevant to mathematical practice as a whole:
1. What makes an open problem “significant” to mathematicians? Is this also a question which can be analyzed, or at least, approached, by formal means?
2. What kinds of proofs do we expect from automated systems? Should they produce informal proofs (like those found in human-written articles or textbooks), or formal ones (given that AIs are a kind of formal system themselves)?
My co-author spoke to 1. in his reply. But let me also say that another interesting question -- related to the issue of external realignment we mentioned in the post -- is whether we will reach a point where we begin to defer to automated systems in deciding what conjectures are worth working on? Or is this something which will always require human intervention?
In regard to 2., we focused in the paper on cases in which automated systems both successfully resolved a recognized open question and delivered formal (and formally verifiable) proofs. But in the case of more recent results for math competition problems (to which we alluded briefly), do we expect the same the standards of proof? For instance, if only a numerical answer is required -- as in some competition questions and current benchmarks -- are we simply asking for an informal explanation of why an answer is correct? If so, will this change the standards of justification which have been traditionally employed in mathematics? (This is a question which will naturally interest philosophers.) How do ongoing attempts at auto-formalization bear on this? And might these themselves give rise to new obstructions, formal or otherwise?
The possibility of an obstruction to artificial intelligence doing research-level mathematics on its own is indeed a central question we hoped to bring into better focus. But in order address this systematically, we still think it is important to ask whether "doing research-level maths" is being understood as resolving individual open questions or as the general capacity to resolve an open-ended class of conjectures which arise in the ongoing practice of mathematics.
If it's the former, we agree there is "no obstruction". For instance, an automated system might simply 'get lucky' while performing something akin to a brute force search (in fact this is what we think has largely happened to date). So if we really want to discuss the possibility of *formal obstructions* - as you seem to suggest - then it seems that we need to understand "doing research-level maths" in the latter sense as a general problem.
But if we do this, we must then address the further issue of whether we are going to understand this problem simply as that of deciding whether an arbitrary statement is derivable from axioms - i.e. deciding membership in Prov(\Gamma)- or as a restriction of this problem to "research-level questions we care about in practice (or would care about if they happened to be formulated)" - i.e. what we called *significant questions*.
This time, in the first case, there *are* formal obstructions in the form of traditional limitative results. It thus seems to us that the challenge you're pointing to only becomes interesting once we *both* understand "doing research-level maths" as a general problem and also consider the restriction to significant open questions. In this case, there is also precedent for thinking that the subproblem might be easier - e.g. in the way that 2SAT is easier than SAT or deciding if a Mersenne number is prime is easier than an arbitrary one. In fact, this is what Gowers seems to predict in his Manifesto linked here:
https://gowers.wordpress.com/2022/04/28/announcing-an-automatic-theorem-proving-project/
In particular, he simultaneously acknowledges the general difficulty of deciding Prov(\Gamma) - which he calls "boring mathematics" (B) - while pointing towards a strategy for developing systems capable of resolving significant open questions - which he describes as "the class of statements ... we find interesting and approachable'' (M \subset B). And it is also presumably part of what the Frontier Math benchmarks are intended to assess (albeit currently with a class of 300 questions, rather than a truly open-ended one).
But whether the relevant subproblem is actually easier remains an untested hypothesis. In the face of this, we wish to call attention to the possibility that it will remain difficult for automated systems. Or in concrete terms: either current Millennium-style problems, or others that human mathematicians will continue to generate on their own accord, will remain unresolved by automated systems. Of course even if this turns out to be true, it would not necessarily provide evidence that "we humans can navigate this terrain and computers cannot". Rather, it might attest to the fact that even when the boundaries of significance are defined relative to our own (human) standards, part of the relevant terrain is *inherently unnavigable* by technology we can construct and apply in practice (albeit for reasons for which we currently lack a formal explanation). Or at least it seems to us that this is amongst the genuinely interesting questions raised by current developments.
I read the following article when it came out: “At Secret Math Meeting, Researchers Struggle to Outsmart AI”, Lyndie Chiou, Scientific American, June 6, 2025. and IT IS WILD and almost to the point that 1) I wonder if Scientific American is still a reliable source of scientific info, 2) I wonder if this is one of those heavily editorialized/edited click-bait articles that keep the lights on, and lastly 3) Pretty sure the net information I got from that article was negative ...
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.
Scientific American has been unreliable for some time now. It became clear a few years ago that the editorial board had been captured by certain fashionable ideological factions, owing to the willfully incomplete and slanted coverage of topics bearing on certain lightning-rod cultural issues.