Too much news
Publishers interrupt my vacation by signing deals to license their data for training
Disappearance!
Rather than enjoying an uninterrupted month of relaxation1 in the depths of the French countryside, I have been shocked out of my state of repose by a series of headlines that remind me that machines — and this is one way in which they are superior to humans — never sleep and never take vacations.2 First, this headline in the July 19 New York Times
suggested that the exponential explosion in generative AI was about to come crashing to a halt. “[M]any of the most important web sources used for training A.I. models have restricted the use of their data,” according to an MIT study quoted in the article, and this is because of “tensions with the owners of that data — many of whom have misgivings about being used as A.I. training fodder, or at least want to be paid for it.”
“We’re seeing a rapid decline in consent to use data across the web that will have ramifications not just for A.I. companies, but for researchers, academics and noncommercial entities,” said Shayne Longpre, the study’s lead author, in an interview.
The article continues:
As the backlash has grown, some publishers have set up paywalls or changed their terms of service to limit the use of their data for A.I. training.
Outrage!
But other publishers haven’t got the news:
That’s from an article in the Chronicle of Higher Education, which opens with these ominous words:
Two major academic publishers, Wiley and Taylor & Francis, recently announced partnerships that will give tech companies access to academic content and other data in order to train artificial-intelligence models, a move some academics see as just the latest way their work is being exploited.
“Microsoft,” the article continues, “paid Informa, the parent company of Taylor & Francis, an initial fee of $10 million to make use of its content ‘to help improve relevance and performance of AI systems…’” Have I ever published with Taylor & Francis? Yes! Does this mean Microsoft will be exploiting my work? “Taylor & Francis has not publicized the deal on its website,” according to the article, and I have been unable to determine whether Experimental Mathematics is part of the deal. However, a “spokesperson for the company … said content usage is ‘strictly controlled’ and rights holders will receive royalties as outlined in their contracts.”
Are my mathematical colleagues as outraged about what Justine Bateman called the largest theft in the [history of the] United States, period as Nathan Kalman-Lamb, an assistant professor of sociology at the University of New Brunswick? As he complained to the Chronicle, “This is a prototypical example of capitalism and how capitalism is predicated on exploitation…”
“The researcher who does the work of researching, writing, and publishing; the editors and peer reviewers, who are an integral part of the process, are performing a tremendous amount of highly skilled labor, and that labor is simply not compensated. The only ones receiving compensation are companies like Taylor & Francis and Routledge.”
If there is outrage among mathematicians it’s not apparent. Editors of Cambridge University Press series, including some mathematicians, have been informed about the latter’s recent “limited licensing arrangement with an LLM.” These moves aren’t driven by an ideological commitment. While some colleagues do seem to be under the delusion that giving Silicon Valley license to scrape and remix the results of their unpaid labor will somehow enhance the collective practice of mathematics — as opposed to making the obscenely wealthy even more so — the direct impetus for such agreements is the cold calculation that if publishers don’t sign them the LLMs will just grab their data as they have been doing up to now; so it’s best to sign and collect the licensing fees while the offer still stands.
Since this perception is directly contradicted by the July 19 New York Times article, one has to wonder why our colleagues haven’t considered joining forces with publishers in other areas to challenge ongoing copyright infringement. In the meantime, it’s reassuring to know that the Cambridge agreement only concerns books, not journals; and — crucially — that a Cambridge “author who has any concerns can opt-out.”3
Should we expect mathematics publishers like Springer-Verlag and Elsevier to follow the lead of Wiley and T&F?4 What about the London Mathematical Society, the American Mathematical Society, the European Mathematical Society? Will authors be promised the right to opt out of any future arrangement, whether or not their texts have been published under a “no commercial use” copyright? Will they be motivated to exercise this option? And would such a condition make the deal less appealing to LLMs?
Gibberish! Collapse!
That’s a headline from an article by Emily Wenger in a recent issue of Nature that accompanies a Nature article entitled AI models collapse when trained on recursively generated data. Meanwhile, the Florida-based hedge fund Elliott Management has warned investors that, according to an article in the Financial Times,
Many of AI’s supposed uses are “never going to be cost-efficient, are never going to actually work right, will take up too much energy, or will prove to be untrustworthy”,
Taken together, these two headlines ought at least temporarily to discourage speculation about AI creating a “state of affairs where mathematicians are basically left with nothing to do.”5 The first headline should be discouraging because, as I’ve mentioned repeatedly, specialists estimate that the existing corpus of research mathematics would have to be expanded by a factor of 100000 by AI-generated data in order to train an AI mathematician, and a diet of 99.999% AI-generated data definitely looks like “too much.” The second headline should be discouraging because, even if Elliott is mistaken, promoting this way of thinking in places like the Financial Times could lead investors to flee the technology, in which case no one will know whether or not it would have worked.
Recently Gary Marcus has written
An entire industry has been built - and will collapse - because many people haven’t yet gotten it.
Marcus is known as a skeptic of purely data-driven approaches to AI and as a proponent of neuro-symbolic AI, which combines the currently fashionable neural network models with the “good old-fashioned” methods of symbolic AI. And it was in this role that he was able to temper his predictions of collapse with a description of one of the few headlines to “excite” him:
Get out the gong!
All is not lost! Before July ended the Times published yet another headline:
Last week the DeepMind researchers got out the gong6 again to celebrate what Alex Davies, a lead of Google DeepMind’s mathematics initiative, described as a “massive breakthrough” in mathematical reasoning by an A.I. system.
The new system is named AlphaProof, to rhyme with AlphaGo, AlphaZero, AlphaFold, and more recently AlphaGeometry; and the gong was banged to celebrate the new system’s obtaining a silver medal in the 2024 International Mathematical Olympiad. Since many of my most redoubtable colleagues were IMO medalists in their teens, such an announcement, with or without sound effects, has to be acknowledged and scrutinized. Even Ernie Davis, who coauthored the book Rebooting AI with Marcus and who has been skeptical of previous claims by DeepMind, had mildly positive things to say about their latest effort on his Facebook page.
Davis and Marcus both recognize that this “massive breakthrough” has at least one “serious shortcoming and one that is not unfamiliar”:
[AlphaGeometry and AlphaProof] both rely on a kind of cheat on the input: trained human coders translate all the Olympiad's input sentences into mathematical form. Needless, we don’t really have an autonomous AI if we still require human coders in the loop.
Marcus doesn’t mention whether or not AlphaProof, like AlphaGeometry, “cheats by building heavy-handed hints… directly into the training data.” My challenge to AlphaGeometry — to rediscover the Euler characteristic — still stands after six months, time enough for hundreds of startups to arise and collapse; so I see no need to add a new challenge in view of the latest information.
Nonsense!
Intel, as I write this, has announced the reduction of its workforce by 15%, and the world’s stock markets are feeling what may be the first tremors of what history will remember as the Great AI Depression. Rather than contemplate the dire implications of this eventuality for the coming US Presidential election, it is comforting to learn that three publishers have each issued new English translations of Wittgenstein’s Tractatus Logico-Philosophicus, joining the two translations that have been on offer since before I read the book.
Here’s a fun thought-experiment: Routledge, the official publisher of the 1922 and 1961 translations, is one of the Taylor & Francis brands. So far, then, only the older translations are being offered up to the LLM’s voracious appetites. What lesson will a future AGI trained on Wittgenstein draw from reading the penultimate Proposition 6.54 of the Tractatus in three versions?
My propositions are elucidatory in this way: he who understands me finally recognizes them as senseless, when he has climbed out through them, on them, over them. (He must so to speak throw away the ladder, after he has climbed up on it.)
(from the 1922 Ogden translation)
My propositions serve as elucidations in the following way: anyone who understands me eventually recognizes them as nonsensical, when he has used them—as steps—to climb up beyond them. (He must, so to speak, throw away the ladder after he has climbed up it.)
(from the 1961 Pears/McGuinness translation)
Meine Sätze erläutern dadurch, dass sie der, welcher mich versteht, am Ende als unsinnig erkennt, wenn er durch sie—auf ihnen—über sie hinausgestiegen ist. (Er muss sozusagen die Leiter wegwerfen, nachdem er auf ihr hinaufgestiegen ist.)
Moore glosses the paradox inherent in this Proposition in his review of the new translations.
Wittgenstein… does take himself to be conveying inexpressible practical insights. These include insights into how to recognise the nonsense that he is using to convey these very insights for the sheer nonsense that it is. But they include more besides.7
I spent much time reading Wittgenstein, and not only the Tractatus, at an impressionable age. So, although I could not disagree more strongly with his Proposition 6.21 —
A proposition of mathematics does not express a thought.
— and the related claim in Proposition 6.1251 that “there can never be surprises in logic,” I once found the following sentence scribbled in one of my old notebooks:
A proof is a picture of an argument.
In its form this could hardly be more Wittgensteinian, although I’m pretty sure I didn’t realize it at the time — compare Proposition 4.06
Nur dadurch kann der Satz wahr oder falsch sein, indem er ein Bild [picture] der Wirklichkeit ist.
And my sentence qualifies as the kind of “sheer nonsense” to which Proposition 6.54 refers. But the “more besides” to which I was pointing is the implication that something about “an argument” cannot be captured in any formal system. This justifies including my scribble in the present newsletter, but it’s clear in retrospect that in writing it I was influenced by the perspective developed more extensively in the later Wittgenstein, and to which he hinted already in the final sentence of the Tractatus, which I probably don’t need to reproduce here.
Instead, since I have some extra space, I can include the notes of a talk I gave during my last semester at Brandeis, in 1995, when I was invited to explain Wiles’s recent proof of Fermat’s Last Theorem to a group of Brandeis scientists at the Volterra Center for Science Studies.
I mentioned in my abstract that it has been said in certain quarters that Wiles’ proof is a “splendid anachronism,” that in the future deductive proof in mathematics will be largely replaced by computer-assisted proofs and probabilistic arguments. This was the subject most notably of an article in Scientific American in October 1993,8 an article that upset most pure mathematicians, as you might imagine. One version of the argument is that, in the future, it will be possible to translate mathematical assertions into formulas that can be verified by computers in large numbers of cases, and that one can assign a probability value to the truthfulness of the assertion that increases as it is verified in more cases. On the other hand, this argument continues, if the assertion is decidable, its provability could be determined by analyzing the formula for logical truthfulness, and this could also be carried out by computers. Since the provability test would be much more expensive than the verification of truth to a high degree of probability, the latter approach to “proof” will eventually win out on economic grounds alone.
Already at the time I saw three problems with this vision of mathematics:
Confusion of two notions of proof. In theoretical computer science, a proof is used to verify that an algorithm performs a certain function, or obtains a solution after a certain number of steps, or something similar. For the person who wants to use the algorithm, knowing to a high degree of probability that it works as expected may be just as useful as having a formal proof, and since the proof and the probabilistic test both may involve considerable machine computation, an economic argument may suffice to choose one over the other. But a deductive proof in pure mathematics does more than just demonstrate the theorem (or less, depending on how you understand “truth”): it reveals something new about the concepts that inhabit the theorem. In other words, the proof is at least as important as the theorem itself.
This is related to a confusion about the basic unit of mathematics, which is not the theorem but the concept. An unmotivated proof, or a proof whose ideas are not clear, is not considered a satisfactory solution to a mathematical problem, and indeed, mathematics generally proceeds by illuminating or reinterpreting mysterious points in existing proofs (or arguments from physics). So the replacement of deductive proofs by probabilistic or mechanical proofs should be compared, not to the introduction of a new technology for producing shoes, but rather to the attempt to replace shoes by the sales receipts, or perhaps the cash profits, of the shoe factory. Returning to Wiles’ proof, the big surprise was his demonstration that existing techniques could be combined to show that the “modularity” of an elliptic curve F is already detected if you know [the value modulo 3 of the number of points of F modulo p for all primes p]. The concept illuminated was that of “modularity.” But the work is just beginning to clarify the mysterious points in Wiles’ proof.
The argument over deductive proofs has a predictive side and a prescriptive side, but in practice the lines tend to be blurred. One might predict that future mathematicians will spend less time reasoning and more time doing computer experiments, just as one might predict that our descendants will spend less time reading and more time playing video games (cf. hypertext). To which the correct answer is “Let’s wait and see.” But those who make such predictions usually add something like, “and the sooner, the better.” This prescription is really philosophical and not merely mathematical. Deductive proof is essentially a sustained form of reasoning in a context in which a consensus exists as to the identity of the objects being reasoned about, and the prescription claims that reasoning, as a form of organization of thought, yields no significant knowledge about the world. This is an old conundrum but it seems obvious to me that mathematics proves the contrary. But when the prescription is couched in politically-charged rhetoric about modernity vs. tradition then it can acquire predictive force insofar as it leads to shifts of power and resources from the reasoners to the experimenters. In the real world, this argument plays itself out at the level of granting agencies and their overseers.
Although I gave this talk in the chill of the second extended AI winter, readers of this newsletter will see that most of my current themes were already present. But I clearly didn’t anticipate how the distortions I was highlighting would be aggravated by the growth of Silicon Valley’s trillion dollar companies.
This was, however, written one month ago, so some of the news is already out of date.
Nor do they strike, or take bathroom breaks…21
Prospects for immediate developments along these lines may well be compromised, as you’ll read in the next section.
I am coeditor of a CUP book, but I haven’t been contacted, so I have to assume the LLMs don’t yet care about Shimura varieties.
Meanwhile, publishers are unhappy about a federal directive requiring “authors who use grant funding to produce research … to deposit their work into agency-designated public-access repositories as soon as it’s published.”
…publishers will first have to decide if they want to continue publishing research funded by the federal government, which finances nearly 55 percent of academic research and development, according to the National Science Board. “[For] most publishers, the answer would be absolutely not, because they’d have nothing to publish”…
But that’s quite another story.
Tim Gowers, quoted in the New York Times.
“At the headquarters of Google DeepMind… researchers have a longstanding ritual for announcing momentous results: They bang a big ceremonial gong.” Siobhan Roberts, New York Times, July 25, 2024.
A. W. Moore, “A Tove on the Table: Versions of Wittgenstein,” London Review of Books, Vol. 46 No. 15 · 1 August 2024. Chapter 7 of Mathematics without Apologies ends with a quotation of Proposition 6.54, in the 1961 translation, but points out its resonance with a Buddhist metaphor:
Buddha is said to have remarked that Sunyata is to be treated like a ladder for mounting up to the roof of prajña [wisdom, understanding]. Once the roof is reached, the ladder should be discarded.
The ladder image appears frequently in internet accounts of Buddhism (see this one, for example). But it now occurs to me that the author who added the sentence about discarding the ladder may well have had the Tractatus in mind.
Rawls's splendid egalitarian max-min rule (enunciated in "A Theory of Justice" and beyond) has a tantalizing spectral/variational interpretation.
As a mathematican I'm not outraged in the slightest. If anything, I'm outraged that they need to pay anything to get that data. In an ideal world every published piece of mathematics would be available free for everyone to learn from it -- human or machine.
And calling it theft is utterly ridiculous! From a fundamental rights standpoint or some kind of background intuition about property it is intellectual property protections that are themselves violations of people's property interests (certainly the Lockean conception). Those are laws which prevent me from using my property in certain ways because they mimic what you did with yourself.
And yes, IP makes sense insofar as it is necessary to incentivize innovation. But beyond that it is theft from our common cultural heritage. And there is no plausible argument that preventing people from training on mathematics papers is a necessary incentive for it to be produced.
Ultimately, Rawls has a very good point that every inequality in wealth is unfair and are only justified by the degree to which they incentivize innovation. And trying to demand that companies get permission to train AI on academic papers of all things is -- like the paid gatekeeping that for profit scientific publishers put up in the first place -- definitely isn't necessary to incentivize innovation.
Besides, it really shouldn't matter that it's a machine learning from your work or a person. Especially for Mathematicians who are paid by governments and universities (largely gov supported) to release work for the benefit of society at large.