Two quick updates on AI licensing and copyright
Cambridge University Press announcement, and UK plans
Cambridge University Press explains its approach to generative AI licensing
Cambridge University Press has created a page, illustrated with brightly colored but completely pointless images, under the implausible title “Protecting authors and research in the age of AI,” that explains its motivations and priorities. The motivations are more or less those I published here a few weeks ago:
We believe it is important to engage in generative AI licensing partnerships because it:
Enables the creation of legal frameworks and formal licensing arrangements to govern use and better protect authors’ rights.
Has the potential to increase the impact and discoverability of research.
Makes high-quality information available to generative AI tools so that it can help improve accuracy and support users more effectively.
I am entirely unconvinced by the second of the three stated motivations ("increase the impact and discoverability of research"), having been informed by more than one generative AI that I have written papers on practically every subject imaginable. But I have to grant that CUP is making an effort to establish some kind of principle where the commercial publishers abandoned principle long ago.
It surprises me to read on the CUP page that
we have been pleased to see that many authors, after considering the current context and our approach, have decided to opt in. Indeed, by the end of January 2025, a majority of the 17,000+ authors contacted have opted in to licensing agreements, and only a very small proportion have actively declined to participate.
What about the current context has convinced the majority of CUP authors? Those who do participate can't be looking forward to a huge payoff… Maybe they are selflessly signing the contract in the hope that the new revenue stream for CUP will protect CUP's operations through the impending upheavals and thus help to preserve the scholarly community as a whole?
Out of curiosity I asked Google about CUP's finances and received this answer:
Revenues at Cambridge University Press & Assessment (CUPA) for the year to end July 2023 hit £1bn for the first time, the group has revealed in its annual report, up from £868m the prior year. Profit for the year stood at £144m, up nearly 32% from £106m the prior year.
I have appreciated the opportunity to work with CUP, as editor, then editor-in-chief, of the Journal of the Institute of Mathematics of Jussieu, and as co-editor of the book mentioned in the earlier post. I am deeply attached to the survival of non-commercial university publishers in general, and of CUP in particular; and whether or not I am convinced by their arguments I have to be grateful for their transparency, in sharp contrast to the attitude of some of the commercial publishers reported here.
Graham Lovelace on “Rampant copyright denialism”
Anyone looking for hard-hitting prose about the ongoing struggle over copyright and generative AI need look no further than Graham Lovelace’s latest Substack post pugnaciously entitled
COMMENT: Enough of the lies, the UK government needs to abandon its deeply flawed approach to AI and copyright
Here’s the link. The post goes through Lovelace’s answers to questions in a UK government consultation on modifying copyright laws to facilitate generative AI. All of Lovelace’s answers are worth reading. I highlight a few that are particularly relevant to the future of mathematics:
Question 40. Do you agree that generative AI outputs should be labelled as AI generated? If so, what is a proportionate approach, and is regulation required?
Yes, and this would benefit the AI developers too so they can discriminate between AI slop now polluting the web and human-made content. Content ‘transparency labels’ such as those offered by Credtent already exist showing the degree to which content is entirely human-made, AI-assisted or fully AI-created. These need to become mandatory, so yes, regulation is required.
And, once again for those who imagine that artificial mathematicians can be trained by synthetically expanding the mathematical corpus by a factor of 100000:1
Question 46. What are the implications of the use of synthetic data to train AI models and how could this develop over time, and how should the government respond?
Research suggests that repeatedly training generative models on the synthetic outputs of generative models leads to atrophy and ultimate model collapse. The extent to which this is both a business issue and an area for regulatory intervention is unclear. AI developers should certainly be liable for inaccurate and biased outputs, and outputs that cause harm.
The link is to an article published last July in Nature with the self-explanatory title
AI models fed AI-generated data quickly spew nonsense
and subtitle
Researchers gave successive versions of a large language model information produced by previous generations of the AI — and observed rapid collapse.
From an earlier post:
Conjecture 3: The current rate-limiting step for AI mathematics is: producing orders of magnitude more lines of (high-quality) formalized mathematics.
(Alex Kontorovich at the June 2023 NASEM workshop.)
An earlier speaker at the workshop estimated that the existing corpus of formalized mathematics would have to be increased, in fact, by five orders of magnitude. The proposed solution is to use AI to generate this formalized mathematics.
So human mathematics would make up 1/100000 of the training set.
Thank you for the mention Michael!