16 Comments

It's always an exciting sign when a writer references Quine's "gavagai" story with its "undetached rabbit parts." The issue of whether the world can be "cut at its joints" was also central to medieval philosophical debates about the nature of language and representation. Good to see we are still debating this 800 years later. I hope you'll write more about this question vis-a-vis the 'naturalness' of the object of mathematics.

Expand full comment

1. Google, for a long time yet, IS ABOVE the law. Nobody may flatout copy entire books against all copyright provisions. Googlebooks does just that, nobody cares (?), nobody acts (!). That is the main problem here.

Same now with arXiv etc. - which is public though anyway, so up for graps (legally!) by anybody incl. Google.There IS though a license attached to arXiv material and it requires citing and/or non-modifying etc. Will they bother?

2. Garbage in - garbage out. AI up to now (in particular chatGPT) mixes input snippets to output ouevres. Nice enough, but always within the bounds of the received input material. Nothing genuinely new. Difficult to tell though, where it comes from, whether / that it is just copy&paste. Same might happen to math articles, where already today some 99% of us do not enter into the details of stuff too distant of our own tiny circle of competence. In the future, we will see Sokal-style "Fashionable Nonsense" also in math.

Expand full comment

I couldn't disagree more with these objections to the use of mathematics to train models. Ultimately, we all make use of the intellectual work of others and use ideas (both cited and uncited because they can't be traced to a specific source) when we do our work.

Copyright doesn't reflect any natural right to stop others from using your work. It's merely a cludge to ensure the incentive exists to make valuable work. There is no reason to believe harvesting arxiv undermines those incentives so I don't see a moral problem. Indeed, far from Google behaving like Elsiever it's those who wish to keep this precious info locked down using copyright who are more like Elsiever in this situation.

Regarding the legality, google is probably in a pretty decent place. Copyright governs only the copying of data and doesn't prevent you from being inspired or getting an idea from it so the ultimate ouput of the ML process is probably no more a violation than it would be to read the paper yourself and have an idea. The initial training may be copying but it's probably covered under fair use just like creating an index for web search is.

Expand full comment

Plus que les arguments philosophiques et politiques ( comme ceux de ton texte annexe '' mathematics and the undead '' je pense que l'absence d'autoformalisation des définitions et l'identification ridiclue de corrélations et intuition jettent un doute sérieux sur l'avenir de cette direction de '' prise en main '' des mathématiques par Google .

Expand full comment