agnosticmantis 5 days ago | next [–]
“… we have a verbal agreement that these materials will not be used in model training”
Ha ha ha. Even written agreements are routinely violated as long as the potential upside > downside, and all you have is verbal agreement? And you didn’t disclose this?
At the time o3 was released I wrote “this is so impressive that it brings out the pessimist in me”[0], thinking perhaps they were routing API calls to human workers.
Now we see in reality I should’ve been more cynical, as they had access to the benchmark data but verbally agreed (wink wink) not to train on it.
I met so many interesting people and had so many interesting conversations at the Joint Mathematics Meetings earlier this month in Seattle that I am at risk of forgetting most of what I learned before I have a chance to write it here. Fortunately the widely-reported “manipulative and disgraceful” scandal1 over OpenAI’s secret financing was there to remind me of two of my conversations. The one with FrontierMath lead author Elliot Glazer was mentioned briefly in my last post, along with a promise to design my own “benchmark” in the near future. More immediately relevant was a conversation with the multi-talented Carina Hong, who in addition to pursuing simultaneously a PhD in mathematics and a law degree at Stanford found time to publicize the scandal on the platform formerly known as Twitter:
Six mathematicians who significantly contributed to the FrontierMath benchmark confirmed this is true - that they are unaware that OpenAI will have exclusive access to this benchmark (and others won’t). Most express they are not sure they would have contributed had they known.
What’s worse,
1. OAI binds Epoch to an NDA until eve of o3 performance claim, preventing Epoch to disclose OAI is the donor and that OAI has exclusive data access.
2. Mathematicians then sign NDA on the problem & solution they create. They were led to believe it is to prevent data contamination.
The FrontierMath article on arXiv includes a section entitled “Interviews with mathematicians,” in which three Fields medalists — Terry Tao, Tim Gowers, and Richard Borcherds — and International Mathematics Olympiad coach Evan Chen vouch for the difficulty of the benchmark questions and speculate on “the timeline for AI progress on FrontierMath-level problems.” Gowers, at least, has indicated what he thinks of the scandal by reposting a tweet by Mikhail Samin, which I reproduce verbatim:
Remember o3’s 25% performance on the FrontierMath benchmark?
It turns out that OpenAI funded FrontierMath and has had access to most of the dataset.
Mathematicians who’ve created the problems and solutions for the benchmark were not told OpenAI funded the work and will have access.
That is:
- we don’t know if OpenAI trained o3 on the benchmark, and it’s unclear if their results can be trusted
- mathematicians, some of whom distrust OpenAI and would not want to contribute to general AI capabilities due to existential risk concerns, were misled: most didn’t suspect a frontier AI company funded it.
From Epoch AI: “Our contract specifically prevented us from disclosing information about the funding source and the fact that OpenAI has data access to much but not all of the dataset.”
There was a “verbal agreement” with OpenAI—as if anyone trusts OpenAI’s word at this point: “We acknowledge that OpenAI does have access to a large fraction of FrontierMath problems and solutions, with the exception of a unseen-by-OpenAI hold-out set that enables us to independently verify model capabilities. However, we have a verbal agreement that these materials will not be used in model training.”
On some boards OpenAI’s reputation has taken a hit as a result of this scandal; the agnosticmantis comment above is typical. Is this merely schadenfreude on the part of techies who remain upset about OpenAI’s failure to live up to the “moral high ground” promised in a widely quoted 2018 message from (cofounder) Greg Brockman to Elon Musk?
Our biggest tool is the moral high ground. To retain this, we must:
Try our best to remain a non-profit. AI is going to shake up the fabric of society, and our fiduciary duty should be to humanity.…
I don’t have time to explore this question, but interested readers can easily find enough information to help them draw their own conclusions about why it might be in OpenAI’s interest to cheat on this and other benchmarks.
Several days after the scandal broke, and no doubt independently, Sam Altman announced that
"Watching @potus more carefully recently has really changed my perspective on him (i wish i had done more of my own thinking and definitely fell in the npc trap). i'm not going to agree with him on everything, but i think he will be incredible for the country in many ways!"
This story, taken together with the images of the world’s three richest human beings at this week’s presidential inauguration, merely confirms what I said in my JMM presentation:
It’s almost as if people like Bezos, Musk, Thiel, Altman, and the rest are competing to show whose evil is least banal... (or most banal?)
In more innocent times — just over one month ago — François Chollet2 could celebrate an OpenAI “breakthrough of 75.7% on the Semi-Private Evaluation set at our stated public leaderboard $10k compute limit,” a feat which the website CIO interpreted as “having reached AGI”; and the news that o3 scored 25% on FrontierMath inspired Kevin Buzzard to muse on the topic of “Can AI do maths yet?”
But now Altman’s attention is taken up by his weird love/hate triangle with Trump and Musk and has left it up to FrontierMath to deal with the scandal:
In conversation, Carina Hong tried to draw positive lessons from the debacle.
There needs to be a better method of collaboration between AI labs and mathematicians going forward. There should be more transparency and disclosure of industry participation and funding sources and most importantly because some mathematicians are unaware of how ML research is carried out, they may not be able to ask the right questions.
Example: Even though OpenAI may agree not to use the data as training data, they can use it to generate synthetic data that can then be used as training data.
The first link that comes up if you type “Frontier Math scandal” into a search engine is likely to be the January 21 article in Fortune. That’s where I found the “manipulative and disgraceful” title; the body of the article is behind a paywall.
Creator of the “Abstract and Reasoning Corpus for Artificial General Intelligence" (ARC-AGI) benchmark to measure the efficiency of AI skill-acquisition on unknown tasks.” He was apparently at the JMM as well, though I didn’t see him.
I just learned that Epoch AI, the institute that created FrontierMath, had released a statement "clarifying" its relation with OpenAI, two days before I published this post.
https://epoch.ai/blog/openai-and-frontiermath
After reading it I am not inspired to change anything I wrote.
I also just found an article by Amanda Zhang on the scandal that was published on January 19.
https://www.ctol.digital/news/openai-hidden-involvement-frontiermath-ai-transparency/
For some reason I had not been able to find it all last week. It's the best published account I've seen so far. (I can't vouch for its accuracy, because I have no independent source of information, but it's consistent with the other reports I've read.)
Seen on reddit, to be saved for future reference:
"Qyeuebs
•
1h ago
• Edited 1h ago
Profile Badge for the Achievement Top 1% Poster Top 1% Poster
To those mathematicians who are upset about contributing to the dataset without being informed about the exclusive content-sharing deal with OpenAI: this is the kind of thing you should expect when dealing with a Silicon Valley company! If you don't like it, then don't give any of these companies your help!"
Apologies to those on this same reddit thread
https://www.reddit.com/r/math/comments/1iadcqw/the_frontiermath_scandal/
who complain that the above post is "unreadable." I was informed about this story a week ago but my "day job" at the beginning of the semester occupies more than 12 hours/day, leaving no time to edit my own report on the scandal. It seemed to me important to share the information with mathematicians, given that the FrontierMath derives its respectability from the association of several well-known mathematicians.