News

Move over, mathematicians, here comes AlphaProof

Share on facebook
Share on twitter
Share on linkedin
Share on pinterest
Share on telegram
Share on email
Share on reddit
Share on whatsapp
Share on telegram


At the headquarters of Google DeepMind, an artificial intelligence laboratory in London, researchers have a long-standing ritual for announcing important results: they ring a large ceremonial gong.

In 2016, the gong sounded for AlphaGo, an AI system that excelled in the game Go. In 2017, the gong rang when AlphaZero conquered chess. On each occasion, the algorithm defeated human world champions.

Last week, DeepMind researchers struck the gong again to celebrate what Alex Davies, leader of the Google DeepMind mathematics initiative, described as a “major advance” in mathematical reasoning by an AI system. Two Google DeepMind models tried their luck with the problem set at the 2024 International Mathematical Olympiad, or IMO, held July 11-22, about 100 miles west of London at the University of Bath. The event is considered the premier mathematics competition for “the world’s brightest mathematicians,” according to a promotional post on social media.

Sign up for The Morning newsletter from the New York Times

The human problem solvers – 609 high school students from 108 countries – won 58 gold, 123 silver and 145 bronze medals. The AI ​​performed at silver medalist level, solving 4 out of 6 problems for a total of 28 points. It was the first time that AI achieved a medal-worthy performance on problems at an Olympics.

“It’s not perfect, we haven’t solved everything,” said Pushmeet Kohli, vice president of research at Google DeepMind, in an interview. “We want to be perfect.”

However, Kohli described the result as a “phase transition” – a transformative change – “in the use of AI in mathematics and in the ability of AI systems to do mathematics”.

The lab asked two independent experts to evaluate the AI’s performance: Timothy Gowers, a mathematician at the University of Cambridge in England and Fields Medalist, who has been interested in math-AI interaction for 25 years; and Joseph Myers, a software developer at Cambridge. Both won IMO gold in their time. Myers was chair of this year’s problem selection committee and in previous Olympics served as coordinator, judging humane solutions. “I strived to evaluate AI attempts in a manner consistent with how human attempts were judged this year,” he said.

Gowers added via email: “I was definitely impressed.” The lab had discussed its Olympic ambitions with him a few weeks earlier, so “my expectations were quite high,” he said. “But the program caught up with them and, in one or two cases, significantly surpassed them.” The program found the “magic keys” that solved the problems, he said.

Hitting the gong

After months of rigorous training, students took two tests, three problems per day – in algebra, combinatorics, geometry and number theory.

The AI ​​​​counterpart advanced almost in tandem in the laboratory in London. (The students didn’t know that Google DeepMind was competing, in part because the researchers didn’t want to steal the spotlight.) The researchers moved the gong into the room where they were gathered to watch the system work. “Every time the system solved a problem, we rang the gong to celebrate,” said David Silver, a research scientist.

Haojia Shi, a student from China, came first and was the only competitor to get a perfect score – 42 points in six problems; each problem is worth seven points for a complete solution. The North American team won first place with 192 points; China came in second with 190.

Google’s system earned 28 points for completely solving four problems – two in algebra, one in geometry and one in number theory. (He failed two combinatorics problems.) The system had unlimited time; for some issues, it took up to three days. Students were given only 4.5 hours per exam.

For the Google DeepMind team, speed is secondary to overall success, as “it’s just a matter of how much computing power you’re prepared to put into these things,” Silver said.

“The fact that we have reached this threshold, where it is even possible to solve these problems, is what represents a radical change in the history of mathematics,” he added. “And I hope it’s not just a radical change IMO, but also represents the point where we move from computers being able to only prove very, very simple things to computers being able to prove things that humans can’t.”

Algorithmic ingredients

Applying AI to mathematics has been part of DeepMind’s mission for several years, often in collaboration with world-class research mathematicians.

“Math requires this interesting combination of abstract, precise and creative reasoning,” Davies said. In part, he noted, this repertoire of capabilities is what makes mathematics a good litmus test for the ultimate goal: achieving so-called artificial general intelligence, or AGI, a system with capabilities ranging from emergent to competent, virtuous, and superhuman. humans. Companies like OpenAI, Meta AI, and xAI are tracking similar goals.

The math problems from the Olympiads began to be considered a reference.

In January, a Google DeepMind system called AlphaGeometry solved a sample of geometry problems from the Olympics at nearly the level of a human gold medalist. “AlphaGeometry 2 has already surpassed IMO gold medalists in problem solving,” said Thang Luong, the principal investigator, in an email.

Building on this momentum, Google DeepMind intensified its multidisciplinary effort at the Olympics, with two teams: one led by Thomas Hubert, a research engineer in London, and another led by Luong and Quoc Le in Mountain View, each with about 20 researchers. For his “superhuman thinking team,” Luong said he recruited a dozen IMO medalists — “by far the largest concentration of IMO medalists at Google!”

The lab strike at this year’s Olympics deployed the improved version of AlphaGeometry. Not surprisingly, the model performed very well on the geometry problem, solving it in 19 seconds.

Hubert’s team developed a new, comparable but more generalized model. Called AlphaProof, it is designed to address a wide range of mathematical subjects. In total, AlphaGeometry and AlphaProof made use of several different AI technologies.

One approach was an informal reasoning system, expressed in natural language. This system took advantage of Gemini, Google’s great language model. He used the English corpus of published problems and proofs and the like as training data.

The informal system is excellent at identifying patterns and suggesting what comes next; is creative and talks about ideas in an understandable way. Of course, great language models tend to invent things – which may (or may not) be useful for poetry and definitely not for mathematics. But in this context, the LLM appears to have demonstrated restraint; was not immune to hallucination, but the frequency was reduced.

Another approach was a formal reasoning system, based on logic and expressed in code. He used a theorem proving and proof assistant software called Lean, which ensures that if the system says a proof is correct, then it is actually correct. “We can check exactly whether the proof is correct or not,” Hubert said. “Every step is guaranteed to be logically correct.”

Another crucial component was a reinforcement learning algorithm in the AlphaGo and AlphaZero lineage. This type of AI learns on its own and can scale indefinitely, said Silver, who is vice president of reinforcement learning at Google DeepMind. Because the algorithm doesn’t require a human teacher, it can “learn and keep learning and keep learning until it can finally solve the hardest problems humans can solve,” he said. “And then maybe one day it will go beyond that.”

Hubert added: “The system can rediscover knowledge for itself.” That’s what happened with AlphaZero: It started with zero knowledge, Hubert said, “and just by playing and seeing who wins and who loses, I could rediscover all the knowledge of chess. It took us less than a day to rediscover all the chess knowledge and about a week to rediscover all the Go knowledge. Then we thought, let’s apply this to math.”

Gowers doesn’t worry – much – about the long-term consequences. “It is possible to imagine a situation in which mathematicians are basically left with nothing to do,” he said. “That would be the case if computers became better and much faster at everything mathematicians do today.”

“There still appears to be a long way to go before computers are capable of research-level math,” he added. “It’s a pretty safe bet that if Google DeepMind can solve at least a few hard problems IMO, then a useful research tool can’t be that far away.”

A truly suitable tool could make mathematics accessible to more people, speed up the research process, and push mathematicians outside the box. Eventually, you might even come up with new ideas that resonate.

c.2024 The New York Times Company



Source link

Support fearless, independent journalism

We are not owned by a billionaire or shareholders – our readers support us. Donate any amount over $2. BNC Global Media Group is a global news organization that delivers fearless investigative journalism to discerning readers like you! Help us to continue publishing daily.

Support us just once

We accept support of any size, at any time – you name it for $2 or more.

Related

More

‘He just refused to move’

August 11, 2024
Two friends were quite surprised when one of their dogs spotted an endangered seabird while walking on a UK beach, leading to a dramatic rescue. As detailed by
1 2 3 9,595

Don't Miss