Tech

Scientists develop new algorithm to detect AI ‘hallucinations’

Share on facebook
Share on twitter
Share on linkedin
Share on pinterest
Share on telegram
Share on email
Share on reddit
Share on whatsapp
Share on telegram


AA persistent problem with current generative artificial intelligence (AI) tools like ChatGPT is that they often reliably assert false information. Computer scientists call this behavior “hallucination,” and it is a fundamental barrier to the usefulness of AI.

The hallucinations led to some embarrassing public slip-ups. In February, AirCanada was forced by a court to honor a discount that its customer support chatbot had mistakenly offered to a passenger. In May, Google was forced to make changes to its new “AI Overviews” search feature after the bot told some users it was safe to eat rocks. And last June, two lawyers were fined $5,000 by a US judge after one of them admitted to using ChatGPT to help draft a court case. He confessed everything because the chatbot added false quotes to the submission that pointed to cases that never existed.

But in good news for lazy lawyers, search giants and errant airlines, at least some types of AI hallucinations may soon be a thing of the past. New to look forpublished Wednesday in the peer-reviewed scientific journal Nature, describes a new method for detecting when an AI tool is likely to be hallucinating. The method described in the paper is able to discern between correct and incorrect AI-generated answers approximately 79% of the time, which is approximately 10 percentage points higher than other leading methods. Although the method addresses just one of several causes of AI hallucinations and requires approximately 10 times more computing power than a standard chatbot conversation, the results could pave the way for more reliable AI systems in the near future.

“My hope is that this opens up avenues for large language models to be deployed where they cannot currently be deployed – where a little more reliability is needed than is currently available,” says study author Sebastian Farquhar, who is a researcher senior in the computer science department at the University of Oxford, where the research was carried out, and is also a researcher on the Google DeepMind security team. Of the lawyer who was fined for relying on a ChatGPT hallucination, Farquhar says: “It would have saved him.”

Hallucination has become a common term in the AI ​​world, but it is also controversial. For one thing, it implies that models have some kind of subjective experience of the world, which most computer scientists agree they don’t have. It suggests that hallucinations are a solvable quirk rather than a fundamental and perhaps ineradicable problem of large language models (different camps of AI researchers disagree on the answer to this question). Above all, the term is imprecise, describing several different categories of errors.

See more information: OA to Z of Artificial Intelligence

Farquhar’s team decided to focus on a specific category of hallucinations, which they call “confabulations.” This is when an AI model spits out inconsistent wrong answers to a factual question, as opposed to the same consistent wrong answer, which is more likely to result from problems with a model’s training data, a model lying in search of a reward, or structural flaws in logic or reasoning of a model. It’s difficult to quantify what percentage of all AI hallucinations are confabulations, says Farquhar, but it’s likely large. “The fact that our method, which only detects confabulations, greatly impairs overall correctness suggests that a large number of incorrect answers come from these confabulations,” he says.


The methodology

The method used in the study to detect whether a model is likely to be confabulating is relatively simple. First, researchers ask a chatbot to spit out a few answers (usually between five and 10) to the same prompt. Then they use a different language model to group these responses based on their meanings. For example, “Paris is the capital of France” and “The capital of France is Paris” would be assigned to the same group because they mean the same thing, even though the text of each sentence is different. “The capital of France is Rome” would be assigned to a different group.

Researchers then calculate a number they call “semantic entropy” – in other words, a measure of how similar or different the Meanings of each answer are. If all of the model’s responses had different meanings, the semantic entropy score would be high, indicating that the model is confabulating. If all of the model’s answers have identical or similar meanings, the semantic entropy score will be low, indicating that the model is giving a consistent answer – and is therefore unlikely to be confabulating. (The answer could still be consistently wrong, but this would be a different form of hallucination, for example caused by problematic training data.)

The researchers said the semantic entropy detection method outperformed several other approaches for detecting AI hallucinations. These methods included “naive entropy,” which only detects whether the text of a sentence, not its meaning, is different; a method called “P(True)” that asks the model to evaluate the veracity of its own answers; and an approach called “embedded regression,” in which an AI is tuned to get correct answers to certain questions. Incorporating regression is effective in ensuring that AIs accurately answer questions about specific subjects, but fails when different types of questions are asked. A significant difference between the method described in the paper and embedded regression is that the new method does not require industry-specific training data – for example, it does not require training a model to be good at science in order to detect possible hallucinations in answers to science-related questions. This means it works to similar effect across different subject areas, according to the article.

Farquhar has some ideas about how semantic entropy could begin to reduce hallucinations in mainstream chatbots. He says this could, in theory, allow OpenAI to add a button to ChatGPT where a user could click on an answer and get a certainty score that would allow them to feel more confident about whether a result is accurate. He says the method can also be integrated with other tools that use AI in high-risk environments, where it is more desirable to trade speed and cost for accuracy.

Although Farquhar is optimistic about his method’s potential to improve the reliability of AI systems, some experts warn against overestimating its immediate impact. Arvind Narayanan, professor of computer science at Princeton University, recognizes the value of the research but emphasizes the challenges of integrating it into real-world applications. “I think it’s good research… [but] It’s important not to get too excited about the potential of research like this,” he says. “The extent to which this can be integrated into a deployed chatbot is unclear.”

See more information: Arvind Narayanan is on TIME100 AI

Narayanan notes that with the release of better models, rates of hallucinations (not just confabulations) have decreased. But he’s skeptical the problem will go away anytime soon. “In the short and medium term, I think it is unlikely that hallucinations will be eliminated. It is, I think, to some extent intrinsic to the way LLMs work,” he says. He points out that as AI models become more capable, people will try to use them for increasingly difficult tasks where failure may be more likely. “There will always be a line between what people want to use them for and what they can reliably work on,” he says. “This is as much a sociological problem as it is a technical problem. And I don’t think there’s a clean technical solution.”



This story originally appeared on Time.com read the full story

Support fearless, independent journalism

We are not owned by a billionaire or shareholders – our readers support us. Donate any amount over $2. BNC Global Media Group is a global news organization that delivers fearless investigative journalism to discerning readers like you! Help us to continue publishing daily.

Support us just once

We accept support of any size, at any time – you name it for $2 or more.

Related

More

1 2 3 5,874

Don't Miss