Why RAG Won't Solve Generative AI's Hallucination Problem

Hallucinations – essentially the lies that generative AI models spread – are a major problem for companies looking to integrate the technology into their operations.

Because models have no real intelligence and merely predict words, images, speech, music, and other data according to a private scheme, they are sometimes wrong. Very wrong. In a recent article in the Wall Street Journal, a source reports a case in which Microsoft’s generative AI invented meeting participants and suggested that conference calls discussed topics that were not actually discussed on the call.

As I wrote some time ago, hallucinations may be an intractable problem with today’s transformer-based model architectures. But a number of generative AI providers are suggesting this may can be more or less eliminated by a technical approach called Retrieval Augmented Generation (RAG).

Here’s how one provider, Squirro, suggests:

The focus of the offer is the concept of Retrieval Augmented LLMs or Retrieval Augmented Generation (RAG) embedded in the solution… [our generative AI] is unique in its promise not to cause hallucinations. Every information generated can be traced back to a source, which ensures credibility.

Here’s a similar pitch from SiftHub:

Using RAG technology and fine-tuned large language models with industry-specific knowledge training, SiftHub enables companies to generate personalized answers without hallucinations. This guarantees increased transparency and reduced risk, instilling absolute confidence in using AI for all their needs.

RAG was created by data scientist Patrick Lewis, a researcher at Meta and University College London and lead author of the 2020 paper that coined the term. When applied to a model, RAG retrieves documents that are potentially relevant to a question—for example, a Wikipedia page about the Super Bowl—essentially using a keyword search and then asking the model to generate answers based on that additional context .

“When you interact with a generative AI model like ChatGPT or Llama and ask a question, the model responds by default from its ‘parametric memory’ – that is, from the knowledge stored in its parameters as a result.” Training with huge amounts of data the Internet,” said David Wadden, research scientist at AI2, the AI-focused research arm of the nonprofit Allen Institute. “But just like you’re likely to give more accurate answers if you have a reference [like a book or a file] The same goes for models in some cases.”

RAG is undeniably useful – it allows mapping things a model generates to retrieved documents to verify their factuality (and, as an added benefit, avoid potentially copyright-infringing regurgitation). RAG also allows companies that don’t want their documents to be used to train a model – for example, companies in highly regulated industries like healthcare and legal – to allow models to access those documents in a more secure and temporary way.

But definitely RAG tilt Prevent a model from hallucinating. And there are limitations that many providers gloss over.

Wadden says RAG is most effective in “knowledge-intensive” scenarios where a user wants to use a model to meet an “information need” — for example, finding out who won the Super Bowl last year. In these scenarios, the document answering the question likely contains many of the same keywords as the question (e.g., “Super Bowl,” “last year”), making it relatively easy to find using keyword search.

Things get trickier for “reasoning-intensive” tasks like coding and math, where it’s harder to specify in a keyword-based search query the concepts needed to answer a query – let alone figure out which documents might be relevant.

Even for simple questions, models can be “distracted” by irrelevant content in documents, especially long documents where the answer is not obvious. Or, for reasons still unknown, they may simply ignore the contents of the retrieved documents and rely instead on their parametric memory.

RAG is also expensive in terms of the hardware required for large-scale application.

Retrieved documents, be they from the Internet, an internal database or elsewhere, must be stored – at least temporarily – in memory so that the model can access them. Another effort is computing the extended context that a model must process before generating its response. For a technology already known for requiring a lot of processing power and power for even basic operations, this is a serious consideration.

That doesn’t mean RAG can’t be improved. Wadden pointed to many ongoing efforts to train models to make better use of the documents retrieved from RAG.

Some of these efforts include models that can “decide” when to use the documents, or models that can choose not to perform the retrieval at all if they deem it unnecessary. Others focus on ways to index large document datasets more efficiently and on improving search through better representations of documents—representations that go beyond keywords.

“We’re pretty good at retrieving documents based on keywords, but not so good at retrieving documents based on more abstract concepts like a proof technique needed to solve a math problem,” Wadden said. “Research is needed to develop document representations and search techniques that can identify relevant documents for more abstract generation tasks. I think that’s largely an open question at this point.”

So RAG can help reduce a model’s hallucinations – but it’s not the answer to all of AI’s hallucinatory problems. Beware of providers who claim otherwise.

Why RAG Won’t Solve Generative AI’s Hallucination Problem | TechCrunch

Leave a Comment Cancel reply