Science & Innovation

Can AI Stop Making Things Up?: Dr. Zheng Yuan on AI Hallucinations

Published

12 February 2025

Portrait of KCL researcher Dr. Zheng Yuan, who works on mitigating AI hallucinations. Image provided by Dr. Zheng Yuan.

Dr. Zheng Yuan, KCL researcher who works on mitigating AI hallucinations. Image provided by Dr. Zheng Yuan.

Staff Writer Ella Adam sits down with King’s College London (KCL) Researcher Dr. Zheng Yuan to chat about her work mitigating AI hallucinations.

Large Language Models (LLMs) like ChatGPT can design travel itineraries, write code or translate languages. However, sometimes what they output really does not make any sense — a phenomenon called ‘hallucinations’.

Dr. Zheng Yuan, researcher at King’s Department of Informatics, recently signed a gift-agreement with NetMind AI to advance her research on mitigating such hallucinations. This means she tries to eliminate any irrelevant or unfactual output from LLMs. But how does one determine what is irrelevant or unfactual?

Roar: Last time I used ChatGPT, I asked it to shorten down a paragraph to exactly 500 signs. It seems like Large Language Models can pass a bar exam, but not count signs. Is that a hallucination according to your definition?

Zheng: In your case, ChatGPT did generate a summary, but it did not meet your requirements. What I refer to as hallucinations in Large Language Models is more about the content of their output, so when that is not factual or irrelevant to us. Let’s say you are really interested in quantum computing and ask ChatGPT to suggest some books on that topic. It could happen that you get a very authentic looking list of books with title, author and so on, but when you google some of them, you realise they don’t exist. That would be a non-factual hallucination. Then there are irrelevant hallucinations. Let’s go back to our booklist. This time, ChatGPT’s list is not about quantum computing, but mechanical engineering or chemistry. These books do exist, but they are irrelevant to your request.

R: So hallucinations are non-factual or irrelevant output of LLMs. How does this happen?

Z: That comes down to how LLMs are trained and how their output is generated. Let’s take ChatGPT as an example: It was trained on very large amount of text. Based on statistical inference, ChatGPT tries to predict the word that follows after another word in a certain context. Step-by-step. That is what we call next-word-prediction. Sometimes, the prediction might be right, but the output still somehow wrong.

R: What can the LLM user do to prevent hallucinations?

Z: If you provide more context in your prompt, the LLM will be more likely to output a response that is more helpful or relevant to you. Here is a simple example: If I say ‘I am going to the supermarket to buy some …’, you could finish that sentence with basically anything, right? However, if I say: ‘I am running out of sugar. I am going to buy some…’, it would make sense to finish my sentence with ‘sugar’. That shows how the wider context helps.

R: You recently signed a gift agreement with NetMind AI to develop a system that filters out hallucinations. How would such a system work?

Z: Our approach is a matrix with four dimensions: The x-axis is a range from irrelevant to relevant and the y-axis goes from non-factual to factual. With these dimensions, we classify different cases. What we really do not want, is irrelevant and non-factual output. To detect and filter such output, we are working on combining the LLM with an external knowledge base for fact-checking.

For irrelevant output, we are focussing on the context that has been provided to the LLM. We try to come up with ways to calculate the similarity of the output to the context. If the similarity is low, we make the assumption that the output is less likely to be relevant. If the similarity score is high, we assume the output is relevant.

R: How do you determine what is irrelevant or non-factual?

Z: It is easier to determine the relevance, since that simply depends on the context and the similarity score. Regarding the factuality, we are facing a big challenge in determining the truthfulness of the output because we need reliable sources for that. So far, we are using Wikipedia and another external knowledge base, but everyone knows that Wikipedia is not really reliable, because anyone can edit it. A second issue is temporary information, information that is not always factual: For example, let’s say in a ChatGPT conversation, there is a claim about the current US-president being Barack Obama. This used to be true, but not anymore. It is tricky to update that kind of information.

R: How do you go about these cases?

Z: It is tricky because we do not have our own reliable knowledge base. So we rely on what we can get. Our approach is to rely on evidence from different sources. If there is a conflict between them, we can look at those cases in more detail. If we can confirm something from multiple sources, we have a high confidence level to conclude that some information is likely to be true. But if the evidences are conflicting, we look for additional sources and decide every case individually.

R: Do you think people are aware enough of hallucinations?

Z: I am not sure. People need to have prior knowledge in order to judge whether an LLM’s output is relevant and factual. Here is an example that came up during one of my teaching sessions about backpropagation: I asked ChatGPT to explain backpropagation and deep learning and it returned a paragraph that looked very good at the first glance. But there were some errors in the details which I only found because I knew about the topic – if I didn’t I would have never realised. That is something people should be aware of.

R: When a startup like NetMindAI gifts a researcher money to pursue their project, are there any ties attached? Do they get rights to use your findings, for example?

Z: In my case, the gift was a research donation – that means it is unrestricted. We are aiming to make our research output as accessible as possible to benefit the wider community. Of course, if NetMind AI is interested in our output, I would be very happy to collaborate with them in commercialising it.

R: How long do you plan this research project to take?

Z: So the donation is intended to finance the project for three years, starting coming June. However, Large Language Models evolve everyday. Currently, hallucinations are a really big issue, but we do not know when it will be solved and whether other issues will emerge. As a researcher, I believe this field has great potential to achieve great societal impact. Although this donation is meant to last for three years, I will definitely continue to research further in that domain after that.

R: Do you have any other ideas for future research?

Z: Currently, I am also working on a few other projects. One of them is on AI creativity. Up until recently, some people claimed that the feature that sets humans apart from AI and other machines is creativity. However, there have been really interesting findings that creative collaboration between human artists and AI can have amazing results. I investigate how such human and AI collaboration can be fostered.

Another project is more focussed on educational applications. Many students already use ChatGPT for their uni work. Understandably, there are lots of ethical concerns about this. I am looking at how we can build better LLMs for academic and professional settings.

I am also very interested in making LLMs equally good for other languages than English. Particularly for bilingual speakers who mix two languages in their everyday life.

About the author

Ella Adam

View all articles

Recent Posts

Can AI Stop Making Things Up?: Dr. Zheng Yuan on AI Hallucinations

Post Views: 248

In this article:AI, AI Hallucinations, Researchers

Sport

England’s loss against Argentina: What this means for many women

Staff writer Hamna Husnat highlights how England’s football loss to Argentina may have affected domestic violence and the need for action. Many watched the...

Hamna Husnat5 hours ago

Comment

A New Dawn in British Politics? From By-Elections To Resignations

Staff Writer Zaynab Ali discusses recent major upheavals in British politics, arguing that these events may signal the beginning of a new era for...

Zaynab Ali1 day ago

Economy

‘Give them a tram!’: decentralise and make Rayner Chancellor says Strand Group devolution Panel

Deputy Editor-in-Chief Lara Bevan-Shiraz reports on the Strand Group’s panel discussion on 8 July, featuring Angela Rayner, which critiqued patchy devolution thus far, pressing...

Lara Bevan-Shiraz1 day ago

Comment

On the First Day of Burnham… British Politics Sent to Me

'Andy Burnham is my ninth prime minister; Tony Blair was my first.'

Tom Turner2 days ago

Comment

Clacton By-election: The Political Gamble Behind Farage’s Resignation

News Editor Holly Briggs analyses Nigel Farage’s resignation and the upcoming by-election. On 7 July, amid an investigation by Parliament’s standards watchdog into his...

Holly Briggs3 days ago

Science & Innovation

The Imminent Burst: King’s researchers reveal the public’s growing AI scepticism

A team of researchers in the King’s College London Digital Humanities department has found that almost half of the UK’s adult population is limiting...

Anoushka Sinha5 days ago

Science & Innovation

KAiZEN Collab! AI Showcase Aims to Build Dialogue on AI Between King’s Staff and Students

Science Editor Anoushka Sinha discusses the upcoming KAiZEN Collab! where students and staff will come together to discuss the effects of AI usage at...

Anoushka Sinha25 May 2026

Hand holding phone displaying AI applications.

Comment

Claude, meet Chat — your bosses already know one another.

Staff Writer Kaya Newhagen explores how two firms came to shape a technology no government is yet equipped to govern. Ask ChatGPT or Claude...

Kaya Newhagen7 April 2026

Comment

Zara Larsson, Chocolatina, and Strawberto: The Insidiously Casual Cost of AI Use

Comment Editor Deborah Solomon unravels the pervasive and unthinking use of AI and the ecological and societal injustice propping this market up. On a...

Deborah Solomon3 April 2026

Roar News

Science & Innovation

Can AI Stop Making Things Up?: Dr. Zheng Yuan on AI Hallucinations

About the author

Latest

Sport

England’s loss against Argentina: What this means for many women

Comment

A New Dawn in British Politics? From By-Elections To Resignations

Economy

‘Give them a tram!’: decentralise and make Rayner Chancellor says Strand Group devolution Panel

Comment

On the First Day of Burnham… British Politics Sent to Me

Comment

Clacton By-election: The Political Gamble Behind Farage’s Resignation

Science & Innovation

The Imminent Burst: King’s researchers reveal the public’s growing AI scepticism

Science & Innovation

KAiZEN Collab! AI Showcase Aims to Build Dialogue on AI Between King’s Staff and Students

Comment

Claude, meet Chat — your bosses already know one another.

Comment

Zara Larsson, Chocolatina, and Strawberto: The Insidiously Casual Cost of AI Use