Large Language models (LLMs) like ChatGPT, based on advanced AI technologies, can understand and generate text – but incorrect outputs may lead to serious consequences. Vasilios Danos and Thora Markert from TÜVIT explain how companies can ensure the safe and responsible use of LLMs.
Vasilios Danos, Head of AI Security and Trustworthiness, and Thora Markert, Head of AI Research and Governance, both at TÜVIT, explain in this interview how companies can use LLMs safely and responsibly.
Which industries are turning to you?
Vasilios Danos We’re getting enquiries from very different fields. Start-ups and SMEs are using GPT models, for example, to support their customers or in the allocation of appointments. Others are using them in HR or for internal processes.
What are the main problems you see with Large Language Models (LLMs) in terms of their reliability and the quality of the information output?
VD One of the biggest problems lies in what we call hallucinations. The models often give wrong answers, but with an extremely high level of plausibility. The dangerous thing about this is that they almost always give an answer, even if in reality they have no idea. There’s a particularly significant example from the US, in which a lawyer asked ChatGPT for a precedent. The AI then presented him with a completely invented case. This was only noticed in court – a serious mistake that got the lawyer into huge amounts of trouble.
Thora Markert That's right, the appearance of the models can be deceptive. They give the impression of having specialist knowledge that in many cases isn’t even remotely correct. This is especially critical when people make decisions based on this false information. This isn’t limited to case law. Think of medical diagnoses or psychotherapeutic advice. If a model makes incorrect recommendations, human lives can be put at stake or significant financial damage incurred.
Can AI also fall for cyberattacks?
VD Of course. Here’s an example: A car dealer in the USA implemented a chatbot for customer contact purposes which fell victim to a cyberattack. The attackers bypassed the chatbot’s security barriers by claiming that they were the CEO of OpenAI. They persuaded the chatbot to sell them a car for just one dollar. Manipulations like this show how vulnerable the systems can be.
What methods does TÜVIT use to identify the vulnerabilities of LLMs?
VD Our testing methods are based on real attacks and methods from security research that have been successful in the past. We analyse these attacks and use them to develop a toolbox to test the models for their vulnerabilities in a targeted and automated way.
The biggest challenge is the black-box nature of AI.
Thora Markert
Head of AI Research and Governance at TÜVIT
What kind of role are regulatory requirements like the EU’s AI Act playing in the development of your test procedures?
VD The EU’s AI Act is going to be a game changer. Until now, tests have often been optional. But with the law that came into force in August 2024, they will soon become mandatory. The standards for this are currently being set by the EU Commission, in working groups in which we’re also involved.
TM The aim is to ensure that the systems don’t give out false information, are trustworthy and don’t discriminate. The challenge is to translate this into clear testing methodologies.
What types of attacks are particularly problematic for LLMs and how can they be mitigated?
VD Especially problematic are what are known in the trade as jailbreaks and prompt injections. These are malicious prompts to the model. When criminals try to get around the models’ protective barriers with manipulative questioning techniques, they sometimes even manage to extract private data such as credit card information or other personal data from the models.
TM Another risk is data poisoning. In this case, manipulated information is infiltrated into public forums from which the models then go on to learn. They take this false information and pass it off later as fact.
VD We document vulnerabilities in the AI application and inform the development teams of customers and manufacturers. The latter are responsible for fixing the problem and must implement the solution that fits their circumstances. The possible approaches to solving the problems are wide-ranging and vary depending on the manufacturing company. For example, models can match their answers with reputable internet sources or access knowledge bases to minimise the likelihood of misinformation. In the case of certain vulnerabilities, the system prompt can also be adjusted. This contains all the instructions on how the model should react, what content it may deliver and what it should avoid.
What are the biggest challenges when it comes to implementing a comprehensive security check for these models?
TM The biggest challenge is the black-box nature of AI. The aim is to find out why a model can be manipulated in regard to certain questions and not others. It takes a lot of research and testing to understand these mechanisms.
What measures do you think are necessary to counteract the societal impact of bias in LLMs?
VD Language models are a reflection of society. They learn from the data they find on the internet, often adopting stereotypes or toxic behaviour in the process. A decisive factor here is the availability of “clean” training data. Our job is to check how pronounced these tendencies are and how they can be minimised.
With the EU AI Act, we expect a significant increase in demand for our services.
Vasilios Danos
Head of AI Security and Trustworthiness bei TÜVIT
How do you see the cooperation between testing bodies like TÜVIT and the development teams for AI systems panning out in the future?
VD Among other things, the regulation provides for a third-party audit. This means that, as a testing body, we point out where systems have vulnerabilities. The models are becoming more and more powerful and multimodal, for example through the combination of image and video data and through the integration of speech generation. It therefore follows that the requirements will increase significantly over the next few years.
Are there any new technologies or approaches that you want to use in the future to test AI systems?
VD One promising approach is to use specialised language models to test other language models. This “agent-based approach” uses the collaboration of several models to identify vulnerabilities in the test model through increasingly sophisticated questioning techniques. At the same time, the use of agent-based approaches like these in place of a single model could significantly enhance security.
How could the role of testing bodies like TÜVIT develop if AI systems become more widespread and more powerful?
VD We see ourselves as a potential market leader in Germany for the testing of such systems. Since many companies only act when required to do so by law, we expect the EU’s AI Act to lead to a significant increase in demand for our services.