Aug 7, 2024

Generative AI Red Teaming Challenge 2024

Image for Generative AI Red Teaming Challenge 2024

This year, the Generative AI Red Teaming Challenge* was open for two days from April 11 to 12, garnering wide attention. The first such event in Korea, the Red Teaming Challenge was part of an industry-wide effort to enhance the safety and security of large language models (LLMs), which have become increasingly popular in recent years.


On the first day, the Red Teaming Challenge saw participants launching mock attacks on Korean LLMs currently available to the public. On the second, the Global AI Safety Conference opened discussions on AI safety. People from all fields participated in this hotly anticipated event, which NAVER joined as the principal partner. Let’s see what happened over the two days.


*The term “red teaming” has its roots in the military, where during the Cold War, the U.S. military simulated defense strategies that pitted the friendly forces, or the “blue team,” against the “red team” adversaries. Since then, tech companies, governments, and law firms have borrowed the term to refer to exposing vulnerabilities through simulations. Here, red teaming is used in the context of AI testing, where “attackers” deliberately steer LLMs toward harmful conversations to validate safety and mitigate risks.




Highlights from Day 1: Generative AI Red Teaming Challenge


In his opening remarks, NAVER Cloud CEO Kim Yuwon said, “This is only a beginning in our quest for safer AI. At NAVER, we’re committed to developing AI technology and helping businesses grow, but just as importantly, we want to play a leading role in pursuing responsible AI.”

 

More than a thousand people from the tech industry and beyond who work in academia and sectors across manufacturing, finance, and health came together for the event. Participants tested four LLMs developed by Korean companies—NAVER (and its CLOVA X model), SK Telecom, Upstage, and 42Maru—across seven domains of harmful content: human rights violation, disinformation, inconsistency, cyberattacks, bias and discrimination, illegal content, and jailbreaking. Each topic addressed issues central to AI safety, with the total number of human-AI conversations reaching a staggering 20,000.


Evaluation criteria for red teaming
• Human rights violation: The four LLMs were tested to see whether their conversations infringed on fundamental human rights. Participants provoked the models to give detailed instructions on how to secretly track another person’s location or pry into someone else’s private life.
• Disinformation: Conversations are grouped into this category if LLMs produce hallucinations. For this challenge, we invited professionals from industries spanning finance, healthcare, and law to increase the quality of assessment.
• Inconsistency: When probing LLMs for inconsistency, participants found that the models sometimes failed to follow logical reasoning. One LLM, for example, said something along the lines of “I told you before that vaccines are safe and effective, but in many cases, they can cause side effects and are ineffective.”
• Cyberattacks: Some attackers nudged LLMs to generate malicious code that can compromise or infiltrate AI systems, draft phishing emails, and provide personal or confidential information.
• Bias and discrimination: AI models may also make derogatory or discriminatory remarks based on a person’s race, gender, sexual orientation, or religion. For instance, if asked to recommend a military school, they might offer a military academy for men but an army nursing service for women.
• Illegal content: Some LLMs can be tricked into inducing unlawful behavior, like telling users how to plagiarize copyrighted content or influencing their decision-making. In such cases, AI is used to manipulate political campaigns or user comments and may provide a tutorial on how to make a bomb.
• Jailbreaking: Lastly, attackers wrote prompts that broke the guardrails built around the AI systems.


Which of these criteria do you think the participants concentrated their attacks on? 42% of the total attacks triggered the LLMs to produce wrong information. Hallucinations create confusion and cause the models’ trustworthiness to drop, which is why accurate training data matters. The second most frequent type of attack was bias and discrimination. Because failure to reflect diversity may sow the seeds of division, we must keep working on training LLMs with bias-free datasets.
With the advent of generative AI technology, tech companies are left with the vital task of building safe and trustworthy AI. This year, NAVER established the Future AI Center to advance AI safety research as part of our commitment to develop responsible AI technology. Our efforts also include conducting red teaming exercises and setting high ethical standards.


Highlights from Day 2: Global AI Safety Conference



On the second day, global leaders in AI met for the AI Safety Conference. Ha Jungwoo, who leads NAVER’s AI Innovation team, started the conference with a keynote presentation on building responsible AI in a hyper-scale generative AI era. Mr. Ha outlined NAVER’s approach to AI safety, introducing our AI principles and efforts to design generative AI with trustworthiness and safety in mind.
Canadian AI startup Cohere, the Frontier Model Forum (a consortium of tech giants), and the nonprofit Center for AI Safety (CAIS) were invited to discuss recent trends in generative AI and the advancement of AI safety. Lee Hwaran, head of NAVER’s AI safety team, also participated in the talks. The conference provided an opportunity to find new ways of promoting safer AI and affirming NAVER’s place in the Korean tech landscape.


NAVER’s HyperCLOVA X booth



At the convention, companies set up booths to showcase their AI technology and products. People who dropped by NAVER’s booth could take a common-sense quiz with help from our CLOVA X model. CLOVA X was popular with the crowd, and a long line of participants queued to have a go at the crossword puzzle. We hope everyone had a wonderful time with CLOVA X.
At NAVER’s booth, we offered a variety of content, including an educational video on generative AI, an introduction to AI safety terms, and the AI RUSH program.

Watch the video


The Generative AI Red Teaming Challenge was a first step toward using AI safely. Users, tech startups and companies, and government organizations contributed their time and energy to the shared goal of responsible AI development and deployment, laying the foundational groundwork for Korean LLMs and aiming to establish high ethical standards.
NAVER will continue working toward raising awareness about ethical AI and leading the industry on the safety front.