HyperCLOVA X, NAVER’s flagship AI models
In 2023, NAVER released HyperCLOVA X, built on the previous HyperCLOVA model but with even more advanced Korean capabilities and the ability to answer more complex questions. HyperCLOVA X leads performance in several Korean benchmarks, proving itself as an example of sovereign AI and the model to go for if you’re looking for exceptional Korean abilities. With a deep understanding of the Korean language, our models outperform many across several benchmarks and in a wide range of areas, including writing, inference, and translation.
LLM optimized for the Korean language
As Hallyu, or the wave of Korean music, movies, and cuisine, finds a global audience, the Korean language is also gaining popularity. However, large language models (LLMs) specializing in Korean are rare. LLMs are trained on large datasets to understand and generate languages, and for most leading AI models, the training datasets are predominantly in English and focused on North American culture. For this reason, we need AI models familiar with Korean norms and values, its cultural context and geographical situation, and most of all, its unique language. HyperCLOVA X is a step toward achieving AI sovereignty—or building AI technology with local data, infrastructure, and workforce—crucial to ensuring a nation’s data sovereignty.
Exceptional Korean capabilities
HyperCLOVA X is built from the ground up to be proficient in Korean because we vastly increased its portion in the training data. Korean, multilingual, and coding data each constitute about a third of the training data, and these raw data are processed appropriately to transform into meaningful insights. Text is converted into tokens to process data, where a single token can be characters, words, or phrases. We use a tokenizer optimized for the Korean language, allowing our models to encode the same data more compactly than other English-centric models. As a result, the models can interpret longer contexts at reduced inference costs. All else equal, HyperCLOVA X runs faster and gives better answers.
Outstanding multilingual capabilities
HyperCLOVA X excels at multilingual inference and machine translation, which is possible because its Korean and English capabilities, the two largest languages that comprise the training data, spill over to other languages. HyperCLOVA X received high scores in multilingual inference for languages relatively underrepresented in the model—especially Asian languages like Arabic, Hindi, Thai, Urdu, Vietnamese, and Chinese. HyperCLOVA X also exhibits its multilingual proficiency in machine translation, especially between Korean, English, Japanese, and Chinese. When translating from one language to another, our models accurately deliver both the content and intent of a document.
Foundation models for AI
Today, leading companies no longer try to find separate solutions to different problems but create one general or foundation model to answer all sorts of questions. These models are then fine-tuned for specific purposes. NAVER also uses foundation AI models across our services, including CLOVA X and Studio API. With CLOVA X, we bring our hyper-scale AI for enterprise customers to boost productivity and use HyperCLOVA X in an easy-to-use conversational interface. HyperCLOVA X is also available through Studio API, which users can tailor to suit their needs.
Our plans for the future
We plan to expand the features in HyperCLOVA X to support different modalities. AI models that can perform multimodal tasks can understand different types of data— including text, image, audio, and video—at once. In the future, HyperCLOVA X will be able to understand and generate visual information, and this conversational AI will operate much like the way people interact with the world. Soon, you could send a picture of an ingredient at a grocery store and ask HyperCLOVA X for recipes or ask HyperCLOVA X to explain the characters or plot in the movie you’re watching through a live video chat.
*Learn more
For more information, read the HyperCLOVA X technical report.
Read the technical report
Explore CLOVA X