CLOVA Speech

Experience NAVER CLOVA’s new speech recognition engine NEST
Only Korean is supported with current NEST demo.
News Sample This is transcribed text of YTN news speech produced with NEST engine.
The result may contain minor recognition errors.
Check out the speech recognition technology of the NEST engine
Please contact us with a partnership proposal if you are a business or organization that wants to use CLOVA Speech.
Audible and visible meeting note

CLOVA Note beta

Experience Speech Recognition Technology of CLOVA Speech in daily life Start CLOVA Note

NAVER CLOVA’s speech recognition technology,
CLOVA Speech

CLOVA has the world’s finest speech recognition technology in the Korean and Japanese languages and
provides a speech recognition engine with a high recognition rate for various speech-based services on NAVER and LINE.
  • Voice command recognition The CLOVA engine, which learns based on
    a vast amount of data from NAVER and LINE,
    provides an accurate recognition rate
    for any new voice command.
    Learn More
  • Dictation (NEST) The NEST engine learns in an end-to-end method and provides accurate
    speech recognition for long, unstructured,
    and complex sentences.
    Learn More
  • Speaker recognition CLOVA’s speaker recognition can be used to authenticate and identify the speaker as
    it can quickly and accurately identify the
    registered speaker even from short speech.
    Learn More

Voice command recognition technology
for voice assistants

  • Accurate wake-up-word recognition CLOVA responds with a high level of accuracy
    to wake-up-words such as “Hey CLOVA” and
    “Hi NAVER.”
  • Noise-cancelling model CLOVA provides acoustic echo cancellation (AEC)
    to remove any surrounding noise so that the assistant’s sound is not re-entered.
  • Advance learning for certain domains CLOVA can learn a new voice command set in advance and improve the recognition rate required to launch any service ahead of time.
  • Support for speech recognition
    in multiple languages
    CLOVA provides the world’s best Korean and Japanese speech recognition models as well as commercializable English, Chinese, French and Spanish models.
  • Real-time streaming protocol CLOVA can produce interactive application programs that minimize the response time.
  • Recognition post-processing To reduce any error by refining speech recognition outcomes, the outcomes are post-processed and adjusted.
Service examples
  • CLOVA AI speaker CLOVA is responding to various voice commands with complete accuracy and expanding its voice command areas.
  • NAVER App CLOVA accurately recognizes voice searches and commands specialized for the mobile environment.
  • NAVER Map CLOVA provides a speech recognition model specialized for map services including road name addresses and names of restaurants.
  • AiCall Embedded with the AiCall voice assistant designed to make reservations, CLOVA accurately recognizes speech even in low-quality calls.

NEST, Speech Recognition
for unstructured speech

The subtitle of following video was automatically created with NEST
  • Accurate long sentence dictation CLOVA provides an end-to-end speech recognition engine specialized for long and unstructured sentences.
  • High accuracy without data learning Accurate recognition outcomes are guaranteed with the out-of-the-box model without any prior learning about the domain’s speech data.
  • Media-enhanced model Turn conversational speech in video and audio clips into text, and automatically generate subtitles.
  • Streaming and batch processing CLOVA can process pre-recorded files, and real-time transcription using a streaming protocol is in the works.
  • Domain-specific support CLOVA shows a high level of accuracy without learning, but can further enhance its speech recognition rate for a particular domain via data learning.
  • Timestamp support CLOVA provides timestamps that indicate the starting time of a sentence of speech, and word-by-word timestamps are in the works.
Service examples
  • NAVER NEWS subtitles CLOVA is currently applied to NAVER's broadcast news and automatically generates subtitles.
  • Call center Convert speech data from the call center into text to easily manage customer data.
  • Audio/video archiving Turn speech in audio and video data into text to archive and analyze them.
  • Automatic subtitling Subtitles can be easily generated with the time stamp feature.

Speaker recognition
that accurately recognizes and separates speech

  • Speaker recognition CLOVA extracts and identifies the registered speaker’s speech characteristics from 1-second-short wake-up-words.
  • Speaker diarization We are researching and developing technologies that separates multiple speakers having conversations to turn them into each speaker’s conversation script.
  • AVSD/AVSE We are researching and developing AVSD/AVSE (Audio-visual speaker detection/enhancement) technologies that identify and separate people speaking in videos through video analysis.
Service examples
  • CLOVA AI speaker A personalized feature that identifies various people’s registered voices with the wake-up-word “CLOVA” only. *This feature is currently only available in the Japanese CLOVA speaker.
  • Transcription Convert the saved speech files into text where speakers are separated.
  • Teleconference meeting minutes CLOVA can separate speakers in a teleconference and provide more accurate speech recognition.

Research Area

  • Audio Event Detection We are researching and developing technologies that detect learned sounds such as a baby’s cries with a high level of accuracy.
  • Sound Source Separation We are researching and developing technologies that separate multiple speakers via deep learning in situations with background noise.
  • SLU Spoken Language Understanding We are researching and developing technologies that identify what the speaker intends to say immediately from speech without going through natural language processing.
  • AVSD & AVSE Audio-Visual Speaker Diariazation & Audio-Visual Speech Enhancement We are researching and developing technologies that identify and separate people speaking in videos through video analysis.

With NAVER Cloud

Join NAVER CLOVA Platform powered by world-class AI technology.