Glossary

AEC: Acoustic Echo Cancellation, or echo cancellation, refers to removing the echo signal from the input signal. The echo signal is generated by a sound played through the speaker of the device then captured by the microphone.
AFE: Audio Front End, refers to a combination of modules for preprocessing raw audio signals. It’s usually performed to improve the quality of speech signal before the voice interaction, including several speech enhancement algorithms.
AGC: Automatic Gain Control, an algorithm that dynamically controls the gain of a signal and automatically adjust the amplitude to maintain an optimal signal strength.
ASR: Automatic Speech Recognition, or Speech-to-Text, refers to recognition of spoken language from audio into text. It can be used to build voice-user interface to enable spoken human interaction with AI devices.
BF: BeamForming, refers to a spatial filter designed for a microphone array to enhance the signal from a specific direction and attenuate signals from other directions.
KWS: Keyword Spotting, or wakeup word detection, refers to identifying specific keywords from audio. It is usually the first step in a voice interaction system. The device will enter the state of waiting voice commands after detecting the keyword.
NN: Neural Network, is a machine learning model used for various task in artificial intelligence. Neural networks rely on training data to learn and improve their accuracy.
NS: Noise Suppression, or noise reduction, refers to suppressing ambient noises in the signal to enhance the speech signal, especially stationary noises.
RES: Residual Echo Suppression, refers to suppressing the remained echo signal after AEC processing. It is a postfilter for AEC.
SSL: Sound Source Localization, or direction of arrival (DOA), refers to estimating the spatial location of a sound source using a microphone array.
TTS: Text-To-Speech, or speech synthesis, is a technology that converts text into spoken audio. It can be used in any speech-enabled application that requires converting text to speech imitating human voice.
VAD: Voice Activity Detection, or speech activity detection, is a binary classifier to detect the presence or absence of human speech. It is widely used in speech enhancement, ASR system etc, and can also be used to deactivate some processes during non-speech section of an audio session, saving on computation or bandwidth.

All SoCs

Select SoC via Features

HiFi DSP Series ›

HiFi DSP Series

Cortex-A Linux Series ›

Cortex-A Linux Series

Display Series ›

Audio Series ›

Wi-Fi 6 + BLE Series ›

Wi-Fi 2.4G/5G + BLE Seriess ›

Wi-Fi + Classic BT Series ›

Wi-Fi R-MESH Series ›

Select SoC via Applications

IoT Control ›

Application Note

SDK

Advanced Features

Wi-Fi Advanced Features ›

AI Voice ›

Tools

Glossary

用户登录