KWS (Keyword Spotting)

Introduction

KWS is the module to detect specific wakeup words from audio. It is usually the first step in a voice interaction system. The device will enter the state of waiting voice commands after detecting the keyword.

AIVoice provides two KWS solutions: a fixed keyword solution and a user-defined keyword solution. The former can achieve optimal performance on low-resource devices, while the latter allows flexible customization of keywords.

Solution	Training data	Available keywords	Feature
Fixed keyword	Specific keywords	Keywords same as training data	better performance, smaller model
User-defined keyword	Common data	Flexible keyword of the same language as training data	More flexible

Currently SDK provides a fixed keyword model library and a user-defined model.

Fixed Keyword Model

Support Chinese keyword xiao-qiang-xiao-qiang or ni-hao-xiao-qiang.
Other keywords or performance optimizations can be provided through customized services.

User-defined Keyword Model

Language Support: Chinese only
Number of Keyword: Supports up to 5 keywords simultaneously.
Word Length: Each keyword must contain 3 to 6 Chinese characters; words outside this range are invalid.
Keyword Selection Guidelines
- Avoid characters with zero initials(e.g., yīn, yī).
- Avoid common daily phrases (e.g., put on clothes, eat breakfast).
- Ensure high phonetic distinction between adjacent syllables.

KWS Mode

Two KWS modes are provided for different use cases. Single-channel mode processes single-channel audio as input, while Multi-channel mode processes multi-channel as input. Multi-channel mode improves accuracy for KWS and ASR compared to single-channel mode. However, it also increases computational resource consumption and memory usage.

KWS mode	Function	Description
Single-channel mode	void rtk_aivoice_set_single_kws_mode(void)	Less computation resource consumption and less memory usage
Multi-channel mode	void rtk_aivoice_set_multi_kws_mode(void)	Better kws and asr accuracy

Attention

KWS mode MUST set before create instance in these flows:

aivoice_iface_full_flow_v1
aivoice_iface_afe_kws_v1
aivoice_iface_afe_kws_vad_v1

Algorithm Flow

Single-channel Mode

../../../rst_ai/aivoice/aivoice_kws/figures/kws_flow_single_channel.svg

Multi-channel Mode

../../../rst_ai/aivoice/aivoice_kws/figures/kws_flow_multi_channel.svg

Configurations

KWS configurable parameters:

keywords:: Keywords for wake up, and available keywords depend on KWS model. If the KWS model is a fixed keyword solution, keywords can only be chosen from the trained words. For customized solution, keywords can be customized with any combinations of same language unit(such as pinyin for Chinese). Example: xiao-qiang-xiao-qiang.
thresholds:: Threshold for wake up, range [0, 1]. The higher, less false alarm, but harder to wake up. Set to 0 to use sensitivity with predefined thresholds.
sensitivity:: Three levels of sensitivity are provided with predefined thresholds. The higher, easier to wake up but also more false alarm. ONLY works when thresholds set to 0.

Refer to ${aivoice_lib_dir}/include/aivoice_kws_config.h for details.

Threshold Adjustment Suggestions

As the threshold increases from low to high, the wakeup rate gradually decreases, and false wakeup reduce (i.e., sensitivity shifts from high to low). Users should select an appropriate threshold based on actual needs.
For fixed keyword model, three sensitivity levels are provided: High, Medium, and Low, corresponding to ~1 false trigger per 12h, 24h, and 48h, respectively. For finer adjustments, users can configure the thresholds parameter to adapt to their usage scenario, with a step size of 0.02.
For user-defined keyword model, the thresholds are typically lower than fixed keyowrd model, with a suggested adjustment step size of 0.005.

../../../rst_ai/aivoice/aivoice_kws/figures/kws_roc.svg

All SoCs

Select SoC via Features

HiFi DSP Series ›

HiFi DSP Series

Cortex-A Linux Series ›

Cortex-A Linux Series

Display Series ›

Audio Series ›

Wi-Fi 6 + BLE Series ›

Wi-Fi 2.4G/5G + BLE Seriess ›

Wi-Fi + Classic BT Series ›

Wi-Fi R-MESH Series ›

Select SoC via Applications

IoT Control ›

Application Note

SDK

Advanced Features

Wi-Fi Advanced Features ›

AI Voice ›

Tools