KWS (Keyword Spotting)
Introduction
KWS is the module to detect specific wakeup words from audio. It is usually the first step in a voice interaction system. The device will enter the state of waiting voice commands after detecting the keyword.
AIVoice provides two KWS solutions: a fixed keyword solution and a user-defined keyword solution. The former can achieve optimal performance on low-resource devices, while the latter allows flexible customization of keywords.
Solution |
Training data |
Available keywords |
Feature |
---|---|---|---|
Fixed keyword |
Specific keywords |
Keywords same as training data |
better performance, smaller model |
User-defined keyword |
Common data |
Flexible keyword of the same language as training data |
More flexible |
Currently SDK provides a fixed keyword model library and a user-defined model.
Fixed Keyword Model
Support Chinese keyword
xiao-qiang-xiao-qiang
orni-hao-xiao-qiang
.Other keywords or performance optimizations can be provided through customized services.
User-defined Keyword Model
Language Support: Chinese only
Number of Keyword: Supports up to 5 keywords simultaneously.
Word Length: Each keyword must contain 3 to 6 Chinese characters; words outside this range are invalid.
Keyword Selection Guidelines
Avoid characters with zero initials(e.g.,
yīn
,yī
).Avoid common daily phrases (e.g.,
put on clothes
,eat breakfast
).Ensure high phonetic distinction between adjacent syllables.
KWS Mode
Two KWS modes are provided for different use cases. Single-channel mode processes single-channel audio as input, while Multi-channel mode processes multi-channel as input. Multi-channel mode improves accuracy for KWS and ASR compared to single-channel mode. However, it also increases computational resource consumption and memory usage.
KWS mode |
Function |
Description |
---|---|---|
Single-channel mode |
void rtk_aivoice_set_single_kws_mode(void) |
Less computation resource consumption and less memory usage |
Multi-channel mode |
void rtk_aivoice_set_multi_kws_mode(void) |
Better kws and asr accuracy |
Attention
KWS mode MUST set before create instance in these flows:
aivoice_iface_full_flow_v1
aivoice_iface_afe_kws_v1
aivoice_iface_afe_kws_vad_v1
Algorithm Flow
Single-channel Mode
Multi-channel Mode
Configurations
KWS configurable parameters:
- keywords:
Keywords for wake up, and available keywords depend on KWS model. If the KWS model is a fixed keyword solution, keywords can only be chosen from the trained words. For customized solution, keywords can be customized with any combinations of same language unit(such as pinyin for Chinese). Example:
xiao-qiang-xiao-qiang
.- thresholds:
Threshold for wake up, range [0, 1]. The higher, less false alarm, but harder to wake up. Set to 0 to use sensitivity with predefined thresholds.
- sensitivity:
Three levels of sensitivity are provided with predefined thresholds. The higher, easier to wake up but also more false alarm. ONLY works when thresholds set to 0.
Refer to ${aivoice_lib_dir}/include/aivoice_kws_config.h
for details.
Threshold Adjustment Suggestions
As the threshold increases from low to high, the wakeup rate gradually decreases, and false wakeup reduce (i.e., sensitivity shifts from high to low). Users should select an appropriate threshold based on actual needs.
For fixed keyword model, three sensitivity levels are provided: High, Medium, and Low, corresponding to ~1 false trigger per 12h, 24h, and 48h, respectively. For finer adjustments, users can configure the
thresholds
parameter to adapt to their usage scenario, with a step size of 0.02.For user-defined keyword model, the thresholds are typically lower than fixed keyowrd model, with a suggested adjustment step size of 0.005.