AIVoice Interfaces

Flow and Module Interfaces

Interface

Flow/Module

aivoice_iface_full_flow_v1

AFE+KWS+ASR

aivoice_iface_afe_kws_v1

AFE+KWS

aivoice_iface_afe_kws_vad_v1

AFE+KWS+VAD

aivoice_iface_afe_v1

AFE

aivoice_iface_vad_v1

VAD

aivoice_iface_kws_v1

KWS

aivoice_iface_asr_v1

ASR

All interfaces support below functions:

  • create()

  • destroy()

  • reset()

  • feed()

Please refer to ${aivoice_lib_dir}/include/aivoice_interface.h for details.

Event and Callback Message

aivoice_out_event_type

Event trigger time

Callback message

AIVOICE_EVOUT_VAD

When VAD detects start or end point of a speech segment

Struct includes VAD status, offset.

AIVOICE_EVOUT_WAKEUP

When KWS detects keyword

JSON string includes ID, keyword, and score. Example: {“id”:2,”keyword”:”ni-hao-xiao-qiang”,”score”:0.9}

AIVOICE_EVOUT_ASR_RESULT

When ASR detects command word

JSON string includes FST type, commands and ID. Example: {“type”:0,”commands”:[{“rec”:”play music”,”id”:14}]}

AIVOICE_EVOUT_AFE

Every frame when AFE got input

Struct includes AFE output data, channel number, etc.

AIVOICE_EVOUT_ASR_REC_TIMEOUT

When ASR/VAD exceed timeout duration

NULL

AFE Event Definition

struct aivoice_evout_afe {
    int     ch_num;                       /* channel number of output audio signal, default: 1 */
    short*  data;                         /* enhanced audio signal samples */
    char*   out_others_json;              /* reserved for other output data, like flags, key: value */
};

VAD Event Definition

struct aivoice_evout_vad {
    int status;                     /*  0: vad is changed from speech to silence,
                                           indicating the end point of a speech segment
                                        1: vad is changed from silence to speech,
                                           indicating the start point of a speech segment */
    unsigned int offset_ms;         /* time offset relative to reset point. */
};

Common Configurations

AIVoice configurable parameters:

no_cmd_timeout:

In full flow, ASR exits when no command word detected during this duration. In AFE+KWS+VAD flow, VAD works only within this duration after a keyword detected.

memory_alloc_mode:

Default mode uses SDK default heap. SRAM mode uses SDK default heap while also allocate space from SRAM for memory critical data. SRAM mode is ONLY available on RTL8713E and RTL8726E DSP now.

Refer to ${aivoice_lib_dir}/include/aivoice_sdk_config.h for details.