Voice processor & dev kit for integrated far-field voice capture

June 13, 2017 // By Graham Prophet
Fabless chip maker XMOS (Bristol, UK) has introduced two voice interface processors, and a smart-speaker development kit. The company has created “VocalFusion” as its branding for the voice space, and looks to a future where voice is the primary HMI input to many domestic, and other, systems.

The XVF3000 devices and the VocalFusion Speaker development kit enable far-field (metres range) voice capture. XMOS’ fundamental expertise is in I/O centric microcontrollers, and it has been building a specialism in audio processing. The task of aggregating several microphone streams into a single digitised stream is a natural fit, the company says; and it has also been adding DSP operations to its core to enable the required processing. XMOS aims to provide a “front-end” facility – that is, to identify the source voice, lock on to it with microphone beamforming, carry out echo, reverberation and noise cancellation, and capture the essential spoken content. Its customer, the product designer, would be where the interface to speech recognition – either locally or to a cloud service such as Amazon Alexa – would be implemented. XMOS does, however, provide the option of on-chip trigger-word-recognition so that this function can be performed ‘at source’. (the XVF3100 has this facility; the 3000 variant does not.)


XMOS chips, the company says, offer an integration path forward from today’s designs which are typically employing multiple ICs, including DSPs, to effect voice capture; this is a, “flexible, programmable solution … a cost effective always-on voice interface in a single device.” There is the option adding voice-trigger functions with Sensory’s TrulyHandsfree technology.


In the same release is the VocalFusion Speaker development kit (XK-VF3100-C43), which includes an XVF3000 processor card and a 4-mic circular microphone array. This kit provides a quick way to start developing far-field voice capture applications. XVF3000 devices include speech enhancement algorithms that include an adaptive beamformer, which uses signals from four microphones to track a talker as they move, coupled with high performance full-duplex, acoustic echo cancellation. XVF3000 devices can be integrated with an applications processor or host PC via either USB for data and control or a combination of I 2S and I 2C. Developers can add custom voice and audio