Abstract

Dysarthric speech reconstruction (DSR) typically employs a cascaded system that combines automatic speech recognition (ASR) and sentence-level text-to-speech (TTS) to convert dysarthric speech into speech with normal prosody. Dysarthric patients speak more slowly, leading to an excessively long response time in such systems, making them unacceptable in long-speech scenarios. Cascaded DSR systems based on streaming ASR and incremental TTS can reduce these delays. However, patients with varying degrees of dysarthria exhibit significant pronunciation differences for the same text, resulting in poor robustness of ASR and limiting the intelligibility of reconstructed speech. In addition, incremental TTS suffers from poor prosodic feature prediction due to a limited receptive field. In this study, we propose an end-to-end simultaneous DSR system with two key innovations: 1) A frame-level adaptor module is introduced to bridge ASR and TTS. By employing explicit-implicit semantic information fusion and joint module training, it enhances the error tolerance of TTS to ASR outputs. 2) A multiple wait-k autoregressive TTS module is designed to mitigate prosodic degradation via multi-view knowledge distillation. Our system has an average response time of 1.03 seconds on Tesla A100, with an average RTF of 0.71. On UASpeech, it achieves 4.67 MOS, showing a 23.51% relative improvement in WER compared to the SOTA.

Diagram of Model

Fig: End-to-end simultaneous DSR system

Chinese commercial Test Set

intelligibility-low

Sample	Conformer RNNT(pinyin)	Label
178_M_1	n i d e q ue x ian sh i sh en m e	你的缺点是什么
178_M_2	w o x ian ch i f an l e	我先吃饭了
106_W_1	t a h ua b an j iu x iang w o m en d e sh ou	它花瓣就像我们的手
106_W_2	j iu d iao zh e x ie b u y in r en zh u m u d e j iao g uan	揪掉这些不引人注目的小花

intelligibility-middle

Sample	Conformer RNNT(pinyin)	Label
20_M_1	sh eng h uo h en p in d an y ou h en n an	生活很平淡又很难
20_M_2	y i g e r en zh u d uo j iu l e k en d ing h ui l ei l e	一个人主动久了肯定会累了
39_W_1	x ian z ai b u sh i sh uo ch i y i a	现在不是双十一啊
39_W_2	k an l ai zh en d e sh i sh ou b u d ao j ie j ie d e zh u f u l e	看来真的是收不到姐姐的祝福了