# An End-to-End Approach to Automatic Speech Assessment for   Cantonese-speaking People with Aphasia

**Authors:** Ying Qin, Yuzhong Wu, Tan Lee, Anthony Pak Hin Kong

arXiv: 1904.00361 · 2019-04-02

## TL;DR

This paper introduces an end-to-end neural network approach for automatic speech assessment of Cantonese-speaking people with aphasia, outperforming traditional feature-based methods and providing insights into learned impairment features.

## Contribution

It proposes a novel end-to-end deep learning framework using CNN and GRU-RNN models for speech assessment in aphasia, eliminating the need for manual feature design.

## Key findings

- End-to-end approach outperforms conventional methods
- CNN learns impairment-related features similar to human-designed features
- CNN model performs better than GRU-RNN in this task

## Abstract

Conventional automatic assessment of pathological speech usually follows two main steps: (1) extraction of pathology-specific features; (2) classification or regression on extracted features. Given the great variety of speech and language disorders, feature design is never a straightforward task, and yet it is most crucial to the performance of assessment. This paper presents an end-to-end approach to automatic speech assessment for Cantonese-speaking People With Aphasia (PWA). The assessment is formulated as a binary classification task to discriminate PWA with high scores of subjective assessment from those with low scores. The sequence-to-one Recurrent Neural Network with Gated Recurrent Unit (GRU-RNN) and Convolutional Neural Network (CNN) models are applied to realize the end-to-end mapping from fundamental speech features to the classification result. The pathology-specific features used for assessment can be learned implicitly by the neural network model. Class Activation Mapping (CAM) method is utilized to visualize how those features contribute to the assessment result. Our experimental results show that the end-to-end approach outperforms the conventional two-step approach in the classification task, and confirm that the CNN model is able to learn impairment-related features that are similar to human-designed features. The experimental results also suggest that CNN model performs better than sequence-to-one GRU-RNN model in this specific task.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.00361/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/1904.00361/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/1904.00361/full.md

---
Source: https://tomesphere.com/paper/1904.00361