# Factual Consistency Oriented Speech Recognition

**Authors:** Naoyuki Kanda, Takuya Yoshioka, Yang Liu

arXiv: 2302.12369 · 2023-02-27

## TL;DR

This paper introduces a new optimization framework for speech recognition that reduces hallucinations by maximizing factual consistency between hypotheses and ground-truth, leading to more accurate and reliable transcriptions.

## Contribution

It proposes a novel framework that enhances ASR factual accuracy by integrating a separate estimator into the training process, improving consistency without increasing word error rates.

## Key findings

- Higher factual consistency scores in ASR hypotheses
- Maintains comparable word error rates to traditional models
- Improves speech summarization quality via increased factual accuracy

## Abstract

This paper presents a novel optimization framework for automatic speech recognition (ASR) with the aim of reducing hallucinations produced by an ASR model. The proposed framework optimizes the ASR model to maximize an expected factual consistency score between ASR hypotheses and ground-truth transcriptions, where the factual consistency score is computed by a separately trained estimator. Experimental results using the AMI meeting corpus and the VoxPopuli corpus show that the ASR model trained with the proposed framework generates ASR hypotheses that have significantly higher consistency scores with ground-truth transcriptions while maintaining the word error rates close to those of cross entropy-trained ASR models. Furthermore, it is shown that training the ASR models with the proposed framework improves the speech summarization quality as measured by the factual consistency of meeting conversation summaries generated by a large language model.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.12369/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/2302.12369/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/2302.12369/full.md

---
Source: https://tomesphere.com/paper/2302.12369