# Efficient Ensemble for Multimodal Punctuation Restoration using   Time-Delay Neural Network

**Authors:** Xing Yi Liu, Homayoon Beigi

arXiv: 2302.13376 · 2024-05-29

## TL;DR

EfficientPunct introduces a multimodal ensemble model combining acoustic and text embeddings for punctuation restoration, achieving state-of-the-art accuracy with significantly reduced computational complexity.

## Contribution

The paper presents a novel ensemble method using a time-delay neural network that outperforms existing models while being more efficient in inference.

## Key findings

- Outperforms current best model by 1.0 F1 points
- Uses less than one-tenth of the parameters of previous models
- Eliminates attention-based fusion to improve efficiency

## Abstract

Punctuation restoration plays an essential role in the post-processing procedure of automatic speech recognition, but model efficiency is a key requirement for this task. To that end, we present EfficientPunct, an ensemble method with a multimodal time-delay neural network that outperforms the current best model by 1.0 F1 points, using less than a tenth of its inference network parameters. We streamline a speech recognizer to efficiently output hidden layer acoustic embeddings for punctuation restoration, as well as BERT to extract meaningful text embeddings. By using forced alignment and temporal convolutions, we eliminate the need for attention-based fusion, greatly increasing computational efficiency and raising performance. EfficientPunct sets a new state of the art with an ensemble that weights BERT's purely language-based predictions slightly more than the multimodal network's predictions. Our code is available at https://github.com/lxy-peter/EfficientPunct.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.13376/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/2302.13376/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/2302.13376/full.md

---
Source: https://tomesphere.com/paper/2302.13376