Investigating large language models for their competence in extracting   grammatically sound sentences from transcribed noisy utterances

Alina Wr\'oblewska

arXiv:2410.05099·cs.CL·October 8, 2024

Investigating large language models for their competence in extracting grammatically sound sentences from transcribed noisy utterances

Alina Wr\'oblewska

PDF

Open Access

TL;DR

This study evaluates large language models' ability to extract grammatically sound sentences from noisy transcribed dialogues, revealing limitations in their syntactic-semantic understanding compared to humans.

Contribution

The paper provides linguistically motivated experiments assessing LLMs' competence in extracting structured utterances from noisy speech transcriptions in Polish, highlighting their current limitations.

Findings

01

LLMs often fail to produce correctly structured utterances.

02

Performance varies depending on the model and noise level.

03

LLMs' understanding of syntactic-semantic rules is superficial.

Abstract

Selectively processing noisy utterances while effectively disregarding speech-specific elements poses no considerable challenge for humans, as they exhibit remarkable cognitive abilities to separate semantically significant content from speech-specific noise (i.e. filled pauses, disfluencies, and restarts). These abilities may be driven by mechanisms based on acquired grammatical rules that compose abstract syntactic-semantic structures within utterances. Segments without syntactic and semantic significance are consistently disregarded in these structures. The structures, in tandem with lexis, likely underpin language comprehension and thus facilitate effective communication. In our study, grounded in linguistically motivated experiments, we investigate whether large language models (LLMs) can effectively perform analogical speech comprehension tasks. In particular, we examine the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling