Investigating large language models for their competence in extracting grammatically sound sentences from transcribed noisy utterances
Alina Wr\'oblewska

TL;DR
This study evaluates large language models' ability to extract grammatically sound sentences from noisy transcribed dialogues, revealing limitations in their syntactic-semantic understanding compared to humans.
Contribution
The paper provides linguistically motivated experiments assessing LLMs' competence in extracting structured utterances from noisy speech transcriptions in Polish, highlighting their current limitations.
Findings
LLMs often fail to produce correctly structured utterances.
Performance varies depending on the model and noise level.
LLMs' understanding of syntactic-semantic rules is superficial.
Abstract
Selectively processing noisy utterances while effectively disregarding speech-specific elements poses no considerable challenge for humans, as they exhibit remarkable cognitive abilities to separate semantically significant content from speech-specific noise (i.e. filled pauses, disfluencies, and restarts). These abilities may be driven by mechanisms based on acquired grammatical rules that compose abstract syntactic-semantic structures within utterances. Segments without syntactic and semantic significance are consistently disregarded in these structures. The structures, in tandem with lexis, likely underpin language comprehension and thus facilitate effective communication. In our study, grounded in linguistically motivated experiments, we investigate whether large language models (LLMs) can effectively perform analogical speech comprehension tasks. In particular, we examine the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling
