Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper
Chih-Kai Yang, Kuan-Po Huang, Hung-yi Lee

TL;DR
This study investigates whether Whisper, a speech recognition model, truly understands prompts and how prompt quality affects its performance, revealing unexpected behaviors and limitations in prompt comprehension.
Contribution
The paper provides an empirical analysis of Whisper's prompt understanding, highlighting its limited comprehension and counter-intuitive responses to prompt quality and language cues.
Findings
Whisper may not understand prompts as humans do.
Performance does not always improve with better prompts.
English prompts outperform Mandarin prompts despite training data differences.
Abstract
This research explores how the information of prompts interacts with the high-performing speech recognition model, Whisper. We compare its performances when prompted by prompts with correct information and those corrupted with incorrect information. Our results unexpectedly show that Whisper may not understand the textual prompts in a human-expected way. Additionally, we find that performance improvement is not guaranteed even with stronger adherence to the topic information in textual prompts. It is also noted that English prompts generally outperform Mandarin ones on datasets of both languages, likely due to differences in training data distributions for these languages despite the mismatch with pre-training scenarios. Conversely, we discover that Whisper exhibits awareness of misleading information in language tokens by ignoring incorrect language tokens and focusing on the correct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics in Business and Education
