TimeStampEval: A Simple LLM Eval and a Little Fuzzy Matching Trick to Improve Search Accuracy
James McCammon

TL;DR
TimeStampEval introduces a benchmark and a simple two-stage method that significantly enhances the accuracy and efficiency of retrieving precise timestamps from transcripts for non-verbatim quotes, addressing a key challenge in aligning speech and text.
Contribution
The paper presents TimeStampEval, a novel benchmark and a fuzzy matching technique that improves timestamp retrieval accuracy and reduces inference costs in long transcripts.
Findings
Prompt design impacts accuracy more than model choice.
Off-by-one errors are a distinct boundary-misplacement category.
A reasoning budget of 600-850 tokens boosts accuracy significantly.
Abstract
Traditional fuzzy matching often fails when searching for quotes that are semantically identical but syntactically different across documents-a common issue when aligning official written records with speech-to-text transcripts. We introduce TimeStampEval, a benchmark for retrieving precise millisecond timestamps from long transcripts given non-verbatim quotes. Our simple two-stage method dramatically improves retrieval accuracy while cutting inference costs by over 90%. The motivating use case is an automated long-form podcast that assembles Congressional Record clips into AI-hosted narration. The technical challenge: given a sentence-timestamped transcript and a target quote that may differ due to transcription or editorial drift, return exact start and end boundaries. Standard algorithms handle verbatim text but break under fuzzier variants. Evaluating six modern LLMs on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Biomedical Text Mining and Ontologies
