WiSeBE: Window-based Sentence Boundary Evaluation

Carlos-Emiliano Gonz\'alez-Gallardo; Juan-Manuel Torres-Moreno

arXiv:1808.08850·cs.CL·August 28, 2018·1 cites

WiSeBE: Window-based Sentence Boundary Evaluation

Carlos-Emiliano Gonz\'alez-Gallardo, Juan-Manuel Torres-Moreno

PDF

Open Access 1 Repo

TL;DR

WiSeBE introduces a semi-supervised, multi-reference evaluation metric for Sentence Boundary Detection, providing a more reliable assessment than standard metrics by accounting for reference variability.

Contribution

The paper proposes WiSeBE, a novel window-based, semi-supervised evaluation metric that improves the reliability of SBD system assessments through multi-reference agreement.

Findings

01

WiSeBE correlates better with practical performance than standard metrics.

02

WiSeBE reveals differences in SBD system performance not captured by traditional metrics.

03

Evaluation over YouTube transcripts demonstrates WiSeBE's effectiveness and reliability.

Abstract

Sentence Boundary Detection (SBD) has been a major research topic since Automatic Speech Recognition transcripts have been used for further Natural Language Processing tasks like Part of Speech Tagging, Question Answering or Automatic Summarization. But what about evaluation? Do standard evaluation metrics like precision, recall, F-score or classification error; and more important, evaluating an automatic system against a unique reference is enough to conclude how well a SBD system is performing given the final application of the transcript? In this paper we propose Window-based Sentence Boundary Evaluation (WiSeBE), a semi-supervised metric for evaluating Sentence Boundary Detection systems based on multi-reference (dis)agreement. We evaluate and compare the performance of different SBD systems over a set of Youtube transcripts using WiSeBE and standard metrics. This double evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cic4k/wisebe
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling