Automatically Segmenting Oral History Transcripts
Ryan Shaw

TL;DR
This paper explores automated methods for segmenting oral history transcripts into coherent sections, comparing algorithms like BayesSeg and TextTiling, and discusses challenges in evaluation due to low inter-annotator agreement.
Contribution
It evaluates the performance of BayesSeg and TextTiling algorithms for oral history segmentation and highlights the need for clearer segmentation task definitions.
Findings
BayesSeg performs slightly better than TextTiling.
TextTiling does not significantly outperform a uniform segmentation.
Inter-annotator agreement is low, complicating evaluation.
Abstract
Dividing oral histories into topically coherent segments can make them more accessible online. People regularly make judgments about where coherent segments can be extracted from oral histories. But making these judgments can be taxing, so automated assistance is potentially attractive to speed the task of extracting segments from open-ended interviews. When different people are asked to extract coherent segments from the same oral histories, they often do not agree about precisely where such segments begin and end. This low agreement makes the evaluation of algorithmic segmenters challenging, but there is reason to believe that for segmenting oral history transcripts, some approaches are more promising than others. The BayesSeg algorithm performs slightly better than TextTiling, while TextTiling does not perform significantly better than a uniform segmentation. BayesSeg might be used…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Drilling and Well Engineering · Oral History, Memory, Narrative Analysis
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
