# SECTOR: A Neural Model for Coherent Topic Segmentation and   Classification

**Authors:** Sebastian Arnold, Rudolf Schneider, Philippe Cudr\'e-Mauroux, Felix A., Gers, Alexander L\"oser

arXiv: 1902.04793 · 2019-02-14

## TL;DR

SECTOR is a neural model that segments documents into coherent sections and classifies their topics, supported by a new large dataset, achieving significant improvements over previous methods.

## Contribution

We introduce SECTOR, a neural architecture for joint topic segmentation and classification, along with WikiSection, a large dataset for training and evaluation.

## Key findings

- Achieved 71.6% F1 in segmenting and classifying topics in English city documents.
- SECTOR outperforms previous CNN-based classifiers by 29.5 F1 points.
- Demonstrated effectiveness across multiple architectures and languages.

## Abstract

When searching for information, a human reader first glances over a document, spots relevant sections and then focuses on a few sentences for resolving her intention. However, the high variance of document structure complicates to identify the salient topic of a given section at a glance. To tackle this challenge, we present SECTOR, a model to support machine reading systems by segmenting documents into coherent sections and assigning topic labels to each section. Our deep neural network architecture learns a latent topic embedding over the course of a document. This can be leveraged to classify local topics from plain text and segment a document at topic shifts. In addition, we contribute WikiSection, a publicly available dataset with 242k labeled sections in English and German from two distinct domains: diseases and cities. From our extensive evaluation of 20 architectures, we report a highest score of 71.6% F1 for the segmentation and classification of 30 topics from the English city domain, scored by our SECTOR LSTM model with bloom filter embeddings and bidirectional segmentation. This is a significant improvement of 29.5 points F1 compared to state-of-the-art CNN classifiers with baseline segmentation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.04793/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1902.04793/full.md

## References

63 references — full list in the complete paper: https://tomesphere.com/paper/1902.04793/full.md

---
Source: https://tomesphere.com/paper/1902.04793