BP-Seg: A graphical model approach to unsupervised and non-contiguous text segmentation using belief propagation

Fengyi Li; Kayhan Behdin; Natesh Pillai; Xiaofeng Wang; Zhipeng Wang; Ercan Yildiz

arXiv:2505.16965·cs.CL·September 29, 2025

BP-Seg: A graphical model approach to unsupervised and non-contiguous text segmentation using belief propagation

Fengyi Li, Kayhan Behdin, Natesh Pillai, Xiaofeng Wang, Zhipeng Wang, Ercan Yildiz

PDF

Open Access

TL;DR

BP-Seg introduces a graphical model-based unsupervised method for text segmentation that captures both local coherence and long-range semantic similarities using belief propagation.

Contribution

It presents a novel unsupervised approach leveraging graphical models and belief propagation for non-contiguous text segmentation.

Findings

01

Performs favorably compared to existing methods.

02

Effectively groups distant semantically similar sentences.

03

Handles long-form documents efficiently.

Abstract

Text segmentation based on the semantic meaning of sentences is a fundamental task with broad utility in many downstream applications. In this paper, we propose a graphical model-based unsupervised learning approach, named BP-Seg for efficient text segmentation. Our method not only considers local coherence, capturing the intuition that adjacent sentences are often more related, but also effectively groups sentences that are distant in the text yet semantically similar. This is achieved through belief propagation on the carefully constructed graphical models. Experimental results on both an illustrative example and a dataset with long-form documents demonstrate that our method performs favorably compared to competing approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications