Unsupervised Inference of Data-Driven Discourse Structures using a Tree Auto-Encoder
Patrick Huber, Giuseppe Carenini

TL;DR
This paper introduces an unsupervised method using a tree auto-encoder to infer discourse structures, enabling the creation of large, diverse discourse treebanks without relying on annotated data.
Contribution
It extends latent tree induction with auto-encoding to generate task-agnostic discourse trees in an unsupervised manner, addressing the scarcity of annotated discourse data.
Findings
Enables unsupervised inference of discourse trees
Produces larger, more diverse discourse treebanks
Applicable to various tree-structured objectives
Abstract
With a growing need for robust and general discourse structures in many downstream tasks and real-world applications, the current lack of high-quality, high-quantity discourse trees poses a severe shortcoming. In order the alleviate this limitation, we propose a new strategy to generate tree structures in a task-agnostic, unsupervised fashion by extending a latent tree induction framework with an auto-encoding objective. The proposed approach can be applied to any tree-structured objective, such as syntactic parsing, discourse parsing and others. However, due to the especially difficult annotation process to generate discourse trees, we initially develop such method to complement task-specific models in generating much larger and more diverse discourse treebanks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
