Who did They Respond to? Conversation Structure Modeling using Masked Hierarchical Transformer
Henghui Zhu, Feng Nan, Zhiguo Wang, Ramesh Nallapati, Bing Xiang

TL;DR
This paper introduces a novel masked hierarchical transformer model that leverages entire conversation history to accurately identify parent utterances, improving conversation structure modeling for applications like summarization.
Contribution
The work proposes a new masking mechanism and transformer architecture that utilize the full ancestral history for better parent utterance prediction in conversations.
Findings
Significant improvement over baselines including BERT on multiple datasets.
Effective modeling of conversation structure using entire ancestral history.
Release of a new large Reddit conversation dataset.
Abstract
Conversation structure is useful for both understanding the nature of conversation dynamics and for providing features for many downstream applications such as summarization of conversations. In this work, we define the problem of conversation structure modeling as identifying the parent utterance(s) to which each utterance in the conversation responds to. Previous work usually took a pair of utterances to decide whether one utterance is the parent of the other. We believe the entire ancestral history is a very important information source to make accurate prediction. Therefore, we design a novel masking mechanism to guide the ancestor flow, and leverage the transformer model to aggregate all ancestors to predict parent utterances. Our experiments are performed on the Reddit dataset (Zhang, Culbertson, and Paritosh 2017) and the Ubuntu IRC dataset (Kummerfeld et al. 2019). In addition,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Misinformation and Its Impacts
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections
