Multilevel Text Alignment with Cross-Document Attention
Xuhui Zhou, Nikolaos Pappas, Noah A. Smith

TL;DR
This paper introduces a multilevel text alignment method using cross-document attention, allowing for structural comparisons at various levels, improving tasks like citation recommendation and plagiarism detection.
Contribution
It presents a novel hierarchical attention encoder with cross-document attention that learns to align texts at multiple levels, surpassing previous models.
Findings
Outperforms existing hierarchical encoders in citation and plagiarism tasks.
Enables structural comparison across different text levels.
Weakly supervised learning from document pairs enhances alignment accuracy.
Abstract
Text alignment finds application in tasks such as citation recommendation and plagiarism detection. Existing alignment methods operate at a single, predefined level and cannot learn to align texts at, for example, sentence and document levels. We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component, enabling structural comparisons across different levels (document-to-document and sentence-to-document). Our component is weakly supervised from document pairs and can align at multiple levels. Our evaluation on predicting document-to-document relationships and sentence-to-document relationships on the tasks of citation recommendation and plagiarism detection shows that our approach outperforms previously established hierarchical, attention encoders based on recurrent and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques
