Bidirectional Context-Aware Hierarchical Attention Network for Document Understanding
Jean-Baptiste Remy, Antoine Jean-Pierre Tixier, Michalis, Vazirgiannis

TL;DR
This paper introduces CAHAN, a bidirectional, context-aware hierarchical attention network that improves document understanding by incorporating sentence context, demonstrating superior performance on sentiment and topic classification tasks.
Contribution
The paper proposes and evaluates modifications to HAN, enabling context-aware sentence encoding and bidirectional document processing, enhancing understanding of complex documents.
Findings
Bidirectional CAHAN outperforms standard HAN on multiple datasets.
Incorporating context-aware attention improves sentence encoding.
The model achieves these gains with modest additional computational cost.
Abstract
The Hierarchical Attention Network (HAN) has made great strides, but it suffers a major limitation: at level 1, each sentence is encoded in complete isolation. In this work, we propose and compare several modifications of HAN in which the sentence encoder is able to make context-aware attentional decisions (CAHAN). Furthermore, we propose a bidirectional document encoder that processes the document forwards and backwards, using the preceding and following sentences as context. Experiments on three large-scale sentiment and topic classification datasets show that the bidirectional version of CAHAN outperforms HAN everywhere, with only a modest increase in computation time. While results are promising, we expect the superiority of CAHAN to be even more evident on tasks requiring a deeper understanding of the input documents, such as abstractive summarization. Code is publicly available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Natural Language Processing Techniques
