Roof-Transformer: Divided and Joined Understanding with Knowledge Enhancement
Wei-Lin Liao, Cheng-En Su, Wei-Yun Ma

TL;DR
Roof-Transformer enhances knowledge integration in NLP by using dual BERT encoders for knowledge and input, improving performance on tasks involving long texts like QA and GLUE benchmarks.
Contribution
Introduces a dual BERT architecture with a fusion layer to better incorporate knowledge resources in long-text NLP tasks.
Findings
Improved accuracy on QA tasks.
Enhanced performance on GLUE benchmark.
Effective knowledge integration in long documents.
Abstract
Recent work on enhancing BERT-based language representation models with knowledge graphs (KGs) and knowledge bases (KBs) has yielded promising results on multiple NLP tasks. State-of-the-art approaches typically integrate the original input sentences with KG triples and feed the combined representation into a BERT model. However, as the sequence length of a BERT model is limited, such a framework supports little knowledge other than the original input sentences and is thus forced to discard some knowledge. This problem is especially severe for downstream tasks for which the input is a long paragraph or even a document, such as QA or reading comprehension tasks. We address this problem with Roof-Transformer, a model with two underlying BERTs and a fusion layer on top. One underlying BERT encodes the knowledge resources and the other one encodes the original input sentences, and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay · Linear Warmup With Linear Decay · Residual Connection · Dense Connections · Layer Normalization
