Poolingformer: Long Document Modeling with Pooling Attention

Hang Zhang; Yeyun Gong; Yelong Shen; Weisheng Li; Jiancheng Lv; Nan; Duan; Weizhu Chen

arXiv:2105.04371·cs.CL·October 25, 2022·22 cites

Poolingformer: Long Document Modeling with Pooling Attention

Hang Zhang, Yeyun Gong, Yelong Shen, Weisheng Li, Jiancheng Lv, Nan, Duan, Weizhu Chen

PDF

Open Access 1 Video

TL;DR

Poolingformer introduces a two-level pooling attention mechanism for efficient long document modeling, significantly improving performance on QA and summarization tasks by reducing computational costs.

Contribution

It proposes a novel two-level attention schema with pooling attention for long document modeling, enhancing efficiency and accuracy over previous models.

Findings

01

Outperforms state-of-the-art models on long QA tasks by 1.9 points in F1 score.

02

Achieves superior results on long sequence summarization benchmarks.

03

Reduces computational cost and memory usage compared to traditional attention mechanisms.

Abstract

In this paper, we introduce a two-level attention schema, Poolingformer, for long document modeling. Its first level uses a smaller sliding window pattern to aggregate information from neighbors. Its second level employs a larger window to increase receptive fields with pooling attention to reduce both computational cost and memory consumption. We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA. Experimental results show that Poolingformer sits atop three official leaderboards measured by F1, outperforming previous state-of-the-art models by 1.9 points (79.8 vs. 77.9) on NQ long answer, 1.9 points (79.5 vs. 77.6) on TyDi QA passage answer, and 1.6 points (67.6 vs. 66.0) on TyDi QA minimal answer. We further evaluate Poolingformer on a long sequence summarization task. Experimental results on the arXiv benchmark continue to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Poolingformer: Long Document Modeling with Pooling Attention· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management