Efficient Representation Learning via Adaptive Context Pooling
Chen Huang, Walter Talbott, Navdeep Jaitly, Josh Susskind

TL;DR
This paper introduces ContextPool, an adaptive pooling method for attention models that learns to adjust context granularity, improving efficiency and performance in language and image tasks.
Contribution
We propose ContextPool, a novel adaptive pooling technique that enhances attention models by learning to adjust context size dynamically, reducing computational cost and improving expressiveness.
Findings
Achieves state-of-the-art performance with less compute.
Outperforms recent methods with learned context sizes.
Applicable to both transformers and ConvNets.
Abstract
Self-attention mechanisms model long-range context by using pairwise attention between all input tokens. In doing so, they assume a fixed attention granularity defined by the individual tokens (e.g., text characters or image pixels), which may not be optimal for modeling complex dependencies at higher levels. In this paper, we propose ContextPool to address this problem by adapting the attention granularity for each token. Inspired by the success of ConvNets that are combined with pooling to capture long-range dependencies, we learn to pool neighboring features for each token before computing attention in a given attention layer. The pooling weights and support size are adaptively determined, allowing the pooled features to encode meaningful context with varying scale. We show that ContextPool makes attention models more expressive, achieving strong performance often with fewer layers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Domain Adaptation and Few-Shot Learning
