AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation
Anil Kag, Huseyin Coskun, Jierun Chen, Junli Cao, Willi Menapace,, Aliaksandr Siarohin, Sergey Tulyakov, Jian Ren

TL;DR
AsCAN introduces an asymmetric hybrid convolution-transformer architecture that achieves efficient recognition and generation across multiple tasks, offering superior performance-latency trade-offs and state-of-the-art results in large-scale text-to-image synthesis.
Contribution
The paper proposes a novel asymmetric hybrid architecture combining convolutional and transformer blocks, optimized for diverse tasks and scalable to large-scale applications.
Findings
Supports multiple tasks including recognition, segmentation, and image generation.
Achieves faster inference speed than existing efficient transformer models.
Sets new state-of-the-art performance in large-scale text-to-image synthesis.
Abstract
Neural network architecture design requires making many crucial decisions. The common desiderata is that similar decisions, with little modifications, can be reused in a variety of tasks and applications. To satisfy that, architectures must provide promising latency and performance trade-offs, support a variety of tasks, scale efficiently with respect to the amounts of data and compute, leverage available data from other tasks, and efficiently support various hardware. To this end, we introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks. We revisit the key design principles of hybrid architectures and propose a simple and effective \emph{asymmetric} architecture, where the distribution of convolutional and transformer blocks is \emph{asymmetric}, containing more convolutional blocks in the earlier stages, followed by more transformer blocks in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Advanced Neural Network Applications · Brain Tumor Detection and Classification
MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
