NARAIM: Native Aspect Ratio Autoregressive Image Models

Daniel Gallo Fern\'andez; Robert van der Klis; R\u{a}zvan-Andrei; Mati\c{s}an; Janusz Partyka; Efstratios Gavves; Samuele Papa; Phillip Lippe

arXiv:2410.10012·cs.CV·December 6, 2024

NARAIM: Native Aspect Ratio Autoregressive Image Models

Daniel Gallo Fern\'andez, Robert van der Klis, R\u{a}zvan-Andrei, Mati\c{s}an, Janusz Partyka, Efstratios Gavves, Samuele Papa, Phillip Lippe

PDF

Open Access 1 Repo

TL;DR

NARAIM introduces a pre-training method for vision models that preserves the original aspect ratio of images, leading to better interpretation of visual information and improved classification performance.

Contribution

The paper presents NARAIM, a novel autoregressive image model that maintains native aspect ratios during pre-training, addressing a key limitation of existing models.

Findings

01

Improved classification accuracy with aspect ratio preservation

02

Maintains original spatial context for better visual understanding

03

Demonstrates the effectiveness of aspect ratio in autoregressive models

Abstract

While vision transformers are able to solve a wide variety of computer vision tasks, no pre-training method has yet demonstrated the same scaling laws as observed in language models. Autoregressive models show promising results, but are commonly trained on images that are cropped or transformed into square images, which distorts or destroys information present in the input. To overcome this limitation, we propose NARAIM, a vision model pre-trained with an autoregressive objective that uses images in their native aspect ratio. By maintaining the native aspect ratio, we preserve the original spatial context, thereby enhancing the model's ability to interpret visual information. In our experiments, we show that maintaining the aspect ratio improves performance on a downstream classification task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

daniel-gallo/naraim
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques