Taking Shortcuts for Categorical VQA Using Super Neurons
Pierre Musacchio, Jaeyi Jeong, Dahun Kim, Jaesik Park

TL;DR
This paper introduces Super Neurons, a method that uses scalar activations from early layers of vision-language models to create accurate classifiers, enabling faster inference and improved performance on visual tasks.
Contribution
It proposes using scalar activations (Super Neurons) from early model layers for classification, offering a training-free, speed-boosting alternative to attention-based methods.
Findings
Super Neurons improve classification accuracy.
Enables early exiting from the first model layer.
Achieves up to 5.10x speedup.
Abstract
Sparse Attention Vectors (SAVs) have emerged as an excellent training-free alternative to supervised finetuning or low-rank adaptation to improve the performance of Vision Language Models (VLMs). At their heart, SAVs select a few accurate attention heads for a task of interest and use them as classifiers, rather than relying on the model's prediction. In a similar spirit, we find that directly probing the raw activations of the VLM, in the form of scalar values, is sufficient to yield accurate classifiers on diverse visually grounded downstream tasks. Shifting focus from attention vectors to scalar activations dramatically increases the search space for accurate parameters, allowing us to find more discriminative neurons immediately from the first generated token. We call such activations Super Neurons (SNs). In this probing setting, we discover that enough SNs appear in the shallower…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques
