Human-inspired Global-to-Parallel Multi-scale Encoding for Lightweight Vision Models
Wei Xu

TL;DR
This paper introduces GPM, a human-inspired multi-scale encoding method for lightweight vision models that balances global and local feature processing, improving performance on various vision tasks with fewer resources.
Contribution
The paper proposes GPM, a novel multi-scale encoding inspired by human vision, and develops H-GPE, a lightweight network that achieves better accuracy-efficiency trade-offs.
Findings
H-GPE outperforms recent lightweight models in accuracy and efficiency.
GPM effectively captures global and local features inspired by human perception.
H-GPE demonstrates strong results on classification, detection, and segmentation tasks.
Abstract
Lightweight vision networks have witnessed remarkable progress in recent years, yet achieving a satisfactory balance among parameter scale, computational overhead, and task performance remains difficult. Although many existing lightweight models manage to reduce computation considerably, they often do so at the expense of a substantial increase in parameter count (e.g., LSNet, MobileMamba), which still poses obstacles for deployment on resource-limited devices. In parallel, some studies attempt to draw inspiration from human visual perception, but their modeling tends to oversimplify the visual process, making it hard to reflect how perception truly operates. Revisiting the cooperative mechanism of the human visual system, we propose GPM (Global-to-Parallel Multi-scale Encoding). GPM first employs a Global Insight Generator (GIG) to extract holistic cues, and subsequently processes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection
