Parameter-Inverted Image Pyramid Networks
Xizhou Zhu, Xue Yang, Zhaokai Wang, Hao Li, Wenhan Dou, Junqi Ge,, Lewei Lu, Yu Qiao, Jifeng Dai

TL;DR
The paper introduces Parameter-Inverted Image Pyramid Networks (PIIP), a novel architecture that processes multi-scale images with varying model sizes to improve efficiency and performance in vision tasks.
Contribution
PIIP employs models with different parameter sizes for each resolution level and introduces a feature interaction mechanism, reducing computation while enhancing multi-scale feature integration.
Findings
Achieves superior performance in object detection, segmentation, and classification.
Reduces computational cost by 40-60% compared to traditional methods.
Improves large-scale vision model performance by 1-2%.
Abstract
Image pyramids are commonly used in modern computer vision tasks to obtain multi-scale features for precise understanding of images. However, image pyramids process multiple resolutions of images using the same large-scale model, which requires significant computational cost. To overcome this issue, we propose a novel network architecture known as the Parameter-Inverted Image Pyramid Networks (PIIP). Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid, thereby balancing computational efficiency and performance. Specifically, the input to PIIP is a set of multi-scale images, where higher resolution images are processed by smaller networks. We further propose a feature interaction mechanism to allow features of different resolutions to complement each other and effectively integrate information from different spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗OpenGVLab/PIIPmodel· ♡ 5♡ 5
- 🤗OpenGVLab/PIIP-LLaVA_CLIP-BL_512-448_13Bmodel· 18 dl18 dl
- 🤗OpenGVLab/PIIP-LLaVA_ConvNeXt-B_CLIP-L_1024-336_13Bmodel· 11 dl11 dl
- 🤗OpenGVLab/PIIP-LLaVA_ConvNeXt-B_CLIP-L_640-224_7Bmodel· 18 dl18 dl
- 🤗OpenGVLab/PIIP-LLaVA-Plus_ConvNeXt-L_CLIP-L_1024-336_7Bmodel· 17 dl17 dl
- 🤗OpenGVLab/PIIP-LLaVA_ConvNeXt-L_CLIP-L_1024-336_13Bmodel· 16 dl16 dl
- 🤗OpenGVLab/PIIP-LLaVA_CLIP-BL_512-448_7Bmodel· 17 dl17 dl
- 🤗OpenGVLab/clip-vit-large-patch14to16-336model· 3 dl3 dl
- 🤗OpenGVLab/PIIP-LLaVA_ConvNeXt-L_CLIP-L_1024-336_7Bmodel· 16 dl16 dl
- 🤗OpenGVLab/PIIP-LLaVA_ConvNeXt-B_CLIP-L_1024-336_7Bmodel· 17 dl17 dl
Videos
Taxonomy
TopicsImage and Signal Denoising Methods · Advanced Vision and Imaging · Advanced Image Processing Techniques
MethodsSparse Evolutionary Training
