MC-MLP:Multiple Coordinate Frames in all-MLP Architecture for Vision
Zhimin Zhu, Jianguo Zhao, Tong Mu, Yuliang Yang, Mengyu Zhu

TL;DR
MC-MLP introduces a novel all-MLP architecture for vision that employs multiple coordinate frames through orthogonal transforms, enhancing feature learning and outperforming existing MLP models in image classification.
Contribution
The paper proposes MC-MLP, a new MLP backbone that uses multiple coordinate frames via orthogonal transforms to improve learning capacity in vision tasks.
Findings
MC-MLP outperforms most existing MLPs in image classification.
The model achieves better performance at the same parameter level.
Orthogonal transforms enable learning across different coordinate frames.
Abstract
In deep learning, Multi-Layer Perceptrons (MLPs) have once again garnered attention from researchers. This paper introduces MC-MLP, a general MLP-like backbone for computer vision that is composed of a series of fully-connected (FC) layers. In MC-MLP, we propose that the same semantic information has varying levels of difficulty in learning, depending on the coordinate frame of features. To address this, we perform an orthogonal transform on the feature information, equivalent to changing the coordinate frame of features. Through this design, MC-MLP is equipped with multi-coordinate frame receptive fields and the ability to learn information across different coordinate frames. Experiments demonstrate that MC-MLP outperforms most MLPs in image classification tasks, achieving better performance at the same parameter level. The code will be available at: https://github.com/ZZM11/MC-MLP.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · CCD and CMOS Imaging Sensors · Machine Learning and ELM
