PhiNet v2: A Mask-Free Brain-Inspired Vision Foundation Model from Video
Makoto Yamada, Kian Ming A. Chai, Ayoub Rhim, Satoki Ishikawa, Mohammad Sabokrou, Yao-Hung Hubert Tsai

TL;DR
PhiNet v2 introduces a brain-inspired, Transformer-based SSL model that processes video sequences without heavy augmentation, achieving competitive performance and aligning more closely with human visual cognition.
Contribution
It presents PhiNet v2, a novel Transformer architecture that learns from sequential visual data using variational inference, advancing biologically plausible vision models.
Findings
Achieves competitive accuracy with state-of-the-art models
Learns effectively from raw video sequences without strong augmentation
Aligns visual processing more closely with human cognition
Abstract
Recent advances in self-supervised learning (SSL) have revolutionized computer vision through innovative architectures and learning objectives, yet they have not fully leveraged insights from biological visual processing systems. Recently, a brain-inspired SSL model named PhiNet was proposed; it is based on a ResNet backbone and operates on static image inputs with strong augmentation. In this paper, we introduce PhiNet v2, a novel Transformer-based architecture that processes temporal visual input (that is, sequences of images) without relying on strong augmentation. Our model leverages variational inference to learn robust visual representations from continuous input streams, similar to human visual processing. Through extensive experimentation, we demonstrate that PhiNet v2 achieves competitive performance compared to state-of-the-art vision foundation models, while maintaining the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · EEG and Brain-Computer Interfaces
MethodsAverage Pooling · Convolution · Global Average Pooling · Kaiming Initialization · Variational Inference · Max Pooling
