VDLF-Net: Variational Feature Fusion for Adaptive and Few-Shot Visual Learning

Jiawei Yan

arXiv:2604.23641·cs.CV·April 28, 2026

VDLF-Net: Variational Feature Fusion for Adaptive and Few-Shot Visual Learning

Jiawei Yan

PDF

TL;DR

VDLF-Net is a novel neural architecture combining variational autoencoders with multi-scale CNNs, enhancing few-shot learning performance on standard benchmarks through a unique feature fusion and training strategy.

Contribution

The paper introduces VDLF-Net, a new model integrating VAE-based feature fusion with CNNs for adaptive and few-shot visual learning, showing superior results.

Findings

01

VDLF-Net outperforms ResNet-50, VGG-16, and prototypical networks on CIFAR-100 and Mini-ImageNet.

02

Removing fine-resolution scale significantly reduces performance.

03

Full architecture and training strategy are key to performance gains.

Abstract

This paper introduces VDLF-Net, which attaches a compact VAE to a multi-scale CNN backbone. Latent vectors and softmax-gate support the backbone feature maps, while $ℓ_{2}$ -normalized embeddings from the gated maps contribute toward supervised classification or episodic few-shot prediction. Under standard CIFAR-100 and Mini-ImageNet protocols, VDLF-Net demonstrates an improved performance over ResNet-50 Enhanced, VGG-16, Prototypical Networks, and Matching Networks. Extensive ablations show that removing the fine-resolution scale has the greatest impact on VDLF-Net's performance. At the same time, KL and reconstruction at the chosen $α$ pose a minor performance reduction, demonstrating that performance gains over classical episodic baselines mainly originate from the full VDLF-Net architecture and training strategy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.