# InfoMSD: an information-maximization self-distillation framework for parameter-efficient fine-tuning on artwork images

**Authors:** Feng Guan, Hao Hong, Yong Wang

PMC · DOI: 10.3389/frai.2026.1721866 · Frontiers in Artificial Intelligence · 2026-03-04

## TL;DR

This paper introduces InfoMSD, a new method for efficiently fine-tuning large vision models to recognize objects in artwork images without needing labeled data.

## Contribution

InfoMSD introduces an unsupervised self-distillation framework with entropy regularization for efficient fine-tuning on unlabeled artwork data.

## Key findings

- InfoMSD improves accuracy by +6.43% over CLIP zero-shot baselines with less than 1% parameter updates.
- The method outperforms existing lightweight distillation approaches by 1.35% average accuracy.
- Entropy-based regularization sharpens pseudo-labels and balances class coverage effectively.

## Abstract

In recent years, despite the remarkable performance of large-scale vision language models across various visual classification tasks, their substantial parameter counts and high fine-tuning costs have hindered deployment in resource-constrained cultural and artwork settings. This work specifically addresses the task of object recognition in artwork—that is, identifying semantic objects (e.g., animals, people, everyday items) depicted within paintings, sketches, and other artistic renditions, rather than classifying artistic styles or genres. To address this issue, we propose InfoMSD, an unsupervised, Information-Maximization Self-Distillation framework designed for parameter-efficient fine-tuning on unlabeled artwork imagery while preserving robust performance. Specifically, InfoMSD incorporates a teacher-student architecture in the self-distillation phase, where the teacher model generates pseudo-labels for artworks, and the student model learns from the teacher through cross-entropy. By aligning the student's predictions with the discriminative signals from the teacher's pseudo-labels and simultaneously applying entropy-based regularization to sharpen the probability distribution and balance class coverage, the framework improves both the quality of the pseudo-labels and the discriminative capacity of the model. To enable parameter-efficient fine-tuning, only the layer norm parameters and visual prompts in the student model are updated, while the remaining parameters are frozen, significantly reducing computational overhead. Extensive experimental results on artwork datasets show that InfoMSD achieves accuracy improvements of +6.43 and +3.02% over CLIP zero-shot baselines, while adjusting less than 1% of the model parameters. Compared to existing lightweight distillation methods, InfoMSD achieves average accuracy gains of 1.35 and 0.96%, respectively. Overall, InfoMSD offers a novel, information-theoretic paradigm for unsupervised and efficient fine-tuning in object recognition within artistic imagery, balancing performance and efficiency.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12997121/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12997121/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC12997121/full.md

---
Source: https://tomesphere.com/paper/PMC12997121