Finetune-Informed Pretraining Boosts Downstream Performance
Atik Faysal, Mohammad Rostami, Reihaneh Gh. Roshan, Nikhil Muralidhar, Huaxia Wang

TL;DR
This paper introduces Finetune-Informed Pretraining (FIP), a simple, model-agnostic method that enhances the target modality's representation during pretraining, leading to better downstream performance in multimodal tasks.
Contribution
FIP biases pretraining toward the target modality by adjusting masking difficulty, loss weighting, and decoder capacity without altering the encoder or needing extra data.
Findings
FIP improves downstream performance on wireless signal classification.
FIP does not require additional data or compute.
FIP is compatible with various multimodal masked modeling pipelines.
Abstract
Multimodal pretraining is effective for building general-purpose representations, but in many practical deployments, only one modality is heavily used during downstream fine-tuning. Standard pretraining strategies treat all modalities uniformly, which can lead to under-optimized representations for the modality that actually matters. We propose Finetune-Informed Pretraining (FIP), a model-agnostic method that biases representation learning toward a designated target modality needed at fine-tuning time. FIP combines higher masking difficulty, stronger loss weighting, and increased decoder capacity for the target modality, without modifying the shared encoder or requiring additional supervision. When applied to masked modeling on constellation diagrams for wireless signals, FIP consistently improves downstream fine-tuned performance with no extra data or compute. FIP is simple to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech and dialogue systems · Speech Recognition and Synthesis · Multimodal Machine Learning Applications
