BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs
Nicolas Boizard, Th\'eo Deschamps-Berger, Hippolyte Gisserot-Boukhlef, C\'eline Hudelot, Pierre Colombo

TL;DR
This paper introduces BidirLM, a method to convert causal language models into bidirectional encoders, achieving superior performance across multiple modalities by novel training strategies and model merging techniques.
Contribution
It presents a systematic approach for adapting causal LLMs into bidirectional encoders, including a new training objective, mitigation of catastrophic forgetting, and integration with specialized models.
Findings
BidirLM outperforms existing models on text, vision, and audio benchmarks.
The critical role of prior masking in successful adaptation is identified.
A scalable adaptation process without original pre-training data is developed.
Abstract
Transforming causal generative language models into bidirectional encoders offers a powerful alternative to BERT-style architectures. However, current approaches remain limited: they lack consensus on optimal training objectives, suffer from catastrophic forgetting at scale, and fail to flexibly integrate the vast ecosystem of specialized generative models. In this work, through systematic ablations on the Gemma3 and Qwen3 families, we identify the key factors driving successful adaptation, highlighting the critical role of an often-omitted prior masking phase. To scale this process without original pre-training data, we introduce a dual strategy combining linear weight merging with a lightweight multi-domain data mixture that mitigates catastrophic forgetting. Finally, we augment our encoders by merging them with specialized causal models, seamlessly transferring modality- and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗BidirLM/BidirLM-Omni-2.5B-Embeddingmodel· 8.3k dl· ♡ 408.3k dl♡ 40
- 🤗BidirLM/BidirLM-0.6B-Embeddingmodel· 441 dl· ♡ 2441 dl♡ 2
- 🤗BidirLM/BidirLM-1.7B-Embeddingmodel· 103 dl· ♡ 4103 dl♡ 4
- 🤗BidirLM/BidirLM-1B-Embeddingmodel· 1.3k dl· ♡ 11.3k dl♡ 1
- 🤗BidirLM/BidirLM-270M-Embeddingmodel· 55 dl· ♡ 355 dl♡ 3
- 🤗BidirLM/BidirLM-0.6B-Basemodel· 32 dl32 dl
- 🤗BidirLM/BidirLM-1.7B-Basemodel· 87 dl· ♡ 187 dl♡ 1
- 🤗BidirLM/BidirLM-1B-Basemodel· 5 dl5 dl
- 🤗BidirLM/BidirLM-270M-Basemodel· 72 dl72 dl
- 🤗beaupi/BidirLM-Omni-2.5B-Embedding-oQ4model· 17 dl17 dl
- BidirLM/BidirLM-Contrastivedataset· 1.9k dl1.9k dl
- BidirLM/colpali_train_retrievaldataset· 2.3k dl2.3k dl
- BidirLM/natcapdataset· 5.3k dl5.3k dl
- BidirLM/librispeech_contrastivedataset· 2.2k dl2.2k dl
- BidirLM/laion_audio_contrastivedataset· 1.4k dl1.4k dl
- BidirLM/mscoco_contrastivedataset· 1.8k dl1.8k dl
- BidirLM/BidirLM-Omni-Contrastivedataset· 105 dl105 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
