ORPHEAS: A Cross-Lingual Greek-English Embedding Model for Retrieval-Augmented Generation
Ioannis E. Livieris, Athanasios Koursaris, Alexandra Apostolopoulou, Konstantinos Kanaris Dimitris Tsakalidis, George Domalis

TL;DR
ORPHEAS is a specialized Greek-English embedding model designed to improve retrieval-augmented generation by capturing domain-specific and cross-lingual semantics, outperforming existing models.
Contribution
It introduces a high-quality, knowledge graph-based fine-tuning approach for Greek-English embeddings, enhancing cross-lingual retrieval in complex, domain-specific contexts.
Findings
ORPHEAS outperforms state-of-the-art multilingual models in retrieval benchmarks.
Fine-tuning on Greek with knowledge graphs maintains cross-lingual capabilities.
Domain-specific fine-tuning improves semantic representation for Greek.
Abstract
Effective retrieval-augmented generation across bilingual Greek--English applications requires embedding models capable of capturing both domain-specific semantic relationships and cross-lingual semantic alignment. Existing multilingual embedding models distribute their representational capacity across numerous languages, limiting their optimization for Greek and failing to encode the morphological complexity and domain-specific terminological structures inherent in Greek text. In this work, we propose ORPHEAS, a specialized Greek--English embedding model for bilingual retrieval-augmented generation. ORPHEAS is trained with a high quality dataset generated by a knowledge graph-based fine-tuning methodology which is applied to a diverse multi-domain corpus, which enables language-agnostic semantic representations. The numerical experiments across monolingual and cross-lingual retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
