TWICE: What Advantages Can Low-Resource Domain-Specific Embedding Model Bring? -- A Case Study on Korea Financial Texts
Yewon Hwang, Sungbum Jung, Hanwool Lee, Sara Yu

TL;DR
This paper introduces KorFinMTEB, a new benchmark for evaluating Korean financial domain embeddings, revealing that models perform differently on culturally nuanced tasks compared to translated benchmarks, emphasizing the need for language-specific evaluation.
Contribution
The paper presents KorFinMTEB, a culturally tailored benchmark for Korean financial texts, highlighting the limitations of translated benchmarks in low-resource, domain-specific settings.
Findings
Models perform well on translated benchmarks but show discrepancies on KorFinMTEB.
Cultural nuances significantly impact embedding model performance.
Benchmark development should incorporate language-specific characteristics.
Abstract
Domain specificity of embedding models is critical for effective performance. However, existing benchmarks, such as FinMTEB, are primarily designed for high-resource languages, leaving low-resource settings, such as Korean, under-explored. Directly translating established English benchmarks often fails to capture the linguistic and cultural nuances present in low-resource domains. In this paper, titled TWICE: What Advantages Can Low-Resource Domain-Specific Embedding Models Bring? A Case Study on Korea Financial Texts, we introduce KorFinMTEB, a novel benchmark for the Korean financial domain, specifically tailored to reflect its unique cultural characteristics in low-resource languages. Our experimental results reveal that while the models perform robustly on a translated version of FinMTEB, their performance on KorFinMTEB uncovers subtle yet critical discrepancies, especially in tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBanking Systems and Strategies
