JFinTEB: Japanese Financial Text Embedding Benchmark
Masahiro Suzuki, Hiroki Sakaji

TL;DR
JFinTEB is a comprehensive benchmark for evaluating Japanese financial text embeddings, covering diverse tasks and models to advance domain-specific research.
Contribution
It introduces the first dedicated benchmark with datasets and evaluation protocols for Japanese financial text embeddings, filling a critical resource gap.
Findings
Extensive evaluation of various embedding models on the benchmark.
Benchmark datasets and evaluation framework publicly released.
Demonstrates the importance of domain-specific embeddings for financial texts.
Abstract
We introduce JFinTEB, the first comprehensive benchmark specifically designed for evaluating Japanese financial text embeddings. Existing embedding benchmarks provide limited coverage of language-specific and domain-specific aspects found in Japanese financial texts. Our benchmark encompasses diverse task categories including retrieval and classification tasks that reflect realistic and well-defined financial text processing scenarios. The retrieval tasks leverage instruction-following datasets and financial text generation queries, while classification tasks cover sentiment analysis, document categorization, and domain-specific classification challenges derived from economic survey data. We conduct extensive evaluations across a wide range of embedding models, including Japanese-specific models of various sizes, multilingual models, and commercial embedding services. We publicly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
