JFinTEB: Japanese Financial Text Embedding Benchmark

Masahiro Suzuki; Hiroki Sakaji

arXiv:2604.15882·cs.IR·April 20, 2026

JFinTEB: Japanese Financial Text Embedding Benchmark

Masahiro Suzuki, Hiroki Sakaji

PDF

1 Repo 1 Datasets

TL;DR

JFinTEB is a comprehensive benchmark for evaluating Japanese financial text embeddings, covering diverse tasks and models to advance domain-specific research.

Contribution

It introduces the first dedicated benchmark with datasets and evaluation protocols for Japanese financial text embeddings, filling a critical resource gap.

Findings

01

Extensive evaluation of various embedding models on the benchmark.

02

Benchmark datasets and evaluation framework publicly released.

03

Demonstrates the importance of domain-specific embeddings for financial texts.

Abstract

We introduce JFinTEB, the first comprehensive benchmark specifically designed for evaluating Japanese financial text embeddings. Existing embedding benchmarks provide limited coverage of language-specific and domain-specific aspects found in Japanese financial texts. Our benchmark encompasses diverse task categories including retrieval and classification tasks that reflect realistic and well-defined financial text processing scenarios. The retrieval tasks leverage instruction-following datasets and financial text generation queries, while classification tasks cover sentiment analysis, document categorization, and domain-specific classification challenges derived from economic survey data. We conduct extensive evaluations across a wide range of embedding models, including Japanese-specific models of various sizes, multilingual models, and commercial embedding services. We publicly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

retarfi/JFinTEB
github

Datasets

retarfi/JFinTEB
dataset· 48 dl
48 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.