Do We Need Domain-Specific Embedding Models? An Empirical Investigation
Yixuan Tang, Yi Yang

TL;DR
This paper empirically investigates whether general-purpose embedding models are sufficient for domain-specific tasks, using finance as a case study, and introduces FinMTEB to evaluate domain-specific embedding performance.
Contribution
The paper introduces FinMTEB, a finance domain-specific benchmark, and provides evidence that current general-purpose models underperform in specialized domains, highlighting the need for domain-specific embeddings.
Findings
Significant performance drop of models on FinMTEB compared to MTEB
Performance on MTEB does not correlate with FinMTEB results
Domain-specific models are necessary for capturing specialized linguistic patterns
Abstract
Embedding models play a crucial role in representing and retrieving information across various NLP applications. Recent advancements in Large Language Models (LLMs) have further enhanced the performance of embedding models, which are trained on massive amounts of text covering almost every domain. These models are often benchmarked on general-purpose datasets like Massive Text Embedding Benchmark (MTEB), where they demonstrate superior performance. However, a critical question arises: Is the development of domain-specific embedding models necessary when general-purpose models are trained on vast corpora that already include specialized domain texts? In this paper, we empirically investigate this question, choosing the finance domain as an example. We introduce the Finance Massive Text Embedding Benchmark (FinMTEB), a counterpart to MTEB that consists of financial domain-specific text…
Peer Reviews
Decision·Submitted to ICLR 2025
1. **Valuable problem**: This paper focuses on a practical and important problem: creating high-quality text embeddings for the financial domain. 2. **Well-written paper**: The paper is clearly written and easy to follow. 3. **Comprehensive model evaluation**: The authors evaluate a diverse range of embedding models, providing a thorough overview of the current embedding model landscape. Additionally, the experimental results demonstrate the challenges posed by FinMTEB, highlighting the need f
1. **Should current general-purpose text embedding models perform well on financial embedding tasks?**: I have concerns about whether current SOTA text embedding models should be expected to perform effectively on financial embedding tasks. For example, the NV-Embed [1] model is trained on only a single finance-related dataset, FiQA. Given the limited financial data used in training, these embeddings may not perform well for finance-specific tasks. 2. **Unfair dataset language composition**: Mo
This paper explores a critical question―whether domain-specific embedding models are necessary in the era of general-purpose, large-scale language models. This paper presents the Finance Massive Text Embedding Benchmark (FinMTEB), a collection of existing or newly constructed evaluation datasets. The dataset may be useful for emphasizing the finance-specific abilities in benchmarking because the performances on the (general-domain) MTEB and FinMTEB seem less correlated. This paper evaluates data
The paper’s title question, "Do We Need Domain-Specific Embedding Models?" is not convincingly addressed. To conclude that domain-specific models are necessary, it would be important to demonstrate that (1) a domain-specific embedding model can outperform a general-purpose model within that domain and (2) the observed difference is specifically due to domain specialization. However, the paper does not provide such evidence. Therefore, the research question (and title) would be better phrased as
- This paper presents a massive financial text embedding benchmark — FinMTEB, which consists of 64 financial datasets. I think it is a helpful resource for the financial NLP research community. - This paper designs a novel and comprehensive evaluation framework to prove that the performance gap between MTEB and FinMTEB is due to the models’ capabilities rather than inherent dataset complexity. - The paper is well organized and well written.
- From my point of view, the title is not good because the main contribution of this paper is the benchmark. Why not refer to the benchmark directly in the title? - The evidence may be insufficient to support that the general embedding is not good at domain application since only one domain, i.e., finance, is proven in this paper. - Lack of novelty: According to the previous work [1][2][3], domain fine-tuning can improve domain-specific performance, which is well-accepted evidence. Based on this
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
