Evaluating Embedding Models and Pipeline Optimization for AI Search Quality
Philip Zhong, Kent Chen, Don Wang

TL;DR
This paper systematically evaluates various embedding models and pipeline configurations to enhance AI search quality, demonstrating how higher-dimensional embeddings and neural re-rankers improve retrieval accuracy.
Contribution
It introduces a comprehensive evaluation framework for embedding models and pipeline strategies, including a custom dataset and metrics, to optimize AI search systems.
Findings
Higher-dimensional embeddings significantly improve search accuracy.
Neural re-rankers further enhance retrieval performance.
Finer chunking granularity leads to better results.
Abstract
We evaluate the performance of various text embedding models and pipeline configurations for AI-driven search systems. We compare sentence-transformer and generative embedding models (e.g., All-MPNet, BGE, GTE, and Qwen) at different dimensions, indexing methods (Milvus HNSW/IVF), and chunking strategies. A custom evaluation dataset of 11,975 query-chunk pairs was synthesized from US City Council meeting transcripts using a local large language model (LLM). The data pipeline includes preprocessing, automated question generation per chunk, manual validation, and continuous integration/continuous deployment (CI/CD) integration. We measure retrieval accuracy using reference-based metrics: Top-K Accuracy and Normalized Discounted Cumulative Gain (NDCG). Our results demonstrate that higher-dimensional embeddings significantly boost search quality (e.g., Qwen3-Embedding-8B/4096 achieves Top-3…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Expert finding and Q&A systems
