Chain-of-Rank: Enhancing Large Language Models for Domain-Specific RAG   in Edge Device

Juntae Lee; Jihwan Bang; Seunghan Yang; Kyuhong Shim; Simyung Chang

arXiv:2502.15134·cs.CL·February 24, 2025

Chain-of-Rank: Enhancing Large Language Models for Domain-Specific RAG in Edge Device

Juntae Lee, Jihwan Bang, Seunghan Yang, Kyuhong Shim, Simyung Chang

PDF

Open Access

TL;DR

This paper introduces Chain of Rank (CoR), a novel method that improves domain-specific retrieval-augmented generation in resource-limited edge devices by simplifying reasoning to document ranking, achieving state-of-the-art results.

Contribution

The paper proposes Chain of Rank (CoR), a new approach that replaces complex reasoning with document ranking to enhance LLM performance on edge devices.

Findings

01

CoR reduces computational complexity significantly.

02

Achieves state-of-the-art results on benchmark datasets.

03

Maintains high accuracy with small-scale LLMs.

Abstract

Retrieval-augmented generation (RAG) with large language models (LLMs) is especially valuable in specialized domains, where precision is critical. To more specialize the LLMs into a target domain, domain-specific RAG has recently been developed by allowing the LLM to access the target domain early via finetuning. The domain-specific RAG makes more sense in resource-constrained environments like edge devices, as they should perform a specific task (e.g. personalization) reliably using only small-scale LLMs. While the domain-specific RAG is well-aligned with edge devices in this respect, it often relies on widely-used reasoning techniques like chain-of-thought (CoT). The reasoning step is useful to understand the given external knowledge, and yet it is computationally expensive and difficult for small-scale LLMs to learn it. Tackling this, we propose the Chain of Rank (CoR) which shifts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Linear Layer · Layer Normalization · Byte Pair Encoding · WordPiece · Dense Connections · Attention Dropout · Residual Connection