LLM Alignment as Retriever Optimization: An Information Retrieval Perspective
Bowen Jin, Jinsung Yoon, Zhen Qin, Ziqi Wang, Wei Xiong, Yu Meng, Jiawei Han, Sercan O. Arik

TL;DR
This paper introduces a novel approach to LLM alignment by applying principles from Information Retrieval, specifically using a retriever-reranker framework to improve alignment quality effectively and simply.
Contribution
The paper presents LarPO, a new IR-inspired method for LLM alignment that simplifies the process and demonstrates significant improvements over existing approaches.
Findings
LarPO achieves 38.9% improvement on AlpacaEval2.
LarPO achieves 13.7% improvement on MixEval-Hard.
The IR-based framework effectively enhances LLM alignment quality.
Abstract
Large Language Models (LLMs) have revolutionized artificial intelligence with capabilities in reasoning, coding, and communication, driving innovation across industries. Their true potential depends on effective alignment to ensure correct, trustworthy and ethical behavior, addressing challenges like misinformation, hallucinations, bias and misuse. While existing Reinforcement Learning (RL)-based alignment methods are notoriously complex, direct optimization approaches offer a simpler alternative. In this work, we introduce a novel direct optimization approach for LLM alignment by drawing on established Information Retrieval (IR) principles. We present a systematic framework that bridges LLM alignment and IR methodologies, mapping LLM generation and reward models to IR's retriever-reranker paradigm. Building on this foundation, we propose LLM Alignment as Retriever Preference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLibrary Science and Information Systems
