RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods

Raghav Sharma; Manan Mehta; Sai Tiger Raina

arXiv:2511.03939·cs.LG·November 7, 2025

RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods

Raghav Sharma, Manan Mehta, Sai Tiger Raina

PDF

Open Access

TL;DR

This survey reviews recent advances in Reinforcement Learning from Human Feedback (RLHF), emphasizing multi-modal, cultural fairness, and low-latency alignment methods to improve the robustness and equity of large language models.

Contribution

It provides a comprehensive synthesis of new RLHF techniques beyond text, analyzing foundational algorithms and outlining open challenges in the field.

Findings

01

Comparison of PPO, DPO, and GRPO algorithms

02

Identification of gaps in multi-modal and cultural alignment

03

Discussion of low-latency optimization challenges

Abstract

Reinforcement Learning from Human Feedback (RLHF) is the standard for aligning Large Language Models (LLMs), yet recent progress has moved beyond canonical text-based methods. This survey synthesizes the new frontier of alignment research by addressing critical gaps in multi-modal alignment, cultural fairness, and low-latency optimization. To systematically explore these domains, we first review foundational algo- rithms, including PPO, DPO, and GRPO, before presenting a detailed analysis of the latest innovations. By providing a comparative synthesis of these techniques and outlining open challenges, this work serves as an essential roadmap for researchers building more robust, efficient, and equitable AI systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications