OmniSapiens: A Foundation Model for Social Behavior Processing via Heterogeneity-Aware Relative Policy Optimization

Keane Ong; Sabri Boughorbel; Luwei Xiao; Chanakya Ekbote; Wei Dai; Ao Qu; Jingyao Wu; Rui Mao; Ehsan Hoque; Erik Cambria; Gianmarco Mengaldo; Paul Pu Liang

arXiv:2602.10635·cs.AI·February 12, 2026

OmniSapiens: A Foundation Model for Social Behavior Processing via Heterogeneity-Aware Relative Policy Optimization

Keane Ong, Sabri Boughorbel, Luwei Xiao, Chanakya Ekbote, Wei Dai, Ao Qu, Jingyao Wu, Rui Mao, Ehsan Hoque, Erik Cambria, Gianmarco Mengaldo, Paul Pu Liang

PDF

Open Access

TL;DR

OmniSapiens introduces HARPO, a novel RL method that effectively learns from heterogeneous social behavioral data, resulting in a foundation model with superior multitask performance and robust reasoning capabilities.

Contribution

The paper presents HARPO, a heterogeneity-aware RL algorithm, and develops OmniSapiens-7B 2.0, a social behavior foundation model that outperforms existing models across multiple tasks.

Findings

01

OmniSapiens-7B 2.0 achieves up to +16.85% performance gains.

02

It demonstrates strong performance on multitask and held-out settings.

03

HARPO outperforms recent RL methods in heterogeneous behavioral tasks.

Abstract

To develop socially intelligent AI, existing approaches typically model human behavioral dimensions (e.g., affective, cognitive, or social attributes) in isolation. Although useful, task-specific modeling often increases training costs and limits generalization across behavioral settings. Recent reasoning RL methods facilitate training a single unified model across multiple behavioral tasks, but do not explicitly address learning across different heterogeneous behavioral data. To address this gap, we introduce Heterogeneity-Aware Relative Policy Optimization (HARPO), an RL method that balances leaning across heterogeneous tasks and samples. This is achieved by modulating advantages to ensure that no single task or sample carries disproportionate influence during policy optimization. Using HARPO, we develop and release Omnisapiens-7B 2.0, a foundation model for social behavior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Digital Mental Health Interventions · Recommender Systems and Techniques