Enhancing Lightweight Vision Language Models through Group Competitive Learning for Socially Compliant Navigation

Xinyu Zhang; Atsushi Konno; Toshihiko Yamasaki; Ling Xiao

arXiv:2603.11447·cs.RO·March 13, 2026

Enhancing Lightweight Vision Language Models through Group Competitive Learning for Socially Compliant Navigation

Xinyu Zhang, Atsushi Konno, Toshihiko Yamasaki, Ling Xiao

PDF

Open Access

TL;DR

This paper introduces Group Competitive Learning (GCL), a novel training strategy that significantly enhances the reasoning and decision-making capabilities of lightweight vision language models for socially compliant navigation, achieving high accuracy with lower computational costs.

Contribution

The paper proposes GCL, a new training approach combining global semantics and distributional regularization, to improve lightweight VLMs' performance in social navigation tasks.

Findings

01

GCL improves VLM F1 scores by up to 40%.

02

Lightweight models outperform larger models after GCL training.

03

GCL enables efficient and accurate social navigation in real-world scenarios.

Abstract

Social robot navigation requires a sophisticated integration of scene semantics and human social norms. Scaling up Vision Language Models (VLMs) generally improves reasoning and decision-making capabilities for socially compliant navigation. However, increased model size incurs substantial computational overhead, limiting suitability for real-time robotic deployment. Conversely, lightweight VLMs enable efficient inference but often exhibit weaker reasoning and decision-making performance in socially complex environments. Achieving both strong reasoning ability and efficiency remains an open challenge. To bridge this gap, we propose Group Competitive Learning (GCL), a strategy designed to amplify the capabilities of lightweight VLMs. Our strategy introduces the Group Competitive Objective (GCO) to harmonize global semantics with distributional regularization, alongside Asymmetric Group…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Advanced Neural Network Applications