Safe and Optimal Learning from Preferences via Weighted Temporal Logic with Applications in Robotics and Formula 1
Ruya Karagulle, Cristian-Ioan Vasile, Necmiye Ozay

TL;DR
This paper introduces a safety-guaranteed, optimal learning framework using Weighted Signal Temporal Logic for autonomous systems, effectively capturing complex preferences in robotics and Formula 1 applications.
Contribution
It presents a novel approach that transforms WSTL learning problems into MILPs with safety guarantees, improving efficiency and applicability.
Findings
Successfully applied to robotic navigation tasks.
Effectively models complex preferences in Formula 1 data.
Ensures safety in preference-based learning.
Abstract
Autonomous systems increasingly rely on human feedback to align their behavior, expressed as pairwise comparisons, rankings, or demonstrations. While existing methods can adapt behaviors, they often fail to guarantee safety in safety-critical domains. We propose a safety-guaranteed, optimal, and efficient approach for solving the learning problem from preferences, rankings, or demonstrations using Weighted Signal Temporal Logic (WSTL). WSTL learning problems, when implemented naively, lead to multi-linear constraints in the weights to be learned. By introducing structural pruning and log-transform procedures, we reduce the problem size and recast it as a Mixed-Integer Linear Program while preserving safety guarantees. Experiments on robotic navigation and real-world Formula 1 data demonstrate that the method captures nuanced preferences and models complex task objectives.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLogic, Reasoning, and Knowledge · Constraint Satisfaction and Optimization · Formal Methods in Verification
