Learning Constraint Network from Demonstrations via Positive-Unlabeled Learning with Memory Replay
Baiyu Peng, Aude Billard

TL;DR
This paper introduces a positive-unlabeled learning framework with memory replay to infer complex, nonlinear constraints from expert demonstrations, enabling safer and more accurate planning in complex environments.
Contribution
It presents a novel PU learning approach combined with iterative policy updates and memory replay to infer arbitrary constraints from demonstrations, overcoming limitations of prior methods.
Findings
Successfully infers nonlinear constraints in Mujoco environments
Outperforms baseline in constraint accuracy
Enhances policy safety and robustness
Abstract
Planning for a wide range of real-world tasks necessitates to know and write all constraints. However, instances exist where these constraints are either unknown or challenging to specify accurately. A possible solution is to infer the unknown constraints from expert demonstration. The majority of prior works limit themselves to learning simple linear constraints, or require strong knowledge of the true constraint parameterization or environmental model. To mitigate these problems, this paper presents a positive-unlabeled (PU) learning approach to infer a continuous, arbitrary and possibly nonlinear, constraint from demonstration. From a PU learning view, We treat all data in demonstrations as positive (feasible) data, and learn a (sub)-optimal policy to generate high-reward-winning but potentially infeasible trajectories, which serve as unlabeled data containing both feasible and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Natural Language Processing Techniques
