TL;DR
This paper introduces a novel method for learning the objective function of an optimal control system from incomplete trajectory data using a recovery matrix, enabling incremental learning of feature weights.
Contribution
It presents the recovery matrix concept and an incremental algorithm to learn feature weights from partial trajectory observations, improving data efficiency.
Findings
Effective on linear quadratic regulator systems
Successfully applied to a simulated robot manipulator
Allows incremental learning with minimal observations
Abstract
This article develops a methodology that enables learning an objective function of an optimal control system from incomplete trajectory observations. The objective function is assumed to be a weighted sum of features (or basis functions) with unknown weights, and the observed data is a segment of a trajectory of system states and inputs. The proposed technique introduces the concept of the recovery matrix to establish the relationship between any available segment of the trajectory and the weights of given candidate features. The rank of the recovery matrix indicates whether a subset of relevant features can be found among the candidate features and the corresponding weights can be learned from the segment data. The recovery matrix can be obtained iteratively and its rank non-decreasing property shows that additional observations may contribute to the objective learning. Based on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
