LEGO: Leveraging Experience in Roadmap Generation for Sampling-Based Planning
Rahul Kumar, Aditya Mandalika, Sanjiban Choudhury, Siddhartha S., Srinivasa

TL;DR
LEGO is a novel algorithm that enhances sampling-based motion planning by training a CVAE with strategically chosen samples from bottleneck regions, improving roadmap quality and planning success in complex environments.
Contribution
LEGO introduces a new training approach for CVAE in motion planning, focusing on bottleneck and diverse samples, with formal guarantees and superior performance.
Findings
Significant improvements over heuristics and learned baselines.
Effective in complex obstacle environments and diverse planning problems.
Formal performance guarantees for the proposed method.
Abstract
We consider the problem of leveraging prior experience to generate roadmaps in sampling-based motion planning. A desirable roadmap is one that is sparse, allowing for fast search, with nodes spread out at key locations such that a low-cost feasible path exists. An increasingly popular approach is to learn a distribution of nodes that would produce such a roadmap. State-of-the-art is to train a conditional variational auto-encoder (CVAE) on the prior dataset with the shortest paths as target input. While this is quite effective on many problems, we show it can fail in the face of complex obstacle configurations or mismatch between training and testing. We present an algorithm LEGO that addresses these issues by training the CVAE with target samples that satisfy two important criteria. Firstly, these samples belong only to bottleneck regions along near-optimal paths that are otherwise…
| Non-Learned Samplers | Learned Samplers | ||||||
|---|---|---|---|---|---|---|---|
| Halton | MAPRM | RBB | Gaussian | WIS | ShortestPath | LEGO | |
| Point Robot (2D) | |||||||
| N-link Arm (3D) | |||||||
| N-link Arm (7D) | |||||||
| Snake Robot (5D) | |||||||
| Snake Robot (9D) | |||||||
| Manipulator (7D) | |||||||
| Manipulator (8D) | |||||||
| Non-Learned Samplers | Learned Samplers | ||||||
| Halton | MAPRM | RBB | Gaussian | WIS | ShortestPath | LEGO | |
| 2D Point Robot Planning | |||||||
| 2D Large (easy) | |||||||
| 2D Medium | |||||||
| 2D Small (hard) | |||||||
| N-Link Arm | |||||||
| 3D | |||||||
| 7D | |||||||
| N-Link Snake Robot | |||||||
| 5D | |||||||
| 9D | |||||||
| Manipulator Arm Planning | |||||||
| Unconstrained (8D) | |||||||
| Constrained (7D) | |||||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
LEGO: Leveraging Experience in Roadmap Generation for Sampling-Based Planning
Rahul Kumar1,*, Aditya Mandalika2*, Sanjiban Choudhury2* and Siddhartha S. Srinivasa2* This work was (partially) funded by the National Institute of Health R01 (#R01EB019335), National Science Foundation CPS (#1544797), National Science Foundation NRI (#1637748), the Office of Naval Research, the RCTA, Amazon, and Honda.1Department of Computer Science, Indian Institute of Technology, Kharagpur {vernwalrahul}@iitkgp.ac.in2*Paul G. Allen School of Computer Science and Engineering, University of Washington {adityavk, sanjibac, siddh}@cs.uw.edu
Abstract
We consider the problem of leveraging prior experience to generate roadmaps in sampling-based motion planning. A desirable roadmap is one that is sparse, allowing for fast search, with nodes spread out at key locations such that a low-cost feasible path exists. An increasingly popular approach is to learn a distribution of nodes that would produce such a roadmap. State-of-the-art is to train a conditional variational auto-encoder (CVAE) on the prior dataset with the shortest paths as target input. While this is quite effective on many problems, we show it can fail in the face of complex obstacle configurations or mismatch between training and testing.
We present an algorithm LEGO that addresses these issues by training the CVAE with target samples that satisfy two important criteria. Firstly, these samples belong only to bottleneck regions along near-optimal paths that are otherwise difficult-to-sample with a uniform sampler. Secondly, these samples are spread out across diverse regions to maximize the likelihood of a feasible path existing. We formally define these properties and prove performance guarantees for LEGO. We extensively evaluate LEGO on a range of planning problems, including robot arm planning, and report significant gains over heuristics as well as learned baselines.
I Introduction
We examine the problem of leveraging prior experience in sampling-based motion planning. In this framework, the continuous configuration space of a robot is sampled to construct a graph or roadmap [1, 2] where vertices represent robot configurations and edges represent potential movements of the robot. A shortest path algorithm [3] is then run to compute a path between any two vertices on the roadmap. The main challenge is to place a small set of samples in key locations such that the algorithm can find a high quality path with small computational effort as shown in Fig. 1(b).
Typically, low dispersion samplers such as Halton sequences [4] are quite effective in uniformly covering the space and thus bounding the solution quality [5] (Fig. 1(a)). However, as they decrease dispersion uniformly in C-space, a narrow passage with clearance in a -dimensional space requires samples to find a path. This motivates the need for biased sampling to selectively densify in regions where there might be a narrow passage [6, 7, 8, 9, 10]. These techniques are applicable across a wide range of domains and perform quite well in practice.
However, not all narrow passages are relevant to a given query. Biased sampling techniques, which do not have access to the likelihood of the optimal path passing through a region, can still drop samples in more regions than necessary. Interestingly, the different environments that a robot operates in share a lot of structural similarity. We can use information extracted from planning on one such environment to decide how to sample on another; we can learn sampling distributions using tools such as a conditional variational auto-encoder (CVAE). Ichter et al. [11] propose a useful approximation to train a learner to sample along the predicted shortest path: given a training dataset of worlds, compute shortest paths, and train a model to independently predict nodes belonging to the path. After all, the best a generative model can do is to sample only along the true shortest path. However, this puts all of the burden on the learner. Any amount of prediction error, due to approximation or train-test mismatch, results in failure to find a feasible path.
We argue that a sampler, instead of trying to predict the shortest path, needs to only identify key regions to focus sampling at, and let the search algorithm determine the shortest path. Essentially, we ask the following question:
How can we share the responsibility of finding the shortest path between the sampler and search?
Our key insight is for the sampler to predict not the shortest path, but samples that possess two characteristics: (a) samples in bottleneck regions; these are regions containing near-optimal paths, but are difficult for a uniform sampler to reach, and (b) samples that exhibit diversity; train-test mismatch is common and to be robust to it we need to sample nodes belonging to a diverse set of alternate paths. The search algorithm can then operate on a sparse graph containing useful but diverse samples to compute the shortest path.
We present an algorithmic framework, Leveraging Experience with Graph Oracles (LEGO) summarized in Fig. 2, for training a CVAE on a prior database of worlds to learn a generative model that can be used for roadmap construction. During training (Fig. 2a), LEGO processes a uniform dense graph to identify a sparse subset of vertices. These vertices are a diverse set of bottleneck nodes through which a near-optimal path must pass. These are then fed into a CVAE [12] to learn a generative model. At test time (Fig. 2b), the model is sampled to get a set of vertices which is additionally composed with a sparse uniform graph to get a final roadmap. This roadmap is then used by the search algorithm to find the shortest path.
We make the following contributions:
A framework for training a CVAE to predict a roadmap with different target inputs. We identify two main shortcomings of the state-of-the-art [11] that uses the shortest path as the target input - failures in approximation, and failures due to train-test mismatch (Section IV). 2. 2.
LEGO, an algorithm that tackles both of these issues. It first generates multiple diverse shortest paths, and then extracts bottleneck nodes along such paths to use as the target input for the CVAE (Section VI). 3. 3.
We show that LEGO outperforms several learning and heuristic sampling baselines on a set of , , , and 9 problems. In particular, we show that it is robust to changes in training and test distribution (Section VII).
II Related Work
The seminal work of Hsu et al. [13] provides a crisp analysis of the shortcomings of uniform sampling techniques in the presence of artifacts such as narrow passages. This has led to a plethora of non-uniform sampling approaches that densify selectively [6, 7, 8, 9, 10].
Adaptive sampling in the context of roadmaps aims to exploit structure of the environment to place samples in promising areas. A number of works exploited structure of the workspace to achieve this. While some of them attempt to sample between regions of collision to identify narrow passages [14, 6, 15, 16, 17, 18], others sample near or on the obstacles [19, 20]. There are approaches that divide the configuration space into regions and either select different region-specific planning strategies [21] or use entropy of samples in a particular region to refine sampling [22]. Other methods try to model the free space to speed up planning [23, 24, 25]. While these techniques are quite successful in a large set of problems, they can place samples in regions where an optimal path is unlikely to traverse.
A different class of solutions look at adapting sampling distributions online during the planning cycle. This requires a trade-off between exploration of the configuration space and exploitation of the current best solution. Preliminary approaches define a utility function to do so [26, 27] or use online learning [10]; however these are not amenable to using priors. Diankov and Kuffner [28] employs statistical techniques to sample around a search tree. Zucker et al. [29], Kuo et al. [30] formalize sampling as a model-free reinforcement learning problem and learn a parametric distribution. Since these problems are non i.i.d learning problems, they do require interactive learning and do not enjoy the strong guarantees of supervised learning.
There has been a lot of recent effort on finding low dimensional structure in planning [31]. In particular, generative modeling tools like variational autoencoders [32] have been used to great success [33, 34, 35, 36, 37]. We base our work on Ichter et al. [11] where a CVAE is trained to learn the shortest path distribution.
III Problem Formulation
Given a database of prior worlds, the overall goal is to learn a policy that predicts a roadmap which in turn is used by a search algorithm to efficiently compute a high quality feasible path. Let denote a dimensional configuration space. Let be the portion in collision and denote the free space. Let a path be a continuous mapping from index to configurations. A path is said to be collision-free if for all . Let be a cost functional that maps to a bounded non-negative cost . Moreover, we set . We define a motion planning problem as a tuple of start configuration , goal configuration and free space . Given a problem, a path is said to be feasible if it is collision-free, and . Let denote the set of all feasible paths. We wish to solve the optimal motion planning problem by finding a feasible path that minimizes the cost functional , i.e. .
We now embed the problem on a graph such that each vertex is an element of . The graph follows a connectivity rule expressed as an indicator function to denote if two configurations should have an edge111Note this does not involve collision checking. We consider undirected graphs for simplicity. However, it easily extends to directed graphs.. The weight of an edge is the cost of traversing the edge. We reuse to denote a path on the graph.
Let denote the cardinality of the graph, i.e. the size of 222Alternatively we can also use the size of . We introduce a graph operation with the notation to compactly denote insertion of a new set of vertices , i.e. , and edges, .
A graph search algorithm Alg is given a graph and a planning problem . First, it adds the start-goal pair to the graph, i.e . It then collision checks edges against till it finds and returns the shortest feasible path . The cost of such a path can hence be found by evaluating . If Alg is unable to find any feasible path, it returns which corresponds to .
Definition 1** (Dense Graph).**
We assume we have a dense graph that is sufficiently large to connect the space i.e. for any plausible planning problem, it contains a sufficiently low-cost feasible path.
Henceforth, we care about competing with . We reiterate that searching this graph, , is too computationally expensive to perform online.
We wish to learn a mapping from features extracted from the problem to a sparse subgraph of . Let be a feature representation of the planning problem. Let be a subgraph predictor oracle that maps the feature vector to a subgraph , . We wish to solve the following optimization problem:
Problem 1** (Optimal Subgraph Prediction).**
Given a joint distribution of problems and features, and a dense graph , compute a subgraph predictor oracle that minimizes the ratio of the costs of the shortest feasible paths in the subgraph and the dense graph:
[TABLE]
IV Framework for Predicting Roadmaps
We now present a framework for training graph predicting oracles as illustrated in Fig. 2(a). This is a generalization of the approach presented in [11]. The framework applies three main approximations. First, instead of predicting a subgraph , we learn a mapping that directly predicts states in continuous space.333For cases where a subgraph is preferred, e.g. lies on a constraint manifold, one can design a projection operator Secondly, instead of solving a structured prediction problem, we learn an i.i.d sampler that will be invoked repeatedly to get a set of vertices. These vertices are then connected according to an underlying connection rule, such as -NN, to create a graph. Thirdly, we compose the sampled graph with a constant sparse graph . This ensures that the final predicted graph has some minimal coverage. 444Since is a Halton graph, we use the first Halton sequences.
The core component of the framework is a Conditional Variational Auto-encoder (CVAE) [38] which is used for approximating the desired sample distribution. CVAE is an extension of a traditional variational auto-encoder [12] which is a directed graphical model with low-dimensional Gaussian latent variables. CVAE is a conditional graphical model which makes it relevant for our application where conditioning variables are features of the planning problem. We provide a high level description for brevity, and refer the reader to [32] for a comprehensive tutorial.
Here is the output random variable, is the latent random variable and is the conditioning variable. We wish to learn two deterministic mappings - an encoder and a decoder. An encoder maps to a mean and variance of a Gaussian in latent space, such that it is “close” to an isotropic Gaussian . The decoder maps this Gaussian and to a distribution in the output space . This is achieved by maximizing the following objective :
[TABLE]
Note that the encoder is needed only for training. At test time, only the decoder is used to map samples from an isotropic Guassian in the latent space to samples in the output space.
We train the CVAE by passing in a dataset . is the feature vector (conditioning variable) extracted from the planning problem . is the desired set of nodes extracted from the dense graph that we want our learner to predict. Hence we train the model by maximizing the following objective.
[TABLE]
IV-A General Train and Test Procedure
To summarize, the overall training framework is as follows:
Load a database of planning problems and corresponding feature vectors . 2. 2.
For each , extract relevant nodes from the dense graph by invoking . 3. 3.
Feed dataset as input to CVAE. 4. 4.
Train CVAE and return learned decoder .
At test time, given a planning problem , the graph predicting oracle performs the following set of steps:
Extract feature vector from planning problem . 2. 2.
Sample nodes using decoder . 3. 3.
Connect nodes to create a graph . Compose sampled graph with a constant sparse graph .
The focus of this work is on examining variants of the the node extraction function . While the parameters of the CVAE are certainly relevant (discussed in Appendix A), in this paper we ask the question:
What is a good input to provide to the CVAE?
To that end, we explore the following schemes:
ShortestPath: Extract nodes belonging to the shortest path. This is the baseline approach (Section V). 2. 2.
BottleneckNode: Extract nodes that correspond to bottlenecks along the shortest path (Section VI-A). 3. 3.
DiversePathSet: Extract nodes belonging to multiple diverse shortest paths (Section VI-B). 4. 4.
LEGO: Extract nodes that correspond to bottlenecks along multiple diverse diverse shortest paths. This is our proposed approach (Section VI-C).
V The ShortestPath (Ichter et al. [11]) procedure
We examine the scheme applied in [11] of using nodes belonging to the shortest path on the dense graph as input for training the CVAE. The rationality for this scheme is that the distribution of states belonging to the shortest path might lie on a manifold that can be captured by the latent space of the CVAE. This hypothesis is validated across many high-dimensional planning domains.
We argue that the presented results should not be entirely surprising. The intrinsic difficulty of a planning problem stems from having to search in multiple potential homotopy classes to find a feasible high quality solution. This often manifests in problems involving mazes, bugtraps or narrow passages where the search has to explore and backtrack frequently. Simply increasing the dimension of the problem does not necessarily render it difficult. On the contrary, since the volume of free space increases substantially, there is often an abundance of feasible paths. The challenge, of course, is to find a manifold on which such paths lie with high probability. This is where we found the CVAE to be critical - it learns to interpolate between the start and goal along a low dimensional manifold.
However, we are interested in more difficult problems where such interpolations would break down. Based on extensive evaluations of this ShortestPath scheme, we were able to identify two concrete vulnerabilities:
Failure to route through gaps: Fig. 3(b) shows the output of the CVAE when there is a gap through which the search has to route to get to the goal. The model gets stuck in a poor local minimum between linearly interpolating start-goal and routing through the gap since the network is not expressive enough to map the feature vector to such a path. This is tantamount to burdening the sampler to solve the planning problem. 2. 2.
Presence of unexpected obstacles in test data: Fig. 3(c) shows the output of the CVAE when there are small, unexpected obstacles in the test data which were not present in the training data. The learned distribution samples over this obstacle as it only predicts what it thinks is the shortest path. Even if we were to have such examples in the training data, unless the feature extractor detects such obstacles, the problem remains.
VI Approach
In this section, we present LEGO (Leveraging Experience with Graph Oracles), an algorithm to train a CVAE to predict sparse yet high quality roadmaps. We do so by tackling head-on the challenges identified in in Section IV. Firstly, we recognize that the learner does not have to directly predict the shortest path. Instead, we train it to predict only bottleneck nodes that can assist the underlying search in finding a near-optimal solution. Secondly, the roadmap must be robust to prediction errors of the learner. We safeguard against this by training the learner to predict a diverse set of paths with the hope that at-least one of them is feasible.
VI-A Bottleneck Nodes
We begin by noting that has a uniform coverage over the entire configuration space. Hence, the learner only has to contribute a critical set of nodes that allow to represent paths that are near-optimal with respect to the path in . We call these bottleneck nodes as they correspond to regions that are difficult for a uniform sampler to cover. We define as:
Definition 2** (Bottleneck Nodes).**
Given a dense graph , find the smallest set of nodes which in conjunction with a sparse subgraph contains a near-optimal path, i.e.
[TABLE]
Here represents a merge operation, i.e. , .
The optimization Section 4 is combinatorially hard. We present an approximate solution in Algorithm 1. We use the optimal path on the dense graph and create an inflated graph by composing and inflating weights of newly added edges by (Line 1). The idea is to disincentivize the search from using any of the newly added edges. This inflation factor is increased till a near-optimal path can no longer be found (Lines 1-1). At this point, the additional vertices that the shortest path on this inflated path pass through are essential to achieve near-optimality. This is formalized by the following guarantee:
Proposition 1** (Bounded bottleneck edge weights).**
Let be the chosen bottleneck edges, be the optimal bottleneck edges and be the optimal path on .
[TABLE]
Proof.
(Sketch) Let be the optimal bottleneck nodes and be the optimal bottleneck edges. Let be the path returned by . From Definition 2, the following holds:
[TABLE]
Since is the shortest path on the inflated graph, we have:
[TABLE]
Putting the two inequalities together we have:
[TABLE]
∎
Fig. 4 illustrates the samples generated by (a) ShortestPath and (b) LEGO trained with samples from BottleneckNode; and the successful routing through narrow passages using samples from LEGO.
VI-B Diverse PathSet
In this training scheme, we try to ensure the roadmap is robust to errors introduced by the learner. One antidote to this process is diversity of samples. Specifically, we want the roadmap to have enough diversity such that if the predicted shortest path is in fact infeasible, there are low cost alternates.
We set this up as a two player game between a planner and an adversary. The role of the adversary is to invalidate as many shortest paths on the dense graph as possible with a fixed budget of edges that it is allowed to invalidate. The role of the planner is to find the shortest feasible path on the invalidated graph and add this to the set of diverse paths . The function then returns nodes belonging . We formalize this as:
Definition 3** (Diverse PathSet).**
We begin with a graph . At each round of the game, the adversary chooses a set of edges to invalidate:
[TABLE]
and the graph is updated . The planner choose the shortest path which is added to the set of diverse paths .
The optimization problem (6) is similar to a set cover problem (NP-Hard [39]) where the goal is to select edges to cover as many paths as possible. If we knew the exact set of paths to cover, it is well known that a greedy algorithm will choose a near-optimal set of edges [39]. We have the inverse problem - we do not know how many consecutive shortest paths can be covered with a budget of edges.
Algorithm 2 describes the procedure. We greedily choose a set of edges to invalidate as many consecutive shortest paths till we exhaust our budget (Lines 2-2). We then apply greedy set cover (Line 2). If it leads to a better solution, we continue repeating the process. At termination, we ensure:
Proposition 2** (Near-optimal Invalidated EdgeSet).**
Let be the contiguous set of shortest paths invalidated by Algorithm 2 using a budget of . Let be the size of the optimal set of edges that could have invalidated .
[TABLE]
Proof.
(Sketch) We briefly explain the equivalence to a set cover problem. Each path in corresponds to an element that has to be covered. Each edge corresponds to a set of paths in , where each path in the set contains the edge. Invalidating the edge invalidates all paths in the set.
Line 2 invokes a greedy set cover algorithm which at every iteration chooses the edge which covers the largest number of uncovered paths. Let be the number of edges selected by the greedy algorithm, and be the optimal. From [39], we have the following near-optimality guarantee:
[TABLE]
If , i.e. we have budget remaining, we continue adding edges that can only invalidate more paths in Lines 2-2. This continues till the budget is exhausted.
∎
Fig. 4 illustrates the samples generated by (a) ShortestPath and (b) LEGO trained with samples from DiversePathSet; and the robustness to unexpected obstacles exhibited by LEGO.
VI-C Combining Diversity with Bottleneck Extraction
We present LEGO in Algorithm 3 which combines the characteristics of BottleneckNode and DiversePathSet to extract a set of diverse bottleneck nodes. We first find a set of diverse paths on the dense graph (Line 3). We then iterate over each path, and adversarially invalidate edges of the sparse graph to ensure it does contain a feasible shorter path (Line 3-3). The bottleneck nodes for this path are extracted and added to the set of nodes to be returned (Line 3).
VII Experimental Results
In this section we evaluate the performance of LEGO on various problem domains and compare it against other samplers. We consider samplers that do not assume offline computation or learning such as Medial-Axis PRM (MAPRM) [14, 6], Randomized Bridge Sampler (RBB) [15], Workspace Importance Sampler (WIS) [16], a Gaussian sampler, Gaussian [20], and a uniform Halton sequence sampler, Halton [4]. Additionally, we also compare our framework against the state-of-the-art learned sampler ShortestPath [35] upon which our work is based.
Evaluation Procedure
For a given sampler and a planning problem, we invoke the sampler to generate a fixed number of samples. We then evaluate the performance of the samplers on three metrics: a) sampling time b) success rate in solving shortest path problem and c) the quality of the solution obtained, on the graph constructed with the generated samples.
Problem Domains
To evaluate the samplers, we consider a spectrum of problem domains. The 2 problems have random rectilinear walls with random narrow passages (Fig. 6(a)). These passages can be small, medium or large in width. The n-link arms are a set of line-segments fixed to a base moving in an uniform obstacle field (Fig. 6(b)). The n-link snakes are arms with a free base moving through random rectilinear walls with passages (Fig. 6(c)). Finally, the manipulator problem has a 7DoF robot arm [40] manipulating a stick in an environment with varying clutter (Fig. 6(d)). Two variants are considered - constrained (7), when the stick is welded to the hand, and unconstrained, when the stick can slide along the hand (8).
Experiment Details
For the learned samplers ShortestPath and LEGO, we use training worlds and test worlds. Dense graph is an disc Halton graph [5]: vertices in 2 to vertices in 8. The CVAE was implemented in TensorFlow [41] with 2 dense layers of 512 units each. Input to the CVAE is a vector encoding source and target locations and an occupancy grid. Training time over 4000 examples ranged from 20 minutes in 2 to 60 minutes in 8 problems. At test time, we time-out samplers after sec. The code is open sourced555https://github.com/personalrobotics/lego with more details in [42].
VII-A Performance Analysis
Sampling time
Table I reports the average time each sampler takes for samples across test instances. ShortestPath and LEGO are the fastest. MAPRM and RBB both rely on heavy computation with multiple collision checking steps. WIS, by tetrahedralizing the workspace and identifying narrow passages, is relatively faster but slower than the learners. Unfortunately, some of the baselines time-out on manipulator planning problem due to expense of collision checking.
Success Rate
Table II reports the success rates ( confidence intervals) on test instances when sampling vertices. Success rate is the fraction of problems for which the search found a feasible solution. LEGO has the highest success rate. The baselines are competitive in 2, but suffer for higher dimensional problems.
Normalized Path Cost
This is the ratio of cost of the computed solution w.r.t. the cost of the solution on the dense graph. Fig. 6 shows the normalized cost for Halton, ShortestPath and LEGO- these were the only baselines that consistently had bounded confidence intervals (i.e. when success rate is ). ShortestPath has the lowest cost, however LEGO is within bound of the optimal.
VII-B Observations
We report on some key observations from Table II and Fig. 6.
O 1**.**
LEGO* consistently outperforms all baselines*
As shown in Table II, LEGO has the best success rate (for samples) on all datasets. The second row in Fig. 6 shows that LEGO is within bound of the optimal path.
O 2**.**
LEGO* places samples only in regions where the optimal path may pass.*
Fig. 5 shows samples generated by various baseline algorithms on a 2D problem. The heuristic baselines use various strategies to identify important regions - MAPRM finds medial axes, RBB finds bridge points, Gaussian samples around obstacles, WIS divides up space non-uniformly and samples accordingly. However, these methods places samples everywhere irrespective of the query. ShortestPath takes the query into account but fails to find the gaps. LEGO does a combination of both – it finds the right gaps.
O 3**.**
LEGO* has a higher performance gain on harder problems (narrow passages) as it focuses on bottlenecks.*
Table II shows how success rates vary in 2D problems with small / medium / large gaps. As the gaps gets narrower, LEGO outperforms more dominantly. The BottleneckNode component in LEGO seeks the bottleneck regions (Fig. 4(b)).
For manipulator planning 8 problems, when stick is unconstrained, LEGO and ShortestPath are almost identical. We attribute this to such problems being easier, i.e. the shortest path simply slides the stick out of the way and plans to the goal. When the stick is constrained, LEGO does far better. Fig. 6(d) shows that LEGO is able to sample around the table while ShortestPath cannot find this path.
O 4**.**
LEGO* is robust to a certain degree of train-test mismatch as it encourages diversity.*
Fig. 7 shows the success rate of learners on a 2D test environment that has been corrupted. Environment 1 is less corrupted than environment 2. Fig. 7(a) shows that on environment 1, LEGO is still the best sampler. ShortestPath (Fig. 7(c)) ignores the corruption in the environment and fails. LEGO (Fig. 7(d)) still finds the correct bottleneck. Fig. 7(b) shows that all learners are worse than Halton. ShortestPath (Fig. 7(e)) densifies around a particular constrained region while LEGO (Fig. 7(e)) still finds a path due to the DiversePathSet component sampling in multiple bottleneck regions.
VIII Discussion
We present a framework for training a generative model to predict roadmaps for sampling-based motion planning. We build upon state-of-the-art methods that train the CVAE using the shortest path as target input. We identify important failure modes such as complex obstacle configurations and train-test mismatch. Our algorithm LEGO directly addresses these issues by training the CVAE using diverse bottleneck nodes as target input. We formally define these terms and provide provable algorithms to extract such nodes. Our results indicate that the predicted roadmaps outperform competitive baselines on a range of problems.
Using priors in planning is a double edged sword. While one can get astounding speed ups by focusing search on a tiny portion of C-space [11], any problem not covered in the dataset can lead to catastrophic failures. This is symptomatic of the fundamental problem of over-fitting in machine learning. While one could ensure the training data covers all possible environments [43], an algorithmic solution is to explore regularization techniques for planning. We argue DiversePathSet can be viewed as a form of regularization.
We can also include a more informed conditioning vector that captures the state of the search, e.g., the length of the current shortest path. This is similar to Informed RRT* [44]. Finally, we wish to scale to problems with varying workspace where a global planner guides the sampler to focus on relevant parts of the workspace [13, 45].
Appendix A CVAE Framework
We refer the reader to [32] for technical details and a comprehensive tutorial on CVAE. In Section A-A we describe the CVAE architecture implemented to train LEGO and ShortestPath algorithms. In Sections A-B and A-C, we study two parameters that determine the performance of the CVAE generative model.
A-A Architecture
The entire CVAE module (Fig. 8) takes as input the training samples , which in case of LEGO are the samples in bottleneck regions and along diverse paths. Additionally, the CVAE takes as input a vector of external features , upon which the generative model is also conditioned upon. In the problems we consider, these features include information regarding the environment such as the poses of the obstacles and the start-goal pair. A standard CVAE model consists of an encoder and a decoder, often represented by neural networks trained using the input samples and the external features.
During training, the encoder network takes as input the high-dimensional vector of features including the training sample and the other external features and encodes it into a low-dimensional latent variable vector. The latent variable is then fed into the decoder network along with the vector of external features as an input which outputs a sample in the configuration space. This sample output by the decoder is used to minimize an objective function which aims to fundamentally reduce the divergence between the probability distribution of the training samples and the learned generative model to be able to closely reconstruct the training samples set. During testing, only the decoder network is used to generate the required samples. The decoder takes as input a latent variable sampled from standard normal distribution as well as the vector of external features to generate useful samples.
In our implementation of the CVAE, both encoder and decoder networks have two fully connected hidden layers with 512 units each. The specifics of the external features used in each of the planning problems considered in Section VII are discussed in Section B-A. The behavior of the generative model, in addition to the features used, also depends on certain parameters. We study the effect of these parameters and their design choices in our implementation in the following subsections.
A-B Dimensionality of the Latent Variable
The latent variable captures the information available to the model through the training examples in a lower dimensional latent space. The dimensionality of the latent variable denotes how efficiently the model can capture the sources of variability required to regenerate data similar to the training examples. Theoretically, a model with larger latent dimension is at least as good as a model with lower latent dimension. However, in practice, when the latent variable dimension is high, it becomes computationally expensive for methods like stochastic gradient descent to reduce the KL divergence between the true and the approximated distributions over the latent variables conditioned on the training examples. Fig. 9 shows the behaviors exhibited by the trained generative model for different latent variable dimensions. We choose latent variable dimension of 3 for problems and 5 for and 9 problems.
A-C Regularization Parameter
Although VAEs are generally devoid of regularization parameters, one could introduce the parameter in modifying the objective function the CVAE aims to minimize when learning the generative model. The objective function in a CVAE is given by:
[TABLE]
The reconstruction loss ensures that the training data can be explained with the data generated by the model and therefore minimizing it ensures proper reconstruction of the training examples. On the other hand, the second term captures the divergence between the prior distribution over latent variable and the posterior given the training examples. Minimizing it ensures that the two distributions are similar. When the value of is zero, the behavior of the corresponding VAE is similar to a traditional autoencoder in its capability to reconstruct the training examples. When the value of is equal to 1, the objective function is as in a VAE. However this often leads to over-pruning [46] where many of the dimensions of the latent variable are ignored in an attempt to reduce the KL divergence. By tuning the value of between 0 and 1, one could weigh the two objectives appropriately to obtain the desired generative model behavior (Fig. 10).
Appendix B Experiments
In this section, we discuss the offline computation involved in training the CVAE for different planning environments considered in Section VII.
B-A Training Procedure
2D Point Robot Planning
The training data consisted of 20 randomly generated environments as shown in Fig. 11 with 20 planning problems (start-goal pairs) in each of the environments. The environments were randomized in positions of the vertical and horizontal walls and the narrow passages through them. The CVAE was conditioned upon a vector of 102 features which included the start-goal pair (4 features) as well the occupancy grid (100 features). The dataset generation took 4-5 hours while the training time was around 25 minutes. The CVAE was trained using samples from with 3000 samples. The CVAE was trained to sample configurations (in 2) of the point robot.
N-Link Arm Planning
The training procedure for the robot in consisted of a with 6000 samples which was used to plan for 20 planning problems in each of 20 randomly generated 2D environments. Fig. 12 visualizes some of the environments sampled to train the CVAE. The red and blue positions show the start and goal states respectively. The environment has randomly placesd obstacles. The CVAE was conditioned on a vector of features which included the start-goal pair as well the occupancy grid (100 features). The dataset generation took 6-7 hours while the training time was close to 30 minutes.
Snake Robot Planning
For 5, the training procedure was similar to that in the 2 problems. The training procedure for the robot in 9 consisted of a with 6000 samples which was used to plan for 20 planning problems in each of 20 randomly generated 2D environments. Fig. 13 visualizes some of the environments sampled to train the CVAE. The red and blue positions show the start and goal states respectively. The environments were modified in the wall being horizontal or vertical, the offset in its position, and the position of the narrow passage through it. The CVAE was conditioned on a vector of 118 features which included the start-goal pair (18 features) as well the occupancy grid (100 features). The dataset generation took 6-7 hours while the training time was close to 30 minutes. The CVAE was trained to sample configurations of the snake robot that included the base location as well as the revolute joint angles between each of the links.
Manipulator Arm Planning
The training data consisted of 20 random environments where the obstacles in the environment were arbitrarily repositioned. In each of the randomly generated environment, 50 planning problems were considered as an input to the train the CVAE model. Fig. 14 visualized three such environments, where the positions of the table and that of the obstacle on the table are modified along with start and goal configurations. The CVAE in the constrained problem was conditioned on a vector of 4666648 in the unconstrained problem since the configuration of the robot includes an additional degree of freedom. features which included the start and goal configurations (14 features) and the poses of the table and the obstacle represented as homogeneous matrices (32 features). The dataset was generated in 7-8 hours while the training took around an hour. Samples from a with 30,000 configurations were used to train the CVAE. The CVAE learned to sample the robot configurations which included the joint angles at the seven revolute joints of the arm in the constrained example. The unconstrained 8 example consisted of an additional prismatic joint value denoting where the stick is held in the hand.
B-B Additional Experiment Results
BottleneckNode and DiversePathSet
In addition to the qualitative observations presented in Section VII (O1 and O2) and Fig. 4, we present here the analysis of the performance of the foundational algorithms of LEGO, namely BottleneckNode and DiversePathSet when compared to ShortestPath. Fig. 15(a) shows that on a 2 world, BottleneckNode has a significantly higher success rate that ShortestPath, almost converging to by samples. Fig. 15(b) shows that in terms of path length, ShortestPath is initially better but both are eventually comparable. This is expected because of the near-optimality objective of BottleneckNode (4). Fig. 15(c) shows that DiversePathSet has a better success rate. Fig. 15(d) shows that while both algorithms are comparable in terms of path length, DiversePathSet has a smaller variance.
B-C Roadmap Construction
To evaluate the performance of LEGO, we construct sparse roadmaps, . The sparse graph consisted of 200 samples in problems and 300 samples in case of and 9 problems. Not however, that this sparse roadmap contains both the learned samples as well as samples generated from Halton sequence. While the learned samples are concentrated near the bottleneck regions and along diverse paths, Halton samples ensure the coverage over the free regions of the configuration space as well. We analyze different proportions of Halton samples and learned samples. Fig. 16 shows the performance characteristics of LEGO on roadmaps constructed with different proportions of Halton and learned samples for the 2D point robot example. We observe that LEGO over a roadmap of 200 samples with just 30% learned samples significantly outperforms LEGO over a Halton graph (). Fig. 17 visualizes the samples generated by LEGO represented by the end-effector positions (blue) in the workspace.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Kavraki et al. [1996] L.E. Kavraki, P. Svestka, J.C. Latombe, and M.H. Overmars. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE TRO , 1996.
- 2La Valle [2006] S. M. La Valle. Planning Algorithms . Cambridge University Press, Cambridge, U.K., 2006. Available at http://planning.cs.uiuc.edu/.
- 3Hart et al. [1968] Peter E Hart, Nils J Nilsson, and Bertram Raphael. A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics , 1968.
- 4Halton [1960] J. H. Halton. On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals. Numerische Mathematik , 2:84–90, 1960.
- 5Janson et al. [2015] Lucas Janson, Brian Ichter, and Marco Pavone. Deterministic sampling-based motion planning: Optimality, complexity, and performance. ar Xiv preprint ar Xiv:1505.00023 , 2015.
- 6Holleman and Kavraki [2000] Christopher Holleman and Lydia E Kavraki. A framework for using the workspace medial axis in PRM planners. In ICRA , 2000.
- 7Hsu et al. [2005] D Hsu, G Sánchez-Ante, and Z Sun. Hybrid prm sampling with a cost-sensitive adaptive strategy. In ICRA , 2005.
- 8Burns and Brock [2005 a] Brendan Burns and Oliver Brock. Sampling-based motion planning using predictive models. In ICRA , 2005 a.
