FRoGGeR: Fast Robust Grasp Generation via the Min-Weight Metric
Albert H. Li, Preston Culbertson, Joel W. Burdick, Aaron D. Ames

TL;DR
FRoGGeR is a fast, robust grasp synthesis method that uses a novel min-weight metric to generate collision-free grasps in under a second, significantly improving efficiency over existing approaches.
Contribution
The paper introduces the min-weight metric and demonstrates its effectiveness for rapid, robust grasp generation, enabling real-time applications and refinement of other grasp candidates.
Findings
Typically generates grasps in less than 1 second
Outperforms baseline in grasp feasibility and success rate
Effective across diverse object representations
Abstract
Many approaches to grasp synthesis optimize analytic quality metrics that measure grasp robustness based on finger placements and local surface geometry. However, generating feasible dexterous grasps by optimizing these metrics is slow, often taking minutes. To address this issue, this paper presents FRoGGeR: a method that quickly generates robust precision grasps using the min-weight metric, a novel, almost-everywhere differentiable approximation of the classical epsilon grasp metric. The min-weight metric is simple and interpretable, provides a reasonable measure of grasp robustness, and admits numerically efficient gradients for smooth optimization. We leverage these properties to rapidly synthesize collision-free robust grasps - typically in less than a second. FRoGGeR can refine the candidate grasps generated by other methods (heuristic, data-driven, etc.) and is compatible with…
| category | method | % converged | % pick success | (1e3) | normalized | time per solve (s) | num. solves | total time (s) |
|---|---|---|---|---|---|---|---|---|
| sphere | baseline | 12.9% (31/240) | 83.9% (26/31) | 2.7 (1.6, 3.6) | 0.32 (0.19, 0.39) | 0.67 (0.44, 1.00) | 68 (61, 73) | 27.2 (13.5, 36.2) |
| (240 total) | FRoGGeR | 97.9% (235/240) | 95.3% (224/235) | 4.9 (4.2, 5.5) | 0.67 (0.55, 0.72) | 0.31 (0.18, 0.49) | 2 (1, 4) | 0.57 (0.30, 1.2) |
| box/cyl | baseline | 52.8% (169/320) | 68.6% (116/169) | 2.2 (0.1, 3.8) | 0.20 (0.09, 0.36) | 0.79 (0.41, 1.29) | 42 (13, 53) | 13.4 (5.5, 31.1) |
| (320 total) | FRoGGeR | 100% (320/320) | 81.6% (261/320) | 4.8 (3.8, 5.9) | 0.58 (0.44, 0.65) | 0.15 (0.09, 0.24) | 3 (2, 5) | 0.87 (0.44, 1.6) |
| adversarial | baseline | 61.7% (185/300) | 43.8% (81/185) | 1.8 (0.6, 2.9) | 0.18 (0.08, 0.29) | 0.79 (0.47, 1.25) | 29 (10, 55) | 12.7 (5.8, 24.5) |
| (300 total) | FRoGGeR | 100% (300/300) | 63.0% (189/300) | 4.3 (3.5, 5.1) | 0.53 (0.42, 0.63) | 0.18 (0.11, 0.30) | 3 (2, 9) | 1.0 (0.49, 3.3) |
| overall | baseline | 44.8% (385/860) | 58.0% (223/385) | 2.0 (0.8, 3.4) | 0.19 (0.09, 0.32) | 0.73 (0.44, 1.15) | 50 (17, 63) | 13.8 (5.8, 29.5) |
| (860 total) | FRoGGeR | 99.4% (855/860) | 78.8% (674/855) | 4.6 (3.7, 5.5) | 0.58 (0.45, 0.66) | 0.21 (0.12, 0.36) | 3 (1, 6) | 0.83 (0.39, 1.9) |
| Object | Reason for Exclusion |
|---|---|
019_pitcher_base |
too big |
022_windex_bottle |
poor model: transparency |
023_wine_glass |
poor model: transparency |
024_bowl |
thin walls |
025_mug |
thin walls |
026_sponge |
deformable, too flat |
028_skillet_lid |
poor model: transparency |
029_plate |
thin walls |
030_fork |
too flat |
031_spoon |
too flat |
032_knife |
too flat |
033_spatula |
too big |
035_power_drill |
too big |
037_scissors |
too flat |
038_padlock |
no file |
039_key |
no file |
040_large_marker |
too small |
041_small_marker |
too small |
042_adjustable_wrench |
too flat |
046_plastic_bolt |
no file |
047_plastic_nut |
no file |
049_small_clamp |
too small |
050_medium_clamp |
too small |
053_mini_soccer_ball |
too big |
059_chain |
multibody |
076_timer |
lost features, not interesting |
| Constraint | Tolerance |
|---|---|
| joint | 1e-2 |
| surface contact | 5e-4 |
| collision | 1e-3 |
| force closure | 1e-5 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Hand Gesture Recognition Systems
\newcites
appAppendix References
FRoGGeR: Fast Robust Grasp Generation via the Min-Weight Metric
Albert H. Li†, Preston Culbertson‡, Joel W. Burdick‡, and Aaron D. Ames*†,‡* A. H. Li and A. D. Ames are with the Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125, USA, {alberthli, ames}@caltech.edu. P. Culbertson, J. W. Burdick, and A. D. Ames are with the Department of Civil and Mechanical Engineering, California Institute of Technology, Pasadena, CA 91125, USA, {pculbert, jwb}@caltech.edu.
Abstract
Many approaches to grasp synthesis optimize analytic quality metrics that measure grasp robustness based on finger placements and local surface geometry. However, generating feasible dexterous grasps by optimizing these metrics is slow, often taking minutes. To address this issue, this paper presents FRoGGeR: a method that quickly generates robust precision grasps using the min-weight metric, a novel, almost-everywhere differentiable approximation of the classical grasp metric. The min-weight metric is simple and interpretable, provides a reasonable measure of grasp robustness, and admits numerically efficient gradients for smooth optimization. We leverage these properties to rapidly synthesize collision-free robust grasps—typically in less than a second. FRoGGeR can refine the candidate grasps generated by other methods (heuristic, data-driven, etc.) and is compatible with many object representations (SDFs, meshes, etc.). We study FRoGGeR’s performance on over 40 objects drawn from the YCB dataset, outperforming a competitive baseline in computation time, feasibility rate of grasp synthesis, and picking success in simulation. We conclude that FRoGGeR is fast: it has a median synthesis time of 0.834s over hundreds of experiments.
I Introduction
The success of data-driven methods for grasp synthesis has fundamentally changed manipulation in recent years. Traditional methods [1] for grasp synthesis focused largely on optimizing analytic quality metrics (e.g., the largest inscribed ball metric [2, 3, 4], which we call the * metric* as in [5]), which use hand-object contact points to measure a grasp’s robustness to external perturbations. However, these methods suffer some drawbacks in practice: traditional metrics are often hard to optimize and may be non-differentiable, meaning grasps must be synthesized with slower sampling-based methods. Further, they are sensitive to the object geometry (i.e., the surface location and normals), which requires detailed object models to be constructed offline, e.g., by using object scanning rigs [6, 7].
To address these shortcomings, a number of authors consider data-driven methods for grasp synthesis (for a detailed survey, see [8]) that seek to learn grasping policies or metrics depending solely on raw sensor data, such as RGB images or point clouds. A wide variety of approaches have emerged, including using supervised learning to train CNNs to estimate grasp quality from images [9], identifying class-level keypoints to find features appropriate for manipulation [10], or learning a generative model conditioned on depth images [11]. These methods, however, have focused nearly exclusively on generating antipodal grasps for parallel-jaw grippers, or power grasps for dexterous hands.
In this work, we consider the problem of quickly refining an initial pose for a dexterous hand into a robust precision grasp for a particular object. Compared to power grasps, precision grasps are more useful for manipulation tasks that require delicate or accurate movements, such as tool use or bin packing. Our goals are twofold. First, we seek to generate these grasps quickly (in seconds rather than minutes for current methods [1, 12, 13]) while enforcing kinematic and collision constraints. Second, we seek to balance common trade-offs of grasp synthesis methods in terms of performance, speed, and interpretability.
I-A Contributions
The main contribution of this work is the formulation of grasp synthesis/refinement as a nonlinear optimization problem that leverages a novel, almost-everywhere differentiable approximation of the metric: the min-weight metric. The end result is FRoGGeR, a framework for fast robust grasp generation. The optimization problem underlying FRoGGeR can, due to the properties of the min-weight metric, be solved efficiently using commercial solvers. Additionally, FRoGGeR allows us to explicitly enforce constraints on grasps while harnessing the speed of gradient-based optimization.
We aim for our work to be compatible with existing methods in the grasping community. For instance, while FRoGGeR can synthesize grasps with no prior knowledge, it may also refine infeasible or suboptimal grasps computed by learning-based methods. Further, this paper represents object geometry implicitly using signed distance fields (SDFs), which means we can leverage existing work that learns SDFs of objects from sensor data [14]. To allow the use of mesh-based object representations, we also present practical approximations of the SDF, its gradient, and its Hessian computable using only the mesh.
In summary, our contributions are as follows:
- •
FRoGGeR: a fast robust grasp generator built on the min-weight metric, a novel, almost-everywhere differentiable approximation of the grasp metric with a numerically efficient gradient;
- •
a practical procedure based on nonlinear optimization with an open-source implementation111Open-source implementation available at github.com/alberthli/frogger. that generates feasible grasps on the order of seconds; and
- •
numerical experiments and simulations comparing grasps generated by our method to prior work, thereby demonstrating the speed of the proposed approach.
I-B Related Work
The majority of recent work in the grasping literature concerns parallel-jaw grasps, which admit simple parameterizations due to the low number of DOFs and ease of control [9, 15, 16, 17, 8]. Multifinger, dexterous grasping introduces numerous challenges, as the grasp parameterization must specify the states of each finger, and the high dimensionality of this representation demands more fine-grained control and better sensing. Moreover, the complex kinematics and potential for self-collisions complicate the search for feasible grasps. In turn, the problem of synthesizing dexterous grasps, particularly using data-driven methods, has received far less attention than parallel-jaw grasps.
Many classical methods for multifinger grasp synthesis ignore kinematic and collision constraints and only optimize for contact location by leveraging analytic grasp metrics [18]. The GraspIt! simulator explicitly considers these constraints but simplifies the problem by searching a lower-dimensional space of “eigengrasps” using simulated annealing [1]. This approach has several downsides, including the need to define eigengrasps for new hands (which is highly non-trivial), and slow computational speed in general. Overall, analytic metrics have two main drawbacks: (i) their usage is typically slow, often due to non-smoothness, and (ii) they demand high-fidelity estimates of object geometry and contact locations, diminishing their efficacy [5].
In response to these limitations, numerous authors sought to develop data-driven methods for dexterous grasping. Existing approaches include discriminative models, (i.e., those that seek to estimate the quality of a particular grasp), and generative models, which seek to directly generate grasps for novel objects, conditioned on the object geometry or perceptual data. Among these, some only enforce hand kinematics and check collisions post-hoc [5, 19]; others only optimize for contact points and check kinematic feasibility post-hoc [20]; others learn grasp pre-shapes rather than reasoning about contact [21, 22]. We refer the reader to [8] for a detailed survey of data-driven approaches to grasp synthesis.
One reason for these limitations is the difficulty in casting the nonlinear constrained grasp optimization problem in a computationally tractable way. Among methods addressing this challenge, collision constraints are typically penalized instead of enforced, leading synthesized grasps to have high amounts of infeasible interpenetration [17, 13, 23]. Other attempts at solving the unrelaxed problem are computationally prohibitive (e.g., minutes to hours in [24]).
In this work, we do not address the perception-based challenges of analytic metrics and assume knowledge of the object’s geometry and pose. Instead, we focus on mitigating their slow speed with the view that, despite their drawbacks, these metrics still provide a useful framework for grasping that is agnostic to robot model, object representation, and quality of available data. To that end, we build on prior works that formulate differentiable approximations of analytic force closure measures to recover robust grasps on any multifinger arm/hand system using bilevel optimization.
These prior works propose methods of varying complexity, including solving and differentiating a sequence of linear programs (LPs) [25], a sequence of semidefinite programs (SDPs) [26], a sum of squares program [27], or a single SDP that only approximates force closure [23, 17]. In contrast, we propose in Sec. II a single LP whose optimal value mathematically indicates force closure and whose maximization empirically yields robust grasps.
The method of Wu et al. [28] is most similar to ours as they also propose solving a bilevel optimization program with smooth collision constraints. However, instead of optimizing for robustness, they solve a feasibility problem and impose a force closure constraint parameterized as a quadratic program while training a conditional variational autoencoder (CVAE) to output performant initial grasps. We compare FRoGGeR’s formulation to theirs in Sec. IV.
We do not compare against other analytic metrics in the literature [18] for two reasons. First, for a fair comparison, they should be differentiable and agnostic to object and task, eliminating many choices such as independent contact regions or task-based methods. Second, among metrics satisfying these desiderata (e.g., the minimum singular value of the grasp matrix), the optimized grasps are not guaranteed to satisfy force closure, so they cannot provide robustness guarantees even under perfect conditions.
I-C Preliminaries
Assume a fixed-base, fully-actuated serial manipulator and dexterous hand with fingers contacting the object. Denote by the total DOFs of the system and the generalized positions. We let and denote the forward kinematics and Jacobian of prescribed contact point . Define the hand Jacobian as . We aim to manipulate a rigid object with surface and body frame . The pose of a frame with respect to a frame is expressed . Let represent the relative position and orientation of with respect to . The robot base frame is denoted .
We refer to the pair as a grasp, where is a feasible configuration (i.e., no collisions and valid hand-object contact). In this work, we will model the fingers as point contacts with friction. We can thus define , the grasp map, which maps a vector of contact forces expressed in their local contact frames, , to wrenches in the object frame , i.e., we can write .
We use a Coulomb friction model, so there is no slip if contact forces remain in the friction cone, i.e., if , where and denote the tangent and normal components respectively. We use a pyramidal friction cone approximation [29] with sides and let denote the total number of basis wrenches forming the finite subset of the grasp wrench space . We let the elements of form the columns of the wrench matrix and assume there exists a subset of containing 7 affinely independent basis wrenches.
We say a grasp is force closure if it can resist arbitrary disturbance wrenches in any direction, which is implied if the origin of the grasp wrench space lies in the convex hull of , denoted [3]. For a thorough treatment of grasping fundamentals, we refer the reader to [29, Ch. 5].
II The Min-Weight Grasp Metric
This section introduces the min-weight metric, a simple non-binary indicator of force closure we use as an optimization objective. Specifically, we treat it as a differentiable proxy for the metric, which measures the robustness of force closure grasps by reporting the radius of the largest origin-centered ball inscribed in [2, 3].
In the sequel, we assume that . To check whether , we can solve the following linear feasibility problem [30] over variables , where is the vector of 1s:
[TABLE]
That is, satisfying (1b)-(1d), which is equivalent to the existence of an equilibrium wrench.
II-A The Min-Weight Metric and its Properties
The key idea of the min-weight metric is to relax constraint (1d) by allowing negative weights . If the minimum weight in is non-negative, the feasibility problem is satisfied, so . This motivates the following LP:
[TABLE]
Thus, and are defined even when a grasp is not force closure (i.e., when in the case of non-physical “pulling” contact forces). The dependence of on via the wrench matrix in constraint (2b) allows us to iteratively turn suboptimal grasps into force closure ones via gradient-based nonlinear optimization. The following result formally relates and force closure status.
Theorem 1**.**
If there exists a subset of containing 7 affinely independent basis wrenches, then problem (2) is feasible. Further, the optimal solution satisfies
- (i)
[Non-Force Closure]* ,*
- (ii)
[Robust Closure]* ,*
- (iii)
[Only Equilibrium]* ,*
where denotes the boundary of a set .
Proof.
For the feasibility claim, it suffices to show that there always exists satisfying (2b) and (2c), since we can set to its minimum element. Let denote a submatrix of with 7 affinely independent columns and the associated weights in . Set all other weights in to 0. Let and .
The columns of are affinely independent in if and only if the columns of are linearly independent in [31, Exercise 1.1]. Thus, is invertible, so we can always find satisfying . Equivalently, and , implying and .
Proof of (i). By feasibility problem (1), implies (1) has no solution, so equivalently, .
Proof of (ii). if and only if such that . Since we assume has nonempty interior, its relative interior is its interior, so if and only if such that [31, Exercise 3.1].
Proof of (iii). Follows immediately from (i) and (ii). ∎
Thereom 1 states that under mild assumptions, the sign of indicates force closure, justifying its maximization. Heuristically, very negative values of indicate a grasp is far from force closure while very positive values indicate the origin lies well within (see Fig. 2). This motivates using as an approximate measure of robustness.
Since , the normalized min-weight metric is well-defined, allowing us to specify the constraint , where is a lower bound on the desired grasp robustness. In experiments, we use .
We note that while using to measure force closure is theoretically justified, its use as a proxy for the metric is not, since does not guarantee a large ball is contained in . Nevertheless, empirically, we find that and the metric are strongly correlated and maximizing improves a lower bound on the value (see Fig. 3).
Finally, as in many classical metrics, is not invariant to the object frame [4, Ch. 13.5]. While methods exist that address this [18], we do not explore them in this work.
II-B Computing with Differentiable Optimization
We compute where it is defined using implicit differentiation of the KKT conditions [32] and exploit the resulting structure to compute it quickly.
For brevity, let and express (2) as
[TABLE]
Let and denote the Lagrange multipliers associated with inequality and equality constraints respectively. As in [32], we write the stationarity, primal feasibility, and complementary slackness conditions for (2):
[TABLE]
where denotes the Hadamard product. Solving the system is necessary and sufficient to solve any LP since it is convex and always satisfies Slater’s condition. Let and denote the total and partial Jacobians of a vector-valued function with respect to variables respectively. Since at the optimal solution, by implicit differentiation,
[TABLE]
We can compute explicitly or with autodifferentiation through the primitive wrench matrix , which is derived from the grasp map , with details deferred to our open-source code. We consider the case where is invertible and apply the result without further justification like a subgradient in the singular case (similar to the non-differentiable case in [33]). In particular, is given by the last row of , which we can compute via the following result:
Proposition 1** (Gradient Exploit).**
Let denote the Moore-Penrose pseudoinverse. Then,
[TABLE]
Proof.
See App. -A. ∎
Proposition 1 allows efficient computation of , especially when is “small” (). For example, when (e.g., using a square pyramidal approximation for a 4-fingered hand), we find that and can be computed together in about with cvxpylayers [33] on an Nvidia A6000 GPU, whereas using our exploit, they are computed in on an Intel i9 CPU, a speedup.
III The FRoGGeR Formulation
III-A The Grasp Refinement Problem
FRoGGeR refines a candidate grasp configuration into a locally optimal one by solving the following nonlinear bilevel optimization program (recalling that ):
[TABLE]
To enforce joint limits, we constrain the robot configuration to lie between minimum and maximum values and . Further, we enforce that the fingertips lie on the object surface and that no rigid bodies are interpenetrating.
To express these constraints mathematically, we first parameterize as the 0-level set of a twice-differentiable SDF , which reports the distance of query points to , with for all points in :
[TABLE]
Second, we consider every possible pair of geometries we would like to prevent from colliding and parameterize the collision status using the differentiable constraints , where is the number of collision pairs and is an SDF between two geometries, at least one of whose state depends smoothly on . For pair , we enforce a minimum safety margin of unless it is a finger-object pair, for which we specify to allow a small amount of interpenetration. Thus, we can express optimization program (FRoGGeR) formally as
[TABLE]
III-B Gradients of the Constraint Functions
To use gradient-based methods, we must compute the gradients of the objective and each constraint. The gradient of constraint (7d) is immediately given by for . For constraint (7e), we compute for each pair of geometries the witness points . If the pair is colliding, then the witness points are the two points of furthest penetration. If the pair is not colliding, then they are the two closest points. Let indicate collision of a pair and otherwise. Then,
[TABLE]
where and are the Jacobians at witness points and and is the unit vector from to .
We use Drake [34] to compute witness points for all geometry pairs in a scene. To speed up computation, we represent nonconvex bodies as a union of convex polytopes computed using V-HACD [35]. To reduce the amount of checked pairs, Drake culls distant pairs using a broadphase algorithm and we set the associated gradients to 0. When two geometries have exactly 0 signed distance, the gradient may not be defined since . In this case, we use the previous value of , which is initialized randomly.
III-C Object Surface Representations
The SDF representation of objects is convenient for reasoning about collision and also geometric properties, since the outward-pointing surface normal at a point is given by and principal curvatures can be computed from the Hessian . However, supplying the true object SDF is non-trivial. Some approaches learn this representation [14, 36], while most avoid learning by using the object’s mesh or a point cloud [9, 20, 27]. Since these representations are all widely used, it is desirable for grasp synthesis methods to be compatible with any of them.
In the case of a learned or analytical SDF, computing the requisite gradients can be done via autodifferentiation. Further, if provided a dense enough point cloud, the Poisson surface reconstruction algorithm can return a watertight mesh [37]. Therefore, we focus on the case of meshes.
We assume that there exists a true smooth SDF and denote the approximation computed with the object mesh as . Then, given a closest point to , the gradient is simply
[TABLE]
where when , is the mesh normal at . To compute and , we use the open-source signed distance query provided by open3d that also computes the closest point on a mesh to any query point [38].
To compute whether a grasp is (robustly) force closure, we must compute the grasp map , which depends on the contact frames associated with fingertip positions [3]. The normal component of each contact frame is the inward-pointing surface normal, i.e., . Therefore, to differentiate any objective or constraint that depends on measures of force closure, we require .
However, when the object is parameterized as a mesh, the surface is piecewise flat, so wherever it is defined even if . Other works present methods of varying complexity to compute and that involve solving a quadratic program or deep learning, but do not consider the problem of computing the Hessian of [27] [17].
Here, we propose a coarse but efficient approximation. Let be an arbitrary query point. Fix a small constant , randomly select 2 unit vectors denoted , and define for . By (9), we have for all . Finally, fix .
The directional derivative of in a direction is
[TABLE]
Further, for twice-differentiable functions, we must have . Thus, using our perturbation directions and a finite-difference approximation of the directional derivatives, , we can write a system of equations to estimate ,
[TABLE]
by solving for . We note since is unchanging (close to the surface) along and are linearly independent with probability 1. denotes an initial estimate for that may not be symmetric, so we simply choose .
The quality of our estimate is sensitive to the choice of and the properties of the mesh. Empirically, we find that setting to be roughly 10 times the average mesh edge length yields accurate enough gradients for grasp refinement. Using open3d, we find the time to estimate all of , , and is on the order of ms.
IV Experiments
We describe the high-level experimental setup and defer a detailed discussion to App. -B--F. We use the 7-DOF Franka Research 3 and 16-DOF 4-fingered Allegro hand. The system is mounted on a flat tabletop and each target object is spawned with a fixed initial pose over all trials for repeatability, since the arm/hand configuration is allowed to vary arbitrarily.
We evaluate the robustness of FRoGGeR by executing 20 “shaky pickups” per object in simulation using Drake. To do so, we generate a pick trajectory where the end-effector is lifted in and then held for . We add sinusoidal perturbations to this trajectory with amplitude and varying frequency in all spatial axes after the pick begins until the end of the simulation. A pick fails if either (1) the object rotates by more than or if the object deviates from the pick trajectory by more than at any point; or (2) the total grasp synthesis time exceeds 1 minute.
We remark that our shaking test is more dynamic than others in the literature (e.g. [16, 28]), which either do not shake or classify a shake only as a linear movement in space with zero gravity. In contrast, we simulate gravity as well as sustained high-frequency perturbations in all directions.
We compare FRoGGeR’s performance on the pickup task to a baseline presented by Wu et al., which only enforces force closure without optimizing for robustness [28].
The controller used in simulation is given by . is computed by solving for the optimal contact forces to resist external wrenches and errors in the object’s pose (e.g., [28]). is the concatenation of arm torques tracking the pick trajectory with hand torques that drive the hand configuration towards the optimized one . We project to the null space of to avoid affecting the fingertip locations.
To obtain initial configurations , we use a heuristic sampler that noisily aligns the palm with the axes of the object’s oriented bounding box with probabilities proportional to the box side lengths, motivated by observations of preferred human grasps [39]. We choose a width for the fingertips by computing the width of the appropriate axis of the bounding box. The palm is then placed from the object. To obtain the configuration variables, we solve an inverse kinematics (IK) problem as in [28], but we do not enforce collision constraints or force the fingertips to lie on the object surface. Thus, we only consider infeasible candidate grasps. To solve (7), we use the NLopt [40] implementation of SLSQP [41].
Our choice to use a coarse sampling heuristic instead of a more performant method is intentional, as our goal is to evaluate FRoGGeR’s robustness to the quality of the initial guess. We control for the resulting decrease in performance by evaluating the relative performance of our method versus the baseline under these conditions.
Thus, we do not evaluate the CVAE sampler from [28]. Further, we found that the quality of CVAE-generated grasps was not consistent for all objects in our dataset and its performance was on par with our heuristic on a small set of test objects. Ultimately, we chose to synthesize 4-finger grasps to capture the full dexterity of the Allegro hand, which are incompatible with the 3-finger CVAE sampler.
The only difference between our method and the baseline is that in (7), FRoGGeR maximizes with constraint , while the baseline has no objective and (7c) is replaced with the bilevel force closure equality constraint described in [28]. Otherwise, the same IK routines, sampler, collision geometries, and controller were used.
IV-A Object Data Processing
We only present results on objects parameterized as meshes. When supplied with analytical SDFs or well-trained deep SDFs, our method was generally both fast and performant. We use meshes to demonstrate our approach on non-smooth object representations and to validate the usefulness of the Hessian approximation from Sec. III-C.
The objects used in our experiments are from a pruned subset of the YCB dataset [7]. First, we removed all objects that were too large, small, or thin to reasonably grasp with 4 fingers from a flat table, as well as deformable or multibody objects. Second, since the YCB meshes are not watertight, we attempted to reprocess them by densely sampling points on each mesh and running Poisson reconstruction. Of these, we removed objects for which we could not produce watertight meshes due to poor data quality (e.g. from transparency, thin walls, etc.). We note that FRoGGeR works even on non-watertight meshes of adequate quality, but we take this step to eliminate the effect of poor meshes on our results. In total, we test on 43 objects belonging to three categories: spheroids, like fruits and balls; boxes/cylinders, like food containers, cans, or large cups; and adversarial objects with irregular geometry, like tools or very flat/long objects.
For simplicity, we set the friction coefficient to be for all objects, which is reasonable for the rubbery Allegro fingertips on mostly plastic objects. We also assumed a uniform density of 150 (as in [16]) and computed masses using the volume of the processed meshes. The optimizer assumed a more conservative friction coefficient of and the controller was provided the mass.
IV-B Results and Discussion
We report values related to the quality of the grasp (pick success, metric value, and ) as well as values regarding the runtime of each method in Table I. We find that overall, FRoGGeR outperforms the baseline in terms of pick success by 20 percentage points and yields values that are roughly twice as high. We find that is a noisy predictor of grasp success - with FRoGGeR, the median and IQR of were for successes and for failures. See App. -E for a full histogram.
We also found that overall, FRoGGeR was \sim$$16\times faster at generating grasps than the baseline, a result of \sim$$3\times faster single solve times and \sim$$15\times fewer number of solves required to produce a feasible grasp. This is consistent with the observation by Wu et al. that their method struggles to converge to feasible solutions when is infeasible, which requires an expensive IK pre-solve [28]. In contrast, FRoGGeR retains superior speed and feasibility rate even with a coarse IK procedure, which suggests that our formulation is also robust to poor candidate grasps. In particular, only 5 runs (all on one object) timed out using our method, whereas over half of the runs timed out for the baseline.
One explanation for this gap is that the baseline force closure equality constraint’s gradient vanishes at force closure, which yields a constraint geometry that is difficult to satisfy. Since we do not demand that satisfies (7d), we observed that the optimization often terminated unsuccessfully satisfying only one of the equality constraints. In contrast, our constraint (7c) has non-zero gradients even in force closure, which we conjecture is better-posed numerically, and in particular, allows FRoGGeR to converge for a larger set of candidate grasps than the baseline.
We remark that our reported baseline pick success values are significantly lower than those reported by Wu et al. [28], which we attribute to adding shaking to the pick trajectory. When these perturbations were smaller or nonexistent, we typically observed much higher baseline pick success rates, which supports our hypothesis that enforcing only non-robust force closure yields grasps that are brittle in practice.
One of the limitations of our method is that the metric often prefers grasps where the fingertips lie on edges or corners, since these regions are typically farther from the center of the object (roughly where we place the object frame), yielding larger moment arms. Moreover, these regions allow a grasp to direct forces in “non-robust” directions with little change, drawing solutions to them. This yields unstable grasps in practice, since small deviations in the positions of the fingertips produce large changes in the contact conditions, which commonly occurs in dynamic scenarios.
Edge-seeking behavior was the most common failure mode of both methods, which is reflected by the poor performance on many adversarial objects with less low-curvature area on which to grasp. This behavior was also observed on objects in the box/cyl category, which explains the worse performance compared to spheroids. However, failures often occurred for the baseline even when no fingers were placed on edges.
Finally, we find that the overall performance of both FRoGGeR and the baseline was highly sensitive to the sampled initial conditions. For instance, if the initial width of the fingertips was not guided by object bounding boxes, both methods suffered in terms of runtime and grasp quality, as enforcing surface constraints became harder. This motivates the use of data-driven methods in identifying candidate grasps that may be synergistic with the refinement process.
V Conclusion and Future Work
We presented FRoGGeR, a fast method for generating robust precision grasps using the min-weight metric , a simple, almost-everywhere differentiable approximation of the metric. We have demonstrated that is empirically correlated with the metric, and validated through simulation that using as an optimization objective yields grasps that are more robust to dynamic perturbations than a baseline that only enforces a (non-robust) force closure constraint. Further, we have shown that both the solve time and the feasibility rate of FRoGGeR are superior to that of the baseline.
In the future, we hope to develop methods to combat edge-seeking behavior as well as to generalize FRoGGeR to allow non-precision grasps and a non-fixed number of contact points. Finally, we hope to explore better object representations that do not require online mesh construction or analytical SDFs to be provided beforehand.
-A Proof of Proposition 1
By direct computation, we have
[TABLE]
For brevity, let and unless otherwise stated, let functions be evaluated at the optimal primal/dual solution . Let
[TABLE]
where for convenience we denote
[TABLE]
Proof.
We have
[TABLE]
Observe that
[TABLE]
where the last equality follows because
[TABLE]
by complementary slackness. Letting and ,
[TABLE]
where we note that . By substituting (17) into (LABEL:eqn:kkt_grad_sys), the result immediately follows. ∎
We remark that if is locally Lipschitz with respect to the constraint matrix parameters and , it is differentiable everywhere but a set of measure 0 by a theorem of Rademacher (see \citeappdewolf2021_lp_subgrad for discussion). Further, the gradient is defined when (2) has unique primal/dual optima and in this case, is defined so is computable \citeapp[Prop. 4.1]dewolf2021_lp_subgrad. The Lipschitz condition can always be satisfied by removing degenerate constraints, so we assume it.
-B Controller Implementation Details
This section explains the structure of the controller used for the pickup task. Recall that our controller is of the form
[TABLE]
We first explain computation of . We have that
[TABLE]
where the first term is a gravity compensation torque computed using partial inverse dynamics (i.e., ignoring the inertial/coriolis dynamical terms and assuming quasi-static operation) and the second term is a tracking term with independent components for the arm and the hand. We have
[TABLE]
The arm tracking torques are computed as
[TABLE]
with the gains set to
[TABLE]
The desired values and are computed via a differential inverse kinematics controller implemented in Drake that converts a desired end-effector pose trajectory specified in Cartesian space to joint angles and velocities that can be tracked. We defer those details to \citeapp[Ch. 3.10]tedrake_manipulationnotes.
The hand tracking torques are simply given by the following proportional controller:
[TABLE]
where is the component of the refined configuration corresponding to the hand states and . We project all of these torques via the left multiplication of to the null space of , which ensures that applying them does not change the contact positions between the hand and object. We note that this projection does not affect the arm torques at all.
We now explain the computation of the optimal contact forces , which is formulated as the solution to the following quadratic program:
[TABLE]
We note that is the contact force in the concatenation of all contact forces .
The QP objective produces applied wrenches on the object as close as possible to counteracting some desired wrench whose force and torque components are expressed in the object frame. The first constraint represents the pyramidal friction cone constraints (e.g., [28]). As in [28], the second enforces a minimum normal force which we specify as and we additionally specify that if the object weighs under , we set . The final two constraints enforce torque limits by ensuring the total applied joint torques from both controller terms in (18) respect the desired limits.
The desired external wrench is computed as follows:
[TABLE]
where is the gravitational wrench expressed in the robot base frame. is an error wrench computed from the measured error in the object’s desired pose. Suppose the desired object pose in the world frame is specified as the tuple where we suppress the frame notation for brevity. We convert errors in the pose into a wrench using the formulation provided in \citeapplee2010_se3error:
[TABLE]
where
[TABLE]
is the map sending elements of the Lie algebra to (see \citeapplee2010_se3error), and is the angular velocity of the object computed using numerical differentiation of its orientation. We choose the gains
[TABLE]
-C Heuristic Sampler Implementation Details
We first fix a convention for the axes of the palm of the hand. Let the -axis be the outward palm normal and the -axis be the corresponding axis that points in the direction of the fingers of the hand (for non-anthropomorphic hands, this choice may be arbitrary). The -axis is then chosen consistently with the right hand rule.
The heuristic sampler consists of the following steps: (1) from the oriented bounding box of the object (which can be computed approximately very quickly using open3d), choose an axis with which to align the palm’s -axis up to sign and use the width of this box edge to fix an initial guess for the separation of the hand’s fingers; (2) of the two remaining axes, choose one with which to align the palm’s -axis; (3) add rotational noise drawn from the von Mises distribution on the 2-sphere to randomly perturb the palm frame; (4) compute a desired location of the palm frame with respect to the object by placing it roughly from the surface of the object, which is approximated as its bounding box; (5) using the constraints on the palm frame, solve an inverse kinematics problem to recover .
The probability of choosing a given bounding box axis for alignment is proportional to its length. For instance, if the bounding box has side lengths , then the probability of choosing the first axis is . For very short objects, we only accepted a palm frame whose -axis approached the object from above to avoid heavy collisions with the tabletop.
-D Additional Dataset Processing Details
Out of non-excluded objects, we ranked the quality of the provided data in order of (i) Google 16k mesh, (ii) Poisson reconstruction, and (iii) TSDF file. For instance, if the 16k mesh was available, we would always prefer to use that as the initial mesh for processing before the Poisson reconstructed mesh. The excluded objects and the exact reasons for their exclusion are listed in Table II.
While flat utensils are generally too flat to be picked up by an Allegro hand from a flat table, we did replace the excluded mug with a teacup from the ShapenetSem dataset \citeappsavva2015_shapenetsem with ID 23fb2a2231263e261a9ac99425d3b306 and scaled by a factor of 0.00038748778493825193. This cup was added to the adversarial category.
-E Distribution of Normalized Min-Weight Values
The histogram of values for the values corresponding to successful and failed grasps optimized using FRoGGeR is shown here.
-F Other Experimental Parameters
For all experiments, we used a 4-sided pyramidal approximation of the friction cone. We enforced a minimum safety margin of between every collision geometry pair that was not a fingertip/object pair. For fingertip/object pairs, we allowed interpenetration up to .
We selected a specific desired point of contact on each fingertip such that the forward kinematics were fixed. This point was located on each fingertip at an angle of tilted towards the palm, measured from the very tip of each finger.
We supplied the following constraint tolerances to the optimization solver:
We note that the force closure constraint refers to the robustness constraint for our method and the QP equality constraint for the baseline. In the original implementation of the method of [28] (obtained through private correspondence), the authors used a tolerance of 1e-7. In practice, we had to loosen this slightly to obtain a reasonable feasibility rate for their method.
Finally, we use the default rigid body contact model implemented in Drake for all of our simulations, the details of which we defer to the software documentation [34].
-G Acknowledgments
We thank Victor Dorobantu for useful discussions involving our proposed Hessian approximation. We thank Ivan D. J. Rodriguez for help with setting up experiments. We thank Wu et al. for their thoughtful correspondence concerning their work. We thank Philipp Wu for constructive feedback and comments. Finally, we thank all developers and maintainers of the open-source software that made this work possible (not cited in the main text but used either directly or indirectly: \citeappjax2018_github, trimesh, quantecon, lam2015_numba, pan2012_fcl).
\bibliographystyleapp
unsrt \bibliographyappreferences
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A.T. Miller and P.K. Allen. Graspit! A versatile simulator for robotic grasping . IEEE Robotics and Automation Magazine , 11(4):110–122, 2004.
- 2[2] D. G. Kirkpatrick, B. Mishra, and C. K. Yap. Quantitative steinitz’s theorems with applications to multifingered grasping. In Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing , STOC ’90, page 341–351, New York, NY, USA, 1990. Association for Computing Machinery.
- 3[3] C. Ferrari and J. Canny. Planning optimal grasps . In Proceedings 1992 IEEE International Conference on Robotics and Automation , pages 2290–2295 vol.3, 1992.
- 4[4] Elon Rimon and Joel Burdick. The Mechanics of Robot Grasping . Cambridge University Press, 2019.
- 5[5] Daniel Kappler, Jeannette Bohg, and Stefan Schaal. Leveraging big data for grasp planning . In 2015 IEEE International Conference on Robotics and Automation (ICRA) , pages 4304–4311, 2015.
- 6[6] Laura Downs, Anthony Francis, Nate Koenig, Brandon Kinman, Ryan Hickman, Krista Reymann, Thomas B. Mc Hugh, and Vincent Vanhoucke. Google Scanned Objects: A High-Quality Dataset of 3D Scanned Household Items . In 2022 International Conference on Robotics and Automation (ICRA) , pages 2553–2560, 2022.
- 7[7] Berk Calli, Arjun Singh, Aaron Walsman, Siddhartha Srinivasa, Pieter Abbeel, and Aaron M. Dollar. The YCB Object and Model Set: Towards Common Benchmarks for Manipulation Research . In 2015 International Conference on Advanced Robotics (ICAR) , pages 510–517, July 2015.
- 8[8] Rhys Newbury, Morris Gu, Lachlan Chumbley, Arsalan Mousavian, Clemens Eppner, Jürgen Leitner, Jeannette Bohg, Antonio Morales, Tamim Asfour, Danica Kragic, Dieter Fox, and Akansel Cosgun. Deep Learning Approaches to Grasp Synthesis: A Review , 2022.
