Fast and Agile Vision-Based Flight with Teleoperation and Collision Avoidance on a Multirotor
Alex Spitzer, Xuning Yang, John Yao, Aditya Dhawale, Kshitij Goel,, Mosam Dabhi, Matt Collins, Curtis Boirum, and Nathan Michael

TL;DR
This paper introduces a multirotor system capable of fast, safe, and agile autonomous and teleoperated flight in GPS-denied environments, utilizing visual-inertial navigation and real-time collision avoidance.
Contribution
It presents a novel integrated system combining visual-inertial state estimation, collision-free trajectory planning, and teleoperation for high-speed multirotor flight in unstructured environments.
Findings
Autonomous flight speeds exceeded 12 m/s.
System enables safe teleoperation at 10 m/s.
Validated in outdoor field experiments.
Abstract
We present a multirotor architecture capable of aggressive autonomous flight and collision-free teleoperation in unstructured, GPS-denied environments. The proposed system enables aggressive and safe autonomous flight around clutter by integrating recent advancements in visual-inertial state estimation and teleoperation. Our teleoperation framework maps user inputs onto smooth and dynamically feasible motion primitives. Collision-free trajectories are ensured by querying a locally consistent map that is incrementally constructed from forward-facing depth observations. Our system enables a non-expert operator to safely navigate a multirotor around obstacles at speeds of 10 m/s. We achieve autonomous flights at speeds exceeding 12 m/s and accelerations exceeding 12 m/s^2 in a series of outdoor field experiments that validate our approach.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15| Frame Type | Definition | Initialization Condition for Frame |
| KF | Sensor frames to which subsequent | |
| (Anchor) | SFs and BFs are registered | |
| SF | Sensor frames that provide novel information | |
| about the vehicle surroundings | ||
| Sensor frames that are registered for one time | ||
| BF | step to accommodate the dynamic changes |
| Experiment | Description | (s) | (m/s) | (rad/s) | (m) |
| Outdoor-1 | High speed teleoperation outside | N/A | |||
| Outdoor-2,3 | Aggressive teleoperation with collision avoidance outside | ||||
| Indoor-1,2 | Aggressive teleoperation with collision avoidance in a dimly lit garage | ||||
| Outdoor-4 | High speed, aggressive teleoperation with collision avoidance outside | ||||
| Outdoor-5 | High speed, aggressive teleoperation with collision avoidance outside | ||||
| denotes the duration of the motion primitive, is the maximum desired speed, | |||||
| is the maximum yaw rate, and is the collision radius. | |||||
| ∗Motion primitive duration increased adaptively as a linear function of the desired velocity change | |||||
| Experiment | Trajectory Generation | Trajectory Pruning | Local Map Generation |
| Outdoor-2 | |||
| Outdoor-3 | |||
| Indoor-1 | |||
| Indoor-2 |
| Experiment | Safe Teleoperation | VINS Mono | Total |
| Outdoor-2 | |||
| Outdoor-3 | |||
| Indoor-1 | |||
| Indoor-2 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11institutetext: Authors are with the Robotics Institute at Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh PA, 15213. {aspitzer,xuningy,johnyao,adityand,kgoel1,mdabhi,mcollin1,cboirum,nmichael}@andrew.cmu.edu
∗ These authors contributed equally.
Fast and Agile Vision-Based Flight with Teleoperation and Collision Avoidance on a Multirotor
Alex Spitzer∗
Xuning Yang∗
John Yao
Aditya Dhawale
Kshitij Goel
Mosam Dabhi
Matt Collins
Curtis Boirum
and Nathan Michael
Abstract
We present a multirotor architecture capable of aggressive autonomous flight and collision-free teleoperation in unstructured, GPS-denied environments. The proposed system enables aggressive and safe autonomous flight around clutter by integrating recent advancements in visual-inertial state estimation and teleoperation. Our teleoperation framework maps user inputs onto smooth and dynamically feasible motion primitives. Collision-free trajectories are ensured by querying a locally consistent map that is incrementally constructed from forward-facing depth observations. Our system enables a non-expert operator to safely navigate a multirotor around obstacles at speeds of m/s. We achieve autonomous flights at speeds exceeding m/s and accelerations exceeding m/s2 in a series of outdoor field experiments that validate our approach.
1 Introduction
Autonomous aerial vehicles with onboard vision-based sensing and planning have been shown to be capable of performing fast and agile maneuvers. However, long-horizon planning required to achieve a task has proven to be a challenge, particularly with limited onboard compute. We propose a fully integrated vision-based multirotor platform that allows minimally trained operators to teleoperate the vehicle at high speeds with agility, while remaining safe around obstacles in unstructured outdoor environments. At high speeds, the environment around the vehicle changes quickly, and is subject to dynamic obstacles and lighting conditions. Our multirotor architecture integrates the following to achieve agile and collision-free flight in these scenarios: 1) an extension of motion primitive-based teleoperation Yang et al. (2018) to generate jerk-continuous local trajectories, a crucial component to prevent instability in agile flights, and 2) efficient local mapping for collision avoidance purposes using a KD-tree.
Many prior works in high speed flight exploit the field-of-view (FOV) of stereo cameras for fast collision checking for autonomous flights, including Florence et al. (2016, 2018), where trajectories are constrained to be inside the FOV of the depth sensor with a max range of m. In Matthies et al. (2014), trajectories generated by RRT∗ are checked for collisions directly in the disparity space. Lopez and How (2017) presents aggressive flight on a kg MAV, achieving a velocity of m/s. These methods achieve fast collision checking by circumventing the need to construct a local map and checking for collisions in the sensor’s FOV. This however, limits the range of motions the vehicle can perform. Approaches with local map generation using a laser range finder Chen et al. (2016) and a monocular RGB camera Dey et al. (2016) have been shown to achieve maximum velocities of m/s and m/s respectively, but data processing limits the update rate of the local maps. In our approach, we give a minimally trained operator full control of the vehicle and show that fast and agile flights can be achieved with a human-in-the-loop while maintaining safety.
We perform a series of high speed collision avoidance trials in both indoor and outdoor environments with untrained operators. In our experiments, our hexarotor attains speeds exceeding m/s and accelerations exceeding m/s2. We are able to safely avoid obstacles at speeds up to m/s and accelerations of m/s2, while retaining a local map.
2 Technical Approach
2.1 Smooth Motion Primitive-Based Teleoperation
Aggressive multirotor flights demand large angular velocities and large angular accelerations, which are directly related to the jerk and snap of the reference position Mellinger and Kumar (2011). Thus to avoid incurring large tracking error due to discontinuous trajectories, we extend forward-arc motion primitives Yang et al. (2018) to generate trajectories that retain differentiability up to jerk and continuity up to snap. From the resulting trajectories, we can calculate desired vehicle attitude, angular velocity, and angular acceleration for use as feedforward terms in the controller.
The motion primitives are parameterized as follows. We define a local frame to be a fixed -axis aligned frame, taken at a snapshot in time. The motion primitive definition will be provided in the local frame at the time at which an input is issued, and can be freely transformed into a fixed global frame or body frame for control purposes. An action specified by a continuous external input, such as a joystick, generates a single motion primitive. For a multirotor, we define an action as in the local frame at which the input is issued, where is the -velocity (i.e., the forward velocity of the vehicle), is the -velocity, and is the yaw rate. Let denote a vector containing the position and yaw of the vehicle, i.e. . Then, the endpoint velocities are defined by the unicycle model Pivtoraiko et al. (2009). The unicycle model dynamics are given by \dot{\mathbf{x}}(\mathbf{a},\tau)=[v_{x}\cos(\omega\tau)\quad v_{x}\sin(\omega\tau)\quad v_{z}\quad\omega]\mbox{{}^{\top}}, where . The position endpoints are unconstrained and we enforce all higher order derivatives above velocity to be zero.
A regeneration step occurs when a new input is received from the joystick, or the previous trajectory finishes executing. Alternatively, a fixed regeneration rate can be chosen in order to accommodate changes in the environment for collision avoidance. Suppose regeneration step occurs at time . Then, a library of dynamically feasible motion primitives is generated in the local frame specified by the reference state at time , i.e. , given a set of discretized actions . Each motion primitive is a vector of four order polynomials that specify the trajectory along the position coordinates , , and yaw coordinate . Given an action at regeneration step , each motion in the motion primitive library is generated in frame according to
[TABLE]
where specifies the time derivative. Note that all constraints are appropriately transformed into . The duration of the trajectory and the maximum , , and yaw velocities are specified according to the desired operational range. We further allow the operator to freely rotate the motion primitive library with respect to ’s -axis to allow for sideway slalom motions.
The result of having snap-continuous trajectories (see Fig. 1) ensures that we have smoothness in error dynamics, thus minimizing instabilities and tracking error.
2.2 Pruning & Trajectory Selection.
At every time step, a family of motion primitives, called the motion primitive library, is created. The motion primitive library is constructed by discretizing the continuous input along each action dimension, such that each action is selected from a convex set with size where is the dimension of the space of each input. An example of the discretization is shown in Fig. 2.
At every time step, the operator input is mapped to the closest input in the action space, as defined by the Euclidean norm. A priority queue that minimizes input distance from the selected input to each input in the action space is used to iterate through the action space until a feasible, collision-free trajectory is found. This results in having the operator input mapped to the feasible motion primitive in the library that is parameterized by the closest discretized action, i.e. . A trajectory is deemed collision-free if the minimum distance between any point along the trajectory and the surrounding environment is above the sum of the vehicle size and an operator specified collision radius. Algorithm 1 describes teleoperation with reactive collision avoidance in detail.
The effect of this pruning algorithm is that the vehicle exhibits natural behavior in the presence of obstacles. If a pillar is in front of the vehicle, then the vehicle chooses a motion primitive some angle away and avoids the obstacle. If a wall is present, then the vehicle will choose linear velocities that gradually decrease until the vehicle is stopped.
2.3 Local Map Representation using KD-Trees
We present a local mapping framework that generates a spatially consistent local map of the robot surroundings represented as voxel grids. The local map is generated by retaining only the depth sensor measurements obtained at poses that lie in the vicinity of the vehicle’s current pose. This enables trajectories to span in the space observed by all of the retained sensor measurements. Sequential sensor measurements obtained using a stereo imaging sensor often contain redundant information about the surroundings of the robot. The novelty of information in an incoming sensor measurement at the resolution of the voxel grids is dictated by the rotation and translation displacement of the robot between the current frame and the previous frames. To enforce spatial locality, we dynamically select anchor frames and transform subsequent sensor measurements that contain novel information about the surroundings in to the anchor frames. In order to efficiently incorporate only novel information in the local map, we classify each incoming sensor measurement into one of the following categories: a KeyFrame (KF), a SubFrame (SF), or a BufferFrame (BF) (see Table LABEL:table:frame_classification). The local map (L) is updated in a locally consistent coordinate frame according to the type of sensor frame ().
A sensor measurement that is more than meters away from the current KF is classified as a KF. Sensor measurements that are not new KFs, but are either more than meters in position or degrees in heading away from the previous SF, are classified as SFs. Sensor measurements that are neither KFs nor SFs are classified as BFs, which do not contain sufficient novel information about the surroundings, but are registered to L to account for dynamic changes in the scene. BFs are in L only for the iterations in which they are observed.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Chen et al. (2016) Chen, J., Liu, T., Shen, S.: Online generation of collision-free trajectories for quadrotor flight in unknown cluttered environments. In: Proc. of the IEEE Intl. Conf. on Robot. and Autom. Stockholm, Sweden (2016)
- 2Delmerico and Scaramuzza (2018) Delmerico, J., Scaramuzza, D.: A benchmark comparison of monocular visual-inertial odometry algorithms for flying robots. In: Proc. of the IEEE Intl. Conf. on Robot. and Autom. Brisbane, Australia (2018)
- 3Dey et al. (2016) Dey, D., Shankar, K.S., Zeng, S., Mehta, R., Agcayazi, M.T., Eriksen, C., Daftry, S., Hebert, M., Bagnell, J.A.: Vision and learning for deliberative monocular cluttered flight. In: Field and Service Robotics, pp. 391–409. Springer (2016)
- 4Florence et al. (2016) Florence, P., Carter, J., Tedrake, R.: Integrated perception and control at high speed: Evaluating collision avoidance maneuvers without maps. In: Workshop on the Algo. Found. of Robot. San Francisco, USA (2016)
- 5Florence et al. (2018) Florence, P.R., Carter, J., Ware, J., Tedrake, R.: Nanomap: Fast, uncertainty-aware proximity queries with lazy search over local 3d data. In: Proc. of the IEEE Intl. Conf. on Robot. and Autom. Brisbane, Australia (2018)
- 6Liu et al. (2016) Liu, S., Watterson, M., Tang, S., Kumar, V.: High speed navigation for quadrotors with limited onboard sensing. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1484–1491 (2016). DOI 10.1109/ICRA.2016.7487284
- 7Lopez and How (2017) Lopez, B.T., How, J.P.: Aggressive 3-d collision avoidance for high-speed navigation. In: Proc. of the IEEE Intl. Conf. on Robot. and Autom., pp. 5759–5765. Singapore (2017)
- 8Matthies et al. (2014) Matthies, L., Brockers, R., Kuwata, Y., Weiss, S.: Stereo vision-based obstacle avoidance for micro air vehicles using disparity space. In: Proc. of the IEEE Intl. Conf. on Robot. and Autom., pp. 3242–3249. Hong Kong, China (2014)
