An Equivariant Observer Design for Visual Localisation and Mapping
Pieter van Goor, Robert Mahony, Tarek Hamel, Jochen Trumpf

TL;DR
This paper introduces a novel equivariant observer framework for visual SLAM that unifies pose and landmark estimation, ensuring stability and robustness in non-linear visual localization tasks.
Contribution
It formulates visual SLAM as a continuous-time equivariant observer on a symmetry group, providing a new stability-guaranteed approach for pose and landmark estimation.
Findings
Observer error system is almost globally asymptotically stable.
Exponential stability in the large is achieved.
Decoupled Riccati-gains improve landmark estimation stability.
Abstract
This paper builds on recent work on Simultaneous Localisation and Mapping (SLAM) in the non-linear observer community, by framing the visual localisation and mapping problem as a continuous-time equivariant observer design problem on the symmetry group of a kinematic system. The state-space is a quotient of the robot pose expressed on SE(3) and multiple copies of real projective space, used to represent both points in space and bearings in a single unified framework. An observer with decoupled Riccati-gains for each landmark is derived and we show that its error system is almost globally asymptotically stable and exponentially stable in-the-large.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A Geometric Observer Design for Visual Localisation and Mapping
Department of Electrical, Energy and Materials Engineering
Australian National University
ACT, 2601, Australia
Department of Electrical, Energy and Materials Engineering
Australian National University
ACT, 2601, Australia
I3S (University Côte d’Azur, CNRS, Sophia Antipolis)
and Insitut Universitaire de France
Department of Electrical, Energy and Materials Engineering
Australian National University
ACT, 2601, Australia
Abstract
This paper builds on recent work on Simultaneous Localisation and Mapping (SLAM) in the non-linear observer community, by framing the visual localisation and mapping problem as a continuous-time equivariant observer design problem on the symmetry group of a kinematic system. The state-space is a quotient of the robot pose expressed on and multiple copies of real projective space, used to represent both points in space and bearings in a single unified framework. An observer with decoupled Riccati-gains for each landmark is derived and we show that its error system is almost globally asymptotically stable and exponentially stable in-the-large.
1 Introduction
Simultaneous Localisation and Mapping (SLAM) is a well-known problem in mobile robotics and has been an active area of research for the last 30 years [9]. Visual localisation and mapping refers to the particular case of the SLAM problem where the only exteroceptive sensors available are cameras. The visual localisation and mapping problem, and particularly the case where only a single monocular camera is available, continues to be of substantial interest due to the low cost and low weight, as well as the ubiquity of single camera systems [9]. While visual localisation and mapping is an established research topic with a rich history [7], it remains an active research topic, especially in the area of low-cost light-weight embedded systems [8]. State-of-the-art filters and observers approach the SLAM problem through linearisation, and do not deal well with poor initial estimation or choice of linearisation point [7]. Additionally, these methods suffer from high computational complexity and poor scalability [9, 19].
Both the SLAM and visual localisation and mapping problems have attracted interest recently in the non-linear observer community. Approaches to these problems have emerged from earlier work on attitude estimation [17, 5] and pose estimation [2, 20, 14]. Bonnabel et al. [3] exploited a novel Lie group to design an invariant Kalman Filter for the SLAM problem. Parallel work by Mahony et al. [18] developed the same Lie group and proposed a quotient manifold structure for the state-space of the SLAM problem. Work by Zlotnik et al. [21] derives a geometrically motivated observer for the SLAM problem that includes estimation of bias in linear and angular velocity inputs. For the visual localisation and mapping problem, where only bearing measurements are available, Lourenco et al. [15, 16] proposed an observer with a globally exponentially stable error system using depths of landmarks as separate components of the observer. Grabe et al. [10] derived a non-linear observer for the case where a significant number of the bearings measured are of coplanar landmarks by using the instantaneous homography constraint. Bjorne et al. [4] uses an attitude heading reference system (AHRS) to determine the orientation of the robot, and then solves the SLAM problem using a linear Kalman filter. A similar approach to the visual localisation and mapping case is undertaken in [6]. Hamel et al. have also introduced a Riccati observer [12] for the case where the orientation of the robot is known.
In this paper we present a novel non-linear geometric observer for the visual localisation and mapping problem. The approach extends the SLAM manifold presented in [18] to include bearings (such as magnetometer or gravity measurements) and landmark points in the same formulation by exploiting the structure of the real-projective space and homogeneous coordinates for bound and free vectors. The proposed state-space also allows modelling of visual features as a simple linear projection of onto . A novel Lie group termed the group is introduced and shown to be a symmetry on the measurement function of the visual localisation and mapping problem. The proposed observer uses decoupled gain matrices for each landmark point that satisfy a simple Riccati equation. As a consequence of decoupling the Riccati observer for each landmark, the computational complexity of our approach is only . Finally, the innovation on the pose of the robot is determined through finding the minimum of a novel cost function on the tangent space of , and is based on the static environment assumption common in SLAM algorithms. The resulting observer is shown to have an error system that is almost globally asymptotically stable (the basin of attraction excludes a set of measure zero) and exponentially stable in-the-large (exponentially stable on any compact set contained in the basin of attraction).
This paper consists of five sections alongside the introduction and conclusion. Section 2 introduces key notation and identities, and provides an in-depth explanation of the application of to representing points and bearings in 3d space. In Section 3, we formulate the kinematics, state-space and output of the visual localisation and mapping system, and in Section 4 we introduce the new Lie group that acts on the state-space. In Section 5 we derive a non-linear observer on the Lie group, and in Section 6 we provide the results of a simulation. The experimental results are designed to verify the theory developed throughout the paper, not to provide a comprehensive evaluation of performance.
2 Preliminaries
2.1 Notation
The special orthogonal group and special Euclidean group are denoted and respecively, with Lie algebras and . For any , the corresponding skew-symmetric matrix is denoted by
[TABLE]
This matrix has the property that, for any , where is the vector (cross) product between and .
Consider a matrix . The notations and are used to represent the rotation and translation components of respectively, and may be written as
[TABLE]
Likewise, for a matrix , the notations and represent the rotational and translational velocity components of respectively, and may be written as
[TABLE]
For any the projector is given by
[TABLE]
The operator projects vectors onto the subspace of orthogonal to . The projector and the skew-symmetric matrix are related by
[TABLE]
for any . For any the projector is similarly defined as
[TABLE]
2.2 Real Projective Space
For , define the set of equivalence classes
[TABLE]
Given two elements , the notation indicates for some . The 3-dimensional real-projective space is a smooth quotient manifold [1]. For any full rank matrix , the operation
[TABLE]
is well-defined.
Let , and define an horizontal space . Define an equivalence relationship for between elements of and . A tangent vector is the equivalence class .
For any , define the projector
[TABLE]
To see this is well-defined, let be a non-zero scalar, and check
[TABLE]
Analogously, the projector is well-defined for any .
Let be a vector representing the position of a point in space. Define the homogeneous coordinates
[TABLE]
as an embedding and refer to such points as bound vectors with foot at the origin of the reference frame and tip at the point it represents. Let be a vector representing a bearing or direction and define homogeneous coordinates
[TABLE]
as an embedding . We term \mathrlap{\overline{b}}\accentset{\hbox{{\raisebox{-1.56723pt}[0.0pt][-1.56723pt]{\circ}}}}{b} a free vector. Using these embeddings it is possible to define a map
[TABLE]
A point-type element of is any element in the subset . A bearing-type element of is any element in the subset . A full inverse of is not uniquely defined due to the sign ambiguity of elements of . However, it is possible to define a unique map by
[TABLE]
where denotes the first three elements of and , analogous to the definition. Define a projection by
[TABLE]
The following commutative diagram holds
[TABLE]
The map is smooth under restriction to either point-type elements or bearing-type elements of . Although is unable to reconstruct the full direction vector from a bearing-type element, the unsigned direction is sufficient for the observer construction that we undertake in the sequel.
3 Problem Formulation
3.1 VSLAM Total Space
The formulation of the total space for the VSLAM problem is an extension of the formulation in [18] to include not only points in 3D space but also bearings through their representations.
Raw coordinates for the VSLAM problem can be defined by fixing an arbitrary reference frame . Let and represent the robot pose and landmark coordinates respectively, defined with respect to . Note that each is either point-type or bearing-type depending on whether its last entry is zero. The total space of the VSLAM problem is the product space
[TABLE]
with elements
[TABLE]
The notation is used to simplify notation in the sequel.
Given , recalling (2) define
[TABLE]
Given two elements , the notation means that for some . The SLAM manifold is the set
[TABLE]
with quotient manifold structure [18].
An expression is well-defined on the SLAM manifold if it is invariant to the action of a rigid-body transformation of the reference frame. An important example is . Given any , one has
[TABLE]
3.2 VSLAM Kinematics
The assumption will be made that the robot is moving through a static environment. Consider the velocity input space . The kinematics of the VSLAM system are given by the function
[TABLE]
3.3 System Output
The physical measurements taken by our robot in the VSLAM system are the bearings of landmarks. Let be the body-fixed frame coordinates of a landmark . Using the basic pinhole camera model as described in [13] with invertible camera matrix , the measurement of taken by the camera is . Assuming the camera is calibrated matrix , it is easy to recover the element
[TABLE]
although the scale of this element is arbitrary and cannot be known. If is a bearing-type element, then and no information is lost through the camera projection. However, if is a point-type element, then the scale of the vector is not recoverable. In this formulation the sign of the landmark measurement (representing whether the landmark is in front of or behind the camera) is ambiguous, but this is sufficient for the observer design undertaken in Section 5. The choice of bearing-type or point-type for a particular landmark is a modelling choice based on the requirements for the resulting map of the environment.
The output space of the VSLAM system is defined as
[TABLE]
The output function of the VSLAM system is defined as
[TABLE]
The output function transforms each into body-fixed frame coordinates, and projects the result into , representing bearing-type of point-type landmark measurements with a calibrated pinhole camera.
4 Symmetry of the VSLAM Problem
4.1 Symmetry of the Total Space
We introduce a group we term Scaled Orthogonal Transformations , a subgroup of the group of similarity transforms on .
Lemma 4.1**.**
For any , the set
[TABLE]
with matrix multiplication is a subgroup of .
Proof.
Assigning matrix multiplication as the group action it is clear that is the direct product of , where is the Lie group formed by assigning multiplication as the operation on . It is straightforward to verify that is a subgroup of by considering the action for . ∎
The action of on landmarks is a rotation combined with a scaling for point-type landmarks. Recalling (2) and taking advantage of the equivalence class structure of ,
[TABLE]
There are exactly three orbits of acting on , defined by
[TABLE]
where refers to the fourth coordinate of .
The symmetry group for the VSLAM problem with landmarks in 3 dimensions is defined as a Lie group
[TABLE]
with product Lie group structure. The associated Lie algebra is denoted .
Lemma 4.2**.**
The mapping defined by
[TABLE]
where the right-hand expression depends on definition (2), is a right group action of on .
Proof.
Trivially, for any . Let and be arbitrary. Then
[TABLE]
This demonstrates that is a right action as required. ∎
Recall the orbits of described in (4.1). Given a configuration , let for some . Observe that if for some , then also, independent of the particular element . To overcome this, in the remainder of the paper it is assumed that there is never a such that . This assumption is reasonable, in that it is equivalent to assuming there are no landmarks coinciding precisely with the origin of the robot. Additionally, it is assumed that the type of each landmark (point or bearing) is known, and the landmarks are enumerated such that and represent of point- and bearing-type landmarks respectively. The reduced total space is defined as
[TABLE]
and only elements are considered from here going forward.
4.2 Lift of the VSLAM Kinematics
In order to consider the system on the group, the kinematics from the state space must be lifted onto the group. The following lemma provides the lift function.
Lemma 4.3**.**
The function , defined by
[TABLE]
where is given by
[TABLE]
is a velocity lift of the kinematics (3.2) onto with respect to the group action (10).
Proof.
To show that is a velocity lift, it is required that
[TABLE]
Equivalently, it is required to show that
[TABLE]
where .
First, it is necessary to show that is well-defined whenever . To see this, let be any non-zero scalar, and observe that
[TABLE]
Recalling the expression for provided in (3.2), it is clear that the first terms on both sides of (11) are equal. Let
[TABLE]
In order to aid in the readability of the following equations, and in (13) are chosen such that . However, it is important to note this choice is arbitrary as shown in (4.2). To show (11), consider that
[TABLE]
using the identity (1). This further reduces to
[TABLE]
where the last step follows from (13) and the choice of . From here, (11) clearly resolves to
[TABLE]
as required. This completes that proof that is a velocity lift. ∎
The kinematics of the true state of the VSLAM system are given by
[TABLE]
Choose a reference configuration . By construction, the trajectories of the lifted system kinematics
[TABLE]
project to trajectories of the VSLAM kinematics (14) via .
5 Observer Design
5.1 Observer Kinematics
Define the observer state to lie on the VSLAM group, , with kinematics given by
[TABLE]
where is an innovation term. The estimated state is given by
[TABLE]
Additional notation is helpful in simplifying the expressions that follow in the observer design. Define
[TABLE]
All expressions above are well-defined for equivalence classes in the SLAM manifold.
5.2 Landmark Observer
Theorem 5.1**.**
Let be the true state of the system, evolving with the kinematics (14). Let be arbitrary up to the requirement that, for all , and are members of the same orbit of under the action of . Define to be the observer state with kinematics defined by (5.1), and define as in (16).
Now, for , define by
[TABLE]
where are constants, and assume that there exist and such that
[TABLE]
for any time and for any . For , define
[TABLE]
Then, for every landmark , define as
[TABLE]
where and are given by (17). Let the innovation term be given by the least-squares solution to
[TABLE]
Then the estimated state coordinates converge to the true coordinates almost-globally asymptotically and exponentially in the large111For any compact set in the basin of attraction of the equilibrium, the value of the Lyapunov function converges exponentially to zero. up to equivalence on the SLAM manifold .
Proof.
To verify that is well-defined note that the cost in (5.1) is invariant to scale in the data for . A Lyapunov analysis proves the desired result.
For , recalling (5), define the error coordinates and candidate storage function as
[TABLE]
respectively. The condition (19) ensures that is well-conditioned, and remains bounded and positive-definite for all time [11]. Therefore the candidate storage function is positive definite. It remains to show that is monotonically decreasing. The kinematics of are
[TABLE]
Differentiating the candidate storage function, one has
[TABLE]
where and denote the infinum of the smallest and the supremum of the largest eigenvalues of over time, respectively. Since is chosen as a constant, and remains well-conditioned and bounded, the equilibrium is exponentially stable. Equivalently, this provides that globally exponentially.
For , define the candidate storage function
[TABLE]
Observe that is well-defined as a function of elements, since the expression is invariant to multiplication of or by any non-zero scalar. Clearly is positive definite. The kinematics of the bearing are given by
[TABLE]
This is well-defined as an element of the tangent space since any scaling of results in the same scaling of the expression for . Since , the dynamics of the norm of any chosen representative of are given by . Analogously, recalling (20) and (5.1), the kinematics of are given by
[TABLE]
and hence the dynamics of the norm of any representative of are given by . As a consequence of this and the scale invariance of (23), we may choose for readability without loss of generality. Differentiating the candidate storage function leads to
[TABLE]
which is negative definite as long as the initial directions and are not orthogonal. There are two situations in which . The first one corresponds to the stable case where ( and are parallel) while the second one corresponds to the unstable case for which ( and are orthogonal). To prove the exponential stability in the large, suppose that for some fixed . Then,
[TABLE]
Observe that, unless , such an can always be found. Therefore, almost-globally asymptotically, and exponentially in the large. Since the measurement function is invertible on bearing-type elements, this provides the desired result that almost-globally asymptotically and exponentially in the large.
Define the whole-of-system Lyapunov function
[TABLE]
From the analysis of each individual , it is clear that almost-globally asymptotically and exponentially in the large. The convergence of each provides that
[TABLE]
almost-globally asymptotically and exponentially in the large as well. This completes the proof. ∎
6 Simulation Results
To verify the observer derived in Theorem 5.1, we conducted a simulation of a vehicle equipped with a single monocular camera, observing 4 point-type landmarks and 2 bearing-type landmarks as it moves through space. The vehicle moves in a circular trajectory at a fixed height of 3 m. The body-fixed velocity is fixed to be constant, with rad/s and m/s. For simplicity, the camera frame is assumed to coincide with the body-fixed frame of the vehicle, which avoids the need for a separate computation to transform the body-fixed velocity into the camera frame. Let the true state be . The reference configuration is chosen as , where
[TABLE]
where the terms represent errors in the initial measurements. The observer is defined on , with kinematics given by (5.1) and innovation terms given by Theorem 5.1. The initial conditions and gains for the observer are chosen as
[TABLE]
The simulation was carried out by implementing the continuous time system with Euler integration using a time step of s.
Figure 1a shows the evolution of , where is the Lyapunov function of the simulated system as defined in (24). This clearly shows exponential convergence of the observer error dynamics. Figure 1b shows the evolution of the trajectory of the simulated system. Since the estimated state only converges to the true state up to equivalence on the SLAM manifold , it is necessary to assign total space coordinates to the estimate to aid the comparison. In Figure 1b the choice of total space coordinates for the estimated state is made so that the final robot pose is aligned with that of the true state. This shows that the landmarks have correctly converged to the true landmarks up to the SLAM manifold equivalence.
7 Conclusion
This paper presents an observer design posed on a novel symmetry group for the visual SLAM problem. The total space and SLAM manifold conceptualised in [18] have been extended to include free vectors. The development of the symmetry group has allowed both point-type and bearing-type landmarks to be treated in a unified framework. Riccati observers were incorporated for each of the point-type landmarks, and grant the user refined control over their convergence. The almost-global convergence of the proposed observer on both point-type and bearing-type landmarks is a contrast to many state-of-the-art Extended Kalman Filter systems, which suffer from linearisation errors. While research into the development of non-linear observers for the SLAM problem is only recent, the observer for visual SLAM presented in this paper demonstrates some of the key advantages the approach can offer.
Acknowledgment
This research was supported by the Australian Research Council through the “Australian Centre of Excellence for Robotic Vision” CE140100016.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds . Princeton University Press, Princeton, NJ, USA, January 2008.
- 2[2] G. Baldwin, R. Mahony, and J. Trumpf. A nonlinear observer for 6 DOF pose estimation from inertial and bearing measurements. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pages 2237–2242, 2009.
- 3[3] Axel Barrau and Silvere Bonnabel. An EKF-SLAM algorithm with consistency properties, 2016. ar Xiv:1510.06263.
- 4[4] E. Bjorne, T. A. Johansen, and E. F. Brekke. Redesign and analysis of globally asymptotically stable bearing only SLAM. In 2017 20th International Conference on Information Fusion (Fusion) , pages 1–8, July 2017.
- 5[5] S. Bonnabel, P. Martin, and P. Rouchon. Symmetry-preserving observers. IEEE Transactions on Automatic Control , 53(11):2514–2526, 2008.
- 6[6] F. Le Bras, T. Hamel, R. Mahony, and C. Samson. Observers for position and velocity bias estimation from single or multiple direction outputs. In T.I. Fossen, K.Y. Pettersen, and H. Nijmeijer, editors, Sensing and Control for Autonomous Vehicles , chapter 1. Lecture Notes in Control and Information Sciences 474, Springer, 2017.
- 7[7] Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, Jos´e Neira, Ian D. Reid, and John J. Leonard. Past, present, and future of simultaneous localization and mapping: Towards the robust-perception age. IEEE Transactions on Robotics , 32(6):1309–1332, December 2016.
- 8[8] J. Delmerico and D. Scaramuzza. A benchmark comparison of monocular visual-inertial odometry algorithms for flying robots. In IEEE International Conference on Robotics and Automation (ICRA) , 2018.
