Autonomous Exploration and Mapping with RFS Occupancy-Grid SLAM

This short note addresses the problem of autonomous on-line path-panning for exploration and occupancy-grid mapping using a mobile robot. The underlying algorithm for simultaneous localisation and mapping (SLAM) is based on random-finite set (RFS) modelling of ranging sensor measurements, implemented as a Rao-Blackwellised particle filter. Path-planning in general must trade-off between exploration (which reduces the uncertainty in the map) and exploitation (which reduces the uncertainty in the robot pose). In this note we propose a reward function based on the Rényi divergence between the prior and the posterior densities, with RFS modelling of sensor measurements. This approach results in a joint map-pose uncertainty measure without a need to scale and tune their weights.


Introduction
The task of autonomous exploration and mapping of unknown structured environments combines the solutions to three fundamental problems in mobile robotics: (self)localisation, mapping and motion control. Localisation is the problem of determining the position of the robot within its estimated or a given map. Mapping refers to the problem of integrating robot's ranging and/or bearing sensor measurements into a coherent representation of the environment. Motion control deals with autonomous decision making (e.g., where to move, where to "look") for the purpose of accomplishing the mission in the most effective manner (in terms of duration and accuracy).
Algorithms for simultaneous localisation and mapping (SLAM) have been studied extensively in the last decades. While the problem is still an active research area, see [1,2], several state-of-the-art algorithms are already publicly available in the Robot Operating System (ROS) [3,4]. Integrating decision control with SLAM, thereby making the robot autonomous in modelling structured environments, proves to be a much harder problem. Heuristic approaches are reviewed and compared in [5]. More advanced approaches are based on the information gain [6][7][8][9][10].
Autonomous SLAM is a difficult problem because any SLAM algorithm can be seen as an inherently unstable process: unless the robot returns occasionally to already visited and mapped places (this act is referred to as a loop-closure), the pose (position and heading) estimate drifts away from the correct value, resulting in an inaccurate map. Decision making in autonomous SLAM therefore must balance two contradicting objectives, namely exploration and exploitation. Exploration constantly drives the robot towards unexplored areas in order to complete its task as quickly as possible. By exploitation we mean loop-closures: the robot occasionally must return to the already mapped area for the sake of correcting its error in the pose estimate.
Early approaches to robot exploration ignored the uncertainty in the pose estimate and focused on constantly driving the robot towards the nearest frontier, which is the group of cells of the occupancy-grid map on the boundary between the known and unexplored areas [11]. Current state-of-the art in autonomous SLAM is cast in the context of partially-observed Markov Decision processes. The reward function is typically the expected information (defined in the information theoretic framework) resulting from taking a particular action. Entropy reduction in joint estimation of the map and the pose was proposed in a seminal paper [6]. Assuming the uncertainty in pose and the map are independent, the joint entropy can be computed as a sum of two entropies: the entropy of the robot pose and entropy of the map. Several authors subsequently pointed out the drawback of this approach [9,10]: the scale of the numerical values of the two uncertainties is not comparable (the entropy of the map is much higher than the entropy of the pose). A weighted combination of the two entropies was proposed subsequently, with various approaches to balance the two entropies.
Most of the autonomous SLAM algorithms have been developed in the context of a Rao-Blackwellised particle filter (RBPF)-based SLAM. We will remain in this framework and develop an autonomous SLAM for the recently proposed RFS based occupancy-grid SLAM (RFS-OG-SLAM) [2], which is also implemented as a RBPF. In this short note we propose a reward function based on the Rényi divergence between the prior and the posterior joint map-pose densities, with the RFS modelling of sensor measurements. This approach results in a joint map-pose uncertainty measure without a need to scale and tune their respective weights.

The RFS Occupancy-Grid Based SLAM
The main feature of the RFS-OG-SLAM is that sensor measurements are modelled as a RFS. This model provides a rigorous theoretical framework for imperfect detection (occasionally resulting in false and missed detections) of a ranging sensor. The previous approaches to imperfect detection were based on ad-hoc scan-matching or a design of a likelihood function as a mixture of Gaussian, truncated exponential, and a uniform distribution [12] (Section 6.3).
Let the robot pose be a vector θ = [x y φ] which consists of its position (x, y) in a planar coordinate system and its heading angle φ. Robot motion is modelled by a Markov process specified by a (known) transitional density, denoted by π(θ k |θ k−1 , u k ). Here the subscript k ∈ N refers to time instant t k and u k is the robot-control input applied during the time interval τ k = t k − t k−1 > 0.
The occupancy-grid map is represented by a vector m = [m 1 m 2 · · · m N ] , where the binary variable m n ∈ {0, 1} denotes the occupancy of the nth grid-cell, n = 1, . . . , N, with N 1 being the number of cells in the grid.
The ranging sensor on the moving robot provides the range (and azimuth) measurements of reflections from the objects within the sensor field of view. Let the measurements provided by the sensor at time t k be represented by a set Z k = {z k,1 , · · · , z k,|Z k | }, where z ∈ Z k is a range-azimuth measurement vector. Both the cardinality of the set Z k and the spatial distribution of its elements are random. For a measurement z ∈ Z k , which is a true return from an occupied grid cell n (i.e., with m n = 1), we assume the likelihood function g n (z|θ k ) is known. The probability of detecting an object occupying cell n, n ∈ {1, . . . , N} of the map, is state dependent (the probability of detection is typically less than one and may depend on the range to the obstacle, but also other factors, such as the surface characteristics of the object, turbidity of air, a temporary occlusion, etc.) and is denoted as d n (m n , θ k ). Finally, Z k may include false detections modelled as follows: their spatial distribution over the measurement space is denoted by c(z) and their count in each scan is Poisson distributed, with a mean value λ.
The solution is formulated using the Rao-Blackwell dimension reduction [13] technique. Application of the chain rule decomposes the the joint posterior PDF as: p(θ 1:k , m|Z 1:k , u 1:k ) = p(m|Z 1:k , θ 1:k ) p(θ 1:k |Z 1:k , u 1:k ). (1) Assuming that the occupancy of one cell in the grid-map is independent of the occupancy of other cells (the standard assumption for occupancy-grid SLAM), one can approximate p(m|Z 1:k , θ 1:k ) in (1) as a product of p(m n |Z 1:k , θ 1:k ) for n = 1, . . . , N. Furthermore, the update equation for the probability of occupancy of the n-th cell, i.e., p(m n = 1|Z 1:k , θ 1:k ) = r k,n , can be expressed analytically [2]: with The posterior PDF of the pose, p(θ 1:k |Z 1:k , u 1:k ) in (1), is propagated using the standard prediction and update equations of the particle filter [13]. However, the likelihood function used in the update of particle weights takes the form f (Z k |Z 1:k−1 , θ 1 Assuming the individual beams of the sensor are independent, we can approximate f (Z k |r k−1 , θ 1:k ) by a product of likelihoods p(z|r k−1 , θ 1:k ) for all z ∈ Z k . Finally, where n z as the nearest grid cell to the point where, for a given pose θ k , range-azimuth measurement z maps to the (x, y) plane. The RBPF propagates the posterior p(θ k , m|Z 1:k , u 1:k ), approximated by a weighted set of S particles {(w

Path Planning
Decision making in an autonomous SLAM algorithm includes three steps: the computation of a set of actions; the computation of the reward assigned to each action and selection of the action with the highest reward.

Computing the Set of Actions
An action is a segment of the total path the robot should follow for the sake of exploring and mapping the environment. The number of actions at a decision time t k (options for robot motion) typically should including both short and long displacements and all possible trajectories to the end points. Due to limited computational resources, in practice only a few actions are typically proposed. They fall into one of the following categories: exploration actions, the place re-visiting (loop-closure) actions, or a combination of the two. The exploration actions are designed to acquire information about unknown areas in order to reduce the uncertainty in the map. Exploration actions are generated by first finding the frontier cells in the map [11,14]. Figure 1 shows a partially built map by a moving robot running the RFS-OG-SLAM algorithm and the discovered frontier cells, using an image processing algorithms based on the Canny edge detector [15]. An exploration action represents a trajectory along the shortest path from the robot's current pose to the one of the frontiers. Because the number of frontier cells can be large, they are clustered by neighbourhood. The clusters that are too small are removed, while the cells in the centre of the remaining clusters compose a set of exploratory destinations. Subsequently the A* algorithm [16] is applied to find the shortest path from the robot's current position to each of the exploratory destinations. Because the A* algorithm assumes that the moving robot size equals the grid-cell size, its resulting path can be very close to the walls or the corners. The physical size of the robot is a priori known and therefore the empty space in the map which the robot can traverse is thinned accordingly (using the morphological image processing operations [15]). The place re-visiting actions guide the robot back to the already visited and explored areas.

Reward Function
Reward functions are typically based on a reduction of uncertainty and measured by comparing two different information states. In the Bayesian context, the two information states are the predicted density at time k and the updated density at time k (after processing the new hypothetical measurements, resulting from the action). However, no measurements are collected before the decision has been made and therefore an expectation operator must be applied with respect to the new measurements resulting from the action.
The reward function can be formulated as the gain defined by an information-theoretic measure, such as the Fisher information, the entropy, the Kullback-Leibler (KL) divergence, etc [17]. We adopt the reward function based on the Rényi divergence between the current and the future information state. The Rényi divergence between two densities, p 0 (x) and p 1 (x), is defined as [17]: where α ≥ 0 is a parameter which determines how much we emphasize the tails of two distributions in the metric. In the special cases of α → 1 and α = 0.5, the Rényi divergence becomes the Kullback-Leibler divergence and the Hellinger affinity, respectively [17]. Using the particle filter approximation of the joint posterior at time k − 1, that is w , the expected reward function for an action u k can be expressed as [18] (Equation 18): with θ k−1 ) has been explained at the end of Section 2. Drawing from f (Z k |Z 1:k−1 , u k ) can be done by ray-casting assuming action u k resulted in pose θ (i) k and using the current estimate of the map. In doing so, the probability of a cast ray hitting an object at an occupancy grid cell is made proportional to its probability of occupancy.
An action in the context of active SLAM is a path which can be translated into a sequence of control vectors u k , u k+1 , · · · , u k+L . Computation of the reward for this case is still given by (6) and (7), but one would have to replace Z j k and f k (Z k |θ , respectively. This is computationally very demanding and so we approximate that reward as a sum of single step rewards. The last step of an active SLAM is to select the action with the highest reward and to subsequently execute it. Autonomous SLAM also needs to decide when to terminate its mission. The termination criterion we use is based on the number of frontier cells. Figure 2 shows the map of the area used in simulations for testing and evaluation of the proposed path planning algorithm. The true initial robot pose (distance is measured in arbitrary units, a.u.) is [2.2, 0.65, 80 • ]. The SLAM algorithm is initialised with particles of equal weights, zero pose vectors and the probability of occupancy for all cells set to 0.5. The simulated ranging sensor is characterised with a coverage of 360 • , angular resolution 0.8 • , and d n (1, θ k ) = 0.8 if the distance between the robot and n-th cell is less than R max = 2.5 a.u. False detections are modeled with λ = 8 and c(z) as a uniform distribution over the field of view with the radius R max . Standard deviation is set to 0.01 a.u. for range measurements and 0.3 • for angular measurements. The occupancy-grid cell size is adopted as 0.02 a.u. The number of particles S = 40. The α parameter of the Rényi divergence is set to 0.5. The mission is terminated when none of the clusters of frontier cells has more than 14 members.
Initially, at time k = 0, map entropy is H 0 = 1. The entropy of the maps in Figure 3a,b are 0.6124 and 0.6074, respectively. An avi movie of a single run of the algorithm can be found in Supplementary Material. Next we show the results obtained from 30 Monte Carlo runs of the autonomous RFS-OG-SLAM, using the simulation setup described above. Figure 4a shows the error of the robot estimated position (in a.u.) and the error of the robot estimated heading (in degrees), averaged over time and over all 30 trajectories. Figure 4b displays the final map entropy versus the duration of the exploration and mapping mission. The average estimated map over 30 Monte Carlo runs is shown in Figure 5a, while the variance is displayed in Figure 5b. Variability in the performance of the autonomous RFS-OG-SLAM is due to many causes, such as the inherent randomness of particle filtering, clustering and reward computation. Overall, however, the proposed autonomous RFS-OG-SLAM performs robustly and produces quality maps without a need for human intervention.

Conclusions
The note presented a path planning method for autonomous exploration and mapping by a recently proposed RFS occupancy-grid SLAM. The reward function is defined as the Rényi divergence between the prior and the posterior densities, with RFS modelling of sensor measurements. This approach resulted in a joint map-pose uncertainty measure without a need to tune the weighting of the map versus the pose uncertainty. Numerical results indicate reliable performance, combining short exploration with a good quality of estimated maps.