Analytical Aspects of Distributed MPC based on an Occupancy Grid for Mobile Robots

: In this paper, we evaluate theoretical aspects of a distributed system of noncooperative robots controlled by a distributed model predictive control scheme, which operates in a shared space. Here, for collision avoidance, the future predicted state trajectories are projected on a grid and exchanged via discrete cell indexes to reduce the communication burden. The predicted trajectories are obtained locally by each robot and carried out in the continuous space. Therefore, the quantisation does not impose the quality of the solution. We derive sufﬁcient conditions to show convergence and practical stability for the distributed control system by using an idea of a temporary roundabout derived from crossing patterns of street trafﬁc rules, which is established in a ﬁxed and ﬂexible circle size. Furthermore, a condition for the sufﬁcient prediction horizon length to recognise necessary detours is presented, which is adapted for the occupancy grid. The theoretical results match with the trajectory patterns from former numerical simulations, showing that this pattern is naturally chosen as an overall solution.


Introduction
Today, control problems with high dynamics have been of high interest in production and logistic processes or in traffic scenarios. In manufacturing processes, the technical advances have increased the utilisation of robots, especially in assembly lines in the last decades [1]. In logistic processes, autonomous, connected pallet carriers for material have increasingly been used to supply production lines with material. In common traffic scenarios, autonomous connected cars (or mobile robots) have aggregated different research fields, e.g., sensor fusion, modelling, and handling communication. One central issue is to be able to calculate a feasible solution which ensures safety in terms of collision avoidances. Examples for robotic systems can be found in formation control of robots [2], exploration of unknown environment to achieve an optimal covering with limited number of robots [3], or real-time control of distributed helicopters [4]. All these distributed systems are linked by common properties and arising problems: complexity and availability of information. Concerning complexity, these system have to be handled in a distributed manner. One approach is to use multi-agent systems to decrease the complexity and to enable the distributed system to calculate a solution sufficiently fast. The available computational capabilities on distributed systems like robots or cars ease this approach.
Additionally, the distributed system is relying on the requirement of an information exchange: In contrast to a central system, where the information about the full system state is globally available, here, each agent has to rely on local information (decentralised system) or some communication methods have to be implemented such that the agents are able to gather information from neighbours or subsystems which allow slow variation of the quantisation interval. Here, the quantisation parameters are exchanged among the subsystems to establish terminal regions which ensured that, from a warm start initial condition, the equilibrium was reached within finite time.
Other approaches to stabilise nonlinear systems used Lyapunov-based controllers to keep the system stable with tight state and input constraints, while relaxations on the states were possible via a switched control [25]. On the distributed level, in Reference [26], the Lyapunov controllers were executed in either a sequential or iterative manner while stability was ensured by a cost decrease constraint. In Reference [27], the authors relaxed the Lyapunov scheme via a time delay to allow for a temporary increase of costs. The authors in Reference [28] applied a Lyapunov controller with a global bounded state feedback controller for a tracking problem of a non-holonomic robot.
In these articles, the convergence or stability aspects were shown by utilising either a Lyapunov function or a controller, which guarantees the decrease of costs in any time instant, or of terminal constraints or costs, which then forces the predicted trajectory to achieve an end point or region, or to steer that by a terminal cost penalty or a connective constraint, ensuring that, e.g., a minimum distance or explicit bounds on the optimal value function were used to calculate the suboptimality degree of a finite horizon controller. As the considered scenario here is noncooperative and as a Lyapunov function for one robot may be infeasible if one robot has to increase its costs by taking a detour and if terminal regions or costs will restrict the freedom of feasible trajectories, we use a weaker assumption, which is similar to the movement pattern from former numerical results based on the pattern of a roundabout; see Figures 4 and 7 in Reference [10].
We focus on a noncooperative setting as each robot exhibits an individual target without imposing terminal costs or constraints following Reference [21]. In this article, we derive sufficient conditions to avoid blockings or deadlocks with the assumption that the robots optimise in a fixed order in the spacial set with quantised communication. We avoid imposing Lyapunov conditions for the cost function, which would either impose additional constraints for the OCP or terminal costs. Hence, we implement a switched control, which is inspired from street traffic rules for a roundabout structure. If the robots are below a certain distance to each other, a circle with the mean-based middle point based on the end points of the predicted trajectories of the affected systems is created. Each participating system calculates individual intersection points with this circle and follows the circular curve via a circular control law until the original control law could be reinstated again. Moreover, as a second aspect, we derive a sufficient condition for the prediction horizon length based on a given cost function to examine the sufficient length, which allows the robot to recognise the decrease of costs to take the detour instead of holding their positions probably infinitely long.
The paper is structured as follows: In the subsequent section, the problem is stated with the model of the robots including state and control constraints. We recap the construction of the occupancy grid shortly and give an outline on the construction of the constraints including a safety margin regarding the quantised communication. Then, these parts are assembled to define an OCP in Section 3, which is incorporated in the DMPC scheme on each robot. The theoretical contributions concerning a sufficient prediction horizon length and the convergence to achieve practical stability are presented in Section 4. For the convergence part, we consider two possibilities: First a fixed circle size is considered to possibly include all robots on the circle. Second, we extend this approach to a flexible circle, which considers the maximum size without interfering with targets of other robots. To complete the article, a short conclusion and outlook on future extensions is given.
Notation: R and N denote the real and natural numbers, respectively. N 0 := N ∪ {0} represents the nonnegative integers, and R ≥0 represents the nonnegative real numbers. For integers a, b ∈ N 0 with a ≤ b, the expression [a : b] denotes the sequence {a, a + 1, · · · , b}. Moreover, for a vector x ∈ R n , n ∈ N, we define the infinity norm x ∞ := max i∈[1:n] |x i |.

Problem
Based on Reference [10], we introduce the problem and necessary definitions for the robots, occupancy grid, and constraint construction shortly: Here, we utilise a distributed system of P ∈ N mobile robots. The robots follow a time discrete model stated as a control-affine system: with smooth vector fields f 0 and f i : . . , m}, one control value u p,i is imposed. A successive state of a robot is denoted by z + p and determined by f i based on its current state z p and imposed control u p . The state is represented here without loss of generality by the planar coordinates x p , y p = z p ∈ R d . Constraints for state and control are given by respectively, with U containing the origin. Assuming that our systems in Equation (1) are driftless ( f 0 ≡ 0), describing the kinematic model of a mobile robot, we assume that robots may immediately stop by the following assumption: Assumption 1 (Immediate Hold). Each robot can come to an immediate hold: Note that delays of actuator dynamics, which might invalidate this assumption are not taken into account. For a finite horizon N and individual initial positions z p (0), for each robot p, a finite control sequence is defined by which is utilised to formulate the predicted state trajectory of robot p as z u p 0; z 0 p , z u p 1; z 0 p , . . . , z u p N; z 0 p .

Occupancy Grid and Quantised Communication
Based on a spacial set given by [−x,x] × [−ȳ,ȳ], we construct the occupancy grid by partitioning the spacial set into equidistant cells: The partitioning is defined byx andȳ, both multipliers of c with wherex andȳ are both multiples of c. Furthermore, we assign an index (a, b) ∈ G to each cell; see Figure 1 for details. For the quantisation of communication among the robots, we define q : Z → G: to map a state z p onto the grid G. Now, this enables robot p to broadcast the occupied cells instead of a continuous predicted state trajectory as a sequence of occupancy tuples I p (n) ∈ (N 0 × G) N+1 at time n with Example of a grid G with a max = b max = 4 from Reference [9].
All received occupancy grid tuples I p (n) from the other robots q with q ∈ [1 : P] \ {p} are assembled to i p (n) := I 1 (n) , . . . , I p−1 (n) , I p+1 (n) , . . . , I P (n) (6) and denoted as the occupancy grid. To convert a quantised state back to obtain the spacial coordinates, we require the backward mapping h : G → R 2 by which maps a cell index (a, b) ∈ G to the spatial coordinates of the cell centre ignoring in both cases the time index n. This is then used in the following subsection to construct the coupling constraints, which are later used in the OCP of each robot.

Construction of the Constraints
The mobile robots share a collective operation space defined by Z. As their trajectories may cross each other, we have to ensure collision avoidance regarding two aspects: The first aspect is a minimum cell size of the occupancy grid to prevent robots from skipping cells or swapping cells with another robot. As the dynamics of the robots is discretised with a fixed sampling step, an overlapping of two trajectories may occur within two time instants, leading to an unnoticed crossing of robots. Hence, a minimum cell size to prevent this is imperative. The second aspect stems from a necessary physical distance among the robots and the adherence of the constraints in the discrete time instants and in between two successive time instants.
Cell size: In Reference [10], we proposed a minimum cell size c based on the maximum control value which can be imposed on the robot. This minimum cell size has to cover all possible moves of the robot in Z; therefore, it is lower bounded for each vector field f 0 and f i with i ∈ {1, . . . , m} c := max where each c i with i ∈ {0, . . . , m} covers the dimensions with Here, obeying the kinematic model of the robot, this is sufficient to avoid skipping cells. Safety margin in discrete time instants: Considering the safety margin, we have to ensure that each robot keeps a physical minimum distance to other robots, which is given by d min . As the minimum cell size can be smaller or larger then the necessary physical distance among the robots, we define the safety margin as follows: Definition 1 (Safety Margin). The safety margin among any two robots p, q ∈ P, p = q is given by the necessary physical minimum distance d min and the given minimum cell size c such that holds. Here, (x i , y i ) = z i , i ∈ {p, q} denote the planar positions of the robots.
Furthermore, we have to ensure that this safety margin holds over all predicted states z u p k; z 0 p with k ∈ [1 : N] of one robot p to any other robot q. As the information of the states of robot q is given via the quantised occupancy tuples I q (n) (k) := q z u q (n + k; z 0 q ) for k ∈ [1 : N]; for a given cell size c ≥ c, a robot may be in the centre of the communicated cell or on the boundary. Therefore, we obtain the maximal deviation of the quantised state to the continuous state of the robot by This leads to the definition of the safety margin Ψ regarding the quantised error: Proposition 1 (Quantised Prediction Safety Margin). For a given cell size c ≥ c, minimum distance among robots d min , and discrete time instants k with k ∈ [1 : N], the safety margin between two robots p, q ∈ P, p = q based on the received quantised state I q (n) (k) from robot q is If this holds, then Definition 1 is satisfied with The details of the proof utilising the triangle inequality are given in Reference [10].
Proposition 2 (Intermediate Safety Margin). Suppose that Equation (9) holds for two robots p, q ∈ P, p = q for c ≥ c. Then, the minimum safety margin to prevent intermediate constraint violation in two successive time Again, the details of the proof are given in Reference [10]. Derivation of the Constraints: Now, we utilise the received occupancy tuples I q (n) , q ∈ [1 : P] \ {p} of the other robots to construct the coupling constraints including the safety margin Ψ q . Here, the infinity norm is used as this approximate the geometry of the squared cells.
Based on the received information, we map the state back to the continuous spacial set via h I q (n)(k) . Given the safety margin Ψ q I q (n) (k) , I q (n) (k − 1) , the coupling constraint for robot p regarding robot q is defined as for all k ∈ [1 : N]. Note that, although the infinity-norm describes the property of the occupancy grid very well, it is not differentiable. Hence, a gradient-free optimisation method should be used, which is in general more time consuming than gradient-based methods; see, e.g., Reference [29]. For all robots q with q = p, the received information is condensed in i p (n). Based on the coupling constraints obeyed by robot p, the combined constraints are given by for If these constraints are satisfied, the quantised prediction safety margin based on the information i p (n) and in turn Definition 1 is ensured.

Distributed Model Predictive Control
To complete the requirements to state an OCP, we introduce a strict convex stage cost function p : Z × U → R ≥0 , which is a positive definite with respect to an individual target z * p ∈ Z for each robot. In this distributed setting, each robot is steered by a controller which solves an OCP over a finite horizon length N based on the measured state. Then, after solving its OCP, the first control value of the obtained control sequence is applied, and the procedure is repeated until a termination condition is met, i.e., the target z * p is reached. To this end, the overall OCP is stated as subject to regarding the given constraints concerning the kinematic model, control, state constraints, and coupling constraints induced by the information of the other robots and by individual initial conditions and targets. By solving the OCP, an optimal control sequence is obtained, where uniqueness is ensured by the strict convexity of the cost function p . We introduce the corresponding value function as V N p : The DMPC scheme, which is executed by each robot based on the scheme by References [30,31], is presented in algorithmic form and is divided in an initialisation phase (Algorithm 1) and an execution phase (Algorithm 2).
Algorithm 1 DMPC initialisation for the overall system 1: Given admissible states z p for initial states z 0 p for all p ∈ [1 : P] 2: for p = 1 to P do 3: for k = 1 to N do 4: Set w p = f alse 8: end for 9: Set Q = ∅ In Algorithm 1, every robot keeps its initial state in the first time instant and broadcasts this before the OCP is solved. Q and w p are needed later for the derivation of conditions for convergence and practical stability. Algorithm 2 DMPC execution for the overall system 1: Call Algorithm 1 2: for n = 0, 1, . . . do 3: for p = 1 to P do 4: if n > 0 then 5: Measure z p (n) 6: Assemble i p (n) using Algorithm 3 8: else 9: Receive I q (n) for q ∈ [1 : P] \ {p} 10: Set i p (n) according to Equation (6) 11: end if 12: Solve OCP in Equation (16) and Apply u * p (0)

13:
Broadcast I p (n) 14: end for 15: end for As stated in Reference [10], the asymmetry of information has to be handled: Robot p has information I q (n) of the robots q, q ∈ [1 : p − 1], while for the successive robots q, q ∈ [p + 1 : P], the information of the last time instant I q (n − 1) is accessible. The latter issue is solved in Algorithm 3.
Here, in line 4, the information of the last time instant is shifted to the next time instant to construct the coupling constraints g p q,1 , . . . , g p q,N−1 . The last prediction step is duplicated by using Assumption 1 to obtain g p q,N in line 6, which is used to show the recursive feasibility of the system. Furthermore, the uniqueness of the solution of the OCP is ensured by selecting a strictly convex stage cost function p .

Algorithm 3
Resolving asymmetry of communication data for robot p 1: Given robot p and communication data I q (n) for q ∈ [1 : p − 1] and I q (n − 1) for q ∈ [p + 1 : P] 2: for q = p + 1 to P do 3: for k = 1 to N − 1 do 4: end for 6: Set I q (n)(N) := I q (n − 1)(N) 7: end for 8: Set i p (n) according to Equation (6) Each robot executes Algorithm 2 to perform the DMPC scheme with given initialisation conditions and targets until the target condition is matched.

Prediction Horizon Length and Convergence
In this section, we derive sufficient conditions on the prediction horizon length to guarantee a decrease of the costs function and practical stability of the overall system. As the property of initial and recursive feasibility is needed to ensure that the algorithm does not terminate unexpectedly and a feasible solution is found for any time instant n ≥ 0, we recap shortly the assumptions and theorems which were already presented in Reference [10]: Assumption 2 (Feasible initial conditions). Given a set of robots [1 : P], we have that z 0 p ∈ Z and for all p, q ∈ [1 : P] with p = q condition g p q,0 (z 0 p , q(z 0 q )) ≥ 0 holds.
Assumption 2 states that the scenario is well defined regarding the state and coupling constraints. Based on this assumption, we can conclude the following: Lemma 1 (Initial feasibility). Consider a set of robots [1 : P] with the model in Equation (1) and the constraints in Equations (2) and (3) satisfying Assumption 1. If Assumption 2 holds, then there exists a solution to the OCP in Equation (16).
Concluding that there exists a solution to the initial optimal control problem (initial feasibility), recursive feasibility ensures that there exists a solution for any following time instant n > 0: Theorem 1 (Recursive feasibility). Suppose a set of robots [1 : P] with the underlying model in Equation (1) and the constraints in Equations (2) and (3) satisfying Assumptions 1 and 2. If Algorithm 2 is applied, then the problem is recursively feasible, i.e., for all n ∈ N 0 and all p ∈ [1 : P], there exists a solution to OCP in Equation (16).
The detailed proofs are presented in Reference [10].

Prediction horizon length with an occupancy grid
In Model Predictive Control (MPC), an appropriate prediction horizon length is crucial for convergence. Here, as the constraints utilising the maximum norm and the intermediate safety margin have to be obeyed (Proposition 2), the indifferentiability of the occupied cells has to be taken into account, which requires a suitable horizon length to enable the optimiser to recognise a decrease of costs by taking a detour. The following assumption is stated to ensure disjunct, feasible initial conditions, and targets of the robots: Assumption 3 (Disjunct initial conditions/targets). Consider a set of robots [1 : P] with the underlying model in Equation (1), the state in Equation (2), and the control constraints in Equation (3). For each robot p ∈ [1 : P], let z 0 p , z p ∈ Z, and for all p, q ∈ [1 : P] with p = q andẑ j ∈ {z 0 j , z * j } with j ∈ {p, q}, we have and forẑ p = x p , y p shall hold.
This assumption ensures the existence of a connected feasible path between all initialization points and to all targets, i.e., between any two of these points, there is one grid cell available for a third robot such that all constraints are satisfied. Additionally, to allow that other robots could pass at the bounds of Z, this is ensured by Equation (21). For an illustration, see Figure 2. Here, Ψ is included in the figure, so the overall distance for Equation (20) leads to 2Ψ + c to ensure that a robot can still move between the initial conditions or targets of the other robots. We like to point out that Assumption 3 is stricter as the target positions are included in the feasibility assumption and therefore directly induces Assumption 2 to hold. In order to obtain a sufficient prediction horizon length, which depends on the formulated cost function, a class of stage costs has to be defined: Definition 2 (Stage cost function). : Let p : Z × U → R ≥0 be a positive definite, continuous differentiable function with where d z : Z × Z → R ≥0 and d u : U × U → R ≥0 describe metrics with p (z * p , 0) = 0, p (z p , u p ) > 0 for z p ∈ Z and u p ∈ U and where 0 ≤ λ ≤ 1, λ ∈ R ≥0 describes the fraction of the included penalty of the control.
For stage cost functions based on a normed distance, we have to evaluate the worst case of a necessary feasible detour caused by circumventing another robot's cell in comparison to the direct path to ensure a sufficient prediction horizon length. Therefore, we consider the trajectory points (start and end point of the detour) of a robot at a cell to be circumvented. Then, as most pairs of start and end points of this detour are on the opposite sides of the cell, which has to be circumvented, these pairs can be classified as shown in Figure 3 by either taking the middle of a side length for start and (intermediate) targets or the diagonal length of an asymmetric pair. Figure 3. Classification of start and end combinations of robot 2 to circumvent robot 1 to obtain the ratios of detour/direct way: middle of side length (red), diagonal (blue), and asymmetric (green).
We state the following theorem to obtain the worst-case positioning of the robots, i.e., we derive a maxima of the ratio of direct path to the necessary detour that a robot has to take: Theorem 2 (Worst-case positioning of robots). Suppose a group of robots with the time discrete model in Equation (1) restricted by the state in Equation (3) and the control constraints in Equation (17) with feasible initial conditions and disjunct (intermediate) targets z p (n), z q (n), z p (n + N), z q (n + N) ∈ Z satisfying Assumption 3 with z p (n + N) = z q (n + N) and a minimum cell size c obtained by Equation (9). For two robots p, q ∈ [1 : P], q keeps its position z q (k) ≡ z q with k ∈ [n : n + N]. Then, the worst-case of positioning of two robots is bounded by the maximum of the ratio of a necessary feasible detour (3 c 2 + c for moving along the sides of the cell and c 2 + c 2 + δ c 2 + c 2 for the hypotenuse to the target) to the direct path Figure 4b) by with δ → 0 leading to the maximum ratio This coincides with the shortest feasible target distance (opposite of both sides of the cell of z p , c.f. z 2 (red) in Figure 4b) with and therefore, the ratio of shortest feasible detour to direct path is bounded by Proof. Suppose, a larger ratio of detour to direct path exists. Then, we have to select a combination of start and end points on the side length of the cell. In total, four combinations can be classified, which are depicted in Figure 3. Setting the diagonal opposite edges (Figure 3, blue pair) would lead to a ratio of detour to direct way of and therefore, this is a contradiction to the hypothesis. If a nonsymmetric pair is chosen (Figure 3, green pair) with 0 ≤ δ ≤ 1, this lead to a ratio with and therefore, all these combinations lead to a ratio of detour to direct tour of less then 2 and, therefore, a larger ratio then in Equation (23) does not exists, which completes the proof.
Theorem 2 states the worst-case ratio, which has to be covered by a necessary detour in favour to the direct path, which also determines the ratio for the cost function to let the optimiser recognise a decrease of costs by circumventing a robot. This is illustrated in Figure 4a for two robots, where the quantised position of robot 1 is given as an occupied cell. The sufficient horizon length N * p , necessary for the optimiser to recognise the decrease of costs, additionally depends on the position of the given target z * p , illustrated in Figure 4a for robot 2. Examples for up to 4 robots (P ≤ 4) are presented in the Appendix. Based on this, we can derive a lower bound for the sufficient horizon length assuming the worst-case positioning from Theorem 2, where the ratio of necessary detour to shortest way is at maximum. An example of feasible trajectories is presented in Figure 4b, illustrating the trajectories depending on the target used for the maximum ratio by Theorem 2.
Note that, although the ratio is bounded, an additional difficulty to determine a lower bound on the prediction horizon length occurs as this depends also on the formulation of the cost function: The costs do not depend solely on the difference of current z p (n) and target z * p . A positive definite cost function p defined by Definition 2 may include a penalty on the control with 0 < λ ≤ 1 to avoid large control values or large differences in the control values as many cost functions aiming for a set point are defined for the euclidean space according to Reference [32]

[Equation 3.3]. The open-loop costs
for a robot retaining its position over the prediction horizon N are given by (N − 1) p z p (n; z 0 p ), 0 . Then, if such a robot holds this position, the costs to circumvent the occupied cell have to be lower, i.e., holds. Here, a prediction horizon length, which is long enough that a decrease of state costs (λ = 0) is recognised by choosing a detour, is not necessarily long enough if control effort is also penalised (λ > 0). Hence, a sufficient prediction horizon length has to incorporate the control effort, and therefore, the whole cost function has to be considered. Using Definition 2 and the assumption that a minimiser for the OCP in Equation (16) exists, the sufficient horizon length to recognise a decrease of the open-loop costs is stated in the following theorem: Theorem 3 (Sufficient prediction horizon length). Suppose that two robots p, q are given with feasible initial conditions z 0 p,q ≡ z p,q (n) and disjunct (intermediate) targets z p (K) ∈ Z with K ∈ N 0 , K < ∞ satisfying Equations (2) and (3), minimising a cost function defined by Definition 2 with Assumptions 1 and 2 to hold and a feasible trajectory between z 0 p and z p (K) exists bounded by Theorem 2. If holds, then robot p is able to obtain a feasible control, which allows to decrease the costs according to Definition 2 and therefore is able to circumvent robot q.
Proof. Based on Assumption 2 and Theorem 2, each robot p is equipped with a feasible (intermediate) initial condition and (intermediate) target, i.e., these are feasible and not occupied by other robots for the given time interval k ∈ [n : K]. By Theorem 2 a feasible trajectory exists between z p (n) and z p (n + N) for N → ∞ and bounded costs as the maximum ratio of necessary detour and direct path is bounded. As robot p is steered to z p (N) = z p (K) with K < ∞ for a large horizon N, most of the open-loop cost values of the prediction converge to p (z p (N), 0) → 0. With the constrained control in Equation (3), the state z p (K) is reachable by incorporating the impact of the control with while the upper bound is not necessarily optimal in terms of minimal control effort. Hence, as the states of z p (n), z p (K) satisfy Theorem 2 and a minimiser for the OCP in Equation (16) exists with a bounded maximal detour, a feasible trajectory exists for a finite horizon N. Therefore, the costs are bounded by the constrained control, which defines the required minimum cell size in Equation (9).
The sufficient horizon length is illustrated in the following two examples depicting holonomic and non-holonomic robots, whereas we follow the definition that a robot is holonomic if the total degrees of freedom are equal to the controllable degrees of freedom. Here, we apply cost functions based on Reference [10], and for comparison reasons, the maximum norm constraint definition is used for both systems in Equation (13). The considered cellsize is c = 2.0, and the sampling size is T = 1.0.

Example 1 (Holonomic example). Let be two robots defined by
satisfying the constraints in Equations (2) and (3) and the control bounded by u ∈ [−1, 1] 2 with u = u 1 u 2 , u ≤ √ 2 and stage costs The costs of the open-loop prediction are presented in Table 1, illustrating the sufficient prediction horizon length to circumvent robot q with N = 7. The two columns at first show the costs for the open-loop prediction including the control impact with λ = 0.2 and λ = 0, respectively. The last column represents the stage costs without imposing any control. For an insufficient horizon length, i.e., N = 6, the solution of the optimiser reveals u p = 0. However, to calculate the costs for taking the detour, the optimal control sequence is calculated with the sufficient horizon length N = 7 and stripped by the last value. Therefore, if the control is included in the cost function, the prediction horizon has to be at least N ≥ 7 to recognise the decrease of costs in spite of the necessary detour. For an illustration, see Figure 5a.
Example 2 (Non-holonomic example). Based on Reference [33] an example of a non-holonomic robot defined in the 2D plane via with θ representing its orientation. Moreover, we suppose the given stage costs Due to the constraints of the kinematic model of the robot, the control might be higher than for the holonomic model if the robot has to turn around. The open-loop costs are given in Table 2, showing the costs with included control impact for λ = 0.2 and λ = 0. To compute the control effort for taking the detour for the shorter horizon, the optimal control sequence is computed with the longer horizon with the last value stripped as in the holonomic example. Here, the lowest sufficient horizon is N ≥ 12 to allow the optimiser to recognise the lower costs considering the detour with the additional control impact. For an illustration of the chosen trajetory, see Figure 5b. Therefore, both examples illustrate the necessary lower bound of the sufficient prediction horizon length to recognise a detour with the additional impact by penalising the control. The impact of the control values is too small to change the necessary horizon length but might have impact if a higher penalisation is chosen. A deep analysis about the design of the cost function and control effort analysis to stabilise such a non-holonomic robot is given in Reference [33].

Convergence
Utilizing the geometry of our scenario, we show sufficient conditions for the convergence of robots to their given targets. Therefore, we utilise the assumption of feasible initial conditions and targets and the creation of a circle derived from the idea of a roundabout to let the robots use a circular path to ensure deadlock-free execution. An example for the conflicting zone of four robots with corresponding initial conditions and targets is presented in Figure 6. Figure 6. Example for a collision avoidance conflict of four robots with given initial conditions z i (0) and targets z * i for i ∈ {1, 2, 3, 4} with a max = b max = 3 with conflicting area (gray).

Assumption 4 (Feasible initial conditions and targets allowing boundary values).
Consider a set of robots [1 : P] with the underlying model in Equation (1), the state in Equation (2), and the control constraints in Equation (3). For each robot p ∈ [1 : P] withẑ j ∈ {z 0 j , z * j }, j ∈ {p, q}, we have that z 0 p , z * p ∈ Z, and for all p, q ∈ [1 : P] with p = q, we have that, for allẑ p , a non-empty set M ẑ p ⊂ Z exists such that This assumption is weaker than the former one (Assumption 3) to allow for initial conditions and targets close to the boundary of the defined 2D plane that the robots are operating on. For the definition of the roundabout, the creation of the circle is presented in two versions: First, the size of the circle on the total number of robots and, second, a flexible radius is used depending on the number of conflicting robots only. Now, we show under which conditions the convergence of the robots to their targets can be guaranteed.
If the distance of the end of the predictions of two arbitrary chosen robots i, j ∈ [1 : P] satisfies represents a circle with x, y ∈ R. To obtain the intersection between the trajectory and the circle, let x p (N) , y p (N) = z p (N) and x p (N − 1) , y p (N − 1) = z p (N − 1). This allows to state the linear equation with a c = y p (N) , which can be inserted into Equation (30), and the intersection between circle and trajectory is obtained by To obtain a deadlock-free execution of the collision avoidance mechanism via the circle constraint, the following assumption is required: Assumption 5 (Target-point free circle). For a circle with centre (m x , m y ) and radius r = PΨ 2π , suppose that, for all z * p , p ∈ [1 : P], the conditions m x , m y − z * p > r and (m x , m y ) − z 0 p > r hold. Each robot evaluates the distance to other robots to switch the control law to a circular curve to establish a shared roundabout and to, therefore, ensure the collision avoidance restrictions via Algorithm 4.
If the condition in Equation (28) holds, the algorithm evaluates for each time instant that the predictions of at least two robots undermine their distance. Then, for those robots, the control law is switched and the circular trajectory is carried out for all robots in Q, i.e., all robots, which were added to the set Q. This set Q represents the robots, which satisfy Equation (28) with 2 ≤ |Q| ≤ P. The other robots are set to an immediate hold (Algorithm 4, line 12) and, hence, do not interfere with the created circle. The variable w p ensures that the evaluation of the centre is carried out once and that the control law is switched back if it was circular before. If the exit z e p of the circle is reached or the target z * p is tangential reachable (see Algorithm 4,line 18), the remaining trajectory on the circle to z e p is skipped and the robot continues under the regime of the original control law, which is set back in Algorithm 4, line 19; see Figure 7b for an illustration. To integrate Algorithm 4 into the DMPC scheme, in Algorithm 2, we include after line 11 conditions and targets fulfilling Assumption 4. If the distance between two robots is such that Equation (28) is fulfilled and Assumption 5 is satisfied for all robots p ∈ [1 : P], then the calculation of a circle and execution of the DMPC scheme in Algorithm 2 using the switched control by Algorithm 4 will allow the robots to let them converge to their targets, i.e., to steer the system practically stable.
Proof. From Assumption 4, the feasibility of the initial conditions (n = 0) and targets for all P robots is guaranteed, which is weaker than Assumption 3 with additional allowance of points near to the bounds of the 2D plane. Then, with n > 0 each robot p ∈ [1 : P] minimises the distance to the individual target z * p . Then, if the condition in Equation (28) is fulfilled for at least two robots, the centre m x , m y of the circle for the conflicting robots is calculated with Equation (29) for the given radius r and the circular control law is applied for all conflicting robots according to Equation (33). All other robots are set to an immediate hold via u p . As the condition in Equation (28) holds for any arbitrary chosen robots, the circle is created, when at least two predicted states are closer according to the given condition, and therefore, any other robot is outside this circle fulfilling Assumption 5 and additionally stops moving. With this assumption that no target is located inside the circle (z * p − m x , m y ), each robot p ∈ Q is able to calculate a feasible route via Algorithm 4 subject to the intersections z s p and z e p . Furthermore, the circumference of the circle is chosen such that all robots p ∈ [1 : P] could be integrated on the circular curve and retains the intermediate safety margin, i.e., the trajectories on the circle are feasible. As the exit point is on the opposite position of the circle for each robot, the cost function also decreases with closer distance. Close to the exit point, Algorithm 4 allows an earlier switch to the old control law in Equation (3) if a tangential movement to the original target z * p is possible under the condition that orthogonality of the tangent of the current position at the circle and the vector between current position and centre is ensured. Thus, the trajectory still ensures feasibility and allows each robot to converge closer to its target, which renders the system practically stable.
In a second version, we keep the size of the circle flexible: Instead of using a fixed radius length from Assumption 5, the circle is established for at least two robots with where Q is the set of robots involved in the circle via Equation (28); therefore Q reveals 2 ≤ |Q| ≤ P. Then, the circle is increased in Algorithm 5 as long as no other robot or target is inside the circle with z q (k) − m x , m y 2 > r for k ∈ [n : n + N] and z * q − m x , m y 2 > r, respectively.
Here, the circle is increased in Algorithm 11, line 11 as long as no current position z 0 p or target z * p is inside the circle. Again, the variable w ensures that only once the circle is calculated and not changed further. A robot p not participating in this circle switches the control to u p = 0 according to the stop assumption as long as the circle exists. Then, after the robots following the circle switches their control law back according to the leaving condition in Algorithm 5, line 22, i.e, Q = ∅, the remaining robots are unblocked, i.e., allowing a control u p = 0. According to the previous fixed circle algorithm, the DMPC algorithm is modified by including in Algorithm 2 Call Algorithm 5 after line 11. closest among all other robots. As this distance allows to establish a circle at least for these two robots, the increasing radius via Algorithm 5 allows a maximum size of the circle without incorporating the states of the other robots. Then, with switching the control law subject to Equation (33) and the subset of conflicting robots, the deadlock-free execution of this subset Q is ensured. Then, with the robots following the circular curve under the circular control law, each robot i ∈ Q is able to reach the opposite position and is able to leave on the same condition as in Theorem 4 when the target is tangential reachable. Moreover, as the exit point is closer to the target, the costs also decrease. As long as Q is not empty, the other robots are blocked to prevent them from crossing the circle. Note that, after the circle is dissolved, a minimum distance to other robots which were not participating in the circle is guaranteed by Equation (28) as, for such a robot q / ∈ Q, the distance is at least (m x , m y ) − z q 2 > r with z q ≡ z q (k) for k ∈ [n : n + N]. Moreover, as the control is set to u q = u p as long as Q = ∅, after setting back to the original control law (Algorithm 5, line 29), a feasible solution exists due to Assumption 1. The same procedure to create the circle can be repeated to resolve following conflicts while the robots are approaching to their targets, hence being able to converge to their individual targets, i.e., the system achieves practical stability.

Numerical Illustration
Detailed numerical results about the occupancy grid setting were presented in References [10,34] using, for the first, a fixed order of optimising and executing and, for the latter, a priority rule setting. Without enforcing explicitly the rule to reveal a circular curve, we like to point out one scenario with P = 4 non-holonomic robots from Reference [34], where a minimum open-loop costs priority rule is used, i.e., the robot with minimum costs over the prediction horizon goes first. We selected here two scenarios using cell sizes c = 1.5 and c = 2.0 and with start and target positions of the robots opposite to each other in the edges of the operation space. Hence, they have to circumvent each other in the centre. In Figures 8 and 9, the snapshots of the different time instants are illustrated to show the evolution of time of the trajectories and the behaviour of the robots.
In the first scenario depicted in Figure 8 with cell size c = 2.0, the large cell size demands a large radius due to the collision avoidance constraints. Robot 0 ("car0") goes at first and, hence, may start with a straight trajectory due to the fixed order and the initialisation phase in Algorithm 1. As all robots start with their initial prediction that they keep their position in the first time instant over the prediction horizon, i.e., ensuring feasibility, robot 0 has the centre of the operation space available to use this at first. The other robots ("car1-3") have to incorporate this trajectory from robot 0 and have to choose detours shown by the coloured predictions, which forms a roundabout in a clockwise manner. The occupied cells, which are reserved by the predictions of the robots, are coloured accordingly to the robots. In later time instants, the trajectories show clearly that, although the order may disadvantage the last robots most, the arrival times are quite close due to the identical behaviour of the sidestepping robots.
With cell size c = 1.5 shown in Figure 9, the robots form accordingly a smaller roundabout due to smaller cells and less necessary detours. Similar to the first scenario, robot 0 has the advantage of being enabled to take the centre of the operation space due to the fixed optimisation order. The trajectory especially of robot 3 "car3" shows that the detours are shorter as the robot crosses the centre in a more straight line than in the scenario before. Therefore, without setting implicit conditions on the behaviour of the collision avoidance in both scenarios, the optimiser leads to the pattern of a circular curve, which is also used in street traffic as roundabouts.

Conclusions
In this paper, we derived sufficient conditions for a sufficient prediction horizon length and for convergence to achieve practical stability for a system of distributed robots with quantised communication based on an occupancy grid. We used the properties of the occupancy grid to determine the sufficient prediction horizon length, which is crucial for the optimiser to detect the decrease of costs by taking a detour. The occupancy grid allows to determine the maximum ratio of feasible detour and direct path, which allows to derive such a sufficient prediction horizon length. Moreover, we utilised the idea from a roundabout to establish a control law, which describes a pattern obtained by the numeric results to show convergence for the overall system. This control law was implemented in two versions: First, a fixed radius was used to allow for all robots participating immediately in the circle. Second, this control law was extended to use a flexible radius size to reduce the necessary space to establish the roundabout.
Future considerations include the incorporation of a dynamic optimisation order (priority rules) of the robots and restrictions to traffic scenarios where the freedom of the agents (then cars) is more restricted, e.g., they have to keep within lanes. This may allow easier derivation of conditions to keep such a system deadlock-free, i.e., to steer all distributed agents to their equilibrium.