Arbitrary Microphone Array Optimization Method Based on TDOA for Specific Localization Scenarios

Various microphone array geometries (e.g., linear, circular, square, cubic, spherical, etc.) have been used to improve the positioning accuracy of sound source localization. However, whether these array structures are optimal for various specific localization scenarios is still a subject of debate. This paper addresses a microphone array optimization method for sound source localization based on TDOA (time difference of arrival). The geometric structure of the microphone array is established in parametric form. A triangulation method with TDOA was used to build the spatial sound source location model, which consists of a group of nonlinear multivariate equations. Through reasonable transformation, the nonlinear multivariate equations can be converted to a group of linear equations that can be approximately solved by the weighted least square method. Then, an optimization model based on particle swarm optimization (PSO) algorithm was constructed to optimize the geometric parameters of the microphone array under different localization scenarios combined with the spatial sound source localization model. In the optimization model, a reasonable fitness evaluation function is established which can comprehensively consider the positioning accuracy and robustness of the microphone array. In order to verify the array optimization method, two specific localization scenarios and two array optimization strategies for each localization scenario were constructed. The optimal array structure parameters were obtained through numerical iteration simulation. The localization performance of the optimal array structures obtained by the method proposed in this paper was compared with the optimal structures proposed in the literature as well as with random array structures. The simulation results show that the optimized array structure gave better positioning accuracy and robustness under both specific localization scenarios. The optimization model proposed could solve the problem of array geometric structure design based on TDOA and could achieve the customization of microphone array structures under different specific localization scenarios.


Introduction
In the past two decades, microphone array technology has consistently been a hot research field. Microphone arrays are mainly used for sound source localization and identification, and have been an important practical technology with many valuable applications, such as noise source localization [1,2], target sound source tracking [3], teleconferencing systems [4,5], intelligent robots [6][7][8], and so on.
In microphone array technology, there are three main methods for sound source localization, namely, beamforming, acoustic holography, and time difference of arrival (TDOA). The beamforming method applies delay-and-sum to signals from an array of microphones, and in the direction of the source, a beam peak will form to locate the sound sources [9]. Acoustic holography reconstructs the acoustic fields to locate the sound sources by solving the inverse propagation problems [10]. Beamforming and acoustic holography methods usually involve planar microphone arrays and calculation points located on a surface at a certain distance with respect to the array, which provides a poor resolution in the direction perpendicular to the array. In recent years, beamforming with several deconvolution techniques [11] and inverse methods with additional issues [12,13] have been proposed to construct volumetric sound source imaging, which can give the exact three-dimensional (3D) coordinates of sound sources. The method based on TDOA, virtually a triangulation method, locates the sound source using geometric relationships between microphones and sound sources, which can give the spatial position of sound sources with reasonable accuracy using a small number of sensors [14]. TDOA methods have been widely used for real-time sound source localization [15,16]. Moreover, in some sound source localization scenarios, such as simple sound source tracking, TDOA methods show better application prospects. Different numbers of microphones and different kinds of array structures are used in these three methods. In general, the number of microphones used in beamforming and acoustic holography is much larger than in the TDOA method because the number of microphones has a significant influence on the reconstruction accuracy of the sound source mapping [17]. Nevertheless, the number of microphones required in the TDOA method is much smaller. In theory, only four microphones are needed to locate a sound source in three-dimensional space. For example, Wu and Zhu [15] used only four microphones to locate arbitrarily time-dependent acoustic sources in a free three-dimensional space in real-time. In addition, the number of microphones is not the decisive factor of its location accuracy. The array structure is another main factor that relates to the accuracy of source localization for these three methods.
Many kinds of microphone array structures are applied in sound source localization, which can mainly be divided into three categories: 1-dimensional, 2-dimensional, and 3-dimensional. The 1-dimensional array structure group mainly comprises binaural arrays [18] and linear arrays [19]. 2-dimensional array structures include square [20], cross [21], spiral [22,23], and circular geometries [24]. 3-dimensional arrays mainly include cubic [25], pyramidal [25], hemispherical [26], and spherical [27] geometries. Relevant scholars have analyzed the performance of various arrays, indicating that each kind of array structure is only suitable for specific localization algorithms and scenarios. There is no array structure that can achieve good localization performance under any kinds of scenarios and algorithms. For example, the two-or three-dimensional localization accuracy of a randomly distributed array will vary widely with respect to the relative position of the sound source [28]. Therefore, the optimization of the microphone array structure becomes an important research point. Wang and Bei [29] proposed an optimization method based on acoustic holography theory to optimize the microphone array coordinates on a fixed cross X-type array structure, and the main side lobe ratio and the main lobe area were selected as the optimization objective function. Kodrasi et al. [30] adopted different heuristic optimization approaches and an exhaustive search approach to optimize the microphone positions for an arbitrary planar array based on the beamforming method, and Kodrasi's methods found near-optimal configurations. Recently, Yan and Ma [31], Sarradj [32], Bjelić et al. [33], Teng and Lv [34], and Le Courtois et al. [35] also proposed new methods for planar array optimization based on the beamforming method, and compared the array performance under different localization scenarios. In the optimization procedure, the main-lobe width and side-lobe level are generally selected as the optimization objective function. Padois et al. [36] proposed a spherical microphone array with polyhedral discretization and compared it with a spherical array with a slightly different geometry based on the beamforming method. The results showed that the polyhedral discretization array could obtain better positioning accuracy. In 2019, Padois et al. [37] carried out further research on array geometry optimization based on time-domain beamforming. They proposed an optimal spherical microphone array geometry using a nonlinear optimization.
Numerical and experimental results showed that the optimized geometry improved the sound source mapping.
From the above, it can be seen that a great deal of research work has been done in the field of microphone array optimization for sound source localization. However, these array optimization methods are mainly based on the beamforming and acoustic holography methods. The optimization procedure is also usually based on existing array structures, such as cross array [29], circle array [31], spiral array [32], irregular planar array [33,34], spherical array [38], and so on. As such, certain constraints for the array structure have been introduced to the optimization. A pre-constrained array structure may lead to a local optimum, which may not be suitable for certain specific localization scenarios. In addition, besides the array optimization based on beamforming and acoustic holography, the research on array optimization based on TDOA is relatively rare. Zietlow et al. [39] established a simulation model based on TDOA to compare the source positioning accuracy of different microphone arrangements. The microphone arrays consist of eight microphones in three different arrangements, namely cube, twisted cube, and random. These array arrangements were fixed, and no optimization was performed for the array structures. Hu et al. [40] proposed an analytical method based on TDOA to optimize microphone array structure, which could guarantee that the sound source localization had the same performance in all directions for omni-directional estimation. However, the optimal result led to a set of nonlinear equations which could not give deterministic analytical solutions. With additional constraints, only a particular solution in a regular polyhedron form can be obtained. Further, only five kinds of array structures with a specified number of microphones belong to the solution of regular polyhedron form, including the tetrahedron (5 microphones), the hexahedron (9 microphones), the octahedron (7 microphones), the dodecahedron (21 microphones), and the icosahedron (13 microphones). The limited solution of array structures restricts the practical application of Hu's method. Meanwhile, due to some constraints in modeling and solving the method, these five kinds of array structures may not give the best positioning results under some specific localization scenarios, such as the scenarios with the asymmetrical distribution of sound sources. Therefore, more in-depth research needs to be carried out in the field of array structure optimization for sound source localization based on TDOA. This paper is devoted to an arbitrary microphone arrays optimization method for sound source localization based on TDOA. The method proposed is a numerical approach based on the particle swarm optimization (PSO) algorithm, which can optimize the array structure of an arbitrary number of microphones under any specific localization scenarios without prior array structure information. Examples of localization scenarios were constructed to obtain the optimal array structures through the proposed method. Additionally the optimal array structures were compared with the array structures proposed by Hu et al. [40] as well as random array structures under the constructed specific scenarios.
This article makes four main contributions. First, a numerical approach of microphone array optimization based on the PSO algorithm for the TDOA method is proposed. Second, the proposed model can perform array structure optimization with an arbitrary number of microphones, and no prior array structure information is introduced into the optimization procedures, which is likely to obtain the more optimal solutions. Third, the array optimization model has general applicability, which can effectively solve the problem of microphone arrangements in sound source localization under different specific localization scenarios. Fourth, the fitness evaluation function constructed in the optimization model can give good consideration to the accuracy and robustness of sound source localization based on TDOA. The two specific localization scenarios established here verify the proposed optimization method.
In the following sections, the optimization model is introduced in detail, and the localization performance is compared with the array structures proposed by Hu et al. as well as with random array structures. Section 2 introduces the construction of the TDOA-based sound source localization model for an arbitrary microphone array, as well as the solution for the localization model. The numerical optimization model based on PSO for an arbitrary array structure is presented in Section 3, in addition to the optimization procedure. Simulations were performed and their results are discussed in Section 4, followed by conclusions in Section 5.

Construction of Localization Model Based on TDOA
The sound source localization model is the basis of the array structure optimization. The TDOA method was used to locate the sound source. Therefore, the time difference and the spatial geometric relationship between the array and the sound source were used to establish the localization model.

Geometric Structure Parameterization for Arbitrary Microphone Array
In order to optimize the microphone array, the geometric structure of the array needs to be parameterized first. Because the sound source localization method in this paper is based on TDOA, a reference microphone is needed in the microphone array. For convenience, the coordinate of the reference microphone M 0 is set as (0, 0, 0). Then, the other microphones' coordinates can be expressed by the radial distance l i , the azimuth angle α i , and the elevation angle β i in three-dimensional space, as shown as Figure 1. The coordinates of the other microphones are M i (l i cos(β i ) cos(α i ), l i cos(β i ) sin(α i ), l i sin(β i )), where i = 1, 2, 3..., N m , N m stands for the number of microphones except for the reference microphone. Therefore, the optimal parameter of the microphone array is The constraint of the azimuth angle α i is [0 • , 360 • ] and for the elevation angle it is β i is [−90 • , 90 • ]. The range of radial distance l i is related to the size of microphone array and the frequency band of sound source, as well as the requirements for actual positioning scenarios. Normally, the lower limit of l i is the diameter of microphone d m , and the upper limit of l i is c/(2 f m ), where c is the speed of sound and f m is the main periodic frequency present in the sources. Then, the set of the optimal search space can be described as

Spatial Source Localization Model Based on TDOA
The spatial source location model was constructed based on the TDOA method, which is a triangulation method. Suppose the coordinate of the sound source is S = (x, y, z). The mathematical description is shown as follows: where r 0 is the distance between the sound source and the reference microphone M 0 , and r i is the distance between the sound source and the other microphones M i . i = 1, 2, · · · , N m .
By constructing the time arrival difference from the sound source to the reference microphone and the other microphones, the spatial source location model can be obtained. The model is shown as follows: where τ i,0 is the sound arrival time difference between M i and M 0 . τ i,0 can be estimated by the method of cross correlation. Suppose that u i (t) and u 0 (t) are the acoustic signals acquired by microphones M i and M 0 separately. The cross-correlation function between the two signals is The spatial source location model consists of a group of nonlinear multivariate equations, which is difficult to solve. An alternative method is to transform the model into a set of linear equations. Spatial distance satisfies the relationship shown in Equation (7): Then, Equation (4) can be rewritten as: where (8) is a group of linear equations, which can be written in matrix form:

Solution for Spatial Source Localization model
When the number of microphones is N m = 3, the spatial source location model (Equation (9)) can be solved directly, as represented in [41]. However, the direct solution method may produce two different answers, which leads to localization ambiguity. Meanwhile, the accuracy and robustness of source localization are not very good under the condition of N m = 3. Adding redundant sensors can effectively improve the performance of source localization. When there are more than four microphones (N m ≥ 4), the system is overdetermined, as the number of measurements is greater than the number of unknowns. The LS (least-square) method can be used to solve the overdetermined linear equations. Chan and Ho [42] proposed an alternative solution algorithm in closed-form, valid for both distant and close sources, which used twice-weighted LS to give the localization results. Chan's method gives an explicit solution with reasonable accuracy and is non-iterative with low computational complexity. Therefore, Chan's method is more suitable for the optimization calculation of acoustic array structure in this paper.

The First Weighted Least-Square Solution Process
In order to solve the source localization model by least-square method, Equation (9) should be rewritten to construct an error vector. Because of noise in the TDOA estimation, the error vector can be derived as: where (·) 2 stands for the expectations of variables without noise.
Suppose that the noise of TDOA estimation is n i .
In practice, the condition r 0 i r i,0 = cn i,0 is usually satisfied. Therefore, the second term on the right hand side of Equation (11) can be ignored. Then, the covariance matrix of ψ can be given as: where Q = E(nn T ) = Cov(n). Then, the first weighted LS method is used to solve Equation (10): When the source is far from the array, each r 0 i is close to r 0 , so R ≈ r 0 I. Then, an approximate solution of Equation (13) is When the source is close to the array, Equation (14) can be firstly used to obtain an initial solution to estimate R, which can be substituted into Equations (12) and (13) to get a more accurate result.

The Second Weighted Least-Square Solution Process
The above solution of z a assumes that x, y, and r 0 are independent. However, r 0 is related to the source location. In order to incorporate this relationship to give an improved estimate, z a can be expressed as z a,1 = x 0 + e 1 , z a,2 = y 0 + e 2 , z a,3 = z 0 + e 3 , z a, where e 1 , e 2 , e 3 , and e 4 are estimation errors of z a . (x 0 , y 0 , z 0 ) are the coordinates of the real sources. Then, a new error vector ψ can be obtained as: Substitute Equation (15) into Equation (16): The covariance matrix of ψ is where Then, the second weighted LS method is used to solve Equation (16): The matrix Ψ is not known since it contains the true values. However, R can be approximated by using the values in z a . If the source is far away, then the covariance matrix of z a can be represented as: Then, Equation (19) reduces to: The final sound source position is estimated as: or

Numerical Optimization Method for Array Structures
Given a certain number of microphones, there are infinite spatial geometric structures for microphone arrays. Nevertheless, in various practical scenarios, it is necessary to find the optimal array structure to effectively reduce the positioning error of sound sources in the target area. Because the microphone array consists of multiple microphones and each microphone's coordinates have three independent variables, it the structure optimization of the microphone array in this paper is a multidimensional optimization problem.
The evolutionary algorithm is a global optimization method with high robustness and broad applicability. Unlike classic optimization methods such as gradient descent and quasi-Newton methods, the gradient of the problem being optimized is not required for the evolutionary algorithm. Meanwhile, the evolutionary algorithm makes few or no assumptions about the optimization problem and has great advantages in the application of unsupervised, complex multidimensional problems that cannot be solved using traditional deterministic algorithms [43]. The genetic algorithm (GA) and particle swarm optimization (PSO) algorithm are evolutionary algorithms. GA searches for the optimal solution by imitating the mechanism of selection and inheritance in nature. The selection of crossover rate and mutation rate in GA seriously affects the quality of the solution, and the selection mostly depends on experience. Additionally, GA is very slow and difficult to converge for high-dimensional problems. Particle swarm optimization (PSO) is a metaheuristic global optimization algorithm, and the inner workings of the PSO make sufficient use of probabilistic transition rules to search very large spaces of candidate solutions in parallell [44]. Compared with GA, PSO has the advantage of simplicity, easy implementation, and few parameters requiring adjustment. PSO does not have genetic operations such as crossover and mutation. Instead, it determines the search based on its speed. Another essential feature of PSO is that particles have memories. The full search and update process of PSO follows the current optimal solution. Compared with GA, PSO may converge to an optimal solution more quickly. For the optimization problem of the microphone array structure in this paper, the gradient of the optimization objective function is difficult to derive. Additionally, the structure optimization of an array with many microphones is a high-dimensional optimization problem. These factors make PSO an effective method to solve the optimization problem in this paper. Therefore, an optimization model based on PSO was constructed to optimize the geometric parameter of the microphone array under different localization scenarios.

Optimization Model Based on PSO
A swarm of particles which traverse a multidimensional search space are employed in the PSO algorithm to find optima. Each particle is a potential solution and is influenced by the experiences of other particles, as well as its own experiences. Let p j be the position in the search space of the j-th particle, and the number of particles is set as N p . Then, a swarm of particles can be expressed as: where each particle can be denoted as A new fitness evaluation function for the array structure optimization is constructed by the mean squared error (MSE) and the variance (VAR) of the localization results, which can comprehensively consider the localization accuracy and robustness. The fitness function is shown as follows: where φ w is the weight value, φ w ∈ [0, 1]. z p is the final estimated sound source position. MSE(z p ) is the mean squared error of the localization results, which can be defined as where z 0 p is the coordinate of the real source. N s is the number of sources involved in the optimization. VAR(z p ) is the variance of the localization results, which can be defined as: where In Equation (26), the mean squared error can be used to judge the accuracy of sound source location results, and the variance can be used to judge the robustness of sound source localization results. The weight ratio between them can be adjusted according to the requirement of localization scenarios.
Then, an optimization problem (minimization) is defined as: where R d is the real number field in d-dimensional space. The PSO algorithm is used to solve this optimization problem. To seek the optimal solution, each particle moves in the direction of its previously best (p best ) position and the global best (g best ) position in the swarm. The expression of p best is and the expression of g best is where k denotes the current iteration number, and I t denotes the maximum iteration number. The velocity V and position p of particles are updated by the following equations: where V stands for the migration velocity of particles, which is common to be set as a boundary to limit particles flying out of the search space. rand(·) are uniformly distributed random variables within range [0, 1]. c 1 and c 2 stand for learning factors, which are positive constant parameters. w is the inertia weight used to balance the global exploration and local exploitation. Shi [45] suggested a solution to determine the inertia weight: where w max and w min are maximum and minimum weight, respectively.

PSO Optimization Procedure
The optimization procedure for the acoustic array is summarized as follows: Step 1. Initialize PSO parameters including the number of particles N p , the learning factors c 1 and c 2 , inertia weights w max and w min , and the total iteration number I t .
Step 2. Initialize the particles' positions with a random distribution p j (0), and the parameters of the each particle (j = 1, 2, . . . , N p ) do not go beyond the boundaries of the search space.
Step 5. If f (p j (k)) < f (p best (j, k)), update the best known particle position p best (j, k) = p j (k); if f (p best (j, k)) < f (g best (k)), update the global best position g best (k) = p best (j, k).
Step 6. Judge the termination criteria: f (g best (k)) ≤ δ (δ is presented as threshold) or the iteration number reaches the maximum I t with the fitness function converging steadily. If not, repeat Steps 4 and 5. Otherwise, go to Step 7.
Step 7. Output the g best (k) that stands for the best optimized result. The flow chart of the optimization procedure is shown in Figure 2.

Simulation and Analysis
In order to verify the effectiveness of the method proposed in this paper, two kinds of sound source localization scenarios were constructed for microphone array optimization. One scenario was a ring-shaped sound source distribution, and the other was a cuboid sound source distribution. These two scenarios represent some specific sound source localization scenarios in practical applications, such as surround sound sources localization and road traffic flow noise sources tracking. For each specific localization scenario, two strategies of structure optimization were adopted to generate two kinds of optimal structures. In addition, the regular polyhedron microphone array structure proposed by Hu et al. as well as random array structures were used as a comparative study of the performance of sound source localization. The model established in Sections 1 and 2 was edited to code and run on the Matlab platform.

Scenario I-Ring-Shaped Sound Sources Distribution
In scenario I, sound sources were distributed in a cyclic annular band, here referred to as the ring-shaped sound source distribution. The distribution was controlled by Equation (35), as follows: where C SI is the coordinates of the sound source. R SI is the radius of the cyclic annular band. θ SI is the azimuth angle of the source. h SI is the height of the source. In scenario I, R SI ∈ [6 m, 6.5 m], The source distribution for the array structure optimization under scenario I is shown in Figure 3. The microphone array was located in the center of the ring, in which the reference microphone was located at the origin of the coordinates, and the location of other positioning microphones was obtained by optimization calculation. For the sound source localization based on TDOA, the time difference estimation error is the main influencing factor for localization accuracy. To facilitate the optimization and verification of microphone array structures, the TDOAs were directly obtained by calculating the relative position relationship between the sound sources and the array microphones. A noise component was added to the TDOAs, which was used to represent measurement noise in actual applications.
where η i,0 is the time delay estimation noise component. η i,0 is assumed to be a mutually independent, zero-mean stationary Gaussian random process, and the standard deviation of η i,0 is σ. In this simulation, σ was set as 0.01. Five microphones were selected to optimize the microphone array structures, which were compared with the tetrahedral structure array proposed by Hu et al. as well as a random array. One microphone in the array was chosen as the reference microphone, and the coordinates were set as (0, 0, 0). The other microphones' coordinates were set as M i = (l i cos(β i ) cos(α i ), l i cos(β i ) sin(α i ), l i sin(β i )). The array parameters (l i , α i , β i ) were set as the properties of each particle. The parameters of the PSO model were initialized. The learning factors c 1 and c 2 were all set to 1.5. The maximum weight w max was set to 0.8. The minimum weight w min was set to 0.4. The weight value φ w of the fitness function was set to 0.5, which means that the localization accuracy and robustness were equally considered.
In this simulation, two optimization strategies were applied to search for the optimal array structure. For the first kind of array optimization (Opt-array I), the distances l i between M i and M 0 were set to the same length 0.7 m, which is comparable with the tetrahedral structure array proposed by Hu et al. and the random array. Then, the azimuth angle α i and the elevation angle β i are the main geometric parameters to be optimized for the microphone array. Therefore, for Opt-array I, the dimension of the particles in the PSO algorithm was eight, since the number of other microphones used in scenario I was four. The constraints of the optimization space were α i ∈ [0 • , 360 • ] and β i ∈ [−90 • , 90 • ]. The number of particles N was set to 250. For the second kind of array optimization (Opt-array II), the radial distance l i was not predefined. The azimuth angle α i , the elevation angle β i , as well as the radial distances l i were all used for the array geometric parameters to be optimized. For convenience, the radial distance l i was limited between 0.2 m and 0.8 m, considering the general array size for sound source localization. Then, the dimension of the particles for Opt-array II was twelve, and the constraints of the optimization space were α i ∈ [0 • , 360 • ],β i ∈ [−90 • , 90 • ], and l i ∈ [0.2 m, 0.8 m]. Given that the optimizing search space is much larger than in Opt-array I, the number of particles N was set to 450 for Opt-array II.
Then, the optimization model based on PSO ran on the Matlab platform to obtain the optimal array structure under scenario I. The fitness evolution curves of Opt-array I and the Opt-array II are shown in Figure 4. It can be seen from Figure 4 that the fitness evolution curve tended to be stable after 200 steps of iteration, which indicates that the optimization process was basically convergent. The optimization results are deemed to be the optimal array structures under scenario I.
For the manufacturing of a microphone array, an angle precision of 5 • is achievable. Therefore, the angle values of the optimized array were all rounded each 5 • . The microphone coordinates and geometric parameters of the optimal arrays, the tetrahedral array proposed by Hu et al., and a random array are listed in Table 1.
It can be seen from Table 1 that the array geometric parameters between Opt-array I and the Opt-array II were different. The average radial distances of Opt-array II were larger than those of Opt-array I. At the same time, there were small differences among the four radial distances of Opt-array II. The array structures of Opt-array I and Opt-array II are shown in Figure 5. In order to verify the performance of the optimal arrays obtained in the simulation, a scenario with sound sources randomly distributed in the cyclic annular band was constructed, as shown in Figure 6. As shown in Figure 6, 200 sources were randomly distributed in the cyclic annular band of scenario I. Opt-array I, opt-array II, the tetrahedral structure array, and the random array were used to locate these sources. The distances between the located sources and the corresponding real sources were counted to measure the positioning accuracy and robustness. Meanwhile, in order to analyze the influence of input noise on array positioning performance, Gaussian random noises η i,0 with four different standard deviations were added to the time delay estimation, namely, σ = 0.005, σ = 0.01, σ = 0.015, σ = 0.018. The statistical chart is shown in Figure 7. In Figure 7, the height of the rectangular bar stands for mean localization error, and the length of the line bar presents the standard deviation of the localization error. It can be seen from Figure 7 that the mean values and the standard deviations of the localization error were enlarged with the increase of the input noise component of the time delay estimation. Under the same input noise amplitude, the mean value and the standard deviation of Opt-array I and Opt-array II were all much lower than that of the tetrahedral array and the random array. The bigger the input noise, the more significant the gap. This means that the optimized arrays by the proposed method could improve the accuracy and robustness of the sound source localization based on TDOA. The results illustrate the effectiveness of the array structure optimization method proposed in this paper.
Under four different input noise levels, the mean value and standard deviation of the localization error for the random array were all much larger than that for Opt-array I, Opt-array II, and the tetrahedral array, which illustrates that array optimization-whether the method of this paper or Hu's method-produced a positive effect. The random arrays may achieve excellent positioning performance, but the possibility is tiny. Also, the mean value and standard deviation of the localization error for Opt-array II were lower than for Opt-array I. Considering that there were differences among the four radial distances l i of Opt-array II after array structure optimization, the optimization of the radial distance l i contributes to promoting the positioning performance of the microphone array besides the optimization of the azimuth angle α i and the elevation angle β i .

Scenario II-Cuboid-Shaped Sound Sources Distribution
In scenario II, the sound sources were distributed in a cuboid space band, here referred to as the cuboid-shaped sound source distribution. The cuboid space band is was 15 m × 6 m × 3 m. The microphone array was located on one side of the cuboid distribution. The location of the reference microphone coincided with the origin of the coordinate system. The constructed scenario II is shown in Figure 8. In the simulation of scenario II, five microphones were also selected to optimize the microphone array structure, which was compared with the tetrahedral structure array proposed by Hu et al. and a random array. The noise component η i,0 with zero-mean Gaussian normal distribution was introduced into the time delay estimation. The standard deviation σ of the noise was the same as in scenario I, namely, 0.01. The parameters of the PSO model were set to be the same as in scenario I.
The fitness evolution curve of Opt-array I and Opt-array II under scenario II are shown in Figure 9. It can be seen from Figure 9 that the fitness evolution curve tended to be stable after 150 steps of iteration. At the beginning of the iteration, the fitness function value of Opt-array II was higher than that of Opt-array I. Nevertheless, after numbers of iterative calculation, the fitness function value of Opt-array II was lower than that of Opt-array I when the iterations approached convergence, which means that the optimized structure of Opt-array II may have better localization performance than Opt-array I. Also, the angle values of the optimized array were all rounded each 5 • . The microphone coordinates of the optimal arrays and the tetrahedral array proposed by Hu et al. and the random array are listed in Table 2.  In order to verify the performance of the arrays, the scenario of randomly distributed sound sources in the cuboid space band was constructed, as shown in Figure 11. In Figure 11, 400 sources are randomly distributed in the cuboid space band.The Opt-array I, the Opt-array II, the tetrahedral structure array, and the random array are used to locate these sources. The Gaussian random noise η i,0 with five different standard deviations are added to the time delay estimation, namely σ = 0.002, σ = 0.005, σ = 0.008, σ = 0.01, σ = 0.012. The statistics of the distances between the located sources and the corresponding real sources are drawn in Figure 12. It can be seen from Figure 12 that the mean values and the standard deviations of the localization error were enlarged with the increase of the input noise component of the time delay estimation. Under the same input noise amplitude, the mean value and the standard deviation of Opt-array I and Opt-array II were much lower than those of the tetrahedral array proposed by Hu et al. and the random array, and the gap increased rapidly with the increase of input noise. The optimized arrays by the proposed optimization method could improve the accuracy and robustness of the sound source localization based on TDOA.
Under five different input noise levels, the mean value and standard deviation of the localization error for the random array were much larger than that for Opt-array I, Opt-array II, and the tetrahedral array, especially when the input noise component was large, which illustrates that array optimization produced a positive effect. Random arrays have little chance of achieving excellent positioning performance under specific localization scenarios. Also, the mean value and standard deviation of the localization error for Opt-array II were lower than those for Opt-array I, which indicates that the optimization of the radial distance l i contributed to promoting the positioning performance of the microphone array besides the optimization of the azimuth angle α i and the elevation angle β i . Moreover, the localization error reduction of Opt-array II under scenario II was more significant than that under scenario I when the input noise component was large. Considering that the difference of radial distance l i of Opt-array II under scenario II was much larger than that under scenario I, the radial distance under scenario II was a more significant factor of the array structure optimization than under scenario I.
In addition, compared with scenario I, the mean values and the standard deviations of scenario II were much larger. The standard deviation rose sharply with the increase of input noise. The main reason for this is that the location area and the size of the sound sources in scenario II were much larger than in scenario I, and the sound sources were asymmetrically distributed. For scenario II, increasing the number of array microphones may help to reduce positioning errors and improve positioning robustness. Therefore, another optimization case was applied in scenario II, which is that seven microphones were chosen for the array structure optimization. The octahedron structure array proposed by Hu et al. [40] and a random array with seven microphones were used for comparative study.
Two kinds of optimization strategies were also used in the simulation. In the first kind of array optimization (Opt-array-7mic I), the radial distances l i between M i and M 0 were set to the same length of 0.7 m. For the second kind of array optimization (Opt-array-7mic II), the radial distances l i , the azimuth angle α i , and the elevation angle β i were all used as the array geometric parameters to be optimized. The constraints and the initial parameters of the optimization model were set to be the same as in the case of the five microphone array optimization. Given that the optimizing search space was much larger than the array optimization with five microphones, the number of particles N for Opt-array-7mic I and Opt-array-7mic II were set to 400 and 650, respectively.
The optimal array structures were obtained after running the optimization model on the Matlab platform under scenario II. The geometric parameters of the optimal arrays, the octahedron array, and the random array are listed in Table 3. Also, the angles of the optimized array were all rounded 5 • . The array structures of Opt-array-7mic I and Opt-array-7mic II are shown in Figure 13. It can be seen from Table 3 and Figure 13 that the array structure between Opt-array-7mic I and Opt-array-7mic II were different. The difference of the radial distances l i of Opt-array-7mic II was much smaller than that of Opt-array II under scenario II.
In order to verify the performance of the optimal arrays, the scenario of randomly distributed sound sources in the cuboid space band was constructed, similar to Figure 11. Gaussian random noises η i,0 with five standard deviations were also added to the time delay estimation, namely, σ = 0.002, σ = 0.005, σ = 0.008, σ = 0.01, σ = 0.012. The statistics of the distances between the located sources and the corresponding real sources are drawn in Figure 14.  Figure 14 shows that the mean values and standard deviations of localization error for Opt-array-7mic I and Opt-array-7mic II were lower than for the octahedron array proposed by Hu et al. and the random array, which illustrates the effectiveness of the proposed array optimization method. Comparing Figures 12 and 14, it can be seen that the mean values and the standard deviations of localization error for Opt-array-7mic I and Opt-array-7mic II were lower than for Opt-array-I and Opt-array II under scenario II. Considering that the standard deviations of the optimal arrays with seven microphones were significantly lower than those of the optimal arrays with five microphones when the input noise component was large, optimal array structures with more microphones could significantly improve the robustness of the source localization based on TDOA. In addition, the mean values and the standard deviations of localization error for the octahedron array and the random array with seven microphones were also much lower than that for the tetrahedral array and the random array with five microphones, which demonstrates that increasing the number of microphones can greatly improve the positioning accuracy and robustness of the array based on TDOA.

Conclusions
This paper proposed a method of microphone array optimization for sound source localization based on TDOA under specific localization scenarios, which can be applied to the optimization of arbitrary array structure without prior information. For any number of microphones, a more optimal array structure can be given under any localization scenario. The proposed method is a numerical approach based on the particle swarm optimization algorithm. The mean squared error and the variance of the localization results combined with a weight value are used to construct the fitness function of the optimization model, which can consider both positioning accuracy and robustness. The geometric structure of the microphone array was established in parametric form, which is assigned as particle attributes and substituted into the optimization model to obtain the more optimal results. Two specific localization scenarios were constructed to optimize the array structures. For both specific scenarios, two kinds of array optimization strategies were utilized to obtain two optimal array structures. The optimized array structures were compared with the regular polyhedron structure array under different input noise amplitude.
For scenario I, the mean value and the standard deviation of the localization error for Opt-array I and Opt-array II were much lower than for the tetrahedral array and the random array, and the higher the input noise, the more significant the gap. Under four different input noise levels, the mean value and standard deviation of the localization error for the random array were the largest, and those of Opt-array II were the smallest. The results indicate that the array optimization produced a positive effect, and the optimization of the radial distance l i contributed to promoting the positioning performance of the microphone array under scenario I.
For scenario II, the mean value and the standard deviation of Opt-array I and Opt-array II were also much lower than those of the tetrahedral array and the random array. Under five different input noise levels, the mean value and standard deviation of the localization error for the random array were the largest, and those of Opt-array II were the smallest. The array optimization and the optimization of the radial distance l i all showed a positive effect on the positioning performance of the microphone array under scenario II. Moreover, the localization error reduction of Opt-array II under scenario II was more significant than that under scenario I. Considering that the difference of the radial distance of Opt-array II under scenario II was much larger than that under scenario I, the radial distance under scenario II was a more significant factor of the array structure optimization than that under scenario I. Under scenario II, the mean value and standard deviation of the optimal array were much higher than those of the optimal array under scenario I. The array with seven microphones was introduced into the optimization under scenario II, compared with the octahedron array and a random array. The results show that under five different input noise levels, the mean value and standard deviation of the localization error for Opt-array-7mic II were the smallest, and those for the random array were the largest. The mean value and standard deviation of the optimal array with seven microphones were lower than those of the optimal array with five microphones; especially, the standard deviation of the optimal array was significantly lower. This indicates that an optimal array structure with more microphones can significantly improve the robustness of source localization based on TDOA.
For both specific localization scenarios, the comparison results show that the localization accuracy and robustness of the optimized array structures were better than those of the regular polyhedron array structures proposed by Hu et al. and random array structures, which illustrates the effectiveness of the proposed array structure optimization method. The random arrays may achieve excellent positioning performance, but the likelihood is small. The optimization of the radial distance l i contributed to promoting the positioning performance of the microphone array besides the optimization of the azimuth angle α i and the elevation angle β i , particularly for scenario II. In the future, the efficiency of the optimization algorithm can be further studied, as well as the correlation between the positioning performance of the array and the array geometric parameters.