Sparse Method for Direction of Arrival Estimation Using Denoised Fourth-Order Cumulants Vector

Fourth-order cumulants (FOCs) vector-based direction of arrival (DOA) estimation methods of non-Gaussian sources may suffer from poor performance for limited snapshots or difficulty in setting parameters. In this paper, a novel FOCs vector-based sparse DOA estimation method is proposed. Firstly, by utilizing the concept of a fourth-order difference co-array (FODCA), an advanced FOCs vector denoising or dimension reduction procedure is presented for arbitrary array geometries. Then, a novel single measurement vector (SMV) model is established by the denoised FOCs vector, and efficiently solved by an off-grid sparse Bayesian inference (OGSBI) method. The estimation errors of FOCs are integrated in the SMV model, and are approximately estimated in a simple way. A necessary condition regarding the number of identifiable sources of our method is presented that, in order to uniquely identify all sources, the number of sources K must fulfill K≤(M4−2M3+7M2−6M)/8. The proposed method suits any geometry, does not need prior knowledge of the number of sources, is insensitive to associated parameters, and has maximum identifiability O(M4), where M is the number of sensors in the array. Numerical simulations illustrate the superior performance of the proposed method.


Introduction
Direction of arrival (DOA) estimation is a topic that has received a great deal of attention in the last several decades. It arises in many practical scenarios, such as radar, sonar, and communications [1][2][3][4]. For example, in radar and sonar systems, objects including airplanes, birds, missiles, submarines, and fish torpedoes can be tracked by estimating their DOAs. In communication systems, the DOAs of the sources can be used to improve the communication quality.
Higher-order cumulants (HOCs)-based DOA estimation methods have been developed for non-Gaussian sources [1,[5][6][7][8], mainly to overcome the limitations of second-order statistics-based DOA estimation methods such as MUltiple SIgnal Classification(MUSIC) [9] and Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) [10]. Their limitations include identifiable number of sources, colored Gaussian noise, and modeling errors. The famous 4-MUSIC method [5] uses the fourth-order cumulants (FOCs) to form a MUSIC-like method. Then, the 2q-MUSIC [6] and rectangular 2q-MUSIC [7] methods extend 4-MUSIC by utilizing 2qth-order cumulants. The rectangular 2q-MUSIC method achieves a trade-off between performance and maximal identifiable number of sources compared with the 2q-MUSIC method. To further increase the degrees of freedom (DOFs) of the FOCs-based methods, a kind of sparse linear array named multiple level nested array and the corresponding spatial smoothing MUSIC (SS-MUSIC) method were developed, which make the DOFs increase from O(M q ) to O(M 2q ) [8], where M is the number of sensors in the array. However, HOCs-based methods always suffer from performance degradation when snapshots are limited, mainly because accurate estimations of HOCs require a large number of snapshots.
Recently, a series of covariance vector-based DOA methods were proposed [11][12][13] which built a single measurement vector (SMV) model by covariance vector and were solved by sparse methods such as L1-norm-based methods [14] and sparse Bayesian learning (SBL) [15]. Consequently, these methods always perform better than other covariance-based DOA methods such as MUSIC when snapshots are limited, mainly because estimation errors of covariance are utilized to build their sparse models. Moreover, the DOFs of these methods increase to O(M 2 ), which is more than MUSIC. Analogously, the FOCs vector-based L1-norm method (4-L1) is applied in [16][17][18]. Compared with FOCs vector-based SS-MUSIC (4-SS-MUSIC) in [8], the 4-L1 method can utilize all the virtual sensors in the fourth-order difference co-array (FODCA) [8], which may give a better performance than 4-SS-MUSIC. Moreover, the 4-L1 method can be applied to any array, while 4-SS-MUSIC only suits for linear arrays. Nonetheless, the 4-L1 method is inconvenient to use in practice, since the parameter of allowable error bound is difficult to choose, and the solution of 4-L1 is sensitive to it. The allowable error bound usually needs to be chosen manually through trial-and-error for each scenario, as in [16][17][18]. Furthermore, it has been demonstrated both theoretically and empirically that the SBL technique induces less structural error (biased global minimum) and convergence error (failure in achieving the global minimum) than the L1 methods [11,[19][20][21]. On the other hand, an advanced denoising procedure of the FOCs vector may further promote the performance of FOCs vector-based DOA estimation methods. An analogous procedure can be found in covariance vector-based DOA methods in [22].
In this paper, we establish a novel denoised SMV model by FOCs vector, and solve it by an off-grid sparse Bayesian inference (OGSBI) method [21]. By utilizing the concept of the fourth-order difference co-array, the advanced denoising or dimension reduction procedure of FOCs vector is presented for any geometry. The estimation errors of FOCs are integrated in the proposed SMV model, and are approximately estimated in a simple way. A necessary condition of our method regarding the number of identifiable sources is presented such that in order to uniquely identify all sources, the number of sources K must fulfill K ≤ (M 4 − 2M 3 + 7M 2 − 6M)/8. The proposed method suits for any geometry, does not require prior knowledge of the number of sources, has maximum identifiability O(M 4 ), and is insensitive to associated parameters. The off-grid parameter estimation in our method promotes superior performance in overdetermined cases when the number of snapshots is large and signal-noise-ratio (SNR) is high, and also ensures good performance when choosing a relatively coarse grid. Numerical simulations illustrate the superior performance of the proposed method.

Data Model
One-dimensional DOA estimation is discussed in this paper. Consider an array with any geometry which consists of M identical, omnidirectional, unpolarized and isotropic sensors. The sensor locations n 1 , n 2 , ..., n M are measured by d = λ/2, and collected by the vector set S {n 1 , n 2 , ..., n M } ⊂ R 3 , where n m = [n m,x , n m,y , n m,z ] T is expressed in three dimensions for m = 1, 2, ..., M, (·) T is the transpose operator, λ is the wavelength, and R denotes the set of real numbers. The cardinality of S is denoted by |S|, and |S| = M. Each sensor receives the contribution of K zero-mean statistically independent stationary narrowband non-Gaussian sources. We denote by T the observed signals of the sensors, the source signals, and the noises of the observed signals at the tth snapshot, respectively, where t = 1, 2, ..., +∞. The observed signals are modeled as: where A = [a(θ 1 ), a(θ 2 ), ..., a(θ K )] ∈ C M×K denotes the array manifold matrix, a(θ k ) = [e j2πu T k n 1 , e j2πu T k n 2 , ..., e j2πu T k n M ] T is the steering vector for θ k , j is an imaginary number, u k = d/λ · [cos(θ k ), sin(θ k ), 0] T , θ k denotes the azimuth angle of the kth source, and C is the set of complex numbers. The noises are assumed to be statistically independent zero-mean white Gaussian and independent from each source. Build the FOCs vector c cum{·} denotes the cumulant operator, (·) * denotes complex conjugate, and the indexes i γ and γ are integers satisfying 1 ≤ i γ ≤ M and 1 ≤ γ ≤ 4, respectively. Due to model (1), the above assumptions of the sources and noises, and the properties of the cumulants [23,24], we can obtain where where , and ⊗ denotes the Kronecker product.

The Proposed Method
In this section, we present an FOCs vector-based method to perform DOA estimation. Firstly, we provide a dimension reduction method to build a novel data model by employing the concept of a fourth-order difference co-array [8]. This method can also be treated as a denoising procedure to reduce the estimation errors of FOCs. Next, according to the newly built data model, our sparse model is efficiently established and solved by the OGSBI method. Some information about the proposed method (e.g., identifiability, complexity, and comparison with the state-of-the-art methods) is also discussed.

Dimension Reduction or Denoising Procedure
In this subsection, the dimension of model (3) is reduced by utilizing a fourth-order difference co-array. Here we supplement several definitions and properties to facilitate dimension reduction. Definition 1. Fourth-order difference co-array (FODCA): Given an array with arbitrary geometry S, the fourth-order difference co-array D is defined as the set of distinct vector elements from the set where i 1 , i 2 , i 3 , i 4 = 1, 2, ..., M, as described in Section 2.
Let m 1 , m 2 , ..., m |D| denote the distinct elements in D, where |D| is the cardinality of D. The following definition gives the concept of a transformation matrix, which describes the relation between b(θ) and the steering vector on D (see Property 2 in this subsection).

Definition 2.
The transformation matrix J: Given array S, the associated FODCA D can be obtained by Definition 1. The transformation matrix J is an M 4 × |D| matrix, where the ξth column corresponds to the vector m ξ ∈ D, and the (i, ξ)th entry of J satisfies where the index ξ = 1, 2, ..., |D|, and recall that i = (i 1 − 1)M 3 + (i 2 − 1)M 2 + (i 3 − 1)M + i 4 in Section 2.

Property 1.
The transformation matrix J has full column rank.
where J + = (J H J) −1 J H is the Moore-Penrose inverse of J. Comparing the model (8) with (3), it is clear that the row dimension decreases from M 4 to |D|. Remark 1: An analogous procedure can be found in covariance vector-based DOA estimation methods [22]. Among HOCs-based DOA estimation methods, rectangular 2q-MUSIC provides two strategies to reduce the dimension of a rectangle FOCs matrix (i.e., selection strategy and averaging strategy), by eliminating redundancies of associated left and right virtual steering vectors [7]. From this perspective, the proposed dimension reduction procedure is the essential realization of the averaging strategy applying to FODCA, with an explicit formulation (8). It is clear that the averaging strategy is better than the selection strategy due to the averaging of the noise in the associated observation components [7], if the computation burden is not taken into consideration. Thus, the averaging strategy can also be treated as a denoising procedure. In addition, it can be seen that FODCA has the same geometry as a fourth-order virtual array with parameters (q, l) = (4, 2) [25], which is deduced from an eighth-order cumulants-based array processing problem. In TABLE IX from [25], the max number of identical virtual sensors in the fourth-order virtual array, with parameters (q, l) = (4, 2) and space diversity (without angular and polarization diversity), is illustrated, and it is equal to (7) after simplification due to the same geometry. The detailed derivation of (7) is provided in Appendix C, since it is not provided in [25].
Remark 2: It is clear that when 2 ≤ K < M, the limiting root mean square error (RMSE) of covariance vector-based methods is not necessarily zero [22,26], while the RMSE of the traditional MUSIC method converges to zero as signal-noise-ratio (SNR) approaches infinity. As the FODCA can be regarded as two levels of second-order difference co-array [8,18], applied to construct covariance vector-based methods, it can be deduced that the FOCs vector-based DOA estimation methods, including [8,[16][17][18], will be outperformed by traditional MUSIC in high-SNR regions when 2 ≤ K < M. From this point, we only suggest to use these methods (including the proposed method) in underdetermined cases (K ≥ M).

Sparse Model
In this subsection we develop the sparse model from (8). Letθ = [θ 1 , ...,θ I ] be a fixed sampling grid in the DOA range, where I |D|. Letθ be a uniform grid with a grid resolution r =θ 2 −θ 1 ∝ I −1 .
Supposeθ w k is the nearest grid point to θ k , where w k ∈ {1, ..., I}. Then, the steering vector a D (θ k ) can be approximated by its first-order Taylor series .., K and β w = 0 otherwise, and diag{·} denotes the operator to form a diagonal matrix by a vector or to form a vector by diagonal entries of a matrix. Neglecting approximation errors of (9), model (8) can be expressed as: where Consider the case of existing estimation errors of FOCs. Letĉ x = c x + e, whereĉ x is the estimate of c x , and e is the estimation errors vector. Then, model (12) becomes Assume e ∼ AsCN(0 M 4 , Q) [1], where AsCN denotes the asymptotic normal distribution, 0 M 4 denotes zero vector with M 4 elements, and Q ∈ C M 4 ×M 4 is a positive definite (Hermitian) matrix. The estimation of FOCs and the matrix Q are discussed in Section 3.3. Following this assumption, we have J + e ∼ AsCN(0 |D| ,Q), whereQ = J + Q(J + ) T . Furthermore, we can obtainQ − 1 2 J + e ∼ AsCN(0 |D| , I |D|×|D| ), where I |D|×|D| is |D| dimensional identity matrix. We denote bỹ Then, we can obtain the following SMV off-grid sparse model withẽ ∼ AsCN(0 |D| , I |D|×|D| ). Note that if setting β w = 0 for w = 1, ..., I, the model (17) is simplified to an on-grid DOA model.

Estimation of FOCs and the Matrix Q
To facilitate the implementation of the proposed method, in this subsection we present the estimation methods of FOCs and the matrix Q.
Let us introduce the following general notations: where L is the number of snapshots. Due to the assumptions of sources and noises, [c x ] i can be expressed as [25]: Utilizing the corresponding empirical estimators of terms in (18) Consider complex variables This solution theoretically supports the assumption that e =ĉ x − c x ∼ AsCN(0 M 4 , Q) in Section 3.2, and we aim for a robust estimation of Q in the following derivation.
Consider signals given by It is obvious that Rewrite (18) as: where y(t) is given by (20).

Lemma 1.
Given two joint distributed complex-valued random variables Y t and Z t with snapshots index t = 1, 2, ..., L, Y t , and Z t are independent from snapshot to snapshot, respectively. As L approaches infinity, it holds true that where var(·) is variance, and Y t Z t , Y t , and Z t are sample averages of Y t Z t , Y t , and Z t with L snapshots, respectively.
Proof. See Appendix D.
Due to Lemma 1, from (19) and (22), it is easy to find that when the snapshots number L is large, the termm 2,y (g 1 , g 2 * ) has much larger variance than the other two terms in (22). This implies that the estimation error of [ĉ x ] i is mainly caused by the termm 2,y (g 1 , g 2 * ), giving rise to the following approximate expression of e i : Since the signals are independent from snapshot to snapshot, and E{y(t)} = 0 M 2 , we can obtain where In (25), the term cum{y g 1 (t), y * g 2 (t), y * h 1 (t), y * h 1 (t)}/L contains the eighth-order statistics of x(t), which significantly aggravates the inaccuracy of estimation if the number of snapshots is not large enough. In addition, as the number of snapshots number approaches infinity, we get E{e i e * j } ≈ 0. Therefore, we neglect this term to obtain a robust estimation of Q, given bŷ Rewrite (26) in matrix form Note that the estimatorQ in (30) need to be revised in certain cases. It is clear that, for the noise-free case, T /L. When K < M, the ranks of both matricesR y andT y are K 2 . As a result, the estimator (27) may be singular, which may lead to a singular matrixQ. This is unallowable for model (17). Therefore, when K < M, and SNR is high, to avoid above problems, Q is estimated bŷ where 1 is a small positive number. Specially, if the observed signals are circular, we have m 2,x (l 1 , l 2 ) = 0 and m 4,x (l 1 , l 2 , l 3 , l 4 ) = 0 for ∀l 1 , l 2 , l 3 , l 4 = 1, ..., M. Then, we obtain y(t) = x(t) ⊗ x(t), and c i can be estimated by Furthermore, it is obvious that m 2,y (g 1 , h 2 ) and m 2,y (g 2 * , h 1 * ) become zero in (25). Then, the estimator of Q in (27) Accordingly, when K < M and SNR is high, for circular sources, Q is estimated bŷ where 2 is a small positive number. Remark 3: Approximate expressions for the covariances of FOCs which are more precise than (25) are derived in [5]. However, it is not easy to use in our method in practice, because several higher-order (greater than four) moments need to be estimated, which may cause the estimation of Q to be not as robust as FOCs. For example, if the number of snapshots is large enough to generate precise estimates of FOCs, but not large enough to generate ones of higher-order (greater than four) moments, the DOA estimation results may suffer from a mass of outliers in independent experiments. In comparison, (26) contains at most fourth-order statistics, which means that it is more robust and has lower computational cost. We will show that our estimators work well for the proposed method in numerical simulations.
The posterior distribution of z can be obtained with µ = ΣΦ Hc x (38) As a single snapshot case of the OGSBI method, we can obtain the following updates of α from the deduction in [21]: where E{|z w | 2 } = |µ w | 2 + Σ ww , µ w is the wth entry of µ, and Σ ww is (w, w)th entry of Σ. For β, by maximizing E{logp(c x |z, β)p(β)}, we can obtain the updates where takes the real part of a complex variable and is the Hadamard product. An easier way to estimate β is provided in [21]. We use β, P, and q hereafter to denote their truncated versions for simplicity. By (41) and ∂ ∂β {β T Pβ − 2q T β} = 2(Pβ − q), we have β new =β if P is invertible andβ = P −1 q ∈ [−r/2, r/2] K . Otherwise, we update β elementwise. That is, at each step we update one β w by fixing up the other entries of β. For w = 1, ..., K, we first letβ = where β −k denotes β without the kth entry. Then, by constraining β k ∈ [−r/2, r/2], we have The OGSBI is terminated if α κ+1 − α κ 2 α κ 2 < τ or the maximum number of iterations is reached, where κ is the iteration and τ is the predefined tolerance parameter. Then, α can be treated as the pseudo-spectrum By finding the grid indices of highest K peaks of P, denoted byp k , k = 1, ..., K, we can obtain K estimated DOAs byθ k =θ w k +β w k .
Note that the parameter ρ in the method should be given in advance. According to associated information in [21] and our testing results, the OGSBI method is insensitive to ρ if ρ is not too large. Furthermore, we have observed that by setting ρ = 0.01, the sparse Bayesian learning methods in [13,21,28] always achieve a good performance in a wide range of scenarios. By contrast, the 4-L1 method [16][17][18] is sensitive to the parameter of allowable error bound, which is not easy to estimate (it was chosen to give the best results through trial-and-error for each scenario in the corresponding simulations in [16][17][18]). Therefore, compared with the 4-L1 method, the OGSBI method has the advantage of convenience in setting parameters.
Finally, we summarize our method in Algorithm 1.

Algorithm 1
The proposed sparse method using denoised FOCs vector 1) Input: x, S, and d.

Identifiability of the Number of Sources
According to the theorems about parameter identifiability in [29][30][31] and the model (8), we have the following proposition. Proposition 1. Any K sources can be uniquely identified from model (17) if and only if where spark{B θ } is defined as the smallest number of elements in B θ that are linearly dependent [32], B θ = {b(θ) : θ ∈ Θ} ⊂ C |D| , and Θ is [−π/2, π/2) for linear array and [−π, π) for planar array.
It is generally difficult to compute spark{B θ }, except for when D is a uniform linear array (ULA). In this circumstance, D is hole-free, spark{B θ } = |D| + 1, and then K < (|D| + 1)/2 sources can be identified. However, hole-free FODCA with O(M 4 ) DOFs has not been studied so far to the best of our knowledge. Recently, [16][17][18] provided several methods to construct FODCA with larger ULA segments, with which K < (|D ULA | + 1)/2 sources can be identified, where D ULA is defined as the largest ULA segment in D.
For general arrays, because B θ ⊂ C |D| , it is obvious that According to Property 3, (46), and Proposition 1, we can obtain a necessary condition regarding the identifiability of the number of sources. (17), a necessary condition to uniquely identify all sources is that the number of sources K fulfills

Theorem 1. Consider model
Proof. See Appendix E.
In addition, there are O(M 4 ) distinct elements in D ULA for certain sparse linear arrays (SLAs), such as four-level nested arrays [8] and arrays constructed from a expanding and shift scheme which consists of coprime arrays or nested arrays [18]. Therefore, our proposed method can identify O(M 4 ) sources at most, if these arrays (not limited to) are used.

Complexity
Consider the worst case of complexity. For non-circular sources, the estimation of c x using (19) needs O(LM 4 ) multiplications. The estimation of Q using (27) or (28) needs O(LM 8 ) multiplications. Note that all the symmetries of c x and Q are not taken into account here, while it does not affect the order of complexity. Each iteration of (38) and (39) needs O(I 2 |D|) multiplications, which are much more than each iteration of (40) and (42). Let T denote the iteration number, and consider the worst case that O(|D|) = O(M 4 ). Combing all of the above processes, we can obtain that the complexity of the proposed method is O(LM 8 + I 2 M 4 T).

Comparison with State-of-the-Art Methods
In this subsection, we compare the proposed method with state-of-the-art FOCs-based ones, in terms of required geometry, identifiability, prior knowledge of the number of sources, ability to handle correlated sources, numerical complexity, and sensitivity to parameters. Associated methods are listed as follows: • 4-MUSIC [5,6]: non-redundant version with averaging strategy [7], denoted by A-4-MUSIC; • 4-SS-MUSIC [8]: with averaging strategy, denoted by A-4-SS-MUSIC; • 4-L1 [16][17][18].
Associated comparison results are illustrated in Table 1.
In Table 1 Next, we make a comparison of complexity between the proposed method and 4-L1. Firstly, if the number of snapshots L is large enough that the terms O(I 3 T 2 ) and O(I 2 M 4 T 1 ) can be negligible, the proposed method has higher complexity than 4-L1 due to the term O(LM 8 ) which is caused by the estimation of Q. Secondly, if L is small, we limit the comparison between O(I 3 T 2 ) and O( , implying that the proposed method has lower complexity than 4-L1 in one iteration. However, it is a fact that 4-L1 always has less iterations than the proposed method (i.e., T 1 > T 2 ). Thus, it is not easy to draw a conclusion. In fact, to achieve the super resolution, 4-L1 needs a dense grid, while the proposed method only needs a coarser grid due to the off-grid parameter estimation, which gives our method an advantage for complexity comparison. It should be noted that off-grid parameter estimation technique has been successfully applied to L1-norm minimization methods in [33,34]. However, to the best of our knowledge, the off-grid version of the 4-L1 method has not been published, and it is beyond the scope of this paper.

Numerical Simulations
Numerical simulations were carried out to demonstrate the superior performance of the proposed method. In the following simulations, the sampled analytic signals of statistically independent quaternary phase-shift keying (QPSK) signals were used as circular sources. The RMSE was chosen as the performance criterion, and is computed by where F is the number of independent experiments, andθ k, f is the estimate of θ k in the f th independent experiment. The RMSE performances of the four methods in Table 1 with respect to number of snapshots, grid, adjacent sources, SNR, modelling errors, geometry, and circularity of sources were further investigated. In the proposed method, we set that ρ = 0.01 and τ = 10 −4 for all scenarios, and set 1 = 10 −5 or 2 = 10 −5 for the case that K < M and SNR ≥ 8 dB. 4-L1 was solved by the SeDuMi toolbox [35], and the associated allowable error bound was chosen to give the best results through trial-and-error for each scenario as done in [16][17][18]. Each simulation executed F = 200 independent experiments for all methods.

Identifiability
Consider an SLA with three sensors located at S = {1, 2, 6} × d. It can be deduced that the associated FODCA has 19 distinct elements. That is, |D| = 19, which is equal to sup{H} when M = 3. According to Theorem 1, any greater than nine sources cannot be uniquely identified for the proposed method. Let us consider nine sources with The number of snapshots and SNR were set as L = 10 4 and SNR = 20 dB, respectively. Grid resolution was set to r = 0.5 • . Figure 1a,b shows the normalized spectra of the proposed method and 4-L1, respectively. As can be seen, all DOAs of the sources were identified successfully. Note that this was unachievable for A-4-MUSIC and A-4-SS-MUSIC. The virtual array of A-4-MUSIC contained seven distinct virtual sensors, which could identify six sources at most. The largest ULA segment of the FOCDA contained 13 distinct elements, giving rise to a maximum identifiable number of six for A-4-SS-MUSIC.

Impact of the Grid Resolution
Consider an SLA {1, 2, 4, 11, 25} × d, which is constructed from an expanding and shift scheme that consists of two nested arrays {1, 2, 4} × d and {1, 2, 4} × d [18]. It can be obtained that the number of distinct elements of the corresponding FODCA and associated largest ULA segment is |D| = 85 and |D| ULA = 55, respectively. Consider two independent sources with DOAs {0 • , 20 • } + ζ 1 , where ζ 1 is a random variable chosen uniformly within [−1 • , 1 • ]. SNR was set at SNR = 3 dB. Figure 2a,b shows the RMSE performance of the four methods with respect to number of snapshots, with grid resolution r = 0.5 • and r = 0.1 • , respectively. In Figure 2a, when snapshots number was large, the RMSEs of A-4-MUSIC, A-4-SS-MUSIC, and 4-L1 did not reduce as snapshots increased, mainly because of the mismatch errors between grid and true DOAs. Due to the off-grid parameters estimation, the proposed method outperformed the other three methods for most of simulated snapshot numbers. In Figure 2b, the grid resolution is r = 0.1 • , and it is clear that the RMSEs of A-4-MUSIC, A-4-SS-MUSIC, and 4-L1 increased to the same level of the proposed method when snapshots were greater than 50 and less than 2000. Nonetheless, the proposed method still outperformed the other three methods when snapshots were more than or equal to 2000. It can also be observed that the proposed method had smaller RMSE than A-4-MUSIC and 4-L1 when snapshots were less than 100. The RMSEs of A-4-SS-MUSIC were slightly larger than the other three methods when snapshots were greater than 50 and less than 5000, which may be caused by the fact that 4-SS-MUSIC does not utilize all the virtual sensors in associated FODCA. In addition, the dashed line in Figure 2b gives the RMSEs of the proposed method for r = 0.5 • , and it is clear that the grid resolution seldom affected the RMSEs of the proposed method when snapshots were less than 2000, which implies that when snapshots are not too large, the proposed method can reduce the computational complexity by selecting a relatively coarser grid without heavy loss of RMSE performance.

Identifiability for Adjacent Sources
Next, we investigated the identifiability of adjacent sources using the proposed method. The array and parameters were set as in Section 4.2, except the grid resolution was set to r = 0.1 • , and the DOAs of sources were set to {ζ 2 , ζ 2 + θ}, where ζ 2 is a random variable chosen uniformly within [−1 • , 1 • ], and θ = 0.5 • , 1 • , ..., 5 • are the different DOA intervals. Figure 3a,b shows the RMSE performance of four methods with different DOA intervals, when the number of snapshots is 300 and SNR is 3 dB, and when the number of snapshots is 1000 and SNR is 10 dB, respectively. It is clear that the proposed method had smallest RMSEs for most simulated DOA intervals in both simulations, demonstrating the superior identifiability of adjacent sources with the proposed method.

Impact of SNR
The array and parameters were set as in Section 4.2, except the grid resolution was set to r = 0.1 • . Figure 4 plots the RMSEs of four methods as a function of SNR, where the number of snapshots was set to L = 500. It is clear that the four methods had almost equal RMSEs when SNR was smaller than 4 dB. In the high SNR region, a relatively better performance of the proposed method appeared, benefiting from the off-grid parameter estimation as discussed in the large snapshot number region in Section 4.2.

Impact of Modelling Errors
We also studied the effect of modelling errors on the proposed method. We used the same array geometry and parameters, and the same way of generating K = 2 DOAs as in Section 4.4. We perturbed the steering vector by adding a modelling error vector e a(θ k ) to get the perturbed steering vector asã(θ k ) = a(θ k ) + e a(θ k ) , where k = 1, ..., K, and e a(θ k ) are assumed to be zero mean statistically independent Gaussian distributed with E{e a(θ k ) e H a(θ k ) = σ 2 a(θ k ) I, where σ 2 a(θ k ) = 0.0025. Figure 5 shows the RMSE performance of four methods with such modelling errors. In order to facilitate comparison, the RMSEs of the proposed methods with no modelling errors are also plotted in a dashed line. It can be observed that the RMSE performance of the four methods did not degrade heavily. However, when snapshots were more than 1000, the superior performance of the proposed method disappeared, suggesting that modelling errors did have a large impact on the off-grid parameters estimation of the proposed method.

Underdetermined Case
We investigated the performance of the proposed method in underdetermined cases. The same array {1, 2, 4, 11, 25} × d was used, and the grid resolution r = 0.1 • , while six independent sources with DOAs {−45 • , −30 • , −15 • , 0 • , 25 • , 40 • } were set. Figure 6 shows the RMSE performance of the four methods. In Figure 6a, the number of snapshots varying from 100 to 10,000 and the SNR was fixed at 20 dB. It is clear that the proposed method had the smallest RMSEs for most of the simulated snapshot numbers. In Figure 6b, SNR varied from −4 dB to 28 dB and the snapshots number was fixed at 1000. It can be observed that the proposed method had smaller RMSEs when SNR > 4 dB, but had slightly larger values when SNR ≤ 4 dB, when compared with the other three methods. The RMSE performance of A-4-SS-MUSIC was obviously outperformed by the other three methods in the large snapshot number and high SNR regions, because 4-SS-MUSIC does not utilize all the virtual sensors in the associated FODCA. Moreover, the RMSEs of each method seemed to converge to a positive constant as SNR increased. A similar phenomenon can be observed and explained in covariance vector-based methods in the underdetermined case [22,26].

Non-Linear Array Case
We now give an example to show the effectiveness of the proposed method in the non-linear array case. We randomly generated a planar array as shown in Figure 7a, where the minimum distance between sensors was half of the wavelength. The corresponding FODCA geometry is also illustrated, which had |D| = 131 distinct elements. Note that the A-4-SS-MUSIC cannot be applied for the non-linear array case. We set three sources with DOAs {−60 • , 0 • , 40 • } + ζ 3 , where ζ 3 is a random variable chosen uniformly within [−1 • , 1 • ]. SNR was set at SNR = 6 dB, and the RMSEs of A-4-MUSIC, 4-L1, and the proposed method were plotted as a function of the number of snapshots (Figure 7b). It can also be seen that the proposed method had better RMSE performance when snapshots were greater than 200.

Effectiveness for Non-Circular Sources
Finally, we consider the case of non-circular sources, using the same array {1, 2, 4, 11, 25} × d, and six independent sources with DOAs {−45 • , −30 • , −15 • , 0 • , 25 • , 40 • } as in Section 4.6. In contrast, we replaced the imaginary parts of sources by their real parts. That is, s NC (t) = R{s(t)} + jR{s(t)}, where s NC (t) is the non-circular sources to be simulated. We set SNR = 20 dB and r = 0.1 • . We estimated the FOCs by (19) for all the methods, and estimatedQ by (27) for the proposed method. Figure 8 shows the RMSEs of the four methods versus the number of snapshots. Similar to Figure 6a, it is clear that the proposed method had the smallest RMSEs for all simulated snapshot numbers. Beyond doubt, all the methods were effective for non-circular sources with the estimation of FOCs by (19).

Conclusion
In this paper, a novel SMV sparse model for non-Gaussian sources using denoised FOCs vector is established to perform DOA estimation, and solved efficiently by the OGSBI method. An advanced denoising and dimension reduction procedure of FOCs vector is provided. The estimation errors of FOCs are integrated in the proposed SMV model, and are approximately estimated in a simple way. The proposed method suits any geometry, does not need prior knowledge of the number of sources, has maximum identifiability O(M 4 ), and is insensitive to associated parameters. The off-grid parameter estimation in our method promotes superior performance when the number of snapshots is large and SNR is high, and also ensures good performance when choosing a relatively coarse grid. Numerical simulations illustrate that the proposed method has superior identifiability or RMSE performance in different scenarios, namely grid resolution, adjacent sources, SNR, modelling errors, geometry, and non-circular sources, when compared with other state-of-the-art methods.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.