Low-Complexity Joint Channel Estimation for Multi-User mmWave Massive MIMO Systems

: For multi-user millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems, the precise acquisition of channel state information (CSI) is a huge challenge. With the increase of the number of antennas at the base station (BS), the traditional channel estimation techniques encounter the problems of pilot training overhead and computational complexity increasing dramatically. In this paper, we develop a step-length optimization-based joint iterative scheme for multi-user mmWave massive MIMO systems to improve channel estimation performance. The proposed estimation algorithm provides the BS with full knowledge of all channel parameters involved in up- and down-links. Compared with existing algorithms, the proposed algorithm has higher channel estimation accuracy with low complexity. Moreover, the proposed scheme performs well even with a small number of training sequences and a large number of users. Simulation results are shown to demonstrate the performance of the proposed channel estimation algorithm.


Introduction
Millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) has been generally recognized as a strong potential key technique for enhancing quality in the future wireless communication field [1]. The demand for higher speed data transmission, shorter delays, better user experience, and denser networks is increasing with the rapid development of wireless services such as virtual reality, multimedia, and the Internet of Things [2]. The mmWave band has huge unexploited spectrum resources, which can overcome the spectrum congestion of standard wireless frequency bands and achieve an orders of magnitude increase in spectral efficiency to meet these requirements [3,4]. The work in [5] expounded that the mmWave band can greatly speed up the data transmission rate. Additionally, massive MIMO refers to adding a number of antennas at the base station (BS), which can overcome the effects of fading, average out the dominant path-loss, and make up for the short wavelength of mmWave [6][7][8], and the isolation between radiating elements is a very important issue [9,10]. The combination of mmWave and massive MIMO can be used to meet increasing demand in data traffic [11,12].
Many studies have been devoted to the research of mmWave massive MIMO multi-user systems in recent years. Compared with conventional point-to-point MIMO, multi-user MIMO with simplified resource allocation has greater advantages [13,14]. The work [15] developed blind multi-user detection based on the model of [16], which can approach the lower bound of performance under certain conditions. The work in [17] summarized the benefits, challenges, and potential solutions of mmWave massive MIMO in cellular networks.
In order to achieve the theoretical performance gain and potential benefits of mmWave massive MIMO systems, accurate channel state information (CSI) is essential, which is also regarded as one of the main difficulties in mmWave communication [18,19]. However, due to the difference in channel characteristics compared to traditional sub-6 GHz, the smaller number of RF chains, and overwhelming overhead in downlink training and uplink channel feedback, accurate acquisition of CSI is challenging [20][21][22]. Various novel channel estimation approaches have been recently proposed for mmWave massive MIMO [23][24][25][26][27]. In particular, under the assumption of the BS antennas being highly correlated, the work in [23] proposed an antenna grouping method to reduce feedback overhead. By utilizing the correlation among users, the work in [24] provided a joint CSI acquisition proposal based on a low-rank matrix. To reduce errors caused by discretization and solve the line spectral estimation problem, the work in [25] designed an iterative reweighted approach for joint dictionary parameter training and sparse signal recovery, but it had the limitation that only two parameters could be estimated. The work in [26] extended the suggestion in [25] to two dimensions and represented a super-resolution channel estimation scheme based on iterative reweighting in a point-to-point scenario. It is regrettable that such a solution in [26] had the problem of slow estimation. Considering the problem of channel estimation for mmWave MIMO systems under a transmitter impairment model, a new algorithm based on Bayesian compressive sensing (BCS) was introduced in [27]. However, the assumption of BCS was relatively harsh, and the accuracy of estimation was limited.
To obtain CSI more accurately and faster in multi-user mmWave massive MIMO, we develop a step-length optimization-based joint iterative channel estimation scheme in this paper. Our main contributions are outlined as follows: 1. For multi-user mmWave massive MIMO systems, we propose a novel joint estimation scheme, which can accurately estimate the azimuth angles of departure or arrival (AoDs/AoAs) and joint gain at the BS. Specifically, the proposed algorithm does not need to estimate the channel on the user side, and all channel parameters can be estimated at the BS, which is different from the traditional channel estimation scheme.
2. We combine the advantages of the Newton method and gradient descent method and derive an iterative step-length optimization approach to reduce computational complexity. It is confirmed that the proposed scheme has low complexity without losing the accuracy of channel estimation.
3. Compared with the traditional singular-value decomposition (SVD) algorithm and BCS algorithm under different situations, the simulation results show that the proposed algorithm achieves better channel estimation performance with fewer training data blocks.
The rest of this paper is arranged as follows. Section 2 introduces the considered multi-user mmWave massive MIMO model. Section 3 formulates the proposed joint iterative channel estimation algorithm, which accounts for the preprocessing, the choice of descent direction in the iteration, the optimization of the step-length, and the method for separating the downlink channel. Some simulation results are provided to illustrate the performance of estimation in Section 4, and Section 5 concludes this paper.
Notation: In this paper, vectors and matrices are denoted by boldface lowercase and uppercase symbols, respectively; A T , A H , A −1 , and A † correspond to the transpose, conjugate transpose, inverse, and Moore-Penrose pseudo-inverse of the matrix A; vec (A) is the vector stacked by the columns of the matrix A; ⊗ is the Kronecker product; diag(x) denotes the diagonal matrix with the vector x on its diagonal; rank(A) is the rank of A; A F , h 0 , and h 2 represent the Frobenius norm of A, the 0 -norm, and Euclidean norm of the vector h, respectively; I N is the N × N identity matrix; the sets of real and complex numbers are represented by R and C, respectively.

System Model
We considered a common mmWave massive MIMO system with hybrid precoding as illustrated in Figure 1, where the number of antennas at the BS is N BS ; but, only N RF RF chains with N BS > N RF to support K user equipment (UE), and each UE has a single receive antenna [18,28]  Let m T = 1, · · · , M T and m R = 1, · · · , M R be the time slot of the transmitter and receiver, respectively. The BS pilots S m T ,m R ∈ C N s ×L represent the signal transmitted in the (m T , m R ) th time slot, where L is the number of different pilot sequences, and its design meets the orthogonality condition S m T ,m R · S H m T ,m R = I N S ×N S [29,30]. Then, the signal y k ∈ C 1×L received by the k th user can be given by: where F T,m T = [F RF ] T,m T [F BB ] T,m T ∈ C N BS ×N S is the transmitted hybrid precoding matrix in the m T th time slot, F BB ∈ C N RF ×N S is the baseband beamformer followed by the RF beamformer F RF ∈ C N BS ×N RF , n MS m T ,m R ∈ C 1×L is the received independent and identically distributed additive white Gaussian noise elements having zero mean and the variance σ 2 k , and h k D ∈ C 1×N BS is the downlink channel vector, and by defining θ k d sin δ T,k /λ as the normalized spatial angle [31], it can be expressed as: where α k , d, and λ denote the propagation gain of the k th downlink path and the antenna spacing at the BS and carrier wavelength, respectively. d = 1 2 λ is widely adopted here [32]. δ T,k ∈ [0, 2π] is the AoD of the k th path. a (θ k ) ∈ C N BS ×1 is the steering vector at the transmitter of the k th path. In this work, we considered the typical uniform linear arrays (ULAs) [33]; a (θ k ) is given by: ( Denote φ k d sin δ R,k /λ where δ R,k is the AoA of the k th path. φ k is the normalized spatial angle the same as θ k . The uplink channel vector h k U ∈ C N BS ×1 can be expressed as: where β k is the propagation gain of the k th uplink path and b (φ k ) ∈ C N BS ×1 is the steering vector at the receiver of the k th path. For typical ULAs, b (φ k ) is given by: To reduce the overhead of downlink channel training and uplink channel feedback, we propose a joint estimation scheme, that is the user does not process the received signal and directly feeds it back to the BS [24]. For all K users, the comprehensive signal X BS m T ,m R ∈ C N S ×L received by the BS is: and N BS m T ,m R ∈ C N S ×L is the noise received by the BS after combining. By defining r k = α k β k and synthesizing all K users, we can further obtain: We further process the signal X BS m T ,m R in (7) to make it multiply right by S H m T ,m R ; we have: Next, we start to collect and arrange the signal X m T ,m R in (10). By collecting the pilots in the M T time slots of the transmitter, we can get: where By dealing with the signal X m R in the M R time slots of the receiver, we have after arranging: where W R ∈ C N S M R ×N BS and N ∈ C N S M R ×N S M T are expressed as: Now, the signal received by the BS in (10) is written in a more compact form as: where X ∈ C N S M R ×N S M T ; the combined channel matrix H ∈ C N BS ×N BS can be written as:

Joint Iterative Channel Estimation Scheme
In this section, we firstly transform the above channel estimation problem into a new algorithm form. Then, we adopt the SVD preconditioning proposal to reduce the computational complexity. To improve algorithm performance, we give an angle estimation optimization scheme by optimizing the step-length and using a combination of the gradient descent algorithm and Newton method. Finally, we propose a scheme for successfully separating the downlink channel from the combined channel matrix.

Algorithm Formulation
The estimation of the combined channel matrix H in (15) is equivalent to the estimation of combined path gain r and the two normalized spatial angles θ θ θ and φ φ φ for all paths. Due to the sparseness of the combined channel matrix H in the angle domain [34], such a problem can be easily expressed as: min where r 0 is the number of non-zero components of r, which also represents the number of estimated paths K, and γ is the error tolerance parameter related to noise. Considering the low computational efficiency of finding the optimal solution in (16), we propose to replace the 0 -norm with the use of the logarithmic sum function [25,35]: where r k is the k th element of the gain r and ξ > 0 is a positive parameter to ensure that the definition of the log-sum function is valid. By adding a data fitting parameter µ > 0, the difficulty in (17) can be formulated as an unconstrained optimization problem, which yields the following optimization: Due to the need to ensure the monotonically decreasing characteristic, we introduce a suitable iterative function instead of the log-sum function [26,36]: where t is the number of iterations and G (t) is a diagonal matrix, which is defined as: where is the path gain estimate of the k th user at the t th iteration. The minimization of the log-sum function L (r, θ θ θ, φ φ φ) in (18) is equivalent to the minimization of the surrogate function (19), which is proven as follows. To get better estimatesr (t+1) ,θθθ (t+1) ,φ φ φ (t+1) in the t th iteration and ensure the convergence of the function, the following inequality can be shown: Then, we can get: Since the maximum value is reached atr (t+1) =r (t) , we have: Combining (21) and (22), we can finally get: which explains the correctness of the above assumption. By optimizing the path gain r in (19) through the derivative method [26], we can obtain the optimal point of r and the corresponding optimal value of Z (t) , that is, where x i is the column vector of the signal X received by the BS, and f T is the column vector of the hybrid precoding matrix F T . After that, we only need to consider the optimization of the normalized spatial angles θ θ θ and φ φ φ in (26), which will be discussed in the next subsection.

Initial Angle Preprocessing
To optimize the spatial angle, we propose a preconditioning algorithm based on SVD [37,38] to select the initial value of the spatial angle effectively. Compared with using all N MS N BS angle domain grids as the initial candidate value, the proposed algorithm can rapidly find the angle domain grids closest to the true orientation angle, which can significantly reduce the computational complexity of the subsequent algorithm. Specifically, we apply SVD to the matrix X in (14) and get X = UΣV H , where U and V are unitary matrices that satisfy U H U = I N S M R ×N S M R and V H V = I N S M T ×N S M T , respectively, Σ = diag κ 1 , κ 2 , · · · , κ min(N S M R ,N S M T ) ∈ R N S M R ×N S M T is a diagonal matrix whose value on the diagonal is the singular value of X, and κ 1 κ 2 · · · κ min(N S M R ,N S M T ) 0. The signal is expressed in a more compact form as: Suppose l = 1, 2, · · · , K the larger the K singular value and their homologous singular vectors are roughly determined by K paths, we can infer: where u l and v l are the l th column of U and V, respectively; the singular value can be written as: Then, the spatial angles are normalized by (θ l , φ l ) ∈ (i−1)

N BS
, i = 1, 2, · · · , N BS and assumed to lie in the quantized points; we have: where E R ∈ C N BS ×N BS and E T ∈ C N BS ×N BS are the discrete Fourier transform matrices; we can further obtain the coarse estimates of angle, which can be formulated as: where: At this point, we get the initial candidates for the spatial angle. Furthermore, due to the uncertainty of the number of paths of the estimated channel at the beginning, we set L init K, l = 1, 2, · · · , L init to ensure the accuracy of our estimated path, and some incorrect paths will be cut off in the proposed algorithm. The iterative search scheme of angles is explained in the next subsection.

Step-Length Optimization
In the previous section, it can be seen that the objective function Z (t) (r, θ θ θ, φ φ φ) in (19) is the weighted sum of two parts, where µ is the regularization parameter [39] that affects the tradeoff to some extent. In the proposed algorithm, µ is simplified as: where e is a constant scale factor, and the squared residual at the t th iteration ω (t) is expressed as: The main task is to search for new estimates θ θ θ (t+1) and φ φ φ (t+1) from the neighborhood of θ θ θ (t) and φ φ φ (t) at the t th iteration, so that the objective function Z (t) becomes smaller and eventually stabilizes.
Considering the characteristics of the fast convergence speed of the Newton descent method and the high accuracy of the steepest descent method, we try to suggest combining the two for better performance; this searching can be accomplished by: where ρ (t) is the step-length at the t th iteration and d (t) is the corresponding descent direction.
To balance the high computational complexity of the Newton iterative algorithm, we consider using the Newton direction as the descending direction to achieve fast convergence in the first iteration, After that, we choose the gradient direction to achieve high accuracy and set it can be seen that the search direction involves the gradient calculation of the iterative function Z pe . Taking the normalized angle φ k as an example, the calculation method is explained as follows.
then Z pe can be expressed as: Take the partial derivative with respect to φ k , we can obtain: and ∂D −1 where: Since there is only a certain transposition relationship between the gradient and the partial derivative, we have the gradient of Z pe , and we can further obtain the second derivative: where: Next, we think about whether we can optimize the iteration step-length in (35) to improve the speed of iteration. Considering that if the optimal step-length of θ θ θ and φ φ φ is directly solved, the algorithm complexity is very high, and the channel is only related to the path gain r and the spatial angles θ θ θ and φ φ φ, so when the gain has been optimized, we choose to use the optimal step-length of the channel to approximate the optimal step-length of the angles. The optimization method is as follows.
Denote x = vec (X), Ω = F T T ⊗ W R , and given the spatial angles θ θ θ (t) and φ φ φ (t) at the t th iteration, we have: Then, the cost function transformed from (34) is: Therefore, such an optimization can be reformulated as: The key to this problem is how to get the suitable (t) , and take the derivative with respect to λ ρ (t) ; we can get: where: Similarly, we have: (52) can be simplified as: Now, we can obtain: Considering the complexity of the algorithm, the descent direction here is the gradient direction. Set ∂λ(ρ (t) ) ∂ρ (t) = 0; we can finally get: From this, we get the approximate optimal step-length for the angle update. During the iterative searching, the spatial angle and path gain estimates can converge quickly, until the latest estimates are almost the same as the previous ones. The proposed channel estimation scheme is shown in Algorithm 1. We can achieve the purpose of accurate and fast channel estimation, which will be verified in subsequent simulations.
Since the Newton direction is used only once, the computational complexity of each iteration mainly lies in the calculation of the gradient direction and the optimization of the step-length. The complexity to compute the gradient direction is O(N 2 s M R M T (N BS + K) K 2 ), and the complexity of the step calculation is O(K 2 N BS ). As a result, we can conclude that the proposed algorithm has the complexity O(N 2 s M R M T (N BS + K) K 2 + K 2 N BS ). In the above algorithm, we successfully estimated the space angle θ θ θ, φ φ φ and the combined gain r, where r k = α k β k . Next, the solution to the problem of how to separate the downlink gain α k from the combined gain is revealed.
Specifically, we adopt the unit symbol pilot s k = 1 sent by the UEs to the BS via the uplink channel, and the signal Y ∈ C N S M R ×1 received by the BS is written as: where β β β = [β 1 , β 2 , · · · , β k ] T , we obtain uplink gain β k , and the downlink gain α k can be separated by utilizing α k = r k β k . Since W R can be easily designed as a matrix whose columns are full rank, the above conditions can be achieved.

Algorithm 1
The optimized iterative algorithm. Input: Received signals with noise X, orthogonal pilot signals S m T ,m R , coding matrix W R and F T , the number of detection paths L init , threshold Γ and ε. Output: Estimated angles and gains of all paths.

Simulation Results
In this section, we demonstrate the simulation comparisons under different parameter scenarios, which can prove the effectiveness of the proposed step-length optimization-based joint iterative scheme. In particular, simulation results are provided to verify that the proposed algorithm can outperform the previous methods in certain aspects, as in [26,27]. Some default parameters are set to d = 0.5, N S = 4, and L = 24. Moreover, the number of antennas, UEs, and time slots is set to N BS = 64, K = 3, and M R = M T = 12, respectively, which may change in the form of variables in subsequent simulations, and we set the variable M P to represent the time slot for better presentation. As in [2,40], the signal-to-noise ratio (SNR) is defined as σ 2 s /σ 2 n , where σ 2 s is the signal power. I denotes the number of Monte Carlo runs, and the normalized mean squared error (NMSE) is defined as: (58) Figure 2 shows the convergence performance of the optimized scheme compared with the conventional super-resolution (SR) algorithm, which is reflected by the relationship between NMSE and the number of iterations for three SNR values. It is obvious that the proposed scheme can achieve convergence with a small number of iterations for each SNR value. For instance, the proposed algorithm completes convergence in about 20 iterations at an SNR of 20 dB, which also fully proves the acceleration brought by optimization.  Figure 3 compares the NMSE performance against the SNR. We assume the system design parameters N BS = 32 and M R = M T = M P , as shown in Figure 3, and the accuracy of the proposed solution is similar to that of the SR algorithm in different time slots, which proves that the proposed optimization does not sacrifice performance indicators. We can see that there is a certain proportional relationship between the time slot and the training pilot, so the NMSE performance becomes better as the number of time slots increases. In addition, the NMSE of the proposed and SR schemes is also shown in Table 1 for further comparison. It is clearly observed that accurate channel estimation can be achieved.   Figure 4 depicts the iteration time of the proposed algorithm on the basis of the hypothetical scenario in Figure 3, which can more intuitively reflect that the speed of the proposed scheme is significantly improved and the computational complexity is reduced. For example, when the SNR is 20 and the time slot is 10, the time taken by the proposed algorithm is 2.0534 s and the SR is 5.0885 s; it can be fully seen that our method achieves fast estimation. We can also see that as the SNR and time slot increase, the time consumed increase, and that is because the accuracy is also improved.   Since it optimizes all the spatialization angles simultaneously and eliminates some false paths, the proposed scheme can obtain better NMSE performance, and the processing time of the BCS method is significantly higher than the proposed scheme under the same conditions. In addition, we can see that when the number of users increases, the channel estimation accuracy of the proposed algorithm decreases; but the decrease is small, and the proposed algorithm can still effectively estimate the channel.
As SNR increases, we observe that the RMSE of SVD remains constant, and its computational complexity is lower; but our algorithm yields the best performance. This is because the SNR is related to the entire angle domain grids, and our SVD method finds the best matching grid to get coarse estimates. Similarly to Figure 5, the RMSE performance of channel estimation will decrease with the increase of users, and in contrast, the number of antennas is directly proportional to the estimated accuracy.

Conclusions
In this paper, we proposed a joint iterative channel estimation scheme based on step-length optimization for multi-user mmWave massive MIMO systems. The proposed scheme first combined the advantages of Newton method and gradient descent method, and then gave a step-length optimization approach. Compared with the conventional SR algorithm, the proposed method greatly reduced the computational complexity without losing estimation accuracy, and the proposed algorithm yielded smaller channel estimation error compared to existing BCS and SVD methods. In addition, the proposed method could effectively estimate the channel even with a certain number of UEs. The perspective of this work considered an extension to the mmWave time-varying channel for joint channel parameters estimation, which included AoDs/AoAs, path gains, and Doppler shifts.