Critical Temperature Prediction of Superconductors Based on Atomic Vectors and Deep Learning

: In this paper, a hybrid neural network (HNN) that combines a convolutional neural network (CNN) and long short-term memory neural network (LSTM) is proposed to extract the high-level characteristics of materials for critical temperature ( T c) prediction of superconductors. Firstly, by obtaining 73,452 inorganic compounds from the Materials Project (MP) database and building an atomic environment matrix, we obtained a vector representation (atomic vector) of 87 atoms by singular value decomposition (SVD) of the atomic environment matrix. Then, the obtained atom vector was used to implement the coded representation of the superconductors in the order of the atoms in the chemical formula of the superconductor. The experimental results of the HNN model trained with 12,413 superconductors were compared with three benchmark neural network algorithms and multiple machine learning algorithms using two commonly used material characterization methods. The experimental results show that the HNN method proposed in this paper can e ﬀ ectively extract the characteristic relationships between the atoms of superconductors, and it has high accuracy in predicting the T c.


Introduction
Following the discovery of superconductors more than a century ago, it became a focus of research [1]. The superconducting phenomenon [2] is an intrinsic quantum phenomenon caused by the limited attraction between paired electrons. It has unique properties such as zero direct current (DC) resistivity [2][3][4], as well as Meissner and Josephson effects [5][6][7], and its potential applications are increasing. There is even a deep connection between the superconducting state phenomenon and the Higgs mechanism in particle physics [8]. Superconductors can be roughly classified into cuprate-based, iron-based, and all other exotic superconductors. A large amount of research focuses on cuprates and iron-based compounds. Since the discovery of iron-based superconductors in 2008, various types of crystal-based iron-based superconductors were found. Their common feature is that they all have FeAs4/FeSe4 tetrahedral layers [9,10]. Experimental and theoretical studies found that FeAs4/FeSe4 tetrahedral layers in iron-based superconductors play a crucial role in superconductivity [9,10]. High-temperature superconductivity in copper oxides, first discovered 20 years ago [11], led researchers on a wide-ranging quest to understand and use this new state of matter. However, there are still many unresolved problems in superconducting research. For example, the transition temperature of superconductivity is quite different from the actual application. At the extracts the long-dependence feature relationships between atoms. The contributions of this paper can be summarized as follows: (1) Extensive computational tests over three standard benchmark datasets demonstrate the advanced performance of our proposed HNN model. (2) The atomic vector characterization method used to represent superconductors, in addition to using Magpie, one-hot, and other methods, provides a better method for the characterization of superconductors, and this method can also be used to characterize other materials.
The structure of the article is as follows: firstly, we briefly introduce the generation process of the atomic vector, then introduce the source of the superconductor dataset used in this article, as well as the method of characterizing the data of the atomic vector and its model structure. Then, we compare the HNN model with the experimental results of CNN, LSTM, and fully connected neural networks (FNN), as well as the experimental results of multiple machine learning methods of traditional one-hot and Magpie material characterization methods.

Atomic Vector Generation Methods
The atomic vector (Atom2Vec) was first proposed by Quan et al. [38] of Stanford University. Below, we briefly describe the workflow of Atom2Vec. As shown in Figure 1, to capture the relationship between the atom and the environment, the first step is to generate an atom-environment pair for each compound in the material dataset. Before that, a clearer definition of the environment is needed. Atoms can be conveniently represented by chemical symbols. The environment includes two aspects: the number of target atoms in the compound and the number of different atoms in the residue. For example, we consider the compound Bi 2 Se 3 from the miniature dataset of the seven samples given in Figure 1. Two atom-environment pairs are generated from Bi 2 Se 3 . For the atom Bi, the environment is expressed as "(2) Se 3 "; for the atom Se, the environment is expressed as "(3) Bi 2 ." Specifically, for the first pair, the "(2)" in (2) Se 3 indicates the presence of two target atoms (here, a compound of Bi), and "Se 3 " indicates the presence of three Se atoms in the environment. relationships between atoms, and the LSTM extracts the long-dependence feature relationships between atoms. The contributions of this paper can be summarized as follows: (1) Extensive computational tests over three standard benchmark datasets demonstrate the advanced performance of our proposed HNN model. (2) The atomic vector characterization method used to represent superconductors, in addition to using Magpie, one-hot, and other methods, provides a better method for the characterization of superconductors, and this method can also be used to characterize other materials.
The structure of the article is as follows: firstly, we briefly introduce the generation process of the atomic vector, then introduce the source of the superconductor dataset used in this article, as well as the method of characterizing the data of the atomic vector and its model structure. Then, we compare the HNN model with the experimental results of CNN, LSTM, and fully connected neural networks (FNN), as well as the experimental results of multiple machine learning methods of traditional one-hot and Magpie material characterization methods.

Atomic Vector Generation Methods
The atomic vector (Atom2Vec) was first proposed by Quan et al. [38] of Stanford University. Below, we briefly describe the workflow of Atom2Vec. As shown in Figure 1, to capture the relationship between the atom and the environment, the first step is to generate an atomenvironment pair for each compound in the material dataset. Before that, a clearer definition of the environment is needed. Atoms can be conveniently represented by chemical symbols. The environment includes two aspects: the number of target atoms in the compound and the number of different atoms in the residue. For example, we consider the compound Bi2Se3 from the miniature dataset of the seven samples given in Figure 1. Two atom-environment pairs are generated from Bi2Se3. For the atom Bi, the environment is expressed as "(2) Se3"; for the atom Se, the environment is expressed as "(3) Bi2." Specifically, for the first pair, the "(2)" in (2) Se3 indicates the presence of two target atoms (here, a compound of Bi), and "Se3" indicates the presence of three Se atoms in the environment.
Bi2Sb3,Bi2Se3,Bi2Te3, Sb2Se3,Sb2Te3,Bi2O3, Sb2S3,…… We obtained more than 73,452 binary, ternary, and quaternary inorganic compounds from the Materials Project database [40], and then generated a sparse atomic environment matrix. The SVD of the matrix was used to pair the columns, which were then compressed, and the singular vectors corresponding to the largest 20 singular values were taken, while each element was described as a 20-dimensional vector. Obviously, elements with similar properties would have similar row vectors in the atomic environment matrix, generating similar atomic vectors.

Dataset Selection and Material Characterization
The experimental data we selected came from the SuperCon database [25], which contains an exhaustive list of superconductors, all of which are from published papers. It contains two compounds, one is a metal oxide (metal-containing inorganic material, alloy compound, oxide, high- We obtained more than 73,452 binary, ternary, and quaternary inorganic compounds from the Materials Project database [40], and then generated a sparse atomic environment matrix. The SVD of the matrix was used to pair the columns, which were then compressed, and the singular vectors corresponding to the largest 20 singular values were taken, while each element was described as a 20-dimensional vector. Obviously, elements with similar properties would have similar row vectors in the atomic environment matrix, generating similar atomic vectors.

Dataset Selection and Material Characterization
The experimental data we selected came from the SuperCon database [25], which contains an exhaustive list of superconductors, all of which are from published papers. It contains two compounds, one is a metal oxide (metal-containing inorganic material, alloy compound, oxide, high-temperature superconductor, etc.), and the other is an organic superconductor. We obtained 12,413 kinds of metal oxide superconductors from this database, and the Tc of their superconductivity was 0.533 k-120 k (the distribution is shown in Figure 2a). Here, we consider the characteristics of the superconducting compound AxByCz, assuming that the atomic vectors of the elements A, B, and C are → A, → B and → C. The input compounds can be characterized as side-by-side atomic vectors, based on the effect of the number of atoms in a crystalline compound on the properties of the material. The corresponding number of atoms is added to the end of the atomic vector, and the compound is then characterized as follows: where x, y, and z represent the number of corresponding elements in the superconducting compound. temperature superconductor, etc.), and the other is an organic superconductor. We obtained 12,413 kinds of metal oxide superconductors from this database, and the Tc of their superconductivity was 0.533 k-120 k (the distribution is shown in Figure 2a). Here, we consider the characteristics of the superconducting compound AxByCz, assuming that the atomic vectors of the elements A, B, and C are A ⃗ , B ⃗ , and C ⃗ . The input compounds can be characterized as side-by-side atomic vectors, based on the effect of the number of atoms in a crystalline compound on the properties of the material. The corresponding number of atoms is added to the end of the atomic vector, and the compound is then characterized as follows: where x, y, and z represent the number of corresponding elements in the superconducting compound. After we counted superconductors obtained from the SuperCon database, we found that all materials contained no more than eight element types; thus, we characterized the superconductors as a matrix V (n × d) as described above, where n = 8 and d = 21. For those with fewer than eight atoms, we padded them with 0.

Inter-Atomic Short-Dependence Feature Extraction Method Based on CNN
CNN is a neural network specially used to process data with similar network structures. It was originally applied in the field of computer vision to extract local features [41,42]. CNN networks achieved significant results in tasks such as natural language processing, speech recognition, and face recognition [43][44][45], indicating that CNN has the ability to independently extract features. Due to the extremely complex crystal structure and the interdependence between atoms, it is often difficult to determine the interatomic dependencies from complex structures by relying on experts or prior knowledge. Therefore, this paper uses the CNN model (see Figure 3) to extract short-dependent feature relationships between atoms of crystals.  After we counted superconductors obtained from the SuperCon database, we found that all materials contained no more than eight element types; thus, we characterized the superconductors as a matrix V (n × d) as described above, where n = 8 and d = 21. For those with fewer than eight atoms, we padded them with 0.

Inter-Atomic Short-Dependence Feature Extraction Method Based on CNN
CNN is a neural network specially used to process data with similar network structures. It was originally applied in the field of computer vision to extract local features [41,42]. CNN networks achieved significant results in tasks such as natural language processing, speech recognition, and face recognition [43][44][45], indicating that CNN has the ability to independently extract features. Due to the extremely complex crystal structure and the interdependence between atoms, it is often difficult to determine the interatomic dependencies from complex structures by relying on experts or prior knowledge. Therefore, this paper uses the CNN model (see Figure 3) to extract short-dependent feature relationships between atoms of crystals. As mentioned earlier, this article converts each compound into an atomic vector in parallel, and each element in the compound is a vector representation of the atom, which is used to obtain the attribute and environmental representation of the atom. In this way, the input compound can be represented as a matrix V ∈ R (n × d) , where d is the dimension of the atomic vector plus 1, and n is the type of element in the crystal compound. After characterizing the input compound, a conventional layer is used to extract short-dependence features.
Specifically, the convolution layer extracts short-dependence features by continuously sliding window-shaped convolution kernels over the entire row of the matrix V, and the width l of the convolution kernel is the same as the width d of the atomic vector. The height h of the convolution kernel is multiple adjacent rows. Experimental results show that sliding three elements at a time can achieve good performance. The convolution kernel slides over matrix V and performs a convolution operation, where V [i:j] represents the sub-matrix of V from the i-th row to the j-th row, and Wi represents the i-th convolution kernel. Formally, the output of the convolution layer of the i-th convolution kernel is calculated as follows: where ⊗ is bitwise multiplication, ci is the feature learned by the i-th convolution kernel, b is the bias, and f is the activation function (such as sigmoid or tangent). In this study, the rectified linear unit (ReLU) was selected as the nonlinear activation function. For n convolution kernels, the generated n feature maps can be regarded as the input of the LSTM: W = c 1 ,c 2 ,...,c n . Here, a comma indicates a column vector connection, and ci is a feature map generated using the i-th convolution kernel.

Inter-Atomic Long-Dependence Feature Extraction Method Based on LSTM
LSTM is a type of recurrent neutral network (RNN). LSTM achieved great success in many applications, such as unconstrained handwriting recognition [46], speech recognition [47], handwriting generation [35], machine translation [48], etc. Each step of the LSTM has a series of repeated neural network templates. In each step, a unit state c t (post hidden state h t-1 , current step x t ) is controlled by a set of gates, including the forgotten gate f t , an input gate i t , and an output gate o t . These gates use the previously hidden state ht−1 and the current input xi together to decide how to update the current cell and the current hidden state ht (see Figure 2b). The LSTM conversion function is defined as follows: Forgotten Output Unit  As mentioned earlier, this article converts each compound into an atomic vector in parallel, and each element in the compound is a vector representation of the atom, which is used to obtain the attribute and environmental representation of the atom. In this way, the input compound can be represented as a matrix V ∈ R (n × d) , where d is the dimension of the atomic vector plus 1, and n is the type of element in the crystal compound. After characterizing the input compound, a conventional layer is used to extract short-dependence features.
Specifically, the convolution layer extracts short-dependence features by continuously sliding window-shaped convolution kernels over the entire row of the matrix V, and the width l of the convolution kernel is the same as the width d of the atomic vector. The height h of the convolution kernel is multiple adjacent rows. Experimental results show that sliding three elements at a time can achieve good performance. The convolution kernel slides over matrix V and performs a convolution operation, where V [i:j] represents the sub-matrix of V from the i-th row to the j-th row, and W i represents the i-th convolution kernel. Formally, the output of the convolution layer of the i-th convolution kernel is calculated as follows: where ⊗ is bitwise multiplication, c i is the feature learned by the i-th convolution kernel, b is the bias, and f is the activation function (such as sigmoid or tangent). In this study, the rectified linear unit (ReLU) was selected as the nonlinear activation function. For n convolution kernels, the generated n feature maps can be regarded as the input of the LSTM: W = {c 1 , c 2 , . . . , c n }. Here, a comma indicates a column vector connection, and c i is a feature map generated using the i-th convolution kernel.

Inter-Atomic Long-Dependence Feature Extraction Method Based on LSTM
LSTM is a type of recurrent neutral network (RNN). LSTM achieved great success in many applications, such as unconstrained handwriting recognition [46], speech recognition [47], handwriting generation [35], machine translation [48], etc. Each step of the LSTM has a series of repeated neural network templates. In each step, a unit state c t (post hidden state h t−1 , current step x t ) is controlled by a set of gates, including the forgotten gate f t , an input gate i t , and an output gate o t . These gates use the previously hidden state h t−1 and the current input x i together to decide how to update the current cell c t and the current hidden state h t (see Figure 2b). The LSTM conversion function is defined as follows: Output Unit where σ g represents the sigmoid function f(x) = 1/(1 + e (−x) ), and its output is [0, 1]. σ c represents the hyperbolic tangent function, and ⊗ is a bitwise multiplication.

Architecture of HNN Model
Based on the above analysis, this paper proposes an HNN model based on CNN and LSTM. The architecture of the hierarchical feature extraction model is shown in Figure 4, and the algorithm is described below in detail.

Architecture of HNN Model
Based on the above analysis, this paper proposes an HNN model based on CNN and LSTM. The architecture of the hierarchical feature extraction model is shown in Figure 4, and the algorithm is described below in detail. Each superconductor is represented as a matrix V, whereby V is input into the first layer of a single CNN channel, with 32 convolution kernels; thus, the size of the convolution kernel is 3 × 21 × 1. The output of the first layer of convolution is input into the second layer convolution channel, with 32 channels and 64 convolution kernels; thus, the size of the convolution kernel is 3 × 1 × 32. The output of the second layer convolution channel is input into the third layer convolution channel. The number of channels is 64, and 128 convolution kernels are used; thus, the size of the convolution kernel is 3 × 1 × 64. After three layers of convolution operations, the CNN finally obtains a 2× 1 × 128 feature map. This feature map is input into a two-layer LSTM network with 256 forward-propagating LSTM neurons. In the first layer, we use the output of all hidden states to obtain the long-term dependency feature relationship between atoms. In the second layer, we only use the output of the last state to feed into the subsequent fully connected network for Tc prediction. After the LSTM network, each feature is mapped into a 1 × 1 × 256 matrix. The Tc of each superconductor is calculated using the last layer of the fully connected network. After each of the convolutional layers, a batch normalization layer [49] is used to improve the convergence speed of the model and reduce the influence of network weight initialization during the learning process. Except for the final output layer, a rectified linear unit (ReLu) [50] is used as the activation function for each layer of the neural network. The detailed parameters of each layer of the CNN model are shown in Table 1.  Each superconductor is represented as a matrix V, whereby V is input into the first layer of a single CNN channel, with 32 convolution kernels; thus, the size of the convolution kernel is 3 × 21 × 1. The output of the first layer of convolution is input into the second layer convolution channel, with 32 channels and 64 convolution kernels; thus, the size of the convolution kernel is 3 × 1 × 32. The output of the second layer convolution channel is input into the third layer convolution channel. The number of channels is 64, and 128 convolution kernels are used; thus, the size of the convolution kernel is 3 × 1 × 64. After three layers of convolution operations, the CNN finally obtains a 2× 1 × 128 feature map. This feature map is input into a two-layer LSTM network with 256 forward-propagating LSTM neurons. In the first layer, we use the output of all hidden states to obtain the long-term dependency feature relationship between atoms. In the second layer, we only use the output of the last state to feed into the subsequent fully connected network for Tc prediction. After the LSTM network, each feature is mapped into a 1 × 1 × 256 matrix. The Tc of each superconductor is calculated using the last layer of the fully connected network. After each of the convolutional layers, a batch normalization layer [49] is used to improve the convergence speed of the model and reduce the influence of network weight initialization during the learning process. Except for the final output layer, a rectified linear unit (ReLu) [50] is used as the activation function for each layer of the neural network. The detailed parameters of each layer of the CNN model are shown in Table 1. In order to ensure the stability and reliability of the computational experimental results, the HNN and all subsequent comparative computational experiments (RF, GBDT, etc.) were subjected to 10 iterations of 10-fold cross-validation to calculate the average performances. The whole model was developed based on Python 3.6. The neural network model used the Tensorflow14.0 [51] deep learning framework. The implementation of the baseline machine learning algorithms was based on Scikit-learn [52]. All the programs except the baseline machine learning algorithms were run on a Dell Server with a 3.6-GHz central processing unit (CPU) and NVIDIA GPU GTX1080Ti.

Results
In this section, to show that the neural network model proposed in this paper can extract short-dependence features and long-dependence features between atoms in superconductors, we firstly compare three benchmark methods, including CNN, LSTM, and a six-layer FNN model with 256, 128, 64, 32, and 1 hidden layer neurons, as well as the ReLu activation function. At the same time, to illustrate the advantages of this method over traditional material characterization methods combined with machine learning algorithms for material performance prediction, we also compared the use of one-hot and Magpie material characterization methods and used SVM, decision tree (DT), RF, GBDT, and KRR machine learning algorithms. SVM [53] was used to find the best separation hyperplane on the feature space to maximize the interval between positive and negative samples on the training set. After introducing the kernel method, SVM can be used to solve nonlinear problems. DT [54] is a tree structure in which each internal node represents a judgment on an attribute, each branch represents the output of a judgment result, and finally each leaf node represents a classification result. RF [55] is actually a special bagging [56] method that uses a decision tree as a model in bagging. GBDT [57] is also known as the multiple additive regression tree (MART), which learns to combine multiple weak learners effectively to build a strong learner with high prediction accuracy, which can reduce the variance and deviation of the prediction model. KRR [58] is the ridge regression (L2 regular linear regression [59]) using kernel techniques with the same learning form as SVM, but the loss function is different. To ensure the stability of the experimental results, each group of algorithms used the 10-fold cross-validation method for averaging during training.
In machine learning, the model can be thought of as a machine with many adjustable knobs, which are called hyperparameters. Adjusting the knobs can change the model's performance. The search space for hyperparameters of neural networks mainly includes learning rate, optimization algorithms, and batch size; the search space for hyperparameters of SVM, RF, KRR, DT, and GBDT algorithms mainly includes the decision tree number, learning rate, sampling rate, and maximum depth of the decision tree. Among them, the learning rate is one of the most important hyperparameters of deep neural networks. We tried a learning rate from 0.1 to 1 × 10 −6 (10-fold reduction each time).
For the regression model, we chose mean absolute error (MAE), root-mean-square error (RMSE), and R-squared (R 2 ) as the evaluation indicators of the model. MAE is used to reflect the actual situation of the predicted value error, and RMSE is used to measure the deviation between the predicted value and the true value. R 2 has a value in the range (0, 1), which is a statistic that measures the goodness of fit. The specific calculation formulas are shown below.
where m is the number of samples, y i andŷ are the true and predicted values of the i-th sample label (Tc of the superconductors), and − y is the average of the true labels of m samples. We set the initial values for all hyperparameters based on empirical intuition and then used a greedy algorithm to adjust each hyperparameter step by step instead of performing a grid search, which is not feasible due to computational cost. Finally, all the hyperparameters of the various models were determined, as shown in Table 2.  Figure 5a shows the changes in MAE values of all neural network models with the increase in the number of training epochs. Among them, the HNN model proposed in this paper obtained the minimum MAE value of 5.631 k after training 200 generations. At the same time, we can see that our model stabilized after about 80 generations, and the convergence rate was higher than that of the other three baseline models.   Figure 5a shows the changes in MAE values of all neural network models with the increase in the number of training epochs. Among them, the HNN model proposed in this paper obtained the minimum MAE value of 5.631 k after training 200 generations. At the same time, we can see that our model stabilized after about 80 generations, and the convergence rate was higher than that of the other three baseline models. In addition, Table 3 comprehensively evaluates the MAE, RMSE, and R 2 values of the four models in the 200th generation. From the table, it can be seen that the HNN model was better than the three benchmark models from these three perspectives. Stanev et al. [35] used the Magpie feature combined with the RF method, and the result of R 2 was 0.876, while the HNN could reach 0.899. Hamidieh et al. [37] changed RF to GBDT on the basis of Reference [35], and they used all the data in Supercon database; although the R 2 could reach 0.920, the improvement of the results depended largely on the increase in the amount of data in the training model, and the generalization ability remains to be discussed. However, HNN had a better MAE than Reference [37] with less data. From Table 3, we can also see that the LSTM method alone could also achieve good results, indicating that considering the dependence of atoms in superconductors can help improve the prediction results.  In addition, Table 3 comprehensively evaluates the MAE, RMSE, and R 2 values of the four models in the 200th generation. From the table, it can be seen that the HNN model was better than the three benchmark models from these three perspectives. Stanev et al. [35] used the Magpie feature combined with the RF method, and the result of R 2 was 0.876, while the HNN could reach 0.899. Hamidieh et al. [37] changed RF to GBDT on the basis of Reference [35], and they used all the data in Supercon database; although the R 2 could reach 0.920, the improvement of the results depended largely on the increase in the amount of data in the training model, and the generalization ability remains to be discussed. However, HNN had a better MAE than Reference [37] with less data. From Table 3, we can also see that the LSTM method alone could also achieve good results, indicating that considering the dependence of atoms in superconductors can help improve the prediction results. Next, we compared the results using one-hot and Magpie material characterization methods to predict the Tc using multiple machine learning methods. It can be clearly seen from Figure 5b that the MAE values of the predicted results of various machine learning methods under the two material characterizations changed. In order to facilitate comparison, it is necessary to point out that the result of the HNN model in Figure 6 was obtained by using atomic vector characterization and the HNN method. In general, the prediction results described by Magpie were better than those described by one-hot, but the results of the HNN model proposed in this paper were still the best. Tables 4 and 5 comprehensively evaluate the RMSE, MAE, and R 2 results of various machine learning algorithms under the two material descriptions. It must not be noted here that the RF and gradients described by Magpie were used. Two models based on the integrated idea of the GBDT also achieved good results. Next, we compared the results using one-hot and Magpie material characterization methods to predict the Tc using multiple machine learning methods. It can be clearly seen from Figure 5b that the MAE values of the predicted results of various machine learning methods under the two material characterizations changed. In order to facilitate comparison, it is necessary to point out that the result of the HNN model in Figure 6 was obtained by using atomic vector characterization and the HNN method. In general, the prediction results described by Magpie were better than those described by one-hot, but the results of the HNN model proposed in this paper were still the best. Tables 4 and 5 comprehensively evaluate the RMSE, MAE, and R 2 results of various machine learning algorithms under the two material descriptions. It must not be noted here that the RF and gradients described by Magpie were used. Two models based on the integrated idea of the GBDT also achieved good results.  At the same time, the results of these two methods were better than other conventional machine learning algorithms when using one-hot feature description. Figure 6 shows the predicted results of the three material description methods using the best model on the test set. The abscissa represents  At the same time, the results of these two methods were better than other conventional machine learning algorithms when using one-hot feature description. Figure 6 shows the predicted results of the three material description methods using the best model on the test set. The abscissa represents the measured value, and the ordinate represents the predicted value. Comparing Figure 6a-c, we find that the results in Figure 6c were the worst. The predicted value of the Tc between 60 k and 100 k using the Magpie feature and RF method was generally lower than the measured value. The HNN method in this temperature range was better than the RF method. However, the predicted value of superconductors using the atomic vector combined with the HNN method at a Tc of 40 k-60 k was more accurate than the RF method, and its predicted effect was not as good as that of the RF method.
The above two sets of experiments compared the predicted results of different machine learning algorithms under different network models with two traditional material representations described by the atomic vector. Among them, the prediction effect using the HNN model proposed in this paper was the best. This result shows that the atomic vector characterization of the material combined with the HNN model can adequately extract the inter-atomic characteristics of the superconductors. There are certain advantages.

Discussion
This paper proposed a prediction model for Tc of superconductors based on deep learning. System experiments and verifications showed that our HNN model has high prediction accuracy. Because deep learning has stronger generalization ability than machine learning models, we can use our proposed deep learning model to predict the Tc without using DFT, allowing us to find new superconductors. The first step in discovering new materials using deep learning methods is to establish an accurate material attribute prediction model, then construct an imaginary material space (such as AxByCz x + y + z < 10, where A, B, and C are different elements, and x, y, and z are subscripts of the corresponding elements), and finally build an accurate prediction model to screen for possible new materials in this space. For example, after using FNN to build an accurate prediction model of formation energy, Jha et al. [60] screened materials with low formation energy in the constructed material paradigm. After establishing a Tc prediction model using RF, Stanev et al. [35] screened the Inorganic Crystallographic Structure Database (ICSD) to find possible superconductors. Therefore, the model for predicting Tc proposed in this paper can be used to discover new superconductors.
Furthermore, the method of characterizing superconductors based on atomic vectors in this paper provides a new method in addition to Magpie and one-hot characterization methods, and this method can also be used for the characterization of other materials as inputs for a neural network in subsequent tasks.

Conclusions
This paper proposed a new model for predicting the material properties of compounds. It used the side-by-side arrangement of atomic vectors that can represent atomic properties and the environment in the compound as input. The architecture of LSTM stacked on the convolution layer was used. The short-dependence feature relationship between atoms was extracted, and the LSTM was used to extract the long-dependency feature relationship between atoms. To reflect the advantages of this model, we used the commonly used CNN, LSTM, and FNN as comparable models. The experimental results applied to the Tc prediction of superconducting materials showed that our proposed HNN model was superior to the three benchmark models from three angles of convergence speed, based on RMSE, MAE, and R 2 . The proposed HNN method can effectively extract the characteristic relationships between atoms of superconductors and predict the Tc.
Moreover, we used the one-hot and Magpie material characterization methods. The prediction results of various machine learning algorithms were compared with the experimental results of the HNN model. As a result, the prediction effect of this model was still good. At the same time, we also observed that, when using machine learning algorithms to predict material properties, Magpie features were generally better than one-hot features. In terms of algorithms, RF, GBTD, and other algorithms based on integrated ideas were generally better than other algorithms. The first step in discovering new materials using deep learning methods is to establish an accurate material attribute prediction model, then construct an imaginary material space, and finally build an accurate prediction model to screen for possible new materials in this space. Therefore, the model for predicting Tc proposed in this paper can be used to discover new superconductors.