Application of Support Vector Machine (SVM) in the Sentiment Analysis of Twitter DataSet

: At present, in the mainstream sentiment analysis methods represented by the Support Vector Machine, the vocabulary and the latent semantic information involved in the text are not well considered, and sentiment analysis of text is dependent overly on the statistics of sentiment words. Thus, a Fisher kernel function based on Probabilistic Latent Semantic Analysis is proposed in this paper for sentiment analysis by Support Vector Machine. The Fisher kernel function based on the model is derived from the Probabilistic Latent Semantic Analysis model. By means of this method, latent semantic information involving the probability characteristics can be used as the classification characteristics, along with the improvement of the effect of classification for support vector machine, and the problem of ignoring the latent semantic characteristics in text sentiment analysis can be addressed. The results show that the effect of the method proposed in this paper, compared with the comparison method, is obviously improved.


Introduction
As an emerging field, text sentiment analysis has great potential in research and practical applications, helping explain that the research of text sentiment analysis has been attracting increasingly more attention at home and abroad [1][2][3]. Statistical methods, such as keyword matching method, are adopted in most of the thematic information mining for information mining. However, this method can only be used to retrieve documents with same or similar theme keywords, not to calculate the opinions and emotions of documents. Moreover, it is generally acknowledged that the thematic information cannot be searched simply by keywords in some cases. Although it can be classified into related topics, it holds contradictory or neutral emotional views. Thus, when conducting sentiment analysis of such texts, it is necessary to analyze not only the themes, but also the viewpoints and positions, that is, text orientation. Moreover, in terms of the dominant research methods in sentiment analysis, the situations of "Synonymy" and "Polysemy" in natural language and the semantic correlations between vocabulary and document are often ignored in the process of modeling [4][5]. Even more, other features, such as Semantic Structure, the latent semantic information of documents, are also ignored [6]. Overall, these problems can influence the degree of coincidence between the sentiment analysis and the actual semantics, thus affecting the accuracy of the sentiment analysis. Thus, how to solve these problems and improve the accuracy of sentiment analysis has become a challenging research topic.

Kernel Function
So far, in support of vector machines, the research on the theory of kernel function has been focusing mainly on the following aspects. The first one is the properties of kernel function. To be more specific, some properties of Mercer kernel are deeply studied and discussed in reference [7]; for example, Burges studied the nature of kernel functions from a geometric point of view [8]. The second one is the construction and improvement of kernel function. For example [9], Tao Wu, Hangen He et al. regard the construction of kernel function as a function interpolation problem, proposing that it is not necessary to construct explicit function satisfying Mercer condition. The key to the performance difference of the kernel function is to determine the inner product value between the test samples and the training samples according to the inner product between the training samples. It is not necessary to construct the analytic expression of the kernel functions when using this method, for which can directly construct the kernel functions from the samples, thus opening up a new channel for the construction of the kernel functions. However, this method is disadvantageous due to too large amount of calculation when solving the actual classification problem. Amari et al. [10], through conducting the Riemann geometric analysis of kernel function, gradually modified the existing kernel function based on experimental data, to make it better matched with the actual problem. The third one is the selection of the kernel function. Through literature review [11], a method of selecting the kernel function based on hybrid genetic algorithm for addressing the upper limit of Loo upper bound is proposed. Fengting Jia [12] proposed two methods to automatically determine the kernel function, namely, the parameter determination method based on the matrix similarity measure, and the parameter determination method based on the separable kernel function. These two methods are commonly advantageous due to their small amount of calculation and the convenient calculation, but disadvantageous in that the parameters obtained by using the two methods are usually satisfactory solutions rather than optimal solutions.
The usual methods of selecting kernel functions to solve practical problems are as follows. Firstly, kernel functions are selected in advance based on the prior knowledge of experts; secondly, Cross-Validation method is adopted, that is, different kernel functions should be tried separately when selecting kernel functions, with the kernel function with the smallest error as the best kernel function. For example, aiming at the Fourier kernel and RBF kernel, combined with the function regression problem in signal processing problems, simulation experiments show that the error of SVM with Fourier kernel is much less than that with RBF kernel under the same data conditions. The third one, also known as the mainstream method for selecting kernel functions at present, is to use the hybrid kernel function method proposed by Smits et al. [13], which is also the pioneering work of constructing kernel functions.

LSA & PLSA
Also known as the Latent Semantic Index, Latent Semantic Analysis (LSA) preserves the method of calculating the similarity between terms and documents expressed by space vectors by using the angle between the space vectors in the traditional vector space model. Besides, the words and documents are further mapped into the latent semantic space, along with the mining of the underlying semantics and latent topics under the text and document surface, thereby improving the effect of information retrieval. The essence of the latent semantic analysis method is to find out the true semantics of the vocabulary in the document, and then dig out the topic not dependent on vocabulary in the document, that is, the latent semantic theme, to address the related problems and deficiencies caused by the inability to consider the latent semantic in the Vector Space Model [14]. Because LSA expresses each word as a point in the latent semantic space, multiple meanings of a word correspond to a point in this space, which are not distinguished, helping explain that it only solves the problem of "Synonymy", but not "Polysemy".
Probabilistic Latent Semantic Analysis (PLSA) was firstly proposed by THOMAS HOFMAN abroad based on statistical latent block model, applying it for unsupervised learning. Experimental results show that the method, when compared with standard latent semantic analysis, marks an obvious progress. Later, in another article, THOMAS HOFMAN introduced the probabilistic latent semantic analysis method and applied it to automatic retrieval. The experimental results show that the probabilistic latent semantic retrieval significantly improves accuracy compared to the matching of standard terms and latent semantic analysis.
Derived from the statistical perspective of latent semantic analysis, the probabilistic latent semantic analysis method is a new statistical technique for analyzing two-mode and co-occurrence data. More specifically, it has been widely used in many related fields such as information extraction and information filtering, natural language processing, and machine learning [15][16]. In terms of the most general application, it means that standard statistical techniques can be applied to text filtering, text selection and complexity control [17]. For example, marginal trust can be used to estimate the performance of the PLSA by calculating the classification results of the PLSA. More specifically, the PLSA associates latent semantic variables with each co-occurring word are all polar [18].
Based on mathematical statistics, traditional text features extraction methods ignore the semantic relationship between words in the text, as well as the semantic problems caused by "Synonymy" and "Polysemy" [19].
Probabilistic latent semantic analysis is a probabilistic latent layer semantic analysis method based on statistical latent block model, which can obviously improve accuracy compared with standard items and latent semantic analysis. Besides, derived from the statistical viewpoint of latent semantic analysis, probabilistic latent semantic analysis method can make the latent semantic information with probabilistic features as the classification feature, thus effectively reducing the negative effects exerted by "Synonymy" and "Polysemy" in natural language, which in turn improves the emotional classification effect.
In the next section, we will introduce the Fisher kernel function and the reason for choosing it. More specifically, the Fisher kernel function can be used to measure the similarity between two objects on the generated model set and the statistical model set. Combined with the probabilistic latent semantic analysis method, it constructs a new kernel function by considering the probabilistic features of the latent semantics in the inner product of the kernel function. As one of the similar methods for comparison [6], PLSA-SVM does not make any improvement to the kernel function, only transferring to the support vector machine for classification using the topic features in PLSA. However, the most prominent characteristic of the method proposed in this paper is that it derives Fisher kernel similarity function based on the PLSA model, and use it as the kernel function of support vector machine for classification tasks.
In this paper, based on support vector machine for text sentiment analysis, the probabilistic latent semantic analysis is studied, based on which, the Fisher kernel function is improved. Besides, the Fisher kernel similarity function is derived to improve the sentiment classification effect of support vector machine.

Fisher Kernel Function
Named after the contribution of Ronald Fisher [20,21], the Fisher kernel in the field of statistical classification can help measure the similarity between two objects on a statistical model set of a generated model set. It can combine the advantages of generating models with those discriminating classification methods, such as Support Vector Machine. Also, the Fisher kernel model is a dynamically generated probability model. Therefore, serving as a bridge between file generation and the probabilistic model, it can take the entire corpus as background information into account.
The Fisher kernel is generally used in speech recognition, image classification, etc. [22][23][24], which, compared with the traditional kernel function, is featured by its direct relation to the sample. Another feature of the Fisher kernel is that it is suitable for variable-size training and test sample, thus making Fisher kernel a successful application in speech signal recognition. Since the Fisher kernel is a kind of local kernel, whose parameters can be less than that of general hybrid kernel after mixing Fisher kernel with other global kernels, thus capable of effectively reducing the time consumption. Besides, it is expected that such hybrid kernels still have good learning ability and generalization ability.
The Fisher kernel uses the Fisher score method. The Fisher score is defined as follows: where θ denotes the parameter vector, which can be estimated by the Gaussian mixture model using the Expectation Maximum (EM) algorithm. The Fisher kernel is defined as follows: where I refer to Fisher kernel information matrix, which is defined as follows: According to Formula (1)-(3), the Fisher kernel depends entirely on the original training sample information.
In general, the steps to calculate the Fisher kernel are as follows: First, construct a Gaussian mixture model, then use the EM algorithm to estimate the parameters of the model, and finally derive the Fisher kernel based on the Fisher score.
The Fisher kernel is correlated with the number of training samples to some degree. The more training samples, the more compact the Fisher kernel value is, thus indicating a better the classification performance.

Topical Features by PLSA Model
The Probabilistic Latent Semantic Analysis model is a statistical model based on the Aspect Model which is a latent variable model serving the data of the co-occurrence matrix. Each hidden variable, called k z , refers to associated with each observation. Here, each observation refers to the appearance of a term in a document. We can derive and solve the PLSA model to get the topical features of the text.
The probabilistic variables in the model are introduced as follows: On the basis of the selected document i d , the meaning (topic) k z of the document is selected After selecting the semantics, select the words of the document according to Thus, an observation variable pair z without latent class variable ( , ) i j d w can be obtained, followed by the transformation of the data generation process into a joint probability distribution. The expression is as follows: The conditional independence hypothesis is introduced in the aspect model, that is, i d and j w are conditionally independent with respect to the latent variable k z . The most intuitive interpretation of the aspect model can be obtained based on approximate estimation of conditional probability ( | ) j i P w d : it can be regarded as a convex combination of K conditional classes or aspect Once the model is determined, the parameters can be determined based on the maximum similarity formula. The objective function can be expressed as: The standard procedure for estimating the maximum similarity function in the probabilistic latent model is implemented by the expectation maximization (EM) algorithm.
In order to understand the topical features of the text, it is essential to deduce and solve the PLSA model, so the determined probabilities ( | ) P w z and ( | ) P z d are needed, which are determined by maximizing the following log-likelihood function: Maximizing the log-likelihood function is equivalent to minimizing the KL difference (relative entropy) between the empirical distribution and the parametric model. The expectation maximization (EM) algorithm is used to train the model effectively, involving the determination of the subject parameters of all files and that of the blending coefficients for each document.

Improved Fisher Kernel Function
The research on the kernel function of support vector machine is ultimately served for the discriminant function of support vector machine. Therefore, before improving Fisher kernel function, this section first addresses the discriminant function of support vector machine under normal circumstances, followed by the improvement of the Fisher kernel function, and combines it with the discriminant function of support vector machine.
Fisher kernel similarity function, based on the PLSA model of Section 2.2, is derived in this section. Based on the generated model, a general method, called Fisher kernel algorithm, is proposed to extract the features of the promoter sequence.
First, suppose there is an instance set i X , i X and X whose similarity is represented by the kernel function   , i K X X , it corresponds to the label i S (which can be taken as 1  ).
Corresponding to example X , Label Ŝ is obtained by summing all i S weights, and can be obtained. In the above formula, the coefficient i  serves as a free parameter, and the kernel function K , to some extent, also serves as a free parameter. An optimal coefficient i  must be found so that the label Ŝ has the largest probability value. Small changes which occurred in solving this optimal problem may exert a great impact, such as the change from support vector machine to generalized linear model. ; and the parameter  can be obtained by maximizing the following formula: where c denotes independent of  ; in terms of the kernel function ( )  is adopted. In fact, the form of the kernel function will be universal as long as the feature vectors from the samples can replace these features. The following discussion further discusses how to make a general kernel function more effective. Under the guidance of the "Mercer" theory, an effective kernel function can simply describe the inner product between the defined feature vectors, that is, The feature vectors in the formula are derived from some specific mappings ( ) X X   , with the definite requirement for kernel function in the application. The key now lies in how to derive it from the generated probability model generally. It is known from the previous description that automatically defining a kernel function means assuming a metric relationship in the example. Thus, it is necessary to define these metric relationships directly in the generated model ( ) The Fisher score X U maps the instance X into a feature vector and specifies its steepness in ( ) P X    . Later, called as the natural gradient, the gradient is derived from the ordinary gradient by: The mapping ( ) X X   that transforms the instance X into a feature vector is called a natural mapping, and the value of the parameter  is set. The core of this mapping is the inner product of these feature vectors related to the Riemann system: The Fisher score is core of the Fisher algorithm. In fact, in the logistic regression model, as shown above, the matrix appears in the Gaussian covariance associated with the feature vector, helping explain that this information matrix does not actually exist, with this simple core function as: A suitable alternative is provided for the Fisher kernel function. From a metric point of view, for 0 , 0 c c  , the following kernel function serves as an equivalent kernel, In the logistic regression model, the incremental constants exert an effect on the prior change of the bias term in the kernel, with the incremental factor c related to the remaining parameter change of the whole.
Finally, the Fisher kernel function is directly substituted into the  in the support vector machine as a kernel function, which is substituted into the support vector machine decision function.

Convergence Based on Fisher Kernel Support Vector Machine
The method proposed above is based on SVM, proving that the solution of SVM embedded with Fisher function is stable. The following two sub-problems prove 1. Firstly, the Fisher function is proved to be the kernel function of SVM. 2. The solution of the SVM using the kernel function is stable and convergent.
1 Proof: According to Mercer's theorem, the sufficient condition for the Fisher function to be a kernel function is that the Fisher matrix has to be a symmetric and semi-positive definite matrix.
First, it needs to prove that the Fisher matrix is a symmetric matrix. The element of the Fisher function is defined by the inner product, then ( , ) ( , ) Therefore, the Fisher function is symmetrical. Secondly, it needs to prove that Fisher matrix is a semi-positive definite matrix. Since the study in this paper only involves the real field, the diagonal element ( , ) 0 namely, the diagonal element K, is not negative. According to the properties of semi-positive definite matrix, the diagonal element of an n-order diagonal dominance matrix is not negative, thus making the matrix a semi-positive definite matrix. Hence, Fisher matrix   , ( ) ( ) It can be concluded from the above proof that the Fisher function conforms to the Mercer theorem, thus making the Fisher function a kernel function.
2 Proof: SVM with Fisher kernel function has stable local optimal solution, which indicates that the proposed method is stable.
The dual form of the SVM using the Fisher kernel function is: where ( , ) . Simultaneous binding can be expressed as: Therefore, the dual form of the formula can be expressed as the following standard quadratic programming form: where ( , , 1 ) Hence, the SVM using the Fisher kernel function is a standard quadratic programming form, and same as the quadratic programming, the SVM also converges and has a stable local optimal solution.
Therefore, the method proposed in this section is stable and convergent.

Twitter Data Set
English is the most widely spoken language worldwide, helping explain that English data set is selected as the corpus for experimental verification. Particularly, the corpus used here is extracted from Stanford University's "Sentiment140" [25] (The data set can be downloaded from the following link: http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip).
Information for each field is available in the data set in the link: 0-the polarity of tweet (0 = negative, 2 = neutral, 4 = positive) 1-tweet ID (2087) 2-tweet date (Sat May 16 23:58:44 UTC 2009) 3-Query (lyx). If there is no query, then this value will be NO_QUERY. 4-tweet user (robotickilldozr) 5-tweet text (Lyx is cool) There are 1.6 million records in the dataset, without empty records. Although the neutrals are mentioned in the dataset description, no neutral classes are involved in the training set. 50% of the data has negative tags while the other 50% has positive tags.
Before training, it is necessary to preprocess the data, mainly including removing the columns in the data set that are useless for sentiment analysis, and processing @ mentioned (although @mention has provided certain information, such as another user mentioned in the tweet, these Information is not valuable for constructing sentiment analysis models.), URL links, stop words, etc. Thus, preprocessed data sets can contribute to training and testing sentiment analysis algorithms.

Text Sentiment Analysis Experiment Based on FK-SVM
Based on the previous section, this section combines the Fisher kernel function-based support vector machine method (FK-SVM) with PLSA to propose a new text sentiment analysis method. The understanding and model combination of PLSA in text sentiment analysis is the same as the method in the previous section.
This section mainly incorporates the improved Support Vector Machine (FK-SVM) which is also called as the FK-SVM method, for which combines the advantages of the PLSA model and FK-SVM.
Additionally, the effect of the FK-SVM algorithm on sentiment detection is evaluated in this section, with the experimental process involving the following steps: 1. Pre-process emotional detection data; 2. Extract the characteristics of the preprocessed data; 3. Pass the extracted features to a linear support vector machine for training and recognition; 4. Output classification results.

Experimental Design and Algorithm Evaluation Criteria
The FK-SVM-based text sentiment analysis method will be experimentally verified in this section, with the basic procedure of the FK-SVM method shown in Figure 1, including the following parts:  The two indexes of "recall rate" and "accuracy" are involved in all the retrieval and selection involving large-scale data collection. Due to the two mutually constrained indicators, it is usually necessary to select a suitable degree for the "search strategy" according to needs. Moreover, it should not be too strict or too loose, and a balance point should be achieved between the recall rate and the accuracy. This balance point is determined by specific needs.
Assumption: The documents can be divided into four groups when retrieving documents from a large data set: Of course, the negative samples in this case do not mean the wrong classification, but the type of sample with the category "negative". Thus, it can be concluded that FN and TN are used to calculate the classifier error rate. Then: Recall rate R: The number of related documents retrieved is used as the numerator, with the total number of all related documents as the denominator, namely: Recall TP TP FN   . Accuracy rate P: The number of related documents retrieved is used as the numerator, with the total number of all retrieved documents as the denominator, namely: True positive rate TP TP FP   .

Experimental Results of FK-SVM, HIST-SVM and PLSA-SVM
To verify the superiority of the FK-SVM method, a comparison between FK-SVM method and text sentiment analysis method is used in this section based on HIST-SVM and PLSA-SVM, to prove [6]. The comparison includes accuracy and recall rate.
The histogram of words appearing in the document is mainly used as the feature of the document in HIST-SVM method, and the histogram feature is submitted to the SVM for text sentiment analysis.
The PLSA-SVM method mainly consists of two parts. First, the PLSA algorithm is used to simulate the probability distribution of text and text topic features are used to identify text. Secondly, the text theme features transferred to support vector machine are applied as features of sentiment analysis for classification, that is, text sentiment analysis. Combining the advantages of PLSA generative model and SVM discriminative model, this method is called as PLSA-SVM method which uses topics as hidden variables and uses EM algorithms to estimate model parameters. Detailed training methods can be obtained in reference [6].
The comparison method involves multiple rounds of experimental comparisons and comparisons with different percentages of samples.
Two experiments are conducted in this paper: 1. On the twitter data set, five rounds of cross-validation training and testing are carried out for FK-SVM method, HIST-SVM method and PLSA-SVM method, followed by the obtaining and comparison of the accuracy, recall rate and corresponding average value of each round. 2. On the twitter data set, FK-SVM, HIST-SVM method and PLSA-SVM methods are trained and tested in 5 rounds with different proportions of training samples, followed by the obtaining and comparison of the accuracy, recall rate and corresponding average value of each round.
Main experimental results and analysis: Experiment 1: In the first experiment, FK-SVM method, HIST-SVM method and PLSA-SVM method are compared for several rounds of experiments, followed by the use of the recognition accuracy and recall rate as evaluation criteria to verify the effectiveness of HIST-SVM and PLSA-SVM algorithm. In this section, the FK-SVM method, as long as tested, can be compared with the HIST-SVM and the PLSA-SVM experimental results. Table 1 and Figure 2 present the experimental results of the algorithm, respectively. Experiment 2: The second experiment was carried out on the basis of the first experiment. Table  2 compares the results of experiments using the FK-SVM algorithm proposed in this paper with those using the HIST-SVM and the PLSA-SVM algorithm. Table 2 and Figure 3 present the experimental results of the algorithm, respectively.

Experiment Analysis of Experiment 1
Test experiments are conducted using the method proposed in this section, with the classification performance presented in Table 1. The recognition accuracy and recall rate of FK-SVM method in this section, as shown in Table 1, are higher than HIST-SVM method and PLSA-SVM method. The average accuracy is 87.20% and the recall rate is 88.30%, respectively. 1. Comparison between FK-SVM and HIST-SVM: Table (a) and Figure (b) demonstrate that a significantly higher accuracy can be obtained by using the FK-SVM method proposed in this paper than the HIST-SVM method, whether in the average value or in the comparison test each time. This is because HIST-SVM, as a text sentiment recognition method with excellent experimental results, uses the occurrence times/frequency of words (i.e., statistical histogram) as the text features, and uses SVM for classification and recognition. Obviously, compared with the FK-SVM method, HIST-SVM method is incapable of obtaining the latent topics of the text hidden under the words, that is, the latent sentiment category. Thus, HIST-SVM cannot achieve a good recognition effect in the face of polysemy and other problems. In terms of text classification, FK-SVM method can map low-dimensional linear non-separable data to highdimensional space and make it linear separable; Even more, with more advantages in mining latent topics of text, especially latent sentiment topics, it is less disturbed by specific emotional words and irregular vocabulary usage. Therefore, it performs better in text emotional classification than HIST-SVM method.

Comparison between FK-SVM and PLSA-SVM:
The experiment demonstrates that in the 5 rounds of experiments, the accuracy and recall rate fluctuate within a small range, with a relatively stable overall result, indicating that both methods have a good effect on text sentiment classification. In contrast, a slightly higher accuracy can be achieved by using the FK-SVM proposed in this paper than the PLSA-SVM method, whether in the average or the comparison test each time, because the main difference between the two methods lies in Fisher kernel function in the support vector machine. Therefore, it can be concluded that the Fisher kernel function based on PLSA plays a decisive role in these experiments. Besides, this method can map non-linear data from low-dimensional space to high-dimensional space through the kernel function. When deriving the Fisher kernel function based on the PLSA model, the probability distribution information of the data, and the high-level semantic information of the PLSA, especially, the statistical information of the text itself, are taken into account. This allows the support vector machine based on Fisher kernel function to take the probabilistic information of text units and the latent emotional topics of text as classification features in text classification, thus improving the effect of classification.

Experiment Analysis of Experiment 2
Test experiments are conducted using the method proposed in the previous section, with the classification performance being presented in Table 2. The average accuracy and recall rate of FK-SVM are 80.96% and 79.68%, respectively. 1. Comparison between FK-SVM and HIST-SVM: Table (a) and Figure (b) indicate that the HIST-SVM method is basically stable when faced with different percentages of training samples, while the test results of FK-SVM method slightly improve with the increase of the percentage of training samples This is because when the training samples are few, the probability distribution of vocabulary, documents and emotional topics in the samples will be greatly affected by noise. As the proportion of the sample increases, the corresponding probability distribution will gradually approach to the real distribution, thus making the topic mining more accurate, and improving the effect on the test set. But overall, the average accuracy and recall rate of the FK-SVM method are more obvious than those of the HIST-SVM method. 2. Comparison between FK-SVM and PLSA-SVM: Experiment 2 shows that the FK-SVM method is basically stable when faced with different percentages of training samples, and the test results slightly improve with the increase of percentage of training samples. This is can be attributed to the Fisher kernel function in the support vector machine. More specifically, the probability distribution information of the data has been taken into consideration when deriving the Fisher kernel function. In addition, the probability distribution of vocabulary, documents and emotional topics in samples, influenced by the small number of training samples, is greatly affected by noise. As the sample ratio increases, the corresponding probability distribution gradually approaches to the real distribution, thus making the topic mining more accurate, and improving the effect on the test set. But overall, the average accuracy and recall rate when using FK-SVM method are slightly higher than those using PLSA-SVM.
This section and the preceding section illustrate the specific experimental method of the FK-SVM method in text sentiment analysis, along with the comparison of it with the PLSA-SVM method. The experimental results demonstrate that accuracy and recall are higher when using the FK-SVM method than using the PLSA-SVM method. Theoretically, when deriving the Fisher kernel function based on the PLSA model, the probability distribution information of the data, and the high-level semantic information of the PLSA, especially, the statistical information of the text itself are taken into account. This allows the support vector machine based on Fisher kernel function to take the probabilistic information of text units and the latent emotional topics of text as classification features in text classification, thus improving the effect of classification. The algorithm proposed in this section can better perform sentiment analysis on the text. The classification accuracy is proportional to the number of training samples within a certain range. More specifically, the training complexity and training time of the model will not be affected by the scale of test sets. Furthermore, this method can be applied to large-scale text sentiment classification without additional training process.
In the twitter dataset used in this paper, the difference between the HIST-SVM method and the PLSA-SVM method is slightly smaller than that in the dataset composed of Internet posts. This is because there are many sources of Internet posts whose distribution is more representative of the probability distribution of real data, while twitter, as a single data source, has a tendency of cohesion in its topics, making its probability distribution less close to the real distribution than that of Internet posts. Hence, the PLSA-SVM effect is slightly reduced, while HIST-SVM shows little effect. In short, the experimental results of the FK-SVM method will definitely be more improved than the HIST-SVM method and PLSA-SVM method if the FK-SVM method is applied to the network post data set.

Conclusions
A Fisher kernel function method based on probabilistic latent semantic analysis is proposed in this paper, which improves the kernel function of support vector machine. The Fisher kernel function is derived based on the probabilistic latent semantic analysis model, which fully takes the probability relationship and latent relationship among text, vocabulary and subject into consideration. The support vector machine, based on Fisher kernel function, can classify the text sentiment from the probability level of the generated model, such as the latent semantic level. The main contents are as follows:

A Fisher kernel function based on Probabilistic Latent Semantic Analysis is proposed in this
paper. More specifically, the Fisher kernel function based on probability latent semantic analysis can be deduced by using Fisher function to measure the similarity between two objects on the generated model set and the statistical model set. By means of this method, latent semantic information involving the probability features can be used as the classification features, which also improves the effect of classification for support vector machine, and helps address the problem of ignoring the latent semantic features in text sentiment analysis.