Detection of Anomalies in Large-Scale Cyberattacks Using Fuzzy Neural Networks

Featured Application: Fuzzy neural networks are hybrid models capable of dynamically acting in the construction of expert systems through IF / THEN inference rules. This paper will seek to extract large-scale dataset information from cyberattacks. Abstract: The fuzzy neural networks are hybrid structures that can act in several contexts of the pattern classiﬁcation, including the detection of failures and anomalous behaviors. This paper discusses the use of an artiﬁcial intelligence model based on the association between fuzzy logic and training of artiﬁcial neural networks to recognize anomalies in transactions involved in the context of computer networks and cyberattacks. In addition to verifying the accuracy of the model, fuzzy rules were obtained through knowledge from the massive datasets to form expert systems. The acquired rules allow the creation of intelligent systems in high-level languages with a robust level of identiﬁcation of anomalies in Internet transactions, and the accuracy of the results of the test conﬁrms that the fuzzy neural networks can act in anomaly detection in high-security attacks in computer networks.


Introduction
The sheer volume of data these days has created new services, expectations, laws [1], and also precautions. The amount of essential data circulating in digital solutions enables strategic decision-making to be performed more efficiently, provided that they are treated consistently by its staff and managers [2]. The organic evolution of computing resources allows the change of several scenarios such as online shopping [3], service sales, business deployment, and even on the security and privacy of users' data on the internet [4].
Several companies have already visualized the power of the internet [4] and work with the capture of potential customers who engage in various interactions with online solutions, thus allowing them to create business opportunities or direct more focused marketing actions for this group of people. That happened recently in the US and Brazilian elections [5,6], where digital media was instrumental in mass dissemination of news and information sharing. In this context of a high number of interactions of customers, companies, and people on the Internet, Big Data comes [7], a concept capable of representing a large volume of data connected to several origins, mainly companies, business, and government. As the increasing use of computing resources generates a large amount of data traffic, many precautions and controls need to be taken so that companies and ordinary people do not have their data stolen by malicious people [8].
A fraudulent attack to fetch data from people through computer connections is considered a cyberattack. There are several techniques for stealing user data or information base for other purposes, not legal [9]. These techniques exploit security flaws in electronic devices and use human weaknesses in manipulating resources that use the internet to perform their daily tasks, such as smartphones and tablets [10,11].
Diverse smart systems are being developed to identify a cyberattack in a variety of contexts [9], but getting a template with an expected level of fraud matching is that unbalanced databases become a considerable challenge. In general, databases with more than 100000 records have as frauds less than 2% of these values. Within the computational context, this factor is considered an anomaly because it goes against the normal behavior of the system and does not happen so often [12].
There is a study area that deals with the detection of anomalies in different contexts. We can highlight the models that act in problems of the financial branch [13], of the field of the health [14], in the computation as in the paper of Fugate et al. [15], of the industry like the works of Hadeli et al. [16], Kumarage et al. [17], Dong et al. [18] among others [19]. In general, these studies are focused on specific elements that can generate significant financial losses or the physical integrity of the people. More recent work addresses the identification of anomalous behaviors using artificial intelligence to identify behaviors in vegetation [20], authorization logs [21], computer systems [22] and finally in modern industry [23].
However, this kind of context is not simple to be understood. In many cases, an anomaly specialist is rare and requires years of study and experience. Therefore, several studies are proposing the use of intelligent models based on knowledge to perform the detection of anomalies and at the same time, provide a knowledge base that can serve several purposes [24], such as training of people of a company. These expert systems represent the union of two techniques commonly used in artificial intelligence: fuzzy systems and artificial neural networks [25,26]. The first is capable of bringing interpretability to the results by bringing the responses of the social context through their linguistic and interpretable characteristics. The second is responsible for advances in intelligent models in simulating human reasoning through training [27]. These two concepts united are called fuzzy neural networks, which can act in diverse contexts such as the classification of binary patterns as in the work of Lin et al. [28] and Meesad et al. [29], models that operate in the financial market as proposed by Lin et al. [30] and Kuo et al. [31]. Elements related to the health area such as the models proposed by Wang et al. . [32] and Cheng et al. [33], relevant aspects of the industry [34][35][36], even with the specific focus on cyberattacks such as Batista et al. [37], Gang Wanget al. [38] and Souza et al. [39]. Several models of different characteristics have been developed to work on the detection of cyberattacks such as Demertzis et al. [40][41][42] and Yusob et al. [43], which in turn may also represent an anomaly in the traditional behavior of the internet. Therefore, this paper proposes the use of a fuzzy neural network model for the detection of anomalies in cyberattacks. The model is based on the concepts of neural networks with artificial neurons with the leaky ReLU function created by Maas et al. [44], besides fuzzification procedures based on the ANFIS technique [45], capable of generating similarly spaced membership functions to granularize the feature space. In the second layer are used fuzzy logical neurons capable of adding fuzzy inputs with numerical weights. The network seeks its simplicity through the training based on extreme learning machine [46]. A single neuron represents the neural network of aggregation responsible for detecting anomalies in the system [47]. Therefore, the use of this model, in addition to detecting anomalies in Internet transactions, aims to create a system of fuzzy rules capable of serving as a knowledge base on possible forms of attacks and, consequently, in the detection of anomalies. The database used to prove the approach is a set of data commonly used for the detection of cyberattacks, and many results obtained in the literature may corroborate the results obtained by the model.
The main contribution of this paper is to bring a hybrid model with a high degree of assertiveness in the prediction of anomalous behaviors and to transform this behavior into a set of rules that are interpretable and capable of building an intelligent system. Knowledge gained from fuzzy rules in the training phase allows for the creation of expert systems for an audience that does not work directly with artificial intelligence concepts. Thus, neuro-fuzzy models will disseminate knowledge to a broader audience, especially those who are involved with access control over the internet. As the neural networks used in this paper use the division of the problem space into equally spaced pertinence functions, it is simpler to visualize the fuzzy relations, assigning them values as small, medium, and large, according to the dimension of the problem evaluated.
The paper is organized as follows: Section 2 presents the fundamentals that will support the paper, in Section 3, the main concepts about fuzzy neural networks and related works. In Section 4, we offer the training model that the algorithm will use to detect the web anomaly patterns. Section 5 presents the detection methodology proposed in this article. In Section 6, they show aspects of the database and the configurations of the tests, besides the obtained results. Finally, Section 7 discusses the conclusions and presents a new perspective to future works.

Large-Scale Problems
The evolution of the media and the growing use of computational resources produce large-scale data volumes. This new routine affects the behaviors of software developers, marketing people, managers, and several people involved in maintaining computerized systems and performing specific decisions [7]. Including a high volume of data, decision making becomes more complex and time-consuming, but this goes against the dynamics of today's days where people and companies need to make consistent decisions within a reasonable time so as not to miss out on great opportunities [7]. Some parameters are fundamental for the evaluation of a large-scale data set. They stand out as factors availability, reliability, performance, validation, and system parameterization. When dealing with cyber attacks, the main factors to be discussed are the evaluation, and the validation of the results, mainly if the target system contains information of high relevance.
Based on this massive data volume was created one of the concepts that would be part of the science and routine of developers and researchers data science area: the big data [7]. This concept reveals the existence of an extensive data stream in small time lapses. That happens, for example, when a large number of purchase requisitions are made at the same time by an online shopping site while thousands of other requests about searching for products on the same platform make the site performance slower. This type of experience on the new trends in the information market must ensure that decisions, analyze, and factors are checked promptly for decision-making. When reviewing a search trend for a specific product, the manager can choose to make a lightning-fast promotion, while the information security technicians must be aware of the site's integrity and possible transactions, mainly due to the significant number of malicious requests made by hackers [48].
In the Big Data context, several security factors cannot be overlooked by the technology team. Extensive data in a digital solution can generate an exacerbated profit for solution owners, just as it will also create a range of attacks on sensitive information [49]. When it comes to Big Data, several factors compete with the attention of those involved: processing data and information promptly, protecting the system from malicious attacks, understanding the needs of the target audience, and managing the performance of routine of the computerized resource so that it does not lose fundamental usability requirements for solution users [50].
Large-scale attacks can be viewed as a high number of requests to a server or service, in a short time. They can also result from requisitions with many characteristics to be evaluated by the systems. Cyber attacks, especially when small distortions are inserted in a large group of requests, can cause the unavailability of systems relevant to society, such as security, online shopping, government services, among others. Figure 1 below presents features and challenges that involve the Big Data concepts for the present day.

Cybernetic Invasions and Intrusion Detectors
Intelligent models can help identify patterns related to different contexts, especially concerning aspects of the internet. These criteria follow characteristics that were reported in simulations conducted by Lincoln Labs through the 1998 DARPA Intrusion Detection Evaluation Program. The task of learning the intrusion detector is to construct a predictive model (i.e., a classifier) that can distinguish between "bad" connections, called intrusions or attacks, and regular "good" connections. These behaviors can be qualified as anomalies because the correct pattern on the internet is a proper connection. The objective of this study was to investigate and evaluate intrusion detection research, in the same way, it can serve as a database to identify anomalies in Internet connections. A typical dataset to be audited, which includes a wide variety of simulated intrusions in a military network environment, was provided [51]. Figure 2 presents a possible step for simulating cyber-attack generating problems for individuals and businesses.
To simulate the database to be provided to the academic community, Lincoln Labs set up an environment in their labs to acquire nine weeks of raw TCP transmission data to a local area network (LAN), thus allowing them to simulate a typical LAN, present in a national security organization. In these tests, they operated the LAN as if it were a real Air Force environment, enabling simulations of multiple network attacks [51]. The training data obtained takes up about four gigabytes of compressed binary TCP data from seven weeks of network traffic. This data was processed in approximately five million connection records. To collect the test data were received in two weeks, the records of connections with the evaluated network [51]. For this type of study, consider [51]: -A connection is a sequence of TCP packets starting and ending at some well-defined times, between which data flows to and from a source IP address to a destination IP address in some well-defined protocol.
-Each connection is labeled regular or as an attack, with precisely one type of attack specific.
-Each connection record consists of about 100 bytes.
The attacks fall into four main categories [51]: -DOS: denial of service; -R2L: unauthorized access of a remote machine; -U2R: unauthorized access to root superuser privileges; -Poll: Surveillance and other polls.
As it is a realistic basis, several features in the test data are not present at the same intensity in the training data, helping the models to have characteristics of identifying anomalies according to the nature of the new data. The datasets contain a total of 24 training attack types, with an additional 14 types only in the test data. Stolfo et al. [51] have defined high-level features that help distinguish standard connections from attacks in a statistical way. In this approach, host resources only examine the connections in the last two seconds that have the same destination host as the current connection and calculate the statistics related to the behavior of the protocol, service. Thus it is possible to identify the action of the transaction statistically.
In another approach, similar features of the same service only examine the connections in the last two seconds that have the same function as the current connection. To facilitate the naming of propositions, "Same host" and "same service" are called time-based traffic resources of the connection records [51]. However, other techniques exploit attacks on computer networks using different methodological variations. Some probe attacks examine the hosts (or ports) using a time interval much longer than two seconds, for example, once per minute. Therefore, connection records were also sorted by the destination host, and features were constructed using a window of 100 connections to the same host instead of a time window as in the statistical approach. That produces a set of host-based traffic features [51].
When conducting studies on DOS attacks and probing, there appear to be no sequential patterns that are frequent in R2L and U2R attack records. That is because DOS and probe attacks involve many connections to some hosts in a short time, but R2L and U2R attacks are embedded in the packet data parts and usually require only a single connection [51].
In their study, Stolfo et al. [51] used domain knowledge to add features that look for suspicious behavior in data parts, such as the number of failed login attempts. These resources are called 'content' resources, so there are several techniques for working with anomaly detection according to the characteristics of connections made by people who want to break into a computer network. In this paper, the methods of detecting anomalies made by intelligent models will follow the standards defined by these studies.

Anomaly Detection
Anomalies detection consists of identifying patterns in dataset data with behavior different than expected. Humans can identify when something is not following a pattern because of their lived experience in the context, such as a security that identifies inappropriate behavior at the time of their night round or even a cook who checks that their cake is not growing as usual. These patterns are often referred to as anomalies, outliers, exceptions, aberrations, discordant observations, among others, varying according to context. The terms anomaly and outlier are the most used in the context of anomaly detection [52].
Anomaly Detection uses many advanced statistical techniques to determine whether an observation should be considered anomalous or not based on pre-established patterns. These techniques seek to evaluate behaviors different from the normality of a context. Consequently, the detection of anomalies is multidisciplinary and can serve different backgrounds [53]. They can be classified as punctual anomalies: when an individual data type can be considered as abnormal relative to the rest of the data, the class is designated as an anomaly punctual. This anomaly classification is the simplest and is the focus of most anomaly detection research. They are usually defined as the points out of normality, that one that differs from the central region where the other data are. As a real example, we have the detection of fraud in the use of a credit card [54]. In Figure 3, we can see different behaviors of the data sets evaluated. These outliers are considered behavioral anomalies.
In contextual anomalies, an instance of data can be considered anomalous in a specific context, but not otherwise in another type of evaluation. A temperature of 40 degrees may be average in tropical countries but would be unusual at the poles of the planet. To determine the context of an anomaly, the conceptual attributes are used. For example, in geographic data sets, the length and latitude of a location are the attributes of the context. In time-series data, time is a contextual attribute that determines the position of an instance throughout the sequence [52]. Another way to represent the attributes is to evaluate the behavioral form. In a set of geographic data describing the average rainfall across the world, the amount of precipitation at any location is a behavioral attribute.
Finally, the collective anomaly of data is considered when referring to the whole set of data. Specific data instances in a cumulative anomaly may not be anomalies by themselves, but their example in a collection, such as a collection, is anomalous [52].  [55]. Available in: https://stats. stackexchange.com/questions/160260/anomaly-detection-based-on-clustering.

Related Work
As it is an area of interest in different contexts of science, the detection of anomalies has been the target of several researchers and fields of science. The work of [52] brings you a survey with the central regions of the detection of anomalies and the right jobs. Other papers also use reviews to collect the principal works of the literature addressing diverse topics on the discovery of anomalies like Chandola et al. [56], Ahmed et al. [57], Estevez-Tapiador et al. [58], and Patcha and Park [53].
Other works, such as Sabahi and Movaghar [59], address the detection of anomalies as intruders. The paper of Xie et al. [60] has dealt with anomalies in wireless sensors. The hyperspectral imagery animals were treated with algorithms in the work of Stein et al. [61]. A work, well known in the literature ( Garcia et al. [62]), deals with anomalies in network invasion, another one uses the concepts of anomalies in web-based and networks attacks [63][64][65][66][67], IP address [68,69] and using call stack information [70]. Paper proposed by Lee and Xiang [71] has developed techniques to form a set of information to prevent more and more types of anomalies. Works were produced using the Markov chain to identify the anomalies [72], semi Markov chain [73], use immune-inspired algorithm [74], fuzzy judgment [75], and support vector machine's concepts [76].
Another branch of research where anomaly detection is well used is in the detection of credit card fraud such as Aleskerov et al. [77], monitoring smartphones [78], video anomalies [79], sensor of hardness recognition based on magnetic anomalies [80], quantum anomaly detection [81], energy anomalies [82], sonar imagery [83], wide area network [84] and fast anomaly detection in crowded scenes [85]. In recent paper the use of deep learning and large margin to video-based anomaly detection [86], time series anomalies [87], use noise binary search [88], classification and anomaly detection of side-channel signals uses deep-learning [89], authorization log [21] were used in the context of anomaly identification.

Fuzzy Neural Networks
Fuzzy neural networks (FNN) are intelligent hybrid models composed of artificial neural networks, their techniques of training and updates of parameters and fuzzy systems that have the characteristics of transforming the data of a problem into representations in the space of features, assigning them specific characteristics that can be interpreted as linguistic and interpretive terms. These models allow the union of the main benefits of neural network training techniques and the interpretability of fuzzy systems, allowing intelligent systems to update parameters and perform training with more interpretable results [27].
A wide range of applications can be attributed to these types of problems. Since the 1970s, they have been working synergistically in solving complex problems in various social contexts, industry, and science [90].
This intelligent model has as main characteristic the replacement of the artificial neurons commonly employed in artificial neural networks by fuzzy neurons, which perform the fuzzy input weighting with synaptic weights, which may or may not be fuzzy numbers. This approach uses elements ranging from the transformation of the input space into fuzzy characteristics (also called the fuzzification process) until the answers are obtained in the context to which they belong (defuzzification process). In the middle of these two processes, there is an extraction of knowledge from networks, where they can interpret the characteristics of the database and transform them into a representation through membership functions. These membership functions are constructed using techniques that measure the behavior of data in space. They may have methods that perform the comparison by grouping, similarity, density, or distance [27].
The method with which the inputs are manipulated can define parameters such as activation functions, fuzzy set membership functions, network topology, among others [91]. Fuzzy neural networks can use clustering methods, such as c-means and its fuzzy version, fuzzy c-means ( [92,93]), methods based on data density called clouds [94], eClustering+ [95] and ePL [96], among others. These methods help define fundamental network structures and somehow allow the construction of fuzzy rules based on data contexts in the feature space [97].
In this context, the group of characteristics becomes represented by fuzzy elements, where these characteristics can receive qualitative labels (as definitions of the small, medium, large, or warm and warm cold). Thus, the fuzzy system is considered as an alternative to treating situations where the binary results are insufficient to represent the problems and that they require more factors to evaluate the problem. Fuzzy logic allows us to express data with linguistic labels, enabling a set of ages collected from patients in a clinic to qualify as new age, middle age, and older age. The amount of features depends on the nature of the problem being evaluated and the people who understand the target context [27].
The fuzzy neural networks present applicability in the obtaining of expert systems, classification of standards, linear regression, prediction of time series, besides problems applied in robotics, aspects of nature, health, education, and in the industry as can be visualized below.

Fuzzy Neural Networks and Their Practical Applications
The fuzzy neural network is used for anomalies detection in Han and Cho [98], Meneganti et al. [99], and learning rules can be visualized in Mahoney and Chan [100]. Expert systems have been approached as a form of knowledge transfer before the 1990s, as in Wiig [101] and in Gang Wang et al. [38] in 2010.
FNNs act on problem-solving with various types of complexity. They can work on simple pattern classification problems by using pruning approaches of their architecture [102] and at the same time, can generate patterns for the evaluation of nonlinear systems [103], time series forecasting [104] and linear and non-linear regression problems [105,106].
Hybrid models also stand out in finding women with breast cancer [107,108] according to characteristics informed by patients by clinical trials. In the same way, it assists in the detection of an autistic trait in children [109] and immunotherapy treatment [110].
More recent work involves the concepts of robot manipulation and control [111], prediction of chaotic series [103], effort forecast in software building [112], anomalies identification's in children and adolescents locomotion [113], absenteeism at work [114].

Network Architecture
The fuzzy neural network described in this section is composed of three layers. Besides, it was derived from the work of Souza [47]. In the first layer, the fuzzification is realized through the concept of the ANFIS model [45] in its version that can generate membership functions equally spaced in the sample space. The association functions adopted in the first layer are of the Gaussian type created with the centers and the sigma values obtained by genfis1 in the generation of the input space granularization. Already in the second layer, the logical neurons of type andneuron [115], and the type proposed by [91] were used. These neurons have weights and activation functions determined randomly and through t-norms (calculated through the product) and s-norms (using the probabilistic sum concepts) for all neurons in the first layer. To define the weights that connect the second layer to the output layer, the idea of an extreme learning machine [46] is used to act on the neuron with a leaky ReLU type activation function. The fuzzy logical neurons are used to construct fuzzy neural networks in the second layer to solve problems of pattern recognition and to bring interpretability to the model through the extraction of fuzzy rules of type IF/THEN. Figure 5 illustrates the feedforward topology of the fuzzy neural networks considered in this article.
The first layer is composed of neurons whose activation functions are functions of association of fuzzy sets defined for the input variables using ANFIS in its approach called genfis1. For each input variable x ij , L membership functions are defined A lj , l = 1 ... L whose association functions are the activation functions of the corresponding neurons. Thus, the outputs of the first layer are the degrees of association associated with the input values, i.e., a jm = µ A m for j = 1 ... N, where N is the number of inputs and M is the number of fuzzy sets for each input result defined by ANFIS [116]. Therefore, this problem of the fuzzy neurons creation in the first layer. It is an exponential problem because the number of neurons is closely related to the relation of the number of functions of pertinence for each evaluated characteristic.
The second layer is composed by L fuzzy logic neurons. Each neuron performs a weighted aggregation of some of the first layer outputs. This aggregation is performed using the weights w il (for i = 1... N and l = 1... L). For each input variable j, only one first layer output a jl is defined as input of the l-th neuron. So that w is sparse, each neuron of the second layer is associated with an input variable [47]. Finally, the output layer is composed of one neuron whose activation functions are leaky ReLu [44]. The output of the model is: where z 0 = 1, v 0 is the bias, and z j and v j , j = 1, ..., l are the output of each fuzzy neuron of the second layer and their corresponding weight, respectively. Leaky ReLu is an improved function of the ReLU function [117] because a small linear component is inserted at the input of the neuron. This type of change allows small changes to be noticed, and neurons that would be relevant to the model are not discarded. Its function is expressed by [44]: The logical neurons used in the second layer of the model are of the andneuron or unineuron type, where the input signals are individually combined with the weights and performed the subsequent global aggregation. The andneuron used in this work can be expressed as [118]: where T are t-norms (product), s is a s-norms (probabilistic sum). The unineuron uses the concepts of uninorm [119] to perform more simplified operations according to the functions of activation of the fuzzy neurons.
Instead of values 0 and 1 for t-norm and s-norm respectively, the neutral element is allowed to assume values in the unit interval. One of the main characteristics of the uninorm is that it no longer has the so-called neutral element, now being called the entity element [120]. Through this identity element (e), the uninorms extend t-norms and s-norms by varying the value e in the interval between 0 and 1 allowing the alternation between an s-norm (e = 0) and t-norm (e = 1). The uninorm used in this work is expressed as follows [120]: Its formatting allows the unineuron to use either concept of a neuron and, or a neuron or. [120] explain important concepts about a unineuron. The processing of neurons occurs at two levels. At the first level of L 1 locations, the input signals are combined individually with the weights. In the second, at a global level of L 2 , a global aggregation operation is performed on the results of all first-level combinations.
Traditional logical neurons use t-norms and s-norms to perform the described operations. The function p is responsible for transforming the inputs and corresponding weights into individual transformed values. A formulation for the p function can be described as [91]: p(w, a, e) = wa +we (6) wherew represents the complement of w. Using the weighted aggregation reported above the unineuron can be written as:  Fuzzy rules can be extracted from andneurons according to the following example: Rule 1 : I f x i1 is A 1 1 with certainty w 11 ... and x i2 is A 2 1 with certainty w 21 ... Then y 1 is v 1 Rule 2 : I f x i1 is A 1 2 with certainty w 12 ... and x i2 is A 2 2 with certainty w 22 ... Then y 2 is v 2 Rule 3 : I f x il is A 1 3 with certainty w 13 ... Then y 3 is v 3 These rules allow the creation of a building base for expert systems [122]. Figure 5 presents an example of fuzzy neural network architecture.

Training Fuzzy Neural Network
The membership functions in the first layer of the FNN are adopted as Gaussian. The number of neurons created with the input data partition is exponential between the number of membership functions and the number of features present in the problem database. The number of neurons L in the first layer is defined according to the input data and by the number of membership functions M), defined parametrically. The second layer performs the aggregation of the L neurons from the first layer through the andneurons.
After the construction of the L fuzzy logical neurons a filter select the 200 most significant neurons (called L s ) like in [47]. The final network architecture is defined through a feature extraction technique based on l 1 regularization and resampling. The learning algorithm assumes that the output hidden layer composed of the candidate neurons can be written as [116]: where is the weight vector of the output layer and z (x i ) = [z 0 , z 1 (x i ), z 2 (x i )...z L ρ (x i ) ] the output vector of the second layer, for z 0 = 1 and sign is a step function that transforms values greater than zero into 1 and values smaller than zero into -1. In this context, z (x i ) is considered as the non-linear mapping of the input space for a space of fuzzy characteristics of dimension L ρ [116]. The sign function is defined by: Subsequently, following the determination of the network topology, the predictions of the evaluation of the vector of weights' output layer are performed. In this paper, this vector is considered by the Moore-Penrose pseudo Inverse [116]: Z + is the Moore-Penrose pseudo-inverse [123] of z, which is the minimum norm of the least-squares solution for the output weights.

Proposed Detection of Cyber Invasions Through Detection of Anomalies Through Hybrid Models and the Creation of Expert Systems
The hybrid system proposes to use fuzzy neural networks and train them with the database that determines patterns of anomalies. Through these standards, the model learns the trends and characteristics of the database, allowing in addition to pattern classification, create an expert system based on fuzzy rules.
The model will have four dimensions (service, duration, bytes received, bytes sent) according to the formatting of bases for the detections of anomalies. These four features will be combined according to equally spaced membership functions. In an example with two pertinence functions for each input of the model, eight Gaussian neurons are generated in the first layer and consequently 16  (5) Construct L fuzzy logical neurons with random weights and bias on the second layer of the network by welding the L fuzzy neurons of the first layer. (6) For all K input do (6.1) Calculate the mapping z k (x k ) using andneurons end for (7) Estimate the weights of the output layer using Equation (11). (8) Calculate output y using leaky ReLU using Equatiojn (1).

Dataset Uses
The dataset used for the experiments in this paper was originally provided in the KDD Cup 1999 and is currently available in the main data repository for machine learning. It contains 41 attributes (34 continuous and seven categorical). However, they are reduced to 4 attributes (service, duration, bytes received, and bytes sent) because these attributes are considered the most basic attributes where only the service is categorical. Using the service attribute, the data is divided into http, SMTP, FTP, FTP data, other subsets. That allows distinct types of attacks to be verified by intelligent algorithms. Here, only HTTP service data is used. Since the values of the continuous attributes are concentrated around 0, we transform each value into a value far from 0, by y = log (x + 0.1). The original dataset has 3.925.651 attacks (80.1%) of 4.898.431 records. A smaller set is forged by having only 3.377 attacks (0.35%) of 976.157 records, where the logged-in an attribute is positive. From this forged dataset, 567.497 HTTP service data is used to construct the HTTP dataset [124][125][126][127][128][129].
The database was selected precisely with the main feature of a cyber attack: a large volume of requests with attacks entered together with them. Thus, protection systems are overloaded and often miss attacks that can compromise system integrity. Therefore, a system that acts dynamically in identifying these patterns, especially as assertively as possible, is necessary for maintaining system integrity. The database provided by the KDD Cup has the characteristics of large-scale attacks as the number of requests is exceptionally high. Moreover, in these requests, there are less than 2% of malicious attacks. Therefore, the database meets the anomaly detection criteria (when the database is hugely unbalanced about its labels) and the large scale criteria for having more than millions of requests.

Definitions and Models Used in the Tests
In preliminary tests were run using 10-k-fold and cross-validation (70 % for training and 30 % for the test) to find the best value of M between the interval [2,5] (values defined by a specialist in problems). Another factor evaluated in the preliminary analysis was the logical neuron used. The test used logical neurons of the andneuron or unineuron type. After performing initial tests, the values of M and logical neuron type that maximize training accuracy and maintain the shortest execution time is M = 3 and the andneuron. Therefore they will be used for the final experiments of this paper. Simulations were performed on a Core (TM) 2 Duo CPU, 2.27 GHz with 3-GB RAM, and the model are implemented and executed in Matlab.
In addition to tests with traditional approaches, the results will also be compared with other hybrid models of neural networks and fuzzy systems, where we can highlight an evolving fuzzy neural network model (EFNN) [135], one that works with incremental fuzzification (IFNN) [136] and one that works with self-organizing fuzzification (SFNN) [137]. All models use the extreme learning machine to define the weights of the output layer, have three layers, and are composed of unineurons in the second layer and Gaussian neurons in the first layer. All hyperparameters were defined using cross-validation in the interval between [3 and 6], mainly in the fuzzification stage.
In (sensitivity + speci f icity) where, TP = true positive, TN = true negative, FN = false negative and FP = false positive.

Results of anomaly detection
The Table 1 presents the results of detecting anomalies. It shows the training percentage, the number of neurons used in the test, the expected results, the time (in seconds). The values in parentheses represent the standard deviation of the 30 random repetitions performed. The results obtained in the literature, highlighting the work of Tan et al. [129] present the results equivalent to the results obtained by the model, placing as the only unattended situation the execution time that was already expected to be high due to great sensitivity to solve problems with a large number of samples that ANFIS presents.
Despite being a disproportionate dataset, the model was very efficient in detecting the anomalies in the 30 tests performed. The standard deviation was very low in all parameters (except time), and the accuracy, sensitivity, and specificity corroborate that the model is a unique identifier of anomalies. What the proposal of this article has of differential are the fuzzy rules generated, and that can serve as the knowledge base for training and another type of dissemination of knowledge. The best results are presented in Table 1 of the accuracy for FNN, MLP, and SVM. However, it is noteworthy that the best results for an anomaly base were the model described in this paper. FNN specificity had better results (measures the proportion of actual anomalies that are correctly identified as such). The execution time of the algorithm has values close to MLP, but its results stand out as the best assertiveness. Regarding SVM, the results are also statistically similar, but their results are not interpretable.
Among the fuzzy neural network models, the model presented in the paper obtained the best accuracy results, despite spending more time in the training phase. That identifies that the FNN has some degree of accuracy, but it still needs adjustments to adapt to the time of obtaining ideal responses. Other models, such as the evolving fuzzy neural network, showed shorter training times, although the results of success were considerably below those of the other models.
The method proposed in this paper works efficiently to determine anomalies in an extensive data set as it has satisfactory results concerning its sensitivity, specificity, and AUC. When a model has high numbers in these three criteria, it means that it is an excellent model in determining class labels, especially when the imbalance is high. When a model is highly accurate but has low AUC numbers, it means that it has not been able to identify the smallest number of labels correctly. As can be seen in Table 1, the fuzzy neural network model proposed in this paper had the best AUC index, which, consequently, can be said to be the model that most identified anomalies in this large volume of requests.
The next topic will present the characteristics of the rules obtained.

Expert Systems in Detecting Anomalies in Cyberattacks Through Fuzzy Rules
The linguistic characteristics adopted for the formation of the rules were defined in consultation with experts in the field.
The Figure 6 shows the ANFIS structure formed with one of the results of the 30 replicates performed with the model. The three dimensions were shaped using equally spaced Gaussian membership functions.
Here it can see the influence of fuzzy inputs for the fuzzy inference system that will generate rules based on dataset knowledge.  The fuzzy rules generated were in 8 in total and are presented linguistically as a knowledge base for the formation of expert systems in the Figure 8. The model of neurons that represent the first layer and fuzzify the input space is described in Figure 9. The decision space that assists in the identification of anomalies is presented in Figure 10. In fuzzy rules, it is possible to identify the decision space of the model by the duration of a request, the number of bytes received and sent. Large-scale cyber attacks work with the methodology of overloading data servers to make them more susceptible to attack. Thus, in Figure 7, it is possible to define the number of elements evaluated in each dimension of the problem for FNN decision making. Likewise, the graphic knowledge of a fuzzy neural network can be presented linguistically and relationally (Figure 8), allowing anyone interested in protecting computer systems to understand when a cyber-attack can occur, even those who are not profoundly knowledgeable. of artificial intelligence. So this is the most significant advantage of the model because it allows the clear and straightforward dissemination of implicit knowledge in a database. In these relationships obtained, it can be seen that the highest correlation between the identification of cyber-attacks is linked to the reduction of the duration of requests tied mostly to a low amount of bytes received.

Conclusions
After the presented results, we can conclude that the fuzzy neural networks used in this paper can act as unique identifiers of anomalies. Because it is an unbalanced problem or more than 99% of the samples are of one category, the model behaved efficiently to identify the anomalies, and in most of the trials, it has found all of them.
About Table 1, it is possible to analyze some aspects regarding the results obtained. All models submitted to the cyber attack classification test obtained excellent results with a balance between the training and test percentage, identifying that none of the models chosen in the test suffered from overfitting. The fuzzy neural network obtained the best training and test percentage, allowing the conclusion that the model has the best ability to identify cyber-attacks in this evaluation. However, it is noted that their training and testing time was longer compared to other models of the cyber invasion test. Because it is a high data volume, multi-layered networks take more time to solve problems (as can also be seen at MLP runtime). Models with the shortest runtimes were not as effective at detecting cyber attacks. This is because when a problem is so unbalanced (only 0.35% of the base contains attacks), the value of specificity (correct prediction of attacks) is one of the most important indices for defining model performance. Therefore, what obtained the best results was the FNN with results close to 99%.
However, it should be noted that even with the high execution time of the algorithm, the results of the model proposed in this paper were the best in the evaluation indexes, adding to the results of the possibility of obtaining knowledge about the attacks and improving the results. Protection devices that operate on cyber threat systems with knowledge extracted from the dataset. Fuzzy rules can be easily implemented in information systems that have logical programming, as can electronic devices that can also be programmable.
The model can be seen as an approach to knowledge management in Big Data since it can extract knowledge from a database and turn it into a set of linguistic rules, more accessible to interpret by people who do not are directly linked to the computer science area. This type of approach assists in the dissemination of intelligent techniques and can contribute to advances in science and the prevention of anomalies.
For future work, the challenge is to decrease the execution time of the algorithm while maintaining its ability to find the anomalies. Other techniques of fuzzification and training can be tested, as well as the comparison of other intelligent models so that comparisons can be made in different contexts of artificial intelligence. Another factor that can be taken into account for future extensions of this study is linked to the identification of anomalies in several types of cyberattacks, with more current databases, as reported in the work of Rupa Devi and Badugu [138]. Acknowledgments: The thanks of this work are destined to CEFET-MG and UNA.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: