Design and Application of a Case Analysis System for Handling Power Grid Operational Accidents Based on Case-Based Reasoning

: In recent years, power grid accidents have occurred frequently and higher requirements have been placed on their safety operation. In current safety management in the world, there is an e ﬀ ective practice that uses a uniﬁed standard for structuring an accident case database and based on that database, conducts quantitative analysis to cope with accident risks. However, that is not the case for power safety management. Case-based reasoning (CBR) is such a process that solves new problems based on the solutions to similar past problems. It works by matching a current problem with historical cases and solutions in a database, in order to obtain similar case solutions or inspirations. In the matching process, if necessary, such past solutions may be modiﬁed in order to better adapt to the current actual problems. Based on the CBR method, this paper proposes how to construct a case database of power grid operational accidents, provide data support for management of power grid risks and provide knowledge services for accurate grasping of grid accident development dynamics and making quick decisions to rapidly response to the emergencies. First, it designs an operational accident case database after considering the following three aspects: case features, power grid features and accident features based on safety management theory. Secondly, in terms of how to use the power grid operational accident case database, it proposed a two-level search strategy, as well as the corresponding similarity calculation methods for di ﬀ erent feature attributes of the case. Finally, it carried out a demonstration to verify the model by selecting four typical grid accidents. The grid database and CBR strategy proposed in this article could help China’s power grids practice intelligent analysis of grid operational accidents and improve digitalization in safety management.


Introduction
Electricity power is vital to the commerce and daily functioning of the world. An electric power grid comprises all of the power plants generating electricity and the transmission and distribution lines and systems which transmit power to end-users. The grids also connect many publicly and privately-owned electric utilities and other wholesale power companies [1]. The trends in developing power grid are digitalization, intelligence and Internet. This brings new challenges and opportunities to a grid's security management. In recent years, as China's power grids keep growing bigger and bigger and become more and more complex, it has become increasingly difficult to ensure safety and stable operation of the power grids and to avoid large-scale accidents. In the process of smart power grid construction, systems with efficient and interactive dispatching and control can ensure safety and efficient operation of power grid and this also puts forward higher requirements on the safety management of power grid. In the United States of America, there have been increasing reports about foreign hackers' targeting [1].
In terms of power grid safety management, typical cases have always been the main research objects. By analysis and study of major accidents it can help explore dangers hidden in the safety management and emergency decision-making of the state government and enterprises and provide suggestions and proposals for the safety supervision and early warning. Yi Jun [2] and Li Baojie [3] have put forward technical, institutional and management suggestions and measures for improving operation of China's power grid dispatching, adjusting and maintenance of security control devices. However, while their research has qualitatively analyzed the problems in China's power safety management, it lacked quantitative measurement approach. They also pointed out that the direction for the future of research and safety management is digitization.
From the perspective of informatization, the current informatization in the power safety management is mainly applied at the level of safety supervision, such as the standardized safety supervision management and control system as applied by Beijing Electric Power Company in its various departments in 2016. However, informatization does poorly at the level of safety management [4]. In terms of power safety management, China still shows a large gap compared with the advanced countries such as the United States and other industries in China [4,5]. From the perspective of management of power grid operational accidents, the current research focuses on specific cases, so it lacks a systematic approach and universal relevance and the research results can only be applied to specific circumstances, environments and conditions. This cannot produce a systematic, overall solution. The main reason for this lack of universal relevance is due to the extreme lack of sufficient accident data, resulting in that the research can only focus on case analysis. Although typical accident case database has been built for such fields as coal mine [6], public management [7], chemical industry [8] and construction [9][10][11], from the perspective of application of accident analysis methods, at present in the field of power safety management in China, qualitative analysis is still the main approach that is taken by operators, while quantitative approach is far lagging behind. Generally in traditional practice, after an accident breakout in China, a special team composed of experts will be organized to review and inspect the accident, find out the problems, analyze the causes behind it, draw lessons and then form a solution and write a report. This method can only perform post-accident analysis of the accident and cannot achieve monitoring and early warning before the accident happens. This model can be hardly to adapt to the era of big data and artificial intelligence for digital security of power grids [12]. Therefore, it is necessary to study the quantitative methods for quantitative analysis of accidents.
The Case-based reasoning (CBR) is a research area in Artificial Intelligence (AI). In 1982, Professor Roger C. Schank of Yale University proposed the "Dynamic Memory" theory with "Memory Organization Packets" as the core [13]. As a knowledge management method in AI, the key point of CBR is to use the experiences learned from previous lessons to solve new problems [14,15]. It is a method widely used in the emergency decision-making [16], fault diagnosis [17] and medical fields [18]. For emergency management decision-making for electric power accidents, the CBR-based approach can be used to retrieve similar accident cases, dig out historical successful experiences and provide guidance for handling new accidents.
From the above literature review, it can be seen that currently China's research on the risk and accident management of power grid operations is still based on the qualitative analysis of individual cases. In contrast in the quantitative research, it is mainly to collect historical accident text report for statistical analysis. Due to lack of accident cases and lagging of database construction, the use of historical accident cases for prediction and optimization is still in the initial stages of research. But with the development of artificial intelligence and the increase of accident case data, how to standardize the structured and unstructured data of accidents and establish quantitative models for early warning and decision-making has become a research hotspot. This paper proposes the architecture of a power grid operational accidents information system based on CBR, designs the structure of the grid operational accident case database and studies the case retrieval strategy, which can offer data support for the management of power grid risks. The proposed CBR information can find out similar accident cases from the power grid operational accidents database and help decision-makers to make rapid decision for upcoming accident. This helps realize the transition from post-event management to pre-event and in-event management of grid operation accidents.
The structure of the paper is as follows. In the Section 2, based on the CBR theoretical framework, it proposes a systematic framework for the case analysis system for handling power grid operational accidents. In the Section 3, based on the database theory and by referring to the accident database structure of other industries, it designs the case database structure of power grid operational accidents. In the Section 4, three CBR-based power grid operational accident case analysis models are constructed. In the Section 5, a case analysis is performed to demonstrate and verify the effectiveness of the proposed method. In the Section 6, the discussion and management implication are given. In the Section 7, the full text is summarized and the future research work is prospected.

Framework of Case Analysis System of Power Grid Accidents Database
CBR is generally composed of four parts, including case search, case reuse, case revision and case update. It is a method to search similar cases and solutions retrieved in the past, extract the features of new cases, revise and update the case database when the existing cases cannot be exploited to tackle the new problems. Subsequently, the concept of case representation [19] is introduced to optimize the solutions to the problems that are not distinguished as per the original theory, so as to enhance the solutions to solve complex problems.
Based on the idea of CBR, there are three parts in the process of building a framework for a power grid accident analysis system, including database form design, case collection and database application (as shown in Figure 1). For database form design, it is based on the case representation method and the features of power grid accidents to determine the database form and corresponding meanings. For case collection, it is to obtain the data of power grid accident cases from literature, news, newspapers and other data sources through web crawlers. For database application, it is to analyze current operation state of a power grid, extract its corresponding features, search similar cases and pick up the ones with the highest similarity in the database and apply them as the experience guidance for decision-making. can offer data support for the management of power grid risks. The proposed CBR information can 100 find out similar accident cases from the power grid operational accidents database and help decision-101 makers to make rapid decision for upcoming accident. This helps realize the transition from post-102 event management to pre-event and in-event management of grid operation accidents.

103
The structure of the paper is as follows. In the second section, based on the CBR theoretical 104 framework, it proposes a systematic framework for the case analysis system for handling power grid

Database Design
Case representation is the basis of CBR and good case representation can not only accurately reflect attributes of similar cases but also facilitate realization of subsequent reasoning analysis [20]. Generally speaking, the case representation methods include semantic network representation, process representation, framework representation and object-oriented representation [21]. Specific representation method needs to be selected in accordance with the attributes of the case.
Generally speaking, an accident case should include accident time, place, actual loss, casualties, handling measures and accident causes. However, due to the large number and types of the feature attributes of power grid accidents, how to select out typical attributes is a key for successful construction of a case database.
According to research by many scholars on the mechanism of power grid operational accidents, it is concluded that the power grid operational accidents would show the following features.
First, most of the power grid accidents are caused by chain failure, that is, a chain reaction to other components' shutdown in the power system [22,23]. The development process of chain failure is like the followings: when there is an initial failure in the system, the power grid would experience a wild range of power flowing, transferring and component overloading, which leads to electric line overloading, tripping or malfunction protection. Then, the power grid could deteriorate in an accelerating way, the failure of cluster and distribution could expand, the self-critical state of cluster could occur and finally, the operational accident occurs. Thus, to identify fault components is the first step for handling power grid emergency.
Second, since operational accident is a dynamic evolution process and each of which will go through the process of occurring, developing, evolving and recovering, it is also necessary in the process of case recording to analyze the accident by dissecting it into several consecutive or parallel segments.
Therefore, in addition to the basic information attributes of power grid accidents, the evolution and development features will also determine the process attributes of power grid accidents. In this paper, according to the features of power grid operational accidents, the frame representation is used to standardize and express the accident cases. In the process of case representation, frame representation can combine declarative knowledge with process knowledge, which has strong adaptability. At the same time, as a kind of knowledge representation method with high generality and good structure, framework representation decomposes the case into a network of nodes and relationships to represent the case features in the form of framework. The accident case database structure is shown in Table 1.

Power Grid Accident Data Collection
The data of major safety accidents are mainly sourced from newspapers, radio and television. In recent years, Internet has also become another main source to collect accident data [24]. Therefore, in addition to the accident data recorded in existing books and periodicals, we can acquire accident news reports from the Internet by web crawlers as a supplement.
Web crawlers are an effective tool for users to obtain information resources from the Internet. By writing programs to simulate the process of browser's Internet access, the Web crawlers can obtain the desired and required data from the Web. Currently, there are four main types of web crawlers, including general crawlers, focusing crawlers, incremental crawlers and deep web crawlers [25]. In the process of obtaining accident news, focusing crawlers are mainly applied. The web pages highly related to the topic of the power grid accident can be determined in advance, so it is unnecessary to crawl all the pages during the crawling process. In the process of crawling web pages, the links with higher relevance to the topic are put into the crawling queue according to the priority and when the certain conditions are met, the crawling is stopped. Therefore, the downloading of irrelevant pages can be reduced as far as possible and the crawler efficiency can be improved.

CBR-Based Case Analysis Model Regarding to the Power Operational Accident Database
When using the case database, it is necessary to search similar cases and their solutions in the past based on the features of current problem and apply the case that is with highest similarity to solve the current problem and accident. To do so, in the use process it needs to solve two key problems: one is case search strategy and the other is case similarity calculation.

Search Strategy
There are many attributes of the power grid accident case database. As the number of cases increases, the difficulty of searching similar cases will increase naturally. Search strategy is one of the important issues in the application of CBR [26,27]. Therefore, in the search process, this paper proposes a two-level search strategy which enables you to filter through the first-level search and then complete the case matching after completing the second-level search.
The first-level search uses significant case attributes of high efficiency classification. Therefore, some attributes are selected as the first-level search attributes, such as faulty component, voltage level, weather... Before the cases in the case database are screened out, the attributes of the target case should be found exactly identical with them. In case such attributes are different, the similarity between the cases is of almost no reference value for subsequent accident handling.
The second-level search applies fuzzy matching [28]. When the first-level search selects some cases, the cases with higher attribute similarity are obtained. At this point of time, the remaining attributes, such as weather, accident process description and accident level can be used as the second-level search input. Based on the known feature of the new case to perform a similarity calculation, the case whose similarity reaches the screening threshold is thus selected.

Similarity Calculation
When calculating similarity, the attribute similarity between cases is calculated firstly based on known features of the new case and then the overall similarity between the cases is obtained according to the calculation.
In the calculation of attribute similarity, different attribute types are calculated differently. At present, the attribute types involved in the power grid accident database include four types of numerical value, symbol, set and text.
When the type of feature attribute is numerical value: When the type of feature attribute is symbol: When the type of feature attribute is set: Of which: | A ij ∩ B j Refers to number of the same elements of value of attribute j dereference in case A i and case B; | A ij ∪ B j Refers to number of all elements of value of attribute j dereference in case A i and case B. The above three attribute types are all have structural features and it is quite simple method for the similarity calculation. And of the power grid accident case database, the cases are all stored in a semi-structural way. However, in addition to above three attribute types, some of the feature attributes are stored in a textual way and its similarity calculation is quite difficult.
At present, the calculating method for textual similarity includes a string-based method, a corpus-based method, a knowledge base-based method and a mixing method [29]. The corpus-based method and knowledge base-based method cannot be used when there is no sufficient accident data and when the textual similarity cannot be calculated. Therefore, the cosine similarity method in string-based method is mainly applied in textual similarity calculation. From the perspective of string matching, the string co-occurrence and repetition degree are used as the criteria to calculate the similarity.
If the calculated target is a quite short text, such as the social impact and direct cause of the accident, then the calculating procedure is: firstly, divide the complete sentence into independent word sets according to the word segmentation algorithm through Chinese character segmentation; then the union of two word set can be obtained; and then the word frequency of each word set is obtained and the word frequency is vectorized; finally, calculate it according to vector calculation model and the text similarity can be found.
Calculate formula of vector is: If the calculated target is a long text, such as the accident process description, then the calculating procedure is: firstly, find the keywords of the respective paragraphs and synthesize a word set; then, find the union of the two word sets; and then, calculate the word frequency of each word set and vectorize the words; finally, calculate according to the vector calculation model and find the text similarity.
When the feature similarity is obtained, the first and second search attribute similarities and overall similarity among cases would be obtained: of which: Sim 1 indicates the similarity of first search attribute; k is the number of first search attributes; Sim 2 indicates the similarity of second search attribute; m is the number of second search attributes;

Research Context and Case Data
In the third and fourth chapters, the database design and application methods are introduced respectively. In this chapter, four typical power grid accidents will be selected to experimentally verify and demonstrate the design and application methods of case database and the specific content of the cases is attached [30,31]. Among them, the cases A 1 , A 2 and A 3 are cases existing in the database and the case B is a new case. By using the case-based reasoning method [15], the knowledge and experience of handling the case B emergency may be obtained from the first three cases.
According to above search approach, the field of first-level search should be firstly determined. According to description of the fourth case, it could be initially judged that bad weather may be the factor that caused the power grid shutdown; therefore, the weather attribute is used as the field for the first-level search. This is shown in Table 2. Table 2. Case attributes of first-level search.
Weather Snow Snow Thunderstorm Snow

Data Analysis: Applying CBR
The weather field is symbol attribute and according to the formula, it is calculated that Sim 1 (A 1 , B) = 1; Sim 1 (A 2 , B) = 1; and Sim 1 (A 3 , B) = 0.
Based on the principle of exact matching, case A 1 and case A 2 among the three cases are selected for the second-level search.
We know that in the actual work to cope with an accident, it is necessary to make an initial judgment on whether it is due to failure of the power grid equipment and this judgment is often expressed in text. Therefore, in the second-level search, the attribute of device fault condition can be used as the case attribute to undergo the fuzzy matching.
Equipment failure is a short text and similarity can be calculated by using cosine similarity as shown in Table 3. After the words are segmented, the word set is vectorized and the Sim 2 (A 1 , B) = 0.69 and Sim 2 (A 2 , B) = 0.63 can be obtained according to the calculating model (4). Table 3. Case attributes of second-level search. After that, the reasoning result of case is verified as feasible according to accident case description, namely, the case A 1 and the case B are verified as the most similar cases and the experience of A 1 accident handling can be referred to. These four cases are all power grid accidents in bad weather. Cases A 1 , A 2 and B are all occurred in rainy and snowy weather as shown in Table 4. While the case A 3 occurred in thunderstorm weather. As a result, the case A 3 should be excluded firstly. According accident description, the accident cause of rest cases can be summarized. According to the result of the cosine similarity calculation of the indirect cause field attribute, Sim(A 1 , B) = 0.67 and Sim(A 2 , B) = 0.5, therefore, A 1 is the case whose accident cause is more similar to case B. This result is consistent with the case-based reasoning result, which verifies the feasibility of the case-based reasoning method.

Discussion
With the development of the Internet of Things, big data and AI technologies, China's power industry has proposed to construct a ubiquitous Internet of Things. From the perspective of power safety management, how to digitize the information of grid operation accidents and scientific accident management are important issues worth studying.
Different from previous studies, which mainly involved the qualitative analysis of accident information, this research analyzes the features of grid operational accidents, quantifies and standardizes them, forms a standardized database structure and considers the complexity of grid operation accident management and propose a two-level case retrieval strategy.
First, the current grid operation accident management is mainly based on post-event analysis and diagnosis, which provides a reference for future grid operation safety management. In this paper, we put forward the idea of using CBR case data for decision-making support, built a framework of CBR decision support systems for grid operation accident cases.
Second, the current data of grid operation accident is mainly in the form of text, such as accident process descriptions, analysis of accident cause, countermeasures for similar accidents and future preventive measures. These kinds of qualitative accident information are convenient for people to read and handle but it is difficult for machine to make scientific prediction and prevention of accidents. This paper puts forward the idea of constructing a grid operation accident case database, using CBR to quickly retrieve grid operation accidents to assist decision-making and designing a structure for the accident case database according to the grid operation risk features and management requirements.
Moreover, how to find similar cases in the case database is the core of CBR decision support system. Due to the suddenness of power grid operation accidents, the information is produced mainly in the textual form and has the attributes of multiple sources, multiple layers and complexity. This brings difficulties to the analysis of grid operation accidents. Based on the similarity calculation of classic CBR case retrieval, we propose a two-level retrieval strategy and give a method of similarity calculation during two-level retrieval. In this way, it is possible to carry out a hierarchical search of power grid operation accident cases of text information, find out similar accident cases, find out factors behind accident occurrence and make rapid decision support for upcoming accident.

Managerial Implications
The major findings of this study also have significant managerial implications for grid security management. The current approach of power grid operation safety management is mainly after-the-fact management, that is, after breakout of a grid operation accident, collect relevant information, analyze the accident, find out the cause of the accident, summarize the lessons and apply it in the future power grid operation safety management. However, according to Heinrich's Law, any system accident has its own internal laws [32]. Enhancing safety management may reduce the likelihood of accident breakout or delay occurrence of accidents but it cannot completely prevent them.
With the development of information technology and safety management theory, it is urgently needed to develop a system for security operation of power grid that shifts from the traditional post-accident management mode to the early warning and management control mode. One development of China's power security management in the 2020s is to build an intelligent power security system. This paper studies the decision support system for grid operational accidents based on CBR, designs a database of grid operation accident cases and gives a two-level case retrieval strategy. In the sense of management, (1) this helps realize the transition from post-event management to pre-event management and in-event management of grid operation accidents. (2) This is an extensible framework, which can apply new generation information technologies such as artificial intelligence to the grid operation accident management, to realize the scientific grid management. (3) This is a way to achieve an intelligent security system in the field of power grid security management.

Conclusion and Future Work
Based on the qualitative analysis of China's power grid operational accidents, this paper designs the power grid accident CBR framework and the database structure of power grid operational accident cases; establishes a case reasoning analysis model for handling grid operational accidents. The main conclusions we made are as follows: (1) The framework for analysis of power grid operational accidents is proposed, which include three parts: case data collection, case representation and case application. (2) Considering attributes of a power grid, attributes of the grid's operational accidents and attributes of the cases, the structure of operational accident case database is designed. This database can support semi-structured data storage. In this way, the structured attributes of the operational accidents are retained in the storage and the unstructured attributes such as the accident evolution process and the accident cause are recorded in text form. (3) In the application process of the case database, in order to improve the data retrieval efficiency, this paper proposes a two-level data retrieval strategy for CBR analysis. The first level uses exact matching and the second level uses fuzzy matching. In addition, for the case similarity calculation, a handful case similarity calculation methods based on attribute similarity are given. The second-stage matching is used to calculate the comprehensive similarity of the cases. For different attribute types, different similarity calculation methods are given.
Regarding future work: (1) In terms of data collection, this paper proposes the idea of a themed crawler. Later, the corresponding theme crawler can be designed based on the features of the grid operational accidents. (2) In terms of case retrieval, in view of the current situation where there is little case information but too much text information in the case, this paper proposes a comprehensive calculation method of case similarity considering text attributes. With the advancement of power safety informatization and the construction of a case database system, when accident cases reached to a certain number, a corpus of grid operational accidents can be formed and a distribution-based representation method and an ontology-based method are used to improve accuracy in the similarity calculation. (3) This paper mainly proposes the structure and analysis model of CBR-based grid operational accident case database. In the future implementation process, it is also necessary to implement the system through engineering and apply it to grid operational accident management and control.