Towards a Science Map on Sustainability in Higher Education

: This article analyses how the concept of sustainability is being incorporated into global research of higher education. This study utilizes di ﬀ erent scientometric reviews of global research between 1991 and 2018 using text mining techniques in order to generate ﬁrst and second-generation bibliometric indicators, the latter are displayed in science maps. A total of 6724 articles and conference proceedings were collected from the Web of Science and Scopus databases to generate this research. From the results obtained, it was possible to build a canvas of the main institutions that have signiﬁcantly contributed to the topic of sustainability in higher education, and it was found that 40.58% of the records originated in institutions from the United States, China, United Kingdom, and Australia. This study also provides an insight into emerging trend themes, and patterns of research in the area of sustainability worldwide. Terms such as regional planning and environmental protection inside the top keywords found, suggest a greater interest in issues of sustainable planning and social awareness and that higher education is becoming the cornerstone of environmental awareness, innovation, and guidance to achieve sustainability goals in higher education institutions, as well as in society and government.


Introduction
Sustainability is an issue relevant to all (organizations and individuals) due to their interaction with the environment. Environmental awareness is fundamental when making decisions about sustainability [1]. Thus, according to the United Nations, education is the basis for improving the quality of life and sustainable development of society [2]. Initiatives such as the Kyoto Declaration, the Global Higher Education for Sustainability Partnership (GHESP), the Turin Declaration, among others [3], have established a framework for integrating sustainability into higher education. Aleixo outlined some of the major challenges for the implementation of sustainable development and the role of higher education [4].
Sustainability in higher education embodies a multitude of approaches, Ramos compares 33 publications that denote efforts in campus operations, sustainability reporting, organizational change management and curricula development between others [5]. Therefore, several authors have evaluated sustainability in higher education from a dimension perspective, for example, Lozano [3,6] defines the following five dimensions: (a) education (referring to courses and curricula), (b) research,

•
To identify the stages of evolution of the field • To identify the transformation in the main research topics over time.

•
To determine the main authors, affiliations, and countries that have contributed the most to the scientific production of the field.

•
To visualize the progress of collaborations between institutions • To show the development of research topics • To detect the main groups of collaboration

Methodology
Tech mining methodology employs the text in the records to combine documents from different scientific and technological databases and make bibliometric analyses [18,19]. It analyzes bibliometric elements such as author's name, author's keywords, affiliation, among others, as well as the interaction between them [16].
This study was conducted according to the text mining methodology proposed by Porter and Cunningham [15]. It was enhanced with the social network analysis technique to generate a science map, as shown in Figure 1. The following sections describe the actions taken at the different stages of the text mining process.  For the collection of scientific and technological information, an advanced search strategy was 99 required, according to what is considered good practice [19,26]. Databases that enable this type of 100 searches, and that were used as information sources in this study, were: Web of Science (WoS),

102
The data sample was obtained from the search executed in all databases mentioned above (see Figure A1, Appendix A), according to the combination of the following elements:

105
• File type: article and conference proceedings.

107
• Boolean operators: OR and AND.

110
After comparing the results obtained in each of the different databases, it was concluded that 111 the information acquired from the WoS and Scopus databases was the largest amount of Figure 1. Methodology of the current study (text mining and social network analysis). Based on [15].
On the one hand, the analysis of a single bibliometric element gives rise to first-generation indicators. On the other hand, the analysis of the relationship between two elements produces second-generation indicators, for example, co-occurrence maps or co-citation maps [20]. A bibliometric map allows second-generation indicators to be seen in a simple way. It creates a network where it is possible to study the relationship between the immersed elements through the Social Network Analysis (SNA) [21][22][23]. Within the network, the nodes represent bibliometric elements and the edges being the interactions between the two connecting elements [24]. Depending on the element that forms the network, it is possible to perform different types of analysis, such as geospatial, temporal, topic, workgroup or modeling [25]. When this method is applied to a scientific field, it is called a science map and its use facilitates the visual representation of the evolution of the field and the main immersed actors [16,17].

Data Identification, Selection and Collection
For the collection of scientific and technological information, an advanced search strategy was required, according to what is considered good practice [19,26]. Databases that enable this type of searches, and that were used as information sources in this study, were: Web of Science (WoS), Scopus, EBSCO, Science Direct, Emerald Insight, Education Database (ProQuest) and ERIC.
The data sample was obtained from the search executed in all databases mentioned above (see Figure A1, Appendix A), according to the combination of the following elements: • Keywords: sustainability, sustainable, "higher education" and "academic career". • File type: article and conference proceedings. • Specific fields: title, abstract and author's keywords.

•
Boolean operators: OR and AND.
Publication year between 1991 and 2018.
After comparing the results obtained in each of the different databases, it was concluded that the information acquired from the WoS and Scopus databases was the largest amount of representative data in the field. In addition, these two databases have been frequently used in the creation of scientific maps, together with Google Scholar [27,28]. Using Query (1) in the core collection of WoS, a total of 3,024 records were recovered, whereas 3,700 records were obtained from Scopus using Query (2) in the advanced search option. TS = ((sustainability OR sustainable) AND ("higher education" OR "academic career")) (1) TITLE-ABS-KEY ((sustainability OR sustainable) AND ("higher education" OR "academic career")) The data was downloaded in "delimited Tab UTF-8" format from the WoS database, while "RIS" format was the option given in Scopus. With these actions, the data collection stage was completed, see Figure 1.

Treatment of Records and Fields
This section describes the stages of data cleansing, basic analysis, advanced analysis, and visualization, which were assisted by the specialized text mining software Vantage Point (VP) version 11 [15,29]. Firstly, the data obtained from WoS and Scopus were merged to generate a sample of 6724 records. Subsequently duplicated records with the same title and summary were eliminated. The fields: title, country, affiliation, author, publication year and keywords of the remaining records were first processed by using the cleaning command "list cleanup" incorporated within VP, and then a specific "matching rule set" was used in each field: • "organization name (depth ignore)" for affiliation and title • "person names" for author names • "general" for country, author's keywords and publication year In each case, it was necessary to supervise the links suggested by the VP program, particularly when cleaning titles, affiliations and author's keywords, where considerable manual effort was necessary to achieve the desired results. As a final step, duplicated titles were eliminated yet again. At the end of the cleaning stage, a total of 5,074 records were obtained, which is equivalent to a reduction of 24.53% of the initial sample.
For the analysis of first-generation bibliometric indicators, the distribution of scientific production was generated and the appearance of the author's keywords for the first time each year revealed the evolution of the area. Furthermore, main authors, affiliations and countries were identified according to the contributions made from 1991 to 2018.
In the advanced analysis stage, which corresponds to the second-generation indicators, a co-citation map of authors and co-occurrence maps of keywords and affiliations were created. The latter ones were created from dynamic co-occurrence matrices, while the co-citation map was elaborated with a static autocorrelation matrix. The different matrices were exported to Gephi, where the visualization step was carried out.
The maps were elaborated with a representative subset of the bibliographic elements, which were created by choosing the main 500 author's keywords, the main 762 author's affiliations or the main 204 authors. The analysis was focused on the growth of the field, i.e., between 2000 and 2018. In the case of the co-occurrence maps, it was possible to segment the time in two periods 2000-2009 and 2010-2018. In order to achieve this, it was necessary to employ the "time interval" filter in Gephi, and defining the starting and finishing years of each period. In the different maps, the "average grade with weight" was used to assign the node size, which is proportional to the participation within the network. The edges show the interactions of two elements in a common document. Therefore, if the edge is thicker, the interaction between two elements of the graph is stronger.
With the purpose of clearly presenting the evolution of the different bibliometric maps, the elements with similar characteristics were grouped by using the community detection functions of social network analysis, in the particular, case of the co-occurrence map of author's keywords, the areas proposed by WoS and Scopus were not used because they are based on different categories. The principal components decomposition (PCD) was tested, which generated 26 factors. However, one single group had 96.8% of the keywords, which was not useful for the purposes of the analysis. Some of the community detection methods available in Gephi include the Leiden algorithm, the Girvan-Newman clustering, and modularity. The application of the Leiden algorithm and the Girvan-Newman clustering on the graph of the author's keywords generated a dominant community that agglutinated at least 97% of the elements. For its part, the modularity function segmented all the elements in six communities with close distribution percentages when using the default settings. After considering the results obtained by the three different grouping methods, it was decided to implement modularity in all cases. Modularity groups elements within communities according to a quality index for a partition of a network [30][31][32], although it is a widely used method to detect communities, it poorly identifies small modules when a group depends on the total size of the network and the degree of interconnection of the modules [33].
In the case of the science map of the author's keywords, its number of triangles was used as a numerical metric to account for the evolution in each time segment, so that a node with a greater number of triangles is related to a greater influence within the community [34]. To activate this index in Gephi, it was necessary to execute the function "average clustering coefficient" in each period of time.
As SNA results depend on input data [25], special emphasis was placed on the stages of identification, selection, collection, and data cleansing within the text mining process.

Results
It was found that the field's scientific production was 5,074 records from 1991 to 2018 (see Figure 2) and its distribution shows an exponentially increasing trend, which coincides with the exponential growth law proposed by Price [35], although in 2018 it showed a slight decrease of 4.8% over the previous year. The function is shown in Equation (3), and its squared coefficient (R 2 ) was 0.9629.

188
Analyzing the terms that appeared under the author's keywords for the first time related to the 189 topic, such as Sustainability and University sector, it was found that they emerged separately in 1999,

190
whereas the keyword Sustainability in higher education appeared for the first time in 2006.  Another way to identify the evolution of the field is through the appearance of new terms per year. In this case, author's keywords were analyzed, since they are the ones that best define the content of an article or conference proceeding. As shown in Figure 2, the appearance of new author's keywords showed rapid growth from 1999, which reaffirms that this is a scientific field in continuous evolution.
Analyzing the terms that appeared under the author's keywords for the first time related to the topic, such as Sustainability and University sector, it was found that they emerged separately in 1999, whereas the keyword Sustainability in higher education appeared for the first time in 2006. Table 1 presents the top five authors' keywords with the highest number of appearances over time. When analyzing the appearance of new keywords by periods, it can be noticed that in the period 1991-1999 on average 14 keywords appeared for the first time per year. In contrast, during the period 2000-2009, an average of 222 keywords per year produced, which represents an increase of 1585% over the first period. However, the 2010-2018 period recorded an average of 1029 new appearances per year, which represents an increase of 464% over the previous period. Regarding the countries that have significantly contributed to the scientific production of the field under study, the U.S.A., U.K., China, and Australia stand out from the 131 countries analyzed, with 40.58% of the records and more than 400 documents each, see Table 2.               (Figure 3). Moreover, the period exhibits 760 nodes and 1741 edges (Figure 4)    The affiliation of the authors revealed that 2,996 institutions are involved in the topic. However, it should be noted that the study considered institutions with branches in different countries as different affiliations, for example, the Royal Melbourne Institute of Technology with headquarters in Australia, Malaysia, and Sweden.
In order to create the technological landscape of sustainability in higher education after the analyses conducted, it was decided that the bibliometric maps would be created with the most representative elements of the fields affiliations, keywords and authors during the period 2000-2018, which according to the results, correspond to the stage of growth of the field. The first technological canvas shows the participation of institutions with more than three records. Therefore, the representation of the

207
In order to create the technological landscape of sustainability in higher education after the 208 analyses conducted, it was decided that the bibliometric maps would be created with the most   After comparing the maps showed in Figures 3 and 4, it can be observed that between 2000 and 2009 there was marked cooperation between institutions in Australia, Brazil, the United States of America, Japan, and the United Kingdom. The greatest number of interactions in this first period, however, took place between Wageningen University in the Netherlands, Florida Gulf Coast University in the U.S.A. and Earth Charter Initiative (civil society) in Costa Rica. It should be noted that interactions occur between the communities themselves (edges and nodes of the same color). In this period, the following institutions stand out: The University of Technology (Australia), Leuphana University of Luneburg (Germany) and Griffith University (Australia) with more than 30 participations in different works.
In contrast, during the second period (2010-2018) 86.84% of the institutions had at least one interaction with another homolog, and the interactions expanded to neighboring communities. On the second canvas, notably the Metropolitan University (UK), University of Technology (Australia), Griffith University (Australia), Leuphana University of Luneburg (Germany) and Arizona State University (USA) were the institutions with the highest activity, with more than 30 participations in common, each with different affiliations. The last two names belong to the top 10 institutions with the highest contributions to the field (see Figure 5). These elite institutions showed greater participation in scientific production after 2010 with an upward trend, although many of them decreased their activity in 2017.  Similarly, a bibliometric map of the author's keywords was used to study the different aspects of sustainability in higher education. A total of 11,611 author's keywords distributed in 5074 records were counted, 73.79% of which were only used in a single record and 364 terms were used in 10 or more records. For a clean visualization, it was decided to create the science map with the top 500 authors' keywords and to observe their evolution during the periods 2000-2009 and 2010-2018. As a result, the co-occurrence map generated 500 nodes, 15,761 edges, and six communities. The first period contained 398 nodes (79.6% of the chosen words) and 4624 edges (interactions), see Figure 6. In contrast, the period 2010-2018 presented 499 keywords (99.8%) and 13,534 edges (Figure 7), which represents three times as many interactions compared to the previous period.

255
What is more, the main 30 authors' keywords by period were identified according to the 256 weighted degree, as presented in Figure 8. This includes the principal components decomposition 257 (PCD) and the number of triangles of each keyword.

255
What is more, the main 30 authors' keywords by period were identified according to the 256 weighted degree, as presented in Figure 8. This includes the principal components decomposition 257 (PCD) and the number of triangles of each keyword. What is more, the main 30 authors' keywords by period were identified according to the weighted degree, as presented in Figure 8. This includes the principal components decomposition (PCD) and the number of triangles of each keyword.

260
The  The period 2000-2009 shows that research focuses mainly on education and sustainable development. In turn, these nodes are related to author's keywords such as energy efficiency, climate change, campus sustainability, environmental management, agriculture, recycling or waste management to name a few, which shows a vision of caring for environmental issues. In the ranking of author's keywords corresponding to the second period, it can be observed that some terms remain in the same positions of the table, some others go down (down arrows) or up (up arrows), and some new appearances occurred (12 new keywords marked with stars).
In the other hand, keywords such as Institutions of higher education, university, innovation, and Sustainability stand out in the ranking, since they show an increase of five times more in their numbers of triangles with respect to the first period. The presence of the keywords education for sustainable development, university sector, universities and sustainability education confirm that higher education institutions play an active role in the development of the professional profile and actions within the institution itself to achieve sustainability, because these are connected to: engineering education, curricula, curriculum, leadership, competences, sustainable university or decision making, to mention some. The appearance of the terms regional planning and environmental protection inside the top keywords suggests a greater interest in issues of sustainable planning and social awareness because they are linked to nodes of educational management, developing countries, social responsibility, greenhouse gas or societies, and institution, to name a few.
Another significant aspect of the analysis of keywords is the presence of economics and information management that strengthen the aspect of innovation within the field since they are linked to nodes such as knowledge management, information and communication technologies, strategic planning, technology transference, creativity or university students.
Continuing with the analysis of the data, an autocorrelation map of authors, with a minimum of four documents, was created to illustrate the network of collaboration between them. The graph generated 204 nodes, 304 edges and 66 communities ( Figure 9). The map depicts seven communities with more than 6 members each, located on the periphery of the canvas, while the authors with fewer collaborations are located in the center of the image. Likewise, it is possible to observe the interaction of five of the larger groups (left side of the map), in which the top 11 authors are present (see Table 3). In contrast, the other two large groups have no interaction with any other neighboring communities (right side of the image). The largest community is illustrated with purple nodes and includes several of the field's elite authors, the second largest community in the network (green nodes) corresponds exclusively to authors from the Asian region. Walter Leal Filho, who is affiliated to a German institution, is the most prolific author in the field, one-quarter of the 11 most prolific researchers are affiliated to an institution located in the United States of America. 295 Figure 9. Map autocorrelation of 204 authors with at least four records.

Discussion
The findings of this study suggest that the period between 2010 and 2018 was the most productive in the area of sustainability in higher education, as demonstrated by the propagation and increase of both the interactions between institutions and author's keywords. The latter could be analyzed numerically with the rise in the number of triangles of each word. Although the scientific production decreased slightly in 2018, compared to the previous year, the data reported here appear to support the assumption that the field is still in a stage of growth since the first appearance of new author's keywords continues to boost year after year.
Through the science map of the author's keywords and the new author's keywords that appeared for the first time per year, it is possible to imply that the development of the field during the first period was directed towards environmental issues and the development of academic competence, whereas during the second period, higher education becomes the cornerstone of environmental awareness, innovation and guidance to achieve sustainability goals in higher education institutions, as well as in society and government. Collaborations are mainly between higher education institutions within the same region. However, the integration of communication technologies has triggered worldwide collaborations. It is noteworthy that the creation of cooperative groups among authors is limited if they do not share similar languages or cultures. If the field continues to grow, it is expected that the network of authors will expand, leading to an increase in collaborations and the creation of more cohesive working groups.
Additionally, in 2013 the concept of "green universities" appeared for the first time in the scientific literature along with the establishment of "sustainable development goals" or "social innovation" in 2014, which result from the growing participation and incorporation of different sectors. Currently, new tools have been integrated to the field such as "environmental indicators" or the "triple helix model" that appeared in 2018, or the possible appearance of "science maps" in 2019, that will continue to make headway.
The bibliometric analysis has been applied previously to sustainability in higher education, but the results were limited to the use of a single database [13,14]. In contrast, this study using text mining and social network analysis provides a global view of the field, so the authors of this study consider that the findings show a reasonable approach to better understand all its dimension and could be used in the elaboration of public policies that promote its development and contribute to social benefit.

Conclusions
A scientific map of sustainability in higher education fosters the understanding of the current state of science and, therefore, contributes to the definition of development objectives in the field. The results of this study reveal that early records from 1991 account for the beginning of sustainable development, however, sustainability in higher education actually began after 1999, although it was not until 2005 when the scientific production of the field increased sharply.
The countries that have made the largest contributions are the United States of America, the United Kingdom, China and Australia. The top 20 countries are located in North America and the European Union. These institutions showed an increased dynamism during the period 2010-2018. Similarly, the appearance of new author's keywords proliferate in this same period, which suggests that the field is still growing. It is not surprising that the number of institutions working on the field doubled during the last eight years, and their interactions are expected to increase by eight-fold with respect to the initial period. Furthermore, cooperation between authors and institutions is also expected to grow and consolidate in many more working groups.
According to the results obtained from the application of the methodologies of text mining and social network analysis, it is possible to analyze the development of a scientific field in simple terms, in this case, it helped to identify the structure and main actors immersed in research of sustainability in higher education, visualizing its origins and drawing its evolution.  Title and Abstract ab((sustainability OR sustainable) AND ("higher education" OR "academic career")) OR ti((sustainability OR sustainable) AND ("higher education" OR "academic career")) 1.120 6 ERIC All available Title, abstract and descriptors (keyword) abstract:(( sustainability OR sustainable) AND ("higher education" OR "academic career")) OR descriptor:(( sustainability OR sustainable) AND ("higher education" OR "academic career")) OR title:((sustainability OR sustainable) AND ("higher education" OR "academic career")) 1.444

Emerald insight
All available Title, abstract and keyword Title=((sustainability OR sustainable) AND ("higher education" OR "academic career")) OR Abstract=((sustainability OR sustainable) AND ("higher education" OR "academic career")) OR Keyword=((sustainability OR sustainable) AND ("higher education" OR "academic career"))  Figure A1. Data collection: searches in different databases.