Impact of Entrepreneurial Ecosystem Discussions in Smart Cities: Comprehensive Assessment of Social Media Data

: Discussions on “smart cities” are gaining in popularity in the past two decades and has shown potential in tackling the cities’ environmental, social, and economic challenges. Smart cities are known as a system of physical infrastructure, the information and communications technology (ICT) infrastructure, and the social infrastructure that exchange information that ﬂows between its many di ﬀ erent subsystems. The “smart cities” concept has been introduced with various dimensions, among those, the embedded ICT infrastructure in smart cities is playing a decisive role between the functions of the system. One of the important derivatives of ICT is the new communication mediums known as social network services (SNSs), which is emergent and introduces additional functionalities to “smart cities”. This paper seeks to advance the understanding of SNSs in smart cities to evaluate the e ﬀ ects on the innovation and entrepreneurial ecosystem. This agenda has been tackled by a rigorous methodological approach in order to capture and evaluate the presence of entrepreneurially concerned discussions in a popular SNS intermediate (Twitter). Beyond the methodological contribution on handling big data in SNSs for gaining insights on innovation and entrepreneurial aspects in smart cities, the ﬁndings distinguished the inﬂuence of a certain category of content generators (professionals) that drive the biggest motives of the interactions in SNSs.


Introduction
The growth of population, technological development, and urbanization associated with cities are recognized as contemporary challenges that seek novel, efficient, effective, and economic approaches to better governance. Challenges in developing the infrastructure, economy, and services need to be addressed to increase the living standards of communities. The emergence of the "smart city" concept can be considered as a response to such challenges, ensuring that cities can develop economically, whilst protecting the environment and quality of life for citizens. Smart technologies are offering cities with exciting possibilities for the provision of new services and integrated city infrastructure as well as supporting innovation, digital entrepreneurship, and sustainable city development [1,2]. According to the World Economic Forum [3], a growing number of cities around the world are implementing ambitious smart city programs and projects across a range of themes including governance, local economic development, citizen participation, urban living, the natural and built environment, and sustainable transport.
The emergence of the smart city and smart city thinking has escalated in the last two decades in scientific literature and international policies. Cities play a prime role in social and economic aspects worldwide and have a huge impact on the environment [4]. An in-depth analysis of the existing literature revealed that the smart city is a multi-faceted concept with many elements and dimensions [5]. Descriptions of smart cities are now including the qualities of people and communities as well as information and communications technology (ICTs). Smart cities are known as a system of physical infrastructure, the ICT infrastructure, and the social infrastructure exchanging information that flows between its many different subsystems [6]. It might even be noticeable that major cities can serve as a good representation of a nation's economic success or failure. According to Beattie [7], this is because the tricky business of development and urbanization can play a big role in a country's economic prosperity. Entrepreneurship and innovation are the major concerns for an economy consequently within the boundary of a city, therefore, the competitiveness of a city today is determined by its innovativeness and economic strength [8]. While researchers have realized that smart cities are more entrepreneurial than others [9,10], an analysis of the detailed characteristics accounting for this higher entrepreneurial activity within smart cities has not been conducted.
One of the major resources connected to the success of smart cities is the societal capital or cultural capital within the city boundaries. The emphasis on the role of social capital in urban development is promoted in parallel to technical aspects of a city [11]. The importance of human and social capital has been recognized by smart city definitions from previous literature, and has been seen as a fundamental aspect of any smart city [6,[12][13][14]. Social capital has also been seen as an important dimension for the facilitation of innovation and entrepreneurship in smart cities. Smart cities have the infrastructure to bridge and facilitate the connectivity of society for entrepreneurial activity. According to Anthopoulos [15], residence satisfaction has been discussed and activities have been recommended that facilitates data collection and analytics to enhance municipality planning for this performance improvement. Despite the recognition of the importance of the human and social capital aspect in smart cities, the measurement and assessment of this aspect have remained a challenge. Performance measurement studies on smart cities dimensions, especially on social and human capital, are subject to being outcome indicators that, by their nature, involve a medium-to long-term observation and detection times [16]. The results of this issue are the lack of insight coming from society and incapability to absorb the information coming from society.
In this research, the attempt is on the smart city social and human capital performance measurement concerning innovation and entrepreneurship ecosystem activity. Due to ICT advancements, smart cities have the infrastructure to bridge and facilitate the connectivity of society. Within the broad spectrum of ICT application, the emerging presence of mass media communications such as social network services (SNSs) and social media has not been taken into account for studying innovation and entrepreneurship ecosystems in smart cities. Publicly available data sources such as Twitter have facilitated massive data collection, which can leverage the research at the intersection of social sciences, data sciences, and indicator design, thus informing the research community of major opinions and topics of interest among the general population [17,18] that cannot otherwise be collected through traditional means of research (e.g., surveys, interviews, focus groups) [19]. On the other hand, citizens are empowered to use technology-oriented common platforms to communicate among themselves, which has resulted in the inclusive use of social network services among citizens. Despite this interest, there seems to be a very limited understanding of what "social networking services" or "social media" exactly represent and do to societies. In this presented case, social media discussion is taken as a curtail pillar in regulating entrepreneurial spirit in smart cities. Therefore, this paper explores the role of social network services in smart cities from the innovation and entrepreneurial ecosystem vantage point. The aim is to address the following research questions:

•
Can SNS analytics measure the entrepreneurial ecosystem activities within cities? A methodological approach to utilizing SNSs data to identify the presence of impactful entrepreneurial discussion.

•
From the standpoint of the impact of SNSs on smart cities, what type of content in SNSs is more influential regarding innovation and entrepreneurial ecosystem discussions?
In order to study the presence and impact of SNSs in shaping the entrepreneurial and innovation ecosystem in smart cities, the literature has been reviewed carefully to position the trend and the need. This agenda has been tackled by a rigorous methodological approach in order to capture and evaluate the presence of entrepreneurially concerned discussion in a popular SNS outlet (Twitter). A thorough process of detecting and capturing relevant tweets was performed to evaluate the usage of SNSs in promoting innovation and entrepreneurial related discussions. Based on the recognized Smart City Index, London city was selected to utilize the methods for capturing social capital on innovation and entrepreneurial activity. This investigation obtains advance observations on detecting the promoting stakeholders in SNSs on the matter of innovation and entrepreneurial discussion.

Background
In this section, definitions will be provided with evidence from the literature. This section offers a background summary of the interpretation of previous research on smart cities and the role of social network services in innovation and entrepreneurial ecosystems.

Definition of Smart Cities
The "smart city" concept is becoming more popular in scientific literature and policy reports over the last two decades. A simple search of the "smart city" keyword appearance in the title of articles in Web of Science (WoS) as a scientific literature-indexing engine, revealed slightly over two thousand records as of 25 November, 2019. Figure 1 is an illustration of the publication growth trend of the "smart city" concept over the years, which is noticeable in the sharp growth of the terminology usage since 2012. The decrease in the number of publications in 2019 was because the search was undertaken prior to the end of 2019. Cities are considered as key role players in social and economic aspects in global perspectives, and therefore, in order to understand the importance of cities as future key elements, the definitions of "smart cities" will be explored in this section.
The United Nations Population Fund indicates that in 2008, about 3.3 billion people, which is more than 50 percent of the global population, lived in urban areas. This estimation is expected to increase to 70 percent by 2050, according to a United Nations report [20]. The urbanization figure in Europe is currently 75 percent of the population and the number is expected to reach 80 flows by 2020 [20].
The term "smart cities" first appeared in the 1990s, the focus of which was on the significance of new ICT with regard to modern infrastructures within cities. Since then, the smart city definition ranges from where ICT facilitates the bridge between information and digital services with the participation of society and communities [21]. An in-depth analysis of the emerging literature revealed that the meaning of a smart city is multi-faceted and concerns the interdisciplinary studies [21,22]. Observation from the WoS publication's bibliometric analysis indicates domains such as ICT; Computer Science; Urban Studies, and Green Sustainable Science Technology as the top science categories where the "smart cities" concept is contributing. From a systemic point of view, smart cities are encouraged for integrating technologies to disseminate services over their network for future developments. The capabilities of ICT infrastructure as a facilitator for creating new communication mediums becomes crucial, and therefore requires broadband network development, mass communication platform creation, citizen technology skills improvement, and institutional changes [23]. The advantage point of smart cities as a structure to enable the pre-mentioned movements has been seen as an opportunity for information exchange that flows between its many different subsystems [24]. A comprehensive definition of smart cities was provided by Nijkamp and Kourtit [25]: "Smart cities are the result of knowledge-intensive and creative strategies aiming at enhancing the socio-economic, ecological, logistic and competitive performance of cities. Such smart cities are based on a promising mix of human capital (e.g., skilled labor force), infrastructural capital (e.g., high-tech communication facilities), social capital (e.g., intense and open network linkages), and entrepreneurial capital (e.g., creative and risk-taking business activities)". Hence, a recent classification by Neirotti et al. [26] defined two major domains for the smart city concept with regard to the exploitation of tangible and intangible urban assets: (1) hard domain, which concerns energy, lighting, environment, transportation, buildings, and health care and safety issues, and (2) soft domain, which addresses education, society, government, and economy. Shapiro [27] and Holland [11] argued over the soft domain aspect of smart cities such as human capital rather than hard domain aspects like ICT as the driver of smart city creation. According to Caragliu et al. [14], a city is smart "when investments in human and social capital and traditional (transport) and modern (ICT) communication infrastructure fuel sustainable economic growth and a high quality of life, with a wise management of natural resources, through participatory governance" (p. 70). Descriptions of smart cities are now appreciating the soft domain aspects like the qualities of people and communities as well as ICTs [6,9,28]. The new perspective that aims to inspire the sense of community among citizens gains insights from the previous bottom-up knowledge scheme and recognizes the importance of factors that emulates the concept of smart communities where members and institutions work in partnership to transform their environment [29]. Smart communities make conscious decisions on technology use for tackling societal challenges, which results not only in the increase in quality of life, but also as a means in which to reinvent the city's capabilities for new communal practices [30]. The California Institute for Smart Communities could be exemplified among the first to focus on how communities could become smart and how a city could be designed to implement information technologies [31].
The vast range of contexts has led to the formation of a diverse and nebulous smart city design space, where there is little consensus over what smart cities are and what form they should take. This inhibits communal discourse and slows down the development and widespread deployment of smart city technologies and policies [11]. More crucially, it is a barrier to citizen engagement and bottom-up design. Communities are unlikely to engage with, identify, and then design solutions for civic problems while the smart city concept is incoherent, unapproachable, and hard to measure. The agenda for this research was to study the bridge between the embedded soft and hard domain aspects of smart cities and smart communities. On one hand, the hard domain side is where infrastructure such as ICT has a decisive role in the functions of the smart city. On the other hand, the term has also been applied to soft domains where approaches toward culture and social inclusion in a smart city are supposed to offer environments for entrepreneurship that are accessible to all citizens. In the study of Barbara-Sancheze [32], the role of smart city as a generator of new entrepreneurial initiatives has been explored in Spain by confirming the relationship between smart cities and the entrepreneurship rate. The taken aspect of the smart cities in this research concerns ICT provided opportunities such as social network services, and therefore social capital utilization for entrepreneurial ecosystem activities. Data in social network services as a communication platform will be utilized to study the content and discussions on the innovation and entrepreneurship in the smart city, while the general procedure to systematically deal with SNS data will be described. Furthermore, with the data analyzed and operationalization of the extracted simplified metrics, an attempt was made to investigate the influential content in SNSs regarding innovation and entrepreneurial discussions. Therefore, the conceptual framework for approaching smart cities within the focus of this research should offer insights regarding the operationalization of social network services data and the effect magnitude of content in SNSs in the context of innovation and entrepreneurship discussions.

The Role of Social Network Services (SNSs) in Innovation and Entrepreneurial Ecosystems
Innovation and entrepreneurship concepts are highly intertwined and are dependent on each other, and are recognized as the core critical components for the wealth and competitiveness of cities and countries [33]. Innovation is an inherently human endeavor, and successful innovation occurs when people with skills, experience, and capabilities come together to understand or predict, and then address existing challenges, while entrepreneurship is the attempt to set up and scale the efforts [34]. The study by Richter et al. [9] attempted to solidify the connection between the smart city and entrepreneurship by identifying six main characteristics ranging from ICT infrastructure and high-tech industries to social capital and social inclusion. The usage of the addition of ecosystem terminology is to introduce the complex relationships that are formed between actors and entities as stakeholders when studying both innovation and entrepreneurship concepts from a holistic perspective [35,36]. The ecosystem analogy informs the design of system-level innovation and entrepreneurial activities and has been used in the highly intertwined literature of "innovation" and "entrepreneurship", while the prefix "eco" in innovation ecosystems implies a specifically ecological aspect [37,38]. The recent description of ecosystem by Adner [39] defines "ecosystem" as the alignment structure of the multilateral set of partners that need to interact in order for a focal value proposition to materialize.
Referring to the ecosystem addition to both innovation and entrepreneurship, ecosystems can have many components including the existence of prior ventures, a patent system, a culture tolerating failure, incubators, grant programs, and investments by business angels and/or venture capitalists [40]. An entrepreneurial ecosystem or entrepreneurship ecosystem refers to the human, financial, and professional resources and institutional environment that support and nurture new ventures in a specific geographic location [41]. Entrepreneurship, or the act of entrepreneurs, is crucial in any innovation ecosystem. According to Erikson [42], the dynamics and challenging nature of the innovation ecosystem of smart cities require entrepreneurs to adopt more important roles than usual in terms of identifying and exploiting opportunities. Smart cities are introduced as the territories that connect the physical, the information technology, the social, and the business infrastructure to leverage the capability of learning and innovation, which is built-into the collective intelligence of the city and its population [43]. The smart infrastructure of cities can tackle the existing challenges in innovation and entrepreneurship ecosystems. In particular, the role of ICT services as one of the dimensions of smart cities can enhance the innovation and entrepreneurship ecosystem. Smart cities have the infrastructure to bridge and facilitate the connectivity of society, and in general, the social capital for entrepreneurial activity. With the emergence of social network services in the past decade, a new medium has been created to present the society that has not received the proper attention yet. The social infrastructure such as intellectual and social capital, presented by SNSs, is an indispensable endowment to smart cities as it allows for "connecting people and creating relationships" [6]. ICTs also offer new avenues for openness by providing access to social media content and interactions that are created through the social interaction of users via highly accessible web-based technologies.
Social media platforms have had significant growth over the last decade. According to online statistics and market research source Statista [44], over 70 percent of Internet users were social network users in 2017 and these figures are expected to grow. It is estimated that the number of social media users will increase from 2.34 billion in 2016 to 2.95 billion in 2020 [45]. Social networking is one of the most popular online activities with high user engagement rates that expand mobile possibilities. The growth of the SNSs' user base is universal and is now being increasingly populated and used by many diverse age groups [46,47]. The growth of social network services is unprecedented that is now so well established and considered a major visited service on the internet that doesn't change much from year-to-year [48]. The recent evaluation of actively used social networking services by Pew Internet indicates Facebook as the dominant platform including the owned service of Instagram by 76% of the active user's login while Twitter is reported to have 42% of active user's logins [49].
It is, therefore, reasonable to say that social media represents a revolutionary new trend that has the potential to enhance existing and foster new cultures of openness [50]. Social media empowers its users by the ability to inexpensively publish or broadcast information as it gives them a platform to effectively democratize information and communication real-time. However, despite all the facilitation of information creation and dissemination, there seems to be a very limited understanding of what "social media" or "social networking services" exactly represent and eventually do to societies. Meanwhile, in smart city programs that have received great publicity, there has been less discussion about the evaluation and measurement regimes of societal and soft domain aspects in smart cities. The lack of metrics for grasping the societal activities has been depicted in the 'Global Innovators: International Case Studies and Smart Cities' [51] report, which notes the inadequacy of existing evaluation approaches that tend to be non-standard and focused on implementation processes and investment metrics rather than on city outcomes and impacts.
This paper aimed to investigate the social capital on innovation and entrepreneurship within the smart cities by diving into social networking services as the derivative of one of the major dimensions of smart cities. This research presents the utilization of SNSs in understanding and capturing entrepreneurially related discussions and further investigates the impact of various profile types on SNSs regarding entrepreneurial spirit. Further investigation of this research is to shed some light on how social network services are reshaping contemporary smart cities. The focus is on how smart cities should optimally deploy and exploit data coming from SNSs as part of their competitive strategies as well as how the analytical methods, tools, and techniques are best utilized for supporting operations. Furthermore, in the presented case study, social media discussions are seen as a curtail pillar in regulating entrepreneurially related activities in smart cities. Therefore, the attempt would be to capture and isolate the entrepreneurially related discussions in the smart city case via SNS outlets and evaluate the content and profile type of the content generators' influence in the overall SNS interactions.

Research Methodology
In this section, the approach to utilizing computational advancements for analyzing social network services data in a systematic process is described. The approach uses semantic and linguistic analyses for detecting major topical discussions on Twitter as the SNS platform under study. The following section will describe a systematic approach to analyze social network services data; a general process on SNSs data collection, topic discovery, and topic-content analysis. Furthermore, the analysis interpretation discloses the insightful characteristics of tweets regarding their topic of discussion and the characteristics of the content generator.
Internet data are available in various formats; social network services provide one form of these data. Prior to the availability of such data, in the early 20th century, sociologists used to interview people to understand their social connections and, in this manner, used to form small social networks for analysis. Today, due to the activity on social networking platforms such as Twitter, it is possible to study the live tremendous content of SNSs, in addition to millions of nodes and billions of edges. The rise of computational power in the past decade has opened new opportunities for data analysis. At around the same time, exponential growth in Internet usage has accelerated the generation of enormous amounts of data. The ability to quickly access these multifaceted data and the availability of ever-increasing computational power has led to the rapid development of the field of social data analytics. Gartner [52], a research and advisory firm on information technology, defines social data analytics as the analytical tool of people's interaction in social contexts, often with data obtained from social networking services. The data in SNSs often comes unstructured as information that is not organized in a pre-defined manner and does not necessarily present a pre-defined data model. Unstructured information is typically text-heavy, but may also contain data such as dates, numbers, and facts. Advancements in data mining and text analytics will be obtained in this study to analyze the SNS data for insightful information.
In this paper, the focus was on obtaining insights from SNSs as a major component in smart cities regarding entrepreneurial ecosystem activity. The overall architecture to process data in SNSs is composed and presented graphically in Figure 2. For the data collection platform, Twitter (twitter.com) was considered as it is a microblogging platform used by millions of users. However, the process has a high extent of generalizability to most of the data in SNSs platforms. The present process included three major phases: capture, curate, and consume. In addition, each phase had two sub-phases, according to Figure 2. Capture: This is the process of collecting data, which contains the selection of the data source, searching for the data, and collecting data for other usage. Inputting the search query is the primary way to specify the content, which is of any interest to retrieve. Various specifications can be implemented such as keywords, length, date, etc. in order to target the topic of interest. In other words, the required data is obtained by a set of criteria embedded with the search query. Some SNS platforms such as Twitter offer the possibility of retrieving data via the live stream.
Curate: Data curation is a broad term used to indicate processes and activities related to the organization and integration of data collected from various sources. Data retrieval methods are often loosely controlled, resulting in out-of-range values. The data preparation task is performed to reduce the irrelevant and redundant data present in the collected set. This task is necessary for the forthcoming steps to normalize the data for better knowledge discovery results. Data analysis can be very subjective to the context of the study and expected results, but the two primary tasks in analysis can be mentioned as data feature extraction and data classification. The intent for feature extraction is to facilitate further distinctions and categorization of the data. This task will drive values (features) from the data regarding the context of the knowledge discovery process. Classification of the data occurs in order to reduce the dimensionality of the data. It is an approach derived from the general hypothesis of the knowledge discovery task to distinguish the best-fit data points from the mass. In this case study, topic modeling has been performed in order to understand the major important cluster of discussions regarding their topics. Topic modeling is a type of statistical modeling for discovering the abstract "topics" that occur in a collection of documents. Latent Dirichlet allocation (LDA) is an example of a topic model and was used in this study to classify a tweet's text to a particular topic.
Consume: This refers to publishing a presentable format of the information derived from the data. The insights from the results can be provided in a visually appealing way or can be used as a metric to be combined with other data points for further interpretations. In the case of this study, the major topical structure of the text was extracted for simplification and capture of the dominant theme of the discussions.
Having the systematic social network services data analysis explained, the next section will explain the utilization of the presented procedure using a case study.

Evaluating Entrepreneurial Ecosystem Activity on Twitter: London City Case Experiment
The background literature discusses the importance of emerging social network services in smart cities and the need to investigate the effect of entrepreneurial discussions in the innovation ecosystem. In this section, the emphasis will be on a systematic approach to analyzing data from SNSs and emphasize new ways of benchmarking for social capital by focusing on social network services. In order to solidify the objective, an experiment was conducted to detect and capture entrepreneurial discussions on one of the dominant social network services called Twitter. A popular microblogging tool, Twitter, has seen a lot of growth since it launched in October 2006 and is an online news and social networking service where users post and interact with messages called "tweets" that are restricted to 140 characters. Twitter users can post their opinions or share information about a subject to the public. Twitter has 316 million users worldwide [53], providing a unique opportunity to understand societal discussions, and in this study case, a way to comprehend entrepreneurially related discussions.
The initial interest of the study was to capture innovation and entrepreneurial related discussion from social network services as one of the major themes that need studying in smart cities. Startups are considered as a good representation of the societal practice of entrepreneurship. Startups are increasingly seen as significant contributors to national job-creation [54]; employment and gross national product data demonstrate the shift to an innovative startup-dominated economy [54]. Therefore, fostering the startup ecosystem is seen as a measure for improving the national economy [55]. The study case experiment was conducted to collect the activity related to the startup ecosystem in the studied country to capture the relevant societal discussions oriented toward innovation and entrepreneurship.
Twitter is an SNS platform, which well represents and acts as a support infrastructure for startups, which organically are socially active. The study took the initiative of collecting a sample of tweets from a region (city) and extracted features (words and hashtags) related to startup activities; additionally to decompose hashtags, analyze them, and reuse the information extracted for classification purposes. The operationalization of connectedness in Twitter is performed by hashtags, as it is the most common feature for users to connect and relate within a larger networked discourse [56]. Hashtags in Twitter have been used to separate the stream of tweets and unite the discussion streams. This functionality of Twitter has been studied in political science, communication studies, and social sciences [57][58][59].
Twitter provides an application programming interface (API) to access tweets and information about posted content and users. The potential bias of the Twitter API has been discussed by recent research [60]. Twitter data have been used for a wide range of studies such as the stock market [61], brand analysis [62], and election analysis [63]. The unique characteristics and features of Twitter as a microblogging service are illustrated in Figure 3. With respect to Twitter's characteristics, a multi-component semantic and linguistic framework was developed to collect Twitter data, prepare and analyze the data, and discover insightful information. In order to demonstrate the steps for utilizing SNSs data for valuable insights, a high ranked smart city was selected. Exploiting Cities in Motion Index (CIMI), Berrone et al. [64] evaluated 181 cities in more than 80 countries to determine the smartest cities around the world. According to the index results, the city of New York (USA), London (UK), and Paris (France) topped the list, respectively. London has been considered as one of the top smart cities in other global scale rankings [65,66]. In this study, the city of London was selected for further analysis due to its high ranking as a smart city and the use of English language, which will facilitate the text analytics tasks. With respect to Twitter's characteristics, the search queries were constructed in a way that captures the most relevant content regarding the startup scene and the entrepreneurial activity.

Data Collection
This phase attempted to collect relevant tweets using Twitter's application programming interface (API) [67]. Based on the background literature, major keywords have been identified to capture entrepreneurial ecosystem activity in Twitter (e.g., Entrepreneurship, Startup, Innovation). Popular hashtag recommender toolkits were used such as "http://hashtagify.me, https://ritetag.com and "https://www.trendsmap.com to discover the relevant hashtags and their proximities to the innovation and entrepreneurial related discussions. The toolkits encounter the co-occurrence network of tweets, and accordingly their hashtags. Therefore, by inputting a keyword, the toolkits are able to recommend related hashtags based on their background information. Figure 4 illustrates the hashtags' proximity with the subject of the initial search (#startup #startups #entrepreneur #tech #sme #innovation #entrepreneurship #startuplife #hackathon), which were obtained to detect the extended hashtags and relevant discussions. Twitter's API provides both historic and real-time data collections. The latter method randomly collects 10% of publicly available tweets. The real-time method was used to randomly collect publicly available English tweets using several pre-defined hashtags related queries mentioned previously within a specific period. The extended query was used to collect approximately 4000 related tweets between 06/01/2017 and 08/30/2017 in the city of London (the geolocation of retrieved tweets are specified as London city). The crude data are available at the following link "https://goo.gl/mZumDp. Table 1 shows a sample of the textual content of the processed and collected tweets, the users, and the overall interaction (sum of likes and retweets) for each tweet in this research.

Curate
In this phase, the analysis of tweets was advanced by Data feature extraction and data classification. Regarding the SNSs data collected from Twitter, the investigations began with an empirical analysis of the dynamics of the discussions on Twitter. The topical structure of the discussions will be studied. Furthermore, the investigating will continue to extract the characteristics of the major content producers. The Twitter analytic process was facilitated by the Azure cloud computing platform [68], and the pipeline of the process can be seen in Figure 5.

Curate
In this phase, the analysis of tweets was advanced by Data feature extraction and data classification. Regarding the SNSs data collected from Twitter, the investigations began with an empirical analysis of the dynamics of the discussions on Twitter. The topical structure of the discussions will be studied. Furthermore, the investigating will continue to extract the characteristics of the major content producers. The Twitter analytic process was facilitated by the Azure cloud computing platform [68], and the pipeline of the process can be seen in Figure 5. After importing the retrieved tweets as the input data, a total number of 4014 single tweets were considered for the analysis. A filtering process was applied to the structure and reduce the noise of the data meaning to extract the natural language text within the tweet from other data types (i.e., hashtags, mentions, URLs, non-English tweets if they exist). In addition, the data feature extraction distinguishes the valuable data points such as the number of retweets, likes, and profile identifications as well as the textual content of the tweets, as later on, these data points will be leveraged for further insights. The process involves using R which is a programming language and free software environment for statistical computing by the R Foundation for Statistical Computing. The R script will further process the tweet's natural language text for tokenizing, lemmatizing, and stop words removal. The steps are necessary for the raw tweet's content preparation for classification and topic extraction. The R script for handling text preprocessing and topic modeling has been inspired by Dmitriy Selivanov [69] compiled packages, which offers solutions for fast vectorization, topic modeling, distances, and word embeddings in the R language. One classification task for analyzing tweets, topic modeling, was utilized in order to reveal the topical formation of the discussions. Topic modeling can be described as a method for finding a group of words (i.e., topic) from a collection of documents (in this case, tweets) that best represents the information in the collection. It can also be thought of as a form of text mining, a way to obtain recurring patterns of words in textual material [70]. The technique used to obtain topic models in this study was the latent Dirichlet allocation (LDA) and the consequent visualization toolkit (LDAviz) was leveraged to visually show the major Twitter discussion topics [71]. The next section illustrates the results and findings (known as "consume", according to the research methodology process), which represent the classification calculation results visually.
Following the three-step procedure for SNS systematic data analysis described in +-*, the 'Consume' layer is presented as the "Results and Findings" in the next section. After importing the retrieved tweets as the input data, a total number of 4014 single tweets were considered for the analysis. A filtering process was applied to the structure and reduce the noise of the data meaning to extract the natural language text within the tweet from other data types (i.e., hashtags, mentions, URLs, non-English tweets if they exist). In addition, the data feature extraction distinguishes the valuable data points such as the number of retweets, likes, and profile identifications as well as the textual content of the tweets, as later on, these data points will be leveraged for further insights. The process involves using R which is a programming language and free software environment for statistical computing by the R Foundation for Statistical Computing. The R script will further process the tweet's natural language text for tokenizing, lemmatizing, and stop words removal. The steps are necessary for the raw tweet's content preparation for classification and topic extraction. The R script for handling text preprocessing and topic modeling has been inspired by Dmitriy Selivanov [69] compiled packages, which offers solutions for fast vectorization, topic modeling, distances, and word embeddings in the R language. One classification task for analyzing tweets, topic modeling, was utilized in order to reveal the topical formation of the discussions. Topic modeling can be described as a method for finding a group of words (i.e., topic) from a collection of documents (in this case, tweets) that best represents the information in the collection. It can also be thought of as a form of text mining, a way to obtain recurring patterns of words in textual material [70]. The technique used to obtain topic models in this study was the latent Dirichlet allocation (LDA) and the consequent visualization toolkit (LDAviz) was leveraged to visually show the major Twitter discussion topics [71]. The next section illustrates the results and findings (known as "consume", according to the research methodology process), which represent the classification calculation results visually.
Following the three-step procedure for SNS systematic data analysis described in +-*, the 'Consume' layer is presented as the "Results and Findings" in the next section.

Results and Findings
So far, the research process was able to encapsulate the entrepreneurial ecosystem activity via focusing on the startup scene in the smart city of London. The dynamic relevant discussions in social network services (in this study, Twitter) were captured and curated to transform the SNSs data into insightful information. The dynamic discussions and interactions on SNSs regarding entrepreneurially oriented matters can represent the social capital as explained in earlier sections. In this section, the task was to dive deeper into the SNSs data in order to detect the most influential content and type of associated content generator profiles. A categorization analysis task was performed into the textual content of the SNSs data in order to gain a broad overview and distinguish the general topic of discussions. Next, a statistical model was applied to capture the content type impression on the SNS.

Content Type Categorization
The analysis of the topical structure of SNS discussion with LDA is visualized in Figure 6, which illustrates the general topical theme of the discussion. The six major clusters were named based on the major keywords mentioned under each topic. The visualization also revealed the size of the discussion proportional to other topics via their circle size and indicates the distance of topics in a two-dimensional distance map. As part of the data consumption and insight generation task, by having the metadata of each posted tweet and the associated profile under each of the topics, influential profiles based on their overall interaction (number of retweets and likes received for the post) can be detected. This information will reveal how contents (tweets) receive attention in different topics regarding their content generators. The motivation for content generators in twitter profile categorization stems largely from the fact that humans as intelligent individuals impose complex factors on the consumption and dissemination of information on SNSs [72,73]. Therefore, as the different profile types have different purposes and cater to different needs, the categorization of content generators in each of the six topical discussions will be helpful in measuring the impact and influence each category makes. The categorization definitions and process was inspired by Uddin et al. [74] and due to the study intentions, three different major types of Twitter profile were defined and developed as follows: Personal profiles: These accounts contain personal content, have no ties to business, and do not mention corporate or brand information. They are created by individuals who do not wish to be identified with their employer. Technically, the accounts have been created to acquire news, learn, have fun, etc. Generally, these individuals exhibit low to mild behavior in their social interaction.
Professional profiles: Personal users who communicate their professional views on Twitter. They share useful information on specific topics and are involved in healthy discussion related to their specialist interests and expertise. Professional users tend to be highly interactive: they follow many and are also followed by many.
Corporate and business profiles: Unlike personal and professional users in that they follow a marketing and business agenda on Twitter. Their profile description accurately describes their motives, and similar behavior can be observed in their tweeting patterns. Frequent tweeting and less interaction are the two key factors that separate business users from both personal and professional users. The type of content will primarily be corporate. Such accounts are often managed by company teams working under a specific brand name related to the company, providing corporate news and support.
Under each of the six discussion topics, profiles ranked based on their tweet interaction ratio (number of retweets + number of likes) were manually observed and categorized, according to the three major profile descriptions. Figure 7 is an illustration of the manual categorization of the top content generators or in other words, Twitter accounts based on their tweet's interaction ratio. The cut of point decided to include 60 Twitter accounts to cover all tweets in the six categories of content. The 60 top content producers in Twitter generated a total of 1170 interactions, where their tweets' contents were manually reviewed to identify the profile type. As can be observed from Figure 7, professional users have more influence overall. In topical content categories, professional users generate the largest influence in educational, motivational, promotional, and events types of topics. Corporate and business profiles tend to be more influential in the news, educational, and promotional categories after professional users. Counting the likes, the calculation revealed that professional users have more interaction, especially in educational and motivational content category, while business profiles had a higher interaction in the news category and motivational category in second ranking. Personal profiles have the lowest influence among the other two profile categories in both retweets and count of likes. The difference in the distribution of interaction is that the motivational and educational categories received the highest retweets and in the calculation of like counts, the high-interacted categories shifted to events and news.

Content Type Impression in SNSs
Following the description of the tweets' content type and profile category interaction, in this section, the goal was to capture if the type of tweet content and the type of Twitter profile that generated the content had any significant effect on the tweet's received interaction/impression. In other words, the interest in this experiment was to determine how much of the variation in the dependent variable (tweet's received interaction/impression) can be explained by all of the independent variables (tweets' content type and profile category).
Multiple regression analysis is most often used to (a) predict new values for the dependent variable given the independent variables as well as (b) determine how much of the variation in the dependent variable is explained by the independent variables. As such, multiple regression extends simple linear regression, which is used when there is only one continuous independent variable (in this case, the tweets' interaction). Multiple regression allows for a relationship to be modeled between multiple independent variables and a single dependent variable where the independent variables are being used to predict the dependent variable. Considering, for example, four independent variables to be "X1" through to "X4" and the dependent variable to be "Y", the multiple regression models will be the following: where β 0 is the intercept (also known as the constant); β 1 is the slope parameter (also known as the slope coefficient) for X 1 , and so forth, and ε represents the errors. In this experiment, the dependent variable was the tweet's received interaction/impression and for the dependent variable, there were 18 dichotomous variables, which describes the six tweet content type (Educational, Motivational, Promotion, Events, News and Viral) and three profile categories (Personal, Professional, and Corporate & Business). Before initiating the regression model, a check was undertaken for multicollinearity to decide among the highly correlated variables on which to drop and proceed. Otherwise, this leads to problems with understanding which variable contributes to the variance explained and technical issues in calculating a multiple regression model. In order to provide accuracy in interpretation based on the regression model, a check was undertaken for other required assumptions for performing the multiple regression model such as the independence of observations, testing for linearity using scatterplots, and the assumption of homoscedasticity (the detailed information for these checks can be seen in Appendix A).
The multiple regression procedure was initiated using SPSS Statistics software. Twelve variables were input to the model (the viral type of tweet was eliminated as it had a low number of observations and the test for including them did not improve the general explanatory power of the regression model). R squared for the overall model was 37.4% with an adjusted R 2 of 32.5%, which is a moderate size effect according to Cohen [75]. The statistical significance of the model can be observed via the analysis of variance or ANOVA from Table 2. Concluding from the test, it can be said that the 12 variables statistically significantly estimated the independent variable (interaction) F(12, 155) = 7.706, p < 0.0005. The value of these coefficients can be ascertained by inspecting the Coefficients table, as highlighted below: (full statistical summary including the coefficients table for all variables can be found in the the detailed information for these checks can be seen in Appendix A Tables A1-A7 and Figures A1 and A2).
Intercept for variables (ed-prof, mot-pers, mot-prof, and new-crop) was statistically significant (i.e., p < 0.0005). With the slope coefficients statistically significant for four of the independent variables, an interpretation can be performed to say that a positive increase in the interacting variable will be caused if more of the category profile type tweets are generated. For example, referring to the coefficients in Table 3, an increase of approximately 21 interactions is expected if one motivational tweet is generated by professional users, which is the highest expected interaction increase among the other independent variables.

Discussion and Conclusions
In this paper, an attempt was made to capture social capital on entrepreneurial ecosystem activity in smart cities' infrastructure by utilizing SNSs data. By doing so, several key contributions to ongoing research and theory are proposed.
As noted previously, prior research has identified the smart city definition from various perspectives. Based on the soft domain classification of smart cities narrated by Neirotti et al. [26], which mainly considers society, government, and economy, this paper seeks to advance the understanding of the soft domain aspect of smart cities. In particular, the human and social aspects, which are empowered by ICT, were observed in pursuance of capturing and evaluating the effects on the innovation and entrepreneurial ecosystems within smart cities. Capturing and evaluating the presence of entrepreneurial related discussion took place by looking at a social network services platform (in this study, Twitter). The process for utilizing the SNSs data was explained in a systematical manner and the procedure was put into practice by applying it to a case study. The city of London was selected as a smart city in which the systematic process of retrieving information from SNSs was applied to startup discussions as the major community representing innovation and entrepreneurial related discussions. The aim was to identify the presence of influential forces in SNS, which promote and reinforce entrepreneurial related discussion in smart cities.
Prior research has conceptualized multitude dimensions of smart cities and have paid attention to social networking services for smart urban planning ( [76][77][78][79]). This study attempts to advance approaches to analyzing social network services data and the addition it can provide regarding the soft domain features of smart cities. In this study, the focus was to give part of this picture more color by concentrating our attention on one important aspect of smart city design: How smart cities can leverage the presence of SNSs for entrepreneurial ecosystem activities in the innovation ecosystem. The use of technologies to generate intelligence from SNSs data is important as smart city services are increasingly based on the collection and analysis of complex datasets. In this study, a systematic process was demonstrated where the innovation and entrepreneurial discussions in the city of London were retrieved from Twitter for a three-month period. The data curation phase was accompanied with topic modeling techniques to extract the six major topical discussions (Educational, Motivational, Promotion, Events, News, and Viral). Furthermore, the categorization of three profiles (Personal, Professional, and Corporate & Business) gave an insightful illustration through high interaction tweets based on the generated profile and topical theme. In order to investigate the significant effect of profile and content type with received interaction, multiple regression modeling was adopted. Multiple regression was benefited to determine the proportion of the variation in the dependent variable (interaction) explained by the independent variables (Twitter profiles and content type). Multiple regression models provided an understanding that educational content generated by professional users, motivational content generated by personal and professional users, and news content generated by corporate users have a significant contribution toward the interaction by general users in SNSs.
The theoretical arguments developed here may potentially inform future efforts to understand how various types of content in SNSs may interact and influence the users. The level of interaction received for a type of content in SNSs, which was due to their profile characteristics in the context of entrepreneurially related discussion, may contribute significantly to emerging theory in the field of entrepreneurship in this manner. The results also provide context to the debate regarding having a shift of emphasis in smart city design, away from the mass installation of smart technologies and toward making citizens smarter, so that they can use technology. Previous studies have attempted to bring some clarity to the smart city design space by categorizing how cities negotiate physical space [80], generate and manage development policy [28], and balance their human, technological, and institutional dimensions [81,82]. The use of SNSs as a platform for mass communication is able to provide an environment to increase engagement between citizens and other major stakeholders such as companies and various agencies. It can be concluded that the latest ICTs and SNSs have transformed the traditional meaning of citizen participation. A smart city is a place with high social inclusion of its inhabitants widely due to ICT infrastructure, which facilities communication and information dissemination. This new understanding of citizen participation through SNSs has important implications for the planning and design of cities of the future. Utilizing SNSs in city planning will made city planning a collective challenge and responsibility to both governments and citizens [83].

Limitations and Future Research
The novelty of this research lies in the proposed strategy for addressing the opportunities with systematically comprehending social network services data. However, there are limitations and difficulties associated with the process of retrieving, validating, classifying, and generalizing the SNSs data, which is also addressed in the literature [84,85]. In the following, a detail description of each of the limitations found in this study and how we approached the question to be solved is presented.
The Twitter API service promises a random sample of their data for researchers, journalists, consultants, and government analysts to study human behavior. While there is not much available data on the Twitter company sampling mechanism, it has been communicated that the randomness of a sample is so that each element has an equal probability of being chosen [86]. While the scientific community agrees that this is a potential limitation, in this study, the attempt was to take a longer period for the data collection. Another potential limitation when it comes to Twitter is the applicability and popularity of the tool when performing urban level studies. Although the city case selection in this study was carefully considered, the methodology is offered in such a way that it can be replicable for other similar types of SNS applications in other city cases. In terms of validation and the reliability of the automized text analytics and natural language processing, there has been an incorporation of human judgment and intervention to make sure that any major biases do not occur.
Despite these limitations, this work considers a contribution to the literature as a starting point in an empirical analysis, with SNSs quantitative data impact calculations on the discussion of entrepreneurial ecosystem. The propositions advanced in this paper lend themselves to more empirical testing on social network services platforms. The application of systematized SNS data analysis and the templates used to highlight differences in the interaction with content in the SNS takes places considering the theme and the generating profiles. The applications from this study can be used in benchmarking SNS activity by a new metric design, which can initiate more "citizen-led" smart city perspective studies and promote large-scale population-wide initiatives in smart city research agendas.
Funding: The author is honored and thankful to have the support of the Foundation for Economic Education (grant number  to conduct this study.

Conflicts of Interest:
The author declares no conflicts of interest.   Table A3. Variables Entered/Removed.