A Preliminary Attempt at the Identiﬁcation and Financial Estimation of the Negative Health E ﬀ ects of Urban and Industrial Air Pollution Based on the Agglomeration of Gda ´nsk

: This article marks the ﬁrst attempt on Polish and European scale to identify the relationship between urban and industrial air pollution and the health conditions of urban populations, while also estimating the ﬁnancial burden of incidence rates among urban populations for diseases selected in the course of this study as having a causal relation with such incidence. This paper presents the ﬁndings of a pilot study based on general regression models, intended to explore air pollutants with a statistically relevant impact on the incidence of selected diseases within the Agglomeration of Gda´nsk in the years 2010–2018. In discussing the city’s industrial functions, the study takes into consideration the existence within its limits of a large port that services thousands of ships every year, contributing substantially to the volume of emissions (mainly NO x and PM) to the air. The causes considered include the impact of air pollution, seasonality, land- and sea-based emissions, as well as their mutual interactions. All of the factors and their interactions have a signiﬁcant impact ( p ≤ 0.05) on the incidence of selected diseases in the long term (9 years). The source data were obtained from the Polish National Health Fund (NFZ), the Agency for Regional Monitoring of Atmosphere in the Agglomeration of Gda´nsk (ARMAAG), the Chief Inspectorate of Environmental Protection (GIO´S), and the Port of Gda´nsk Harbourmaster. The study used 60 variables representing the diseases, classiﬁed into 19 groups. The resulting ﬁndings were used to formulate a methodology for estimating the ﬁnancial burden of the negative health e ﬀ ects of air pollution for the agglomeration, and will be utilized as a reference point for further research in selected regions of Poland.


Aim of the Research
The main object of this paper is to precisely identify the adverse health effects of air pollution reported in urban and industrial agglomerations, both in quantitative terms (incidence) and financial terms (cost burden of medical treatment) in the region's sustainable development policy.
To accomplish this objective, a precise identification has to be made of the diseases related to air pollution and their causes. At this stage, the research project focuses on the air pollution aspect. The problem of air pollution has remained unnoticed for many years, leading to an accumulation of hazards with a direct health impact, such as smog episodes. The cause-and-effect mechanism behind this phenomenon is not adequately studied, with the literature only naming potential causes but providing little indication as to their long-term influence on human health.

Motivation
Polish cities with different degrees of air pollution problems, such as Warsaw [WAW], Tricity (Gdańsk, Sopot and Gdynia) [TRJ], Cracow [KRA], Zabrze [ZAB] and Nowy Sacz [NSA], have been selected as case studies for the identification of statistically relevant factors behind particularly health-threatening diseases. A relatively long nine-year period (2010-2018) has been considered so as to identify the direct cause-and-effect relationships, as well as determine the long-term effects of exposure to a polluted environment, while also taking into account the interactions between pollutant concentrations, weather conditions and time of exposure. The results for Tricity will set the stage for studies on other urban areas. Besides the reasons discussed above, Tricity has been selected as a pilot case study because of its seaside location and the influence of other factors associated with sea port emissions, in particular, those attributable to ships (engine exhaust emissions from arriving/departing ships and exhaust from power generators in ships moored at berth).
A precise identification of the factors requires the use of statistical tools. Of these, the most fundamental one is the reference grid of the automatic air monitoring system. Of all the cities concerned, the most extensive grid is found in Tricity. The grid should ensure that measurements are reliable so as to enable the data derived from them to be utilized for exploring models of health impact with key implications for city residents. Statistical methodology is of prime importance for this purpose [1].
The use of quantitative methods, including stochastic and exploratory techniques, in environmental studies does not seem to be sufficient for practical purposes. There is no comprehensive dedicated analytical system to address this issue, or research regarding this subject. The methodological emphasis at the initial stage of work was placed on data quality assessment through the authors' own data quality method [2]-using harmonic models and robust estimators in addition to the classical tests of outlier values with their iterative expansions. The results obtained demonstrate both the complementarity of the proposed solution in relation to classical methods as well as allowing a significant extension of the range of applications. The practical usefulness is also highly significant due to the high effectiveness and numerical efficiency as well as the simplicity of this new tool.

Global Background and Local Air Quality Problems
According to UNFPA (United Nations Population Fund), the world's population reached 7.715 billion in 2019. On a global scale, urban residents make up over 50% of the overall worldwide population. The same figure stands at approximately 75% for Europe alone. Moreover, estimates show that, by 2030, the world population (which will by then reach 8.5 billion) will include 5 billion city dwellers (over 60%) [3]. This means that the maintenance of air quality, especially in large urban areas, will be an increasingly serious challenge for institutions and governing bodies managing the quality of the environment. In various locations in the world, as indicated by the World Health Organization's data (WHO Global Ambient Air Quality Database), air quality deviates significantly not only from the rather restrictive WHO guidelines, but also from the usually more liberal local legal regulations. As a result, 92% of the world's population lives in conditions where the WHO standards are exceeded [4]. This in turn causes ambient air pollution to account for an estimated 4.2 million deaths per year due to stroke, heart disease, lung cancer and chronic respiratory diseases. Although the most unfavorable situation applies to some Asian and African countries and the Middle East, Poland is one of the most polluted countries in the European Union.
As a consequence of the relatively high emissions of air pollutants in Poland, limit values of particulate matter (PM 10 and PM 2.5 ) concentrations (according to 2004/107/EC Directive) [5] as well as benzo(a)pyrene (BaP) target values (according to 2008/50/WE Directive) [6] are regularly exceeded. In some areas (a relatively small number (4)(5)(6) of locations), calendar year limit values for nitrogen dioxide (NO 2 ) as well as target values for ozone (O 3 ) and arsenic (As) are also not complied with. The principal sources of air pollutant emissions to ambient air include the municipal and household sector as well as road transport. According to the European Environmental Agency's (EEA) Air Pollutant Emissions Data Viewer (most recent data from 2017), the municipal and household sector in Poland in 2017 was mostly responsible for the emission of PM 10 , PM 2.5 and carbon monoxide (CO). The annual emission of PM10 was 125,082 tons (Mg) (which is 50.8% of the total national emission), of PM 2.5 was 78,937 tons (53.6% of the total emission), and of CO was 1,598,900 tons (62.9% of the total emission). The highest share of this sector in the total emission balance concerns the benzo(a)pyrene-commercial and household sector, due to the fact that solid fuel (mainly coal and wood) incineration is responsible for 83.6% of the total BaP national emission (i.e., 34 tons). Meanwhile, road transportation is primarily responsible for nitrogen oxides (NOx) and CO emission. The annual national emission of NOx in 2017 accounted for 297,356 tons (which is 37.0% of the total national emission), and of CO was 588,444 tons (23.1% of the total emission). These two sectors of the Polish economy overwhelmingly shape the air quality, although the impact on the so-called background concentration also involves the sectors of energy production and distribution as well as industrial processes and product use. A quantitative (model) identification of diseases arising from long-term exposure (more than 9 years) to air pollution has never been made in Poland. Such identification will allow the estimation of the actual financial burden of air pollution to society. This paper is an interdisciplinary project addressing three main areas of concern: health, environment and economy. The three aspects are interconnected from a statistical perspective and hard-wired into information systems currently under construction. The wide thematic scope of this project will allow us to address only certain Gdańsk-specific questions with key implications for the achievement of the research objectives set out in this paper.

Social Background of the Issue and Literature Review
This section focuses on air pollution as one of the most dangerous environmental impacts on the development and functioning of the respiratory system.The key respiratory diseases include: asthma, chronic obstructive pulmonary disease (COPD), and respiratory infections.
Research shows that both short-and long-term exposure to common air pollutants at elevated concentrations is associated with heightened incidence and mortality rates for respiratory diseases [7,8]. One of the key pollutants with adverse effects on the respiratory tract is suspended particulate matter. Depending on particle size, suspended matter may penetrate various parts of the respiratory tract. Particulate matter deposits in the upper parts of the respiratory tract may aggravate the symptoms of asthma and COPD [9]. Water-soluble gaseous pollutants (e.g., SO 2 ) are absorbed mainly in the upper parts of the respiratory tract, promoting damage to upper airways and primary bronchi. Gases with lower water solubility (e.g., NO 2 and O 3 ) mainly affect the lower respiratory tract [10].
Bronchial asthma is estimated to affect approximately 235 million people globally, causing 345 thousand deaths every year [11]. The sharp increase in the worldwide incidence of asthma, especially in industrialized countries, has made it the most frequent chronic children's disease [12].
Chronic obstructive pulmonary disease (COPD) is characterized by partially irreversible restriction of airflow through the respiratory tract, triggering an inflammatory response to various harmful substances [13]. COPD is a major problem in developing and developed countries alike. Estimates suggest that, in 2020, the condition will have become the world's third leading cause of death and the fifth cause of motor impairment or even disability, generating high social and economic costs [14]. Poland has a high number of COPD and asthma sufferers (estimated at approx. 6 million in total), 80% of whom have not been adequately diagnosed and therefore not offered proper medical attention [15,16].
Smoking tobacco remains the largest risk factor for COPD, accounting for 80% of instances of this illness [14]. This is followed by exposure to occupational hazards and to polluted air [14,17]. However, COPD also affects non-smokers. Research conducted in the USA under the NHANES III project found that 19.2% of diagnosed cases of COPD among 10 thousand adults aged 30-75 were attributable to exposure to polluted air in the workplace. In the sample of non-smokers, exposure to occupational hazards accounted for 31.1% of cases of COPD [18]. The increase in COPD incidence worldwide cannot be satisfactorily accounted for solely by smoking tobacco without regard to any other factors [17].
Exposure to air pollutants substantially increases the incidence of respiratory infections, including pneumonia, especially in children [19] and the elderly [20]. It is noteworthy that pneumonia is among the leading causes of death in developed countries. As for the elderly, some studies argue that a link exists between short-term exposure to air pollutants and incidence of pneumonia [21,22]. Long-term exposure to air pollutants has also been shown to be a risk factor for respiratory infections. A survey conducted in Hamilton (Canada) on subjects aged over 65 revealed a correlation between heightened exposure to nitrogen dioxide and/or PM2.5 and an increased number of hospitalizations for pneumonia [22,23].

Economic Background of the Issue and Literature Overview
The economic effects of health loss due to diseases related to air pollution may be studied from a number of perspectives: financial (lost earnings), social (lost GDP), social insurance (pay-out on health insurance and disability pensions), and taxpayer (National Health Fund, Ministry of Health). The analysis considers direct (medical and non-medical) costs, indirect costs, and social costs representing the total burden to the patient and to the economy as a whole. Direct costs are financial burdens to society in the form of money transfers from the healthcare system to entities providing medical services. This cost group represents the main cost component of illness, as it takes the form of cash transfers flowing from the National Health Fund to hospitals or from patients to hospitals. These are not the only costs of illness, however. There are also indirect costs, which make up more than half of total medical costs [24], broadly defined as production losses [25]. Additionally, the indirect costs component also includes the costs of out-of-system medical care, costs of free-of-charge labor, compensation mechanisms, and the group dependency effect [26]. It is noteworthy that indirect costs of medical care are usually related to disease itself, while direct costs are usually associated with the treatment process or preventive measures. Therefore, by footing direct costs, it is possible to reduce indirect costs.
Due to a lack of financial data on procedures funded by the National Health Fund, this article will focus on the first group of costs, i.e., direct costs of medical treatment.

Materials and Methods
Primary data drawn from the following sources have been used to construct statistical models: Health data-National Health Fund database covering the period from 1 January 2010 to 31 December 2018 on all health services rendered in Poland by region (14,387,846 services).
Air pollution data-hourly data for the Tricity area collected from five measurement stations located within the Agglomeration of Gdańsk (AM2, AM3, AM5, AM6 and AM8) for the period from 2010 to 2018 ( Figure 1).
The stations are located close to the Bay of Gdańsk. This study includes gaseous pollutants (sulfur dioxide, nitrogen dioxide, nitrogen oxides, ozone, carbon oxide and carbon dioxide), particulate pollutants (PM 10 and PM 2.5 ) as well as weather parameters (temperature, relative air humidity, wind force and direction, and rainfall). Additional data on benzo(a)pyrene from the Chief Inspectorate of Environmental Protection's manually operated station in Gdańsk, covering the period from 1 January 2010 to 21 December 2018, have been copied from the air quality website at http://powietrze.gios.gov. pl/pjp/archives. The stations are located close to the Bay of Gdańsk. This study includes gaseous pollutants (sulfur dioxide, nitrogen dioxide, nitrogen oxides, ozone, carbon oxide and carbon dioxide), particulate pollutants (PM10 and PM2.5) as well as weather parameters (temperature, relative air humidity, wind force and direction, and rainfall). Additional data on benzo(a)pyrene from the Chief Inspectorate of Environmental Protection's manually operated station in Gdańsk, covering the period from 1 January 2010 to 21 December 2018, have been copied from the air quality website at http://powietrze.gios.gov.pl/pjp/archives. Ship traffic data come from the port of Gdańsk (54°25'N, 18°39'E) and cover the period from 1 January 2010 to 6 November 2017.
All of the data have been entered into a single database after being counted and added up (health data, ship traffic data) or averaged (Agency for Regional Monitoring of Air Pollution, Chief Inspectorate of Environmental Protection). The article utilizes a number of statistical methods and models (analysis of variance, ANOVA; analysis of co-variance, ANCOVA; Cluster Analysis, CA; Principal Component Analysis models, PCA; and others), of which GRMs (Generalized Regression Models) are the most important.
GRMs are an extension of the GLM (Generalized Linear Model) family. A general linear model may be treated as an extension of multiple linear regression for a single dependent variable. The multiple regression model underlies general linear regression. The general purpose of multiple regression (a term first used by Pearson in 1908) is to give a qualitative overview of relationships between multiple independent (controlled, explanatory) variables and dependent (criterion, explained) variables.
The basic multiple regression model in its general form is as follows: where: -explained variable; k-number of predictors (controlled variables). Ship traffic data come from the port of Gdańsk (54 • 25' N, 18 • 39' E) and cover the period from 1 January 2010 to 6 November 2017.
All of the data have been entered into a single database after being counted and added up (health data, ship traffic data) or averaged (Agency for Regional Monitoring of Air Pollution, Chief Inspectorate of Environmental Protection). The article utilizes a number of statistical methods and models (analysis of variance, ANOVA; analysis of co-variance, ANCOVA; Cluster Analysis, CA; Principal Component Analysis models, PCA; and others), of which GRMs (Generalized Regression Models) are the most important.
GRMs are an extension of the GLM (Generalized Linear Model) family. A general linear model may be treated as an extension of multiple linear regression for a single dependent variable. The multiple regression model underlies general linear regression. The general purpose of multiple regression (a term first used by Pearson in 1908) is to give a qualitative overview of relationships between multiple independent (controlled, explanatory) variables and dependent (criterion, explained) variables.
The basic multiple regression model in its general form is as follows: where: Y-explained variable; k-number of predictors (controlled variables).
Contrary to the multiple regression model, which is more suitable for analyzing continuous predictors (strong measurement scales, such as weather measurements or pollutant concentration measurements), the general linear model can be more readily applied to any instance of analysis of variance (ANOVA) featuring qualitative (categorized) predictors, to any instance of analysis of co-variance (ANCOVA) featuring qualitative (categorized) predictors, such as heating periods or rainfall, as well as any model of regressive analysis featuring continuous predictors.
In the case of qualitative predictors, the outcomes may be coded in experiment matrix X, using a re-parameterized model or a sigma-limiting model.
Contrary to other models, the Generalized Linear Regression Model is not a model in a strict sense, but a modeling pathway comprising a variety of model classes and estimation methods: Identical slopes model GRM consolidates all these models and allows for identification of a cause-and-effect relationship regardless of the measurement scale of independent variables.
The first step in model quality assessment is to verify how well empirical data fit into a model, that is to say, to test the goodness of fit, with the available error measures applied. The most commonly used metric for good fit assessment is the determination co-efficient: where: x t -values of variable X at time or period t, x t -theoretical value of variable X at time or period t, x-mean value of variable X in a time series on n observations, n-number of observations, k-number of explanatory variables.
This metric shows the goodness of the model's fit with the empirical data. Its primary advantage is normalization. In fact, the metric is simply an adjusted coefficient of determination based on the "penal factor," favoring multivariate models with fewer independent variables. In this study, this metric also plays an explanatory role in that it specifies what portion of the information pool on disease incidence variability could be accounted for by the model and therefore also to what extent the identified predictors account for total disease incidence variability. The evaluation of model errors, expressed as root mean square of the error, holds a central place in model quality assessment: where: This metric shows how far the actual values deviate from theoretical values determined in the model. Useful, though not always accurate, information can also be extracted from the random-error-based variability coefficient reflecting the mean level of the phenomenon: The formula gives an idea of the root mean square error concentration within the medium level. AICC (Akaike Information Criterion with correct for small size sample) criteria are generalized FPE (Akaike's Final Prediction Error Criterion) criteria proposed by Akaike.
The AICC criteria differ from AIC (Akaike Information Criterion) in weighting adjustment. AIC and AICC statistics are based on the quotient of the maximum likelihood function. Criterion design is done by comparing the estimated model with the full (ideal) model. The model with the lowest criterion value is thought to be the best.
Eventually, the formula proposed by Makridakis as an extension of the previous variants was adopted as the key selection criterion: The calculation methodology adopted in the economic section comprises aggregated cost categories, such as medical care costs, hospitalization costs, diagnostic costs, medication costs and costs of specialist consultancy services provided to in-patients in all hospitals in Gdańsk and Gdynia within the period in question. This breakdown into five cost categories is consistent with the standards for calculation of medical treatment costs adopted by Polish medical service providers as the basis for charging fees for contracted treatments. The value data have been drawn from a register of services subsidized by the National Health Fund and provided by the Clinic of Infectious Diseases and Allergology at the Military Institute of Medicine in Warsaw in 2018 for diseases selected in the course of this study as being correlated with air pollution in urban agglomerations. The classification includes a calculation of total medical costs of specific diseases, expressed as the combined products of lump-sum financial costs disclosed by the service provider. The data have been additionally refined by the inclusion of refund amounts paid monthly by the National Health Fund to medical service providers for each completed procedure relating to diseases covered by the register.

Results
For all the relevant diseases represented by the variables in the first column (Table 1), the percentage of valid observations (column "% Valid Obs") and basic distribution characteristics (asymmetry and kurtosis) have been calculated. The column "standard Normality tests" contains the results of a Kolmogorov-Smirnov [K-S] test, a K-S with Lilliefors correction and of a Shapiro-Wilk Francia test with Royston correction. The columns "Distribution 1st Similar" and "Distribution 2nd Similar" contain estimation results by the Maximum Likelihood estimation of the two most similar distributions based on the Minimum Likelihood Criterion. The tests have show a deviation from the normal distribution of variables without their levels being considered. For all variables, the Normal distribution (with possible Box-Cox logarithmic transformations) is the best empirical approximation of the investigated variables; therefore, it has been assumed that the distributions of empirical variables are largely similar to the normal distribution.
The next step was to identify, for each disease, factor models in relevant correlation with incidence rates for a specific disease (Table 2) without singling out emissions of maritime origin and factor models considering only sea winds (emissions of maritime origin) ( Table 3). A comparison of outcomes derived from the two models leads to clear conclusions regarding emissions of maritime or port origin with health impact on urban populations.

Identification of Disease Factors
The disease factors for the Agglomeration of Gdańsk were identified in two stages. The first stage was to identify models without singling out sea wind as a factor (full models), i.e., ones which considered all emission sources, both local and maritime. The second stage was to set up models for sea winds only (designation of variable: WSea) to allow the identification of factors attributable to ships entering/leaving the port and moored at berth, including those that continued to emit exhaust gases from their power generators during cargo-handling operations. The principal ingredients of pollution attributable to ships at the port include CO 2 , CO, SO x , NO x and PM [27]. With the entry into force in 2015 of the Sulphur Directive [28], the use of heavy fuel oils (HFO) was banned across the entire Baltic Sea area for ships without adequate equipment (scrubbers) to purify sulfur emissions. While shipping no longer seems to emit appreciable amounts of SO 2 , the other pollutants, especially NO 2 and PM, still continue to seriously pollute urban air within the agglomeration.
With this in mind, a list was drawn up to identify diseases whose incidence within the Agglomeration of Gdańsk is attributable to air pollution generated, among other sources, by the city port. The diseases include abnormalities of heartbeat (R00), cough (R05), abnormalities of breathing (R06), pain in the throat and chest (R07), acute nasopharyngitis, acute sinusitis, acute pharyngitis, acute tonsilitis, acute laryngitis and tracheitis, acute obstructive laryngitis and epiglottitis, acute upper respiratory infections of multiple or unspecified sites (J00-J06), influenza due to unidentified virus (J11), viral pneumonia (J12-J18), acute bronchitis, broncholitis and unspecified acute lower respiratory infection (J20-J22), vasomotor and allergic rhinitis (J30), chronic rhinitis, nasopharyngitis and pharyngitis, sinusitis, nasal polyp and other disorders of the nose and nasal sinuses (J31-J34), chronic laryngitis and laryngotracheitis, diseases of the vocal cords and larynx not elsewhere classified, and other diseases of the upper respiratory tract (J37-J39), bronchitis (J40-J42), emphysema and other chronic obstructive pulmonary diseases (J43-J44), asthma and status asthmaticus (J45-J46), bronchiectasis (J47), angina pectoris (I20), acute myocardial infarction, subsequent myocardial infarction and certain current complications following acute myocardial infarction (I21-I23), cerebral infarction and stroke not specified as haemorrhage or infarction (I63-I64), as well as occlusion and stenosis of precerebral arteries not resulting in cerebral infarction (I65-I66). However, their correlation factor (compare the "%Var" columns in Tables 2 and 3) ranges widely from 2.0% to 79.1%. Also, separate analyses are required for outcomes obtained for urban pollution not including shipping operations at port and for pollution outcomes including that factor. The details are presented in Tables 2 and 3.
The most noteworthy three out of the 19 identified diseases are acute severe asthma, chronic obstructive pulmonary disease and pneumonia. Air pollution penetrates into the body by means of the lungs, which act as a protective filter and therefore bear the brunt of infections trasmitted by air. The most severe consequences include pneumonia, which is a cause of numerous deaths, acute severe asthma, as well as COPD, a life-threatening condition which exposes the healthcare system in Poland to considerable financial losses. For that reason, the further discussion will focus in detail on these three diseases to explain the correlations indentified by stochastic models.

Acute severe asthma [J45-J46]
The applied full models account for 12.2% of variability in incidence rates for bronchial asthma (R 2 ) depending on air pollution. Statistically relevant factors correlating with incidence include: The findings point to a strong influence of annual seasonality with periodical peaks of incidence related to natural trends, variable concentrations of CO 2 and O 3 , and changing patterns of ship traffic where causes may include the spread of chemical compounds resulting from loading/unloading operations (CO 2 , NO 2 ).
The findings point to the prominent role of CO 2 , O 3 and NO 2 . These compounds are powerful on their own (NO 2 ), as well as in combination with wind (WV). Interestingly, a statistically relevant interaction of O 3 with particulate matter PM 10 has been found.
Models for sea wind direction account for approx. 6% of variability in incidence rates for asthma. The following variables have been identified as statistically relevant factors: WV.Wsea * ShipNo | NO 2 .Wsea * PRES.Wsea | PM 10 .Wsea * PM 2.5 .Wsea | PRES.Wsea | RAIN.Wsea * PM 2.5 .Wsea This is a refinement of the findings explaining the influence of pollutants (NO 2 , PM 10 ) attributable to ship traffic, and of PM 2.5 in conjunction with PM10 and in combination with rain.

Chronic obstructive pulmonary disease (COPD) [J43-J44]
Full models account for 13.1% of variability incidence rates for COPD (cf. Table 2). Statistically relevant factors correlating with incidence include: MM | TRJ.NO 2 * ShipNo | YYYY*TRJ.PM10 | TRJ.O 3 * TRJ.BaP The findings show that the factors with the most impact on incidence of COPD within the Tricity area include: NO 2 related to ship traffic, PM 10

and interactions between O 3 and BaP.
Another factor with a strong influence on incidence is seasonality. Models for sea wind direction account for approx. 9.6% of the variability. Statistically relevant factors include the following variables: TEMP.Wsea | NOX.Wsea*WV.Wsea | RAIN.Wsea Y_N*TEMP.Wsea That is to say, temperature fluctuations and seasonality of incidence as well as NOx in conjunction with sea wind (WV.Wsea).

Pneumonia [TRJ_sum_J12_18]
Full models account for 46.9% of variability in incidence rates for pneumonia (Table 2). This is among the highest values, pointing to a strong correlation with incidence.
Statistically relevant factors include the following variables: MM | MM * YYYY | NO 2 * WV | PM 2.5 * WV | YYYY * PRES | PM 2.5 * BaP | DD * MM | CO. WV * BaP | TEMP * ShipNo | SO 2 * ShipNo | NO 2 * HUMID | O 3 * PM 2. 5 The findings show that the leading factors with impact on the incidence of pneumonia within the Tricity area include: NO 2 , PM 2.5 , CO, BaP and SO 2 related to emissions from ships.
In addition to the seasonality (MM.YYYY.DD), incidence is also affected by weather conditions (wind, temperature, humidity). Noteworthy in this regard are interactions between O 3 and PM 2.5 and between ship traffic and temperature (TEMP * ShipNo).
Models for sea wind direction account for approx. 49% of the variability, a high proportion. Statistically relevant factors include the following variables: These models give a more detailed picture of the situation by pointing to a strong influence of BaP, both periodically (YYYY * BaP) and in conjunction with O 3 . Also, previous findings have been confirmed: CO 2 and SO 2 blown in from the sea and ports have a serious impact on incidence.

Identification of the Financial Costs of Medical Treatment
Treatment costs to medical service providers have been calculated for the 19 diseases identified above on the basis of medical treatment price lists for various types of respiratory diseases supplied by the Military Institute of Medicine. Due consideration has been given to the criterion of relevance to life safety. The calculation covers only three diseases named in Section 3.1, items 1-3. This has helped to establish the average cost rates for each of the five cost groups and the amounts refunded by the National Health Fund. The figures are set out in Table 6. Even a cursory analysis has shown a wide disparity between actual costs paid by medical service providers (hospitals) and amounts recovered from the National Health Fund in reimbursement for these services. Funding gaps are widest for pneumonia. The figures presented below may be utilized in the future to conduct a more detailed study of the Polish healthcare system with the aim of closing the gap between medical costs and available refunds. This also shows the scale of expenditure on specific diseases.

Regarding the Health and Environmental Component
The research presented in this paper is a preliminary pilot study bringing together, for the first time, a vast collection of over 14 million records on medical services with environmental data. The study has clearly shown a cause-and-effect relationship between air pollution generated by industrial operations, ports and shipping, and disease incidence rates within the Agglomeration of Gdańsk. However, the results are far from final. More research is necessary to study other agglomerations and cities for a more accurate understanding of relations between the impact factors, the agglomeration's location and the associated pollution levels. Such further research will verify the accuracy of the GRM model.
Another area of concern relates to the coverage of data collected by public health institutions (mainly the National Health Fund-NFZ) and environmental monitoring bodies. Record-keeping should preferably evolve towards a detailed description of medical services to allow the identification of the gender and age of patients, length of medical leave due to a particular disease, and type of medical services provided. As for environmental data, it would be desirable to increase the number of atmospheric measurement stations to allow the investigation of the spatial distribution of incidence of diseases attributable to air pollution. It would also be useful to map the location of the major emitters of pollutants (e.g., refineries, garbage incineration plants and timber mills).
The areas for future research into the impact of pollution on diseases include both methodological and factual aspects of the problem. Further work in these areas should identify pollutants with the greatest impact on the incidence of various diseases in society and, by extension, also set up a framework to combat pollution. Such a framework should indicate action to target those sources of pollution whose elimination will be the most beneficial.

Regarding the Economic Component
Within the economic section of the paper, some concerns may arise as to the calculated average costs of treatment of selected diseases. However, without listing these costs separately by specific types of medical services (hospitalization, medication, diagnostics, etc.), it would be impossible to accurately identify which financial costs are incurred by urban air pollution. That is why it is necessary to keep records listing the costs incurred by a medical service provider in offering specific medical procedures. Such records should preferably indicate the amount of National Health Fund (NFZ) refunds provided under contract with a specific medical service provider. Such a step will paint a more accurate picture of the Polish healthcare system's funding shortfall.
Also, establishing a correlation between incidence rates for diseases resulting from air pollution with their treatment costs would allow for a much more adequate framework to be set up to support the process of treatment and to match refunds with service provider needs. In effect, this would considerably reduce the disparities between financial expectations and actual cash flows. It would therefore be necessary to examine in more detail the issue of shortfall in the funding.
With respect to further detailed research, it would be desirable to analyze the cost variability over time of the treatment of diseases related to ship traffic within ports so as to identify the economic impact of ports on the health condition of populations residing in port cities. It would be especially interesting to look at cost variation in the Baltic Sea Region in response to the 2015 Sulphur Directive.
A further step with interesting implications for social economics would be to extend the calculations to the external costs of diseases. These costs would have to be correlated with average salaries, the resulting lost earnings, costs to employers, who-according to Polish law-are obliged to pay sickness benefits for the first 33 days of sick leave, and costs to the Polish Social Insurance Company (ZUS), which is obliged to provide sick pay from the 34th day onwards. This would set the stage for even more in-depth research to identify the full indirect costs of diseases and total economic costs of urban air pollution. This would have the added benefit of elaborating in-depth cost optimization models for medical treatment, while also identifying extreme deviations from average values, the structure of medical treatment costs, and how these figures differ across various medical service recipients.

Conclusions
This study is a pilot project, as the identified models require in-depth analysis and in-depth interpretation in order to fully understand the impact on every disease. It is also necessary to carry out a comparative study of the findings with their counterparts for other Polish cities such as Cracow and Warsaw. Whatever findings are now available represent high value as they reveal, for the first time in Poland, some significant impact factors and their interactions that contribute to selected diseases in the long term. However, further research is necessary to fully understand the statistical importance of disease factors depending on the region and degree of local pollution. This would allow for a full estimate of costs to urban populations as a result of industrial and urban air pollution.

Conflicts of Interest:
The authors declare no conflict of interest.  Acute heart attack TRJ_I22

Appendix A
Another heart attack (reinfarction) TRJ_I23 Complications occurring during acute myocardial infarction TRJ_I24 Other acute forms of ischemic heart disease TRJ_I25 Chronic ischemic heart disease TRJ_I46 Cardiac arrest TRJ_I47 Paroxysmal tachycardia TRJ_I48 Atrial fibrillation TRJ_I49 Other cardiac arrhythmia TRJ_I50 Heart failure TRJ_I51 Heart disease not precisely defined and complications of heart disease TRJ_I52 Other cardiac dysfunction in diseases classified elsewhere TRJ_I63 Cerebral infarction TRJ_I64 Stroke, not defined as hemorrhagic or infarcted TRJ_I65 Blockage and narrowing of the pre-cerebral arteries that do not cause cerebral infarction TRJ_I66 Blockage and narrowing of the cerebral arteries that do not cause cerebral infarction TRJ_I67 Other cerebrovascular diseases TRJ_I68 Cerebrovascular disorders in diseases occurring elsewhere TRJ_I69 Consequences of cerebrovascular diseases TRJ_R00 Heart disorders TRJ_R05 Cough TRJ_R06 Breathing disorders TRJ_R07 Sore throat and chest TRJ_J00 Acute inflammation of the nose and throat (common cold) TRJ_J01 Acute sinusitis TRJ_J02 Acute pharyngitis TRJ_J03 Acute tonsillitis TRJ_J04 Acute laryngotracheitis TRJ_J05 Acute obstructive laryngitis and epiglottitis TRJ_J06 Acute upper respiratory tract infection with multiple or unspecified localization TRJ_J11 Flu caused by an unidentified virus TRJ_J12 Viral pneumonia, not elsewhere classified TRJ_J13 Streptococcal pneumonia (Streptococcus pneumoniae) TRJ_J14 Pneumonia caused by influenza bacillus (Haemophilus influenzae) TRJ_J15 Bacterial pneumonia, not elsewhere classified TRJ_J16 Pneumonia caused by other microorganisms not elsewhere classified TRJ_J17 Pneumonia in diseases classified elsewhere TRJ_J18 Pneumonia caused by an unspecified microorganism TRJ_J20 Acute bronchitis TRJ_J21 Acute bronchiolitis TRJ_J22 Unspecified acute lower respiratory infection TRJ_J30 Angioedema and allergic rhinitis TRJ_J31 Chronic nasopharyngitis TRJ_J32 Chronic sinusitis TRJ_J33 Nasal polyp TRJ_J34 Other diseases of the nose and paranasal sinuses TRJ_J35 Chronic tonsil and pharyngeal tonsil diseases TRJ_J36 Peritonsillar abscess TRJ_J37 Chronic laryngitis and tracheitis TRJ_J38 Inflammation of the vocal cords and larynx, not elsewhere classified TRJ_J39 Other diseases of the upper respiratory tract TRJ_J40 Bronchitis not defined as acute or chronic TRJ_J41 Chronic, simple, and mucopurulent bronchitis TRJ_J42 Unspecified chronic bronchitis TRJ_J43 Emphysema TRJ_J44 Other chronic obstructive pulmonary disease TRJ_J45 Bronchial asthma TRJ_J46 Status asthmaticus TRJ_J47 Bronchiectasis

Appendix B
Due to the volume (several thousand verified factor interactions), the final results of significance tests will be presented synthetically.
The GRM model identification process took place in three main general stages: In the first stage, models without interaction were tested, and the significance of each parameter was tested using standard tests based on the Student's t distribution, the significance of the model based on Fisher's distribution (F test), and the degree of variation explanation by the model (multiple R, R 2 and corrected R 2 ).
Second, the next step was to identify, in addition, all statistically significant interactions of independent variables to the second degree. For this purpose, two parallel iterative procedures for model building were used: forward stepwise and best subset with the R 2 criterion. As a result of comparing the results of both iterative procedures, independent variables significantly related to the dependent variable (selected cases of diseases) were selected.
The final stage of model identification was to build the model only with variables statistically significantly associated with the dependent variable. If there was more than one such model, the one for which R 2 was higher was chosen and the assumptions of its applicability were examined, i.e., the normality of the random component within the level of each factor, and each interaction, was assessed separately. Due to the large volume of results for presentation, a table was built with the names of factors significantly related to the dependent variable (according to Appendix A).
In some cases, it was necessary to repeatedly test various factor systems and their interactions, despite the use of iterative procedures. The final model was a model with the least number of factors at the highest or similar level of explained variance to more complex models.
In the intermediate identification stages, selected additional tools were used: Pareto charts, Ljung Box Pierce Q tests (in model stationary testing), and another visualization tool.
Appendix B presents the final working result (print from the Statistica system, with manual corrections) tables of model estimations, results of significance tests for each factor and interaction in the models, as well as selected intermediate stages of the process of identifying selected variables. Due to the size of the result sets (in the order of several thousand pages), it is not possible to present all the detailed result sets.
Sample, selected results of the model identification process for TRJ_sum_J00_J06: