Data-Driven Modelling of the Complex Interaction between Flocculant Properties and Floc Size and Structure

: Polymeric ﬂocculants are widely used due to their ability to e ﬃ ciently promote ﬂocculation at low dosages. However, fundamental background knowledge about how they act and interact with the substrates is often scarce, or insu ﬃ cient to infer the best chemical conﬁguration for treating a speciﬁc e ﬄ uent. Inductive, data-driven approaches o ﬀ er a viable solution, enabling the development of e ﬀ ective solutions for each type of e ﬄ uent, overcoming the knowledge gap. In this work, we present such an inductive workﬂow that combines the statistical design of experiments and predictive modelling, and demonstrates its e ﬀ ectiveness in the development of anionic polymeric ﬂocculants for the treatment of a real e ﬄ uent from the potato crisps manufacturing industry. Based on the results presented, it is possible to conclude that the hydrodynamic diameter, charged fraction and concentration are the parameters with a stronger inﬂuence on the characteristics of ﬂocs obtained when using copolymers, while the charged fraction, concentration and hydrophobic content present a stronger inﬂuence on the characteristics of ﬂocs obtained using terpolymers containing a hydrophobic monomer.


Introduction
Coagulation/flocculation strategies are widely used in effluent treatment due to their capacity to destabilize and aggregate colloids. Flocculation is quite often used in combination with other more advanced techniques. Organic polymeric flocculants are the most frequently used due to their ability to flocculate efficiently at low dosages, producing large aggregates, contrary to what happens when using traditional inorganic coagulants. Polyelectrolytes are water-soluble macromolecules, natural or synthetic, containing ionic charges along the polymer chain. Depending on their charge, they can be classified as anionic (negative charge), cationic (positive charge) or amphoteric (both negative and positive charges). Chemical structure, charge density and molecular weight are considered as the most important features influencing their performance and application. Depending on the type, charge distribution and molecular weight, they can be used for many applications in industry, namely as flocculants in effluent treatment [1].
The separation of particles from suspensions by polymers can be associated to different flocculation mechanisms such as charge neutralization, polymer bridging and electrostatic patch interactions, which depend strongly on the way the adsorption of flocculants on the particle surfaces occurs, which, on the other hand, are related to the chemical affinity between the polymer and the particle Regarding the use of flocculation in the treatment or pre-treatment of this type of effluent, it may be necessary to modify the commonly used polyelectrolytes due to the hydrophobic nature of the organic matter. Thus, the use of terpolymers, by introducing in the polymer chain a certain amount of a hydrophobic monomer, should favour the flocculation process and thus removal of suspended material, eventually with lower amounts of polymer. However, terpolymers are usually more expensive, requiring a more controlled synthesis process, and it is important to compare their performance with that of more common copolymers, which are normally less expensive.
In a previous work [15], the authors presented an evaluation of the flocculation process used in the treatment of effluent from the potato crisps manufacturing industry, using laser diffraction spectroscopy (LDS) as the technique for flocculation monitoring. In the present study, making use of the data obtained through LDS, an extensive multivariate statistical analysis was conducted in order to identify the most critical flocculant parameters influencing the floc size and structure. The most appropriate flocculant for the treatment of a certain wastewater can thus be pre-identified using an advanced, data-driven screening process. Floc properties related to the application of an additive with specific characteristics can be easily assessed, minimizing the use of experimental resources. The data-driven approach consists of a first stage where the statistical design of experiments is conducted and evaluated. Then, upon completion of the experimental trials in a random order, the LDS results are collected and analyzed. This second stage comprises the development of predictive models explaining the variability in the responses (outputs obtained in the experimental trials), which will then be used to assess the importance of the several factors under study in floc size and structure, as well as their interactions. This model will also provide the basis for eventually defining the optimal set of factors leading to the flocculant with the desired properties being able to be defined by the user.

Materials
Health-friendly anionic polyelectrolytes, copolymers and terpolymers of acrylamide (AAm) and acrylamido-2-methyl-1-propanesulfonic acid sodium (Na-AMPS)-synthesized by inverse-emulsion polymerization, with different charge densities, and fully described and characterized in a previous manuscript from the authors [15]-were used for the flocculation experiments. In the case of the terpolymers, three different hydrophobic monomers were used: stearyl methacrylate (SMA), ethyl acrylate (EA) and lauryl methacrylate (LMA). Table 1 presents a summary of the composition of the polyelectrolytes used. The flocculant solutions were prepared with distilled water at 0.4% (w/w). In order to guarantee the effectiveness of the flocculants, the diluted solutions were prepared every day. Table 1. A summary of the anionic polyelectrolytes used [15]. The initial composition at the beginning of polymerization and the organic phase used in the formulation are supplied. Copolymers Poly(AAm-Na-AMPS): 50AC, 80AC, 50AP and 80AP. Terpolymers Poly(AAm-Na-AMPS-EA): 50A1EC, 50A3EC, 80A1EC and 80A3EC. Terpolymers Poly(AAm-Na-AMPS-LMA): 50A1LC, 50A3LC, 80A1LC and 80A3LC. Terpolymers Poly(AAm-Na-AMPS-SMA): 50A1SC, 50A3SC, 80A1SC and 80A3SC. Charge density was determined by elemental analysis using an element analyser, EA 1108 CHNS-O (Fisons), and 2,5-Bis(5-tert-butyl-benzoxazol-2-yl) thiophene as the standard. C, H and N elemental analyses were performed, and the values for the N element were used in the calculation of the charged fraction. At least three measurements for each sample were performed.
The flocculation tests were performed on an industrial oily effluent from the potato chips manufacturing industry, supplied by the Adventech Group (Portugal).

Flocculation Process Monitoring
LDS was used to monitor the flocculation process under low stirring conditions. LDS gives information on the evolution with time of the median floc size, d (0.5), and simultaneously, on the floc structure described by the scattering exponent [16], SE, which is indicative of floc compactness. Detailed information regarding this procedure is available elsewhere [15]. The tests were performed in a Malvern Masterziser 2000 (Malvern Instruments, Enigma Business Park, UK). Two hundred millilitres of effluent sample were added to 600 mL of distilled water in the equipment beaker, as was the right amount of hydrochloric acid in order to reach a pH of 6. A pH of 6 was selected for the tests since it was the one that led to higher destabilization of the effluent during the off-line pretesting (a lower charge of the effluent). Flocculants were tested for concentrations from 3.3 mg/L to 13 mg/L, and the right amount of flocculant was added at once to the effluent. During the flocculation process, the suspension vessel was stirred mechanically using the sample unit facilities of the Malvern Mastersizer 2000, at a low stirring speed of 300 rpm (to prevent floc breakage [15]). The size of the flocs was measured every 36 s for a period of 6 min, till the floc size stabilized. The scattering exponent (SE) of the flocs, which provides information about floc structure, was calculated off-line at the end of the flocculation process, once the floc size stabilized (6 min).

The Statistical Design of Experiments and Multivariate Data Analysis
The statistical design of experiments (DOE) is a systematic approach to define the process conditions at which experiments are to be carried out, in order to extract the maximum information for a given purpose [17,18]. DOE involves the simultaneous consideration of several factors (typically 5-10), which can be quantitative (e.g., temperatures, flows, etc.) or qualitative (type of flocculant, polymer, etc.), and that can be associated with linear, bilinear or non-linear effects on the response. In this work, the type of DOE adopted was a full factorial design. This design allows the estimation of all of the main effects and two-factor interactions (in the present case, the design even allowed for the estimation of a possible quadratic effect of concentration). Therefore, it is a suitable design for screening and modelling purposes, with full resolution to estimate all terms in Equation (1), where y i stands for the response in the i th experimental run, and x i,j represents the i th value of the j th factor or input variable. Furthermore, additional covariates were also included in the model to capture the influence of other potentially relevant variables, which were not systematically manipulated in the DOE framework. The data collected from the experiments planned under a DOE approach were analysed with resort to modeling frameworks, usually based on Ordinary Least Squares regression. Ordinary least squares (OLS) regression is a method used in the analysis of linear and non-linear relations between a response variable and one or more predictor variables. OLS regression provides estimates for the parameters of a linear regression such as the one presented in Equation (1), which are optimal in the least squares sense [19].
OLS is a suitable methodology when the factors are uncorrelated. However, when they present correlations, OLS estimates become unstable, and the model predictions unreliable. This is known as the collinearity problem of OLS, which can be diagnosed with the support of statistical metrics, such as the Variance Inflation Factor (VIF). When collinearity is a problem, multivariate techniques such as Partial Least Squares (PLS) regression should be applied instead. PLS finds the linear combinations of factors that present maximum covariance with the response, and derives a model linking such linear combinations (or latent variables) and the response. In this way, collinearity is no longer a problem, as correlated variables appear together in the linear combinations, which act as new predictors. The estimated model can be recasted in the format of a conventional regression model, the only difference being that the model parameters are estimated using different estimation principles [20]. In the present study, five predictor variables were considered, corresponding to different characteristics of the flocculants: the hydrophobic content (design factor), the number of methylene groups in the hydrophobic aliphatic chain (Nr of carbons in hydrophobic chain design factor), the concentration of polymer in the process (design factor), the charged fraction (covariate) and the hydrodynamic diameter of the polymer (Rh) (covariate). Molecular weight was not considered as a predictor, since there is a linear correlation between molecular weight and hydrodynamic diameter for the same type of polymer [22], and thus there is no need to include both in the model (mitigating, in this way, the collinearity problem). The response variables are the scattering exponent (SE) and median floc size (d0.5), both determined after 6 min of flocculation. Different combinations of the design factor levels were considered according to the selected DOE design, and the experimental responses recorded together with the values for the covariate variables. Table 2

Results and Discussion
In order to maximize the insights extracted from data analysis, the experimental results were treated from different perspectives. More specifically, the responses (SE, d0.5) were modelled separately, and the type of polymer employed (copolymer or terpolymer) was considered both separately and altogether. The analysis of each situation is reported next, in separate subsections: • Case 1: The prediction of the SE for copolymers. • Case 2: The prediction of the d0.5 for copolymers. • Case 3: The prediction of the SE for terpolymers. • Case 4: The prediction of the d0.5 for terpolymers. • Case 5: The prediction of the SE for copolymers and terpolymers. • Case 6: The prediction of the d0.5 for copolymers and terpolymers.

Case 1: The Prediction of the SE for Copolymers
In the case of copolymers, the predictor set (i.e., the set of experimental factors or input variables) is formed by the design factor concentration, and the covariates hydrodynamic diameter and charged fraction. A model was developed using OLS, resulting in the significant effects presented in Figure 1. The p-value reflects the significance of each effect: the lower the p-value, the more significant the effect is (i.e., the more likely it is to be different from zero). A p-value below 0.01 (the significance level adopted) is considered to be statistically significant. LogWorth is defined as −log10 (p-value). This transformation adjusts p-values to a more appropriate scale. A value of LogWorth exceeding 2 is considered to be significant (it corresponds to a p-value <0.01). The bar graph shows the LogWorth values and a line at 2 for reference of statistical significance. The hydrodynamic diameter is the most important factor in this case, followed by the charged fraction and concentration. effect is (i.e., the more likely it is to be different from zero). A p-value below 0.01 (the significance level adopted) is considered to be statistically significant. LogWorth is defined as −log10 (p-value). This transformation adjusts p-values to a more appropriate scale. A value of LogWorth exceeding 2 is considered to be significant (it corresponds to a p-value <0.01). The bar graph shows the LogWorth values and a line at 2 for reference of statistical significance. The hydrodynamic diameter is the most important factor in this case, followed by the charged fraction and concentration.  Figure 2 shows the summary of fit report. RSquare estimates the proportion of variation in the dependent variable that is explained by the estimated regression model. An RSquare close to 1 indicates a good fit to the experimental data, whereas an RSquare near 0 indicates that the model is  Figure 2 shows the summary of fit report. RSquare estimates the proportion of variation in the dependent variable that is explained by the estimated regression model. An RSquare close to 1 indicates a good fit to the experimental data, whereas an RSquare near 0 indicates that the model is not capable to explain the variation of the response. For this specific case, RSquare is about 0.97, suggesting a good fitting ability. Rsquare Adj is the RSquare adjusted for the number of parameters in the model. Root Mean Square Error estimates the standard deviation of the random error.  The parameter estimates report shows the estimates of the model parameters and the associated significance (Prob>|t|, the p-value associated to each coefficient) ( Figure 3). Std Error represents the standard deviation of the estimated coefficients. The VIF (Variance Inflation Factor) checks for the presence of collinearity in the predictors. High VIFs indicate a collinearity issue among the terms in the model. This value should be lower than 5 and never larger than 10. The VIF values obtained in this case indicate no collinearity issues, suggesting that the regression method used is suitable for describing the experimental data.   The parameter estimates report shows the estimates of the model parameters and the associated significance (Prob>|t|, the p-value associated to each coefficient) ( Figure 3). Std Error represents the standard deviation of the estimated coefficients. The VIF (Variance Inflation Factor) checks for the presence of collinearity in the predictors. High VIFs indicate a collinearity issue among the terms in the model. This value should be lower than 5 and never larger than 10. The VIF values obtained in this case indicate no collinearity issues, suggesting that the regression method used is suitable for describing the experimental data.  The parameter estimates report shows the estimates of the model parameters and the associated significance (Prob>|t|, the p-value associated to each coefficient) ( Figure 3). Std Error represents the standard deviation of the estimated coefficients. The VIF (Variance Inflation Factor) checks for the presence of collinearity in the predictors. High VIFs indicate a collinearity issue among the terms in the model. This value should be lower than 5 and never larger than 10. The VIF values obtained in this case indicate no collinearity issues, suggesting that the regression method used is suitable for describing the experimental data.     values that are less than 0.05). Figure 4 shows an actual by predicted plot, which plots the observed values against the predicted values of the response. It is possible to verify that all the observations lay within the prediction intervals, confirming the stability and accuracy of the model developed for the SE for the copolymers.

Case 2: The Prediction of the d0.5 for Copolymers
For this case, considering the effect summary table (Figure 5), the variables with the highest positive contribution to floc size are concentration and hydrodynamic diameter, but the influence of charged fraction, even though smaller, cannot be overlooked. For this case, considering the effect summary table (Figure 5), the variables with the highest positive contribution to floc size are concentration and hydrodynamic diameter, but the influence of charged fraction, even though smaller, cannot be overlooked.  Figure 6 shows the corresponding summary of fit report for this case. The RSquare achieved is of about 0.92, also indicating a good fitting ability. The parameter estimates report (Figure 7) shows the VIF values obtained for this case, which are all below 5, indicating no collinearity issues and suggesting that the regression method used is suitable for this model.    For this case, considering the effect summary table (Figure 5), the variables with the highest positive contribution to floc size are concentration and hydrodynamic diameter, but the influence of charged fraction, even though smaller, cannot be overlooked.  Figure 6 shows the corresponding summary of fit report for this case. The RSquare achieved is of about 0.92, also indicating a good fitting ability. The parameter estimates report (Figure 7) shows the VIF values obtained for this case, which are all below 5, indicating no collinearity issues and suggesting that the regression method used is suitable for this model.   The parameter estimates report (Figure 7) shows the VIF values obtained for this case, which are all below 5, indicating no collinearity issues and suggesting that the regression method used is suitable for this model. For this case, considering the effect summary table (Figure 5), the variables with the highest positive contribution to floc size are concentration and hydrodynamic diameter, but the influence of charged fraction, even though smaller, cannot be overlooked.  Figure 6 shows the corresponding summary of fit report for this case. The RSquare achieved is of about 0.92, also indicating a good fitting ability. The parameter estimates report (Figure 7) shows the VIF values obtained for this case, which are all below 5, indicating no collinearity issues and suggesting that the regression method used is suitable for this model.

Case 3: The Prediction of the SE for Terpolymers
In the case of terpolymers, the predictor set is composed by the design factors-the hydrophobic content, number of methylene groups in the hydrophobic monomer and polymer concentration-and by the covariates: the charged fraction and hydrodynamic diameter. For this case, according to the effect summary table (Figure 9), the charged fraction is the factor having the highest contribution to the response, followed by the hydrodynamic diameter, the number of methylene groups in the hydrophobic chain, the interaction between the charged fraction and hydrodynamic diameter, and that between the number of methylene groups in the hydrophobic aliphatic chain and the charged fraction. Still, as in the previous cases, the two main parameters most influencing the response are the charged fraction and hydrodynamic diameter. For this specific case, the charged fraction seems to have even more influence than the hydrodynamic diameter, in contrast with the last two cases (for copolymers), perhaps due to the superior performance of the 50 polymer series, when compared with the 80 series, for the terpolymers tested in this effluent.

Case 3: The Prediction of the SE for Terpolymers
In the case of terpolymers, the predictor set is composed by the design factors-the hydrophobic content, number of methylene groups in the hydrophobic monomer and polymer concentrationand by the covariates: the charged fraction and hydrodynamic diameter. For this case, according to the effect summary table (Figure 9), the charged fraction is the factor having the highest contribution to the response, followed by the hydrodynamic diameter, the number of methylene groups in the hydrophobic chain, the interaction between the charged fraction and hydrodynamic diameter, and that between the number of methylene groups in the hydrophobic aliphatic chain and the charged fraction. Still, as in the previous cases, the two main parameters most influencing the response are the charged fraction and hydrodynamic diameter. For this specific case, the charged fraction seems to have even more influence than the hydrodynamic diameter, in contrast with the last two cases (for copolymers), perhaps due to the superior performance of the 50 polymer series, when compared with the 80 series, for the terpolymers tested in this effluent.  The parameter estimate report is present in Figure 11. The VIF values obtained in this case are also below 5, indicating no collinearity issues and leading to the conclusion that the model is suitable The summary of fit report ( Figure 10) presents an RSquare of about 0.81, indicating a good fitting ability.
The parameter estimate report is present in Figure 11. The VIF values obtained in this case are also below 5, indicating no collinearity issues and leading to the conclusion that the model is suitable for this case. The summary of fit report ( Figure 10) presents an RSquare of about 0.81, indicating a good fitting ability. The parameter estimate report is present in Figure 11. The VIF values obtained in this case are also below 5, indicating no collinearity issues and leading to the conclusion that the model is suitable for this case.   The summary of fit report ( Figure 10) presents an RSquare of about 0.81, indicating a good fitting ability. The parameter estimate report is present in Figure 11. The VIF values obtained in this case are also below 5, indicating no collinearity issues and leading to the conclusion that the model is suitable for this case.

Case 4: The Prediction of the d0.5 for Terpolymers
Following the same analysis steps for case 4, one obtains the effect summary table presented in Figure 13, where concentration is clearly the most important factor. However the parameters that are the hydrodynamic diameter, charged fraction and interaction between the charged fraction and concentration cannot be neglected. For this case, in the summary of fit report (Figure 14), RSquare is about 0.78, indicating a significant fitting capability.

Case 4: The Prediction of the d0.5 for Terpolymers
Following the same analysis steps for case 4, one obtains the effect summary table presented in Figure 13, where concentration is clearly the most important factor. However the parameters that are the hydrodynamic diameter, charged fraction and interaction between the charged fraction and concentration cannot be neglected.

Case 4: The Prediction of the d0.5 for Terpolymers
Following the same analysis steps for case 4, one obtains the effect summary table presented in Figure 13, where concentration is clearly the most important factor. However the parameters that are the hydrodynamic diameter, charged fraction and interaction between the charged fraction and concentration cannot be neglected. For this case, in the summary of fit report (Figure 14), RSquare is about 0.78, indicating a significant fitting capability. For this case, in the summary of fit report (Figure 14), RSquare is about 0.78, indicating a significant fitting capability. For this case, in the summary of fit report (Figure 14), RSquare is about 0.78, indicating a significant fitting capability. The parameter estimates report (Figure 15) shows the VIF values obtained in this case, which are below 5, indicating no collinearity issues and suggesting that the regression method used is suitable for this model. The parameter estimates report (Figure 15) shows the VIF values obtained in this case, which are below 5, indicating no collinearity issues and suggesting that the regression method used is suitable for this model.  Figure 16 shows the actual response against the predicted response for this case. The model describes the observations quite well for most of the observations. However, a higher number of points fall out of the prediction interval. It must be stressed that the models considering terpolymers have higher numbers of observations, making it more difficult to derive a single model fitting all the points.

Case 5: The Prediction of the SE for Copolymers and Terpolymers
We have also investigated the possibility of building a global model to simultaneously fit all of the polymers considered. The last two cases reported here correspond to this analysis. In this case, we have adopted the PLS estimation approach, due to the appearance of collinearity when all of the data are combined. The estimated PLS model explains 90% of the variability in SE (with eight latent variables). The variable importance plot represents the VIP values for each predictor variable in a PLS model (Figure 17), which is a measure of their relevance in the model. If a variable has a small coefficient and a small VIP, then it can be concluded that its importance is reduced. A value of 0.8 is generally considered to be a significant VIP, and a blue line is drawn on the plot at 0.8 to better  Figure 16 shows the actual response against the predicted response for this case. The model describes the observations quite well for most of the observations. However, a higher number of points fall out of the prediction interval. It must be stressed that the models considering terpolymers have higher numbers of observations, making it more difficult to derive a single model fitting all the points.  Figure 16 shows the actual response against the predicted response for this case. The model describes the observations quite well for most of the observations. However, a higher number of points fall out of the prediction interval. It must be stressed that the models considering terpolymers have higher numbers of observations, making it more difficult to derive a single model fitting all the points.

Case 5: The Prediction of the SE for Copolymers and Terpolymers
We have also investigated the possibility of building a global model to simultaneously fit all of the polymers considered. The last two cases reported here correspond to this analysis. In this case, we have adopted the PLS estimation approach, due to the appearance of collinearity when all of the data are combined. The estimated PLS model explains 90% of the variability in SE (with eight latent variables). The variable importance plot represents the VIP values for each predictor variable in a PLS model (Figure 17), which is a measure of their relevance in the model. If a variable has a small coefficient and a small VIP, then it can be concluded that its importance is reduced. A value of 0.8 is generally considered to be a significant VIP, and a blue line is drawn on the plot at 0.8 to better identify these situations. As mentioned before, seven variables have VIP values above 0.8. In this

Case 5: The Prediction of the SE for Copolymers and Terpolymers
We have also investigated the possibility of building a global model to simultaneously fit all of the polymers considered. The last two cases reported here correspond to this analysis. In this case, we have adopted the PLS estimation approach, due to the appearance of collinearity when all of the data are combined. The estimated PLS model explains 90% of the variability in SE (with eight latent variables). The variable importance plot represents the VIP values for each predictor variable in a PLS model (Figure 17), which is a measure of their relevance in the model. If a variable has a small coefficient and a small VIP, then it can be concluded that its importance is reduced. A value of 0.8 is generally considered to be a significant VIP, and a blue line is drawn on the plot at 0.8 to better identify these situations. As mentioned before, seven variables have VIP values above 0.8. In this case, the hydrodynamic diameter, the interaction of the hydrophobic content and charged fraction, and the interaction of the number of methylene groups in the aliphatic chain and the hydrodynamic diameter seem to be the variables that have greater influence in the response, even if the hydrophobic content and charged fraction also have a quite high influence in the model. The influence of polymer characteristics in the SE values was evaluated using the current regression method (Figure 18). It is possible to verify the presence of a noticeable interaction effect triggered by the hydrophobic content. When there is no hydrophobic content (copolymers only), the SE values decrease with the increase of the hydrodynamic diameter and concentration, since when there is a high amount of polymer and the polymer chains are longer, there is more space between bridged particles, conducting to more open flocs (a lower SE). Understandably, in this case, there is no influence of the number of methylene groups on the response SE. The influence of polymer characteristics in the SE values was evaluated using the current regression method (Figure 18). It is possible to verify the presence of a noticeable interaction effect triggered by the hydrophobic content. When there is no hydrophobic content (copolymers only), the SE values decrease with the increase of the hydrodynamic diameter and concentration, since when there is a high amount of polymer and the polymer chains are longer, there is more space between bridged particles, conducting to more open flocs (a lower SE). Understandably, in this case, there is no influence of the number of methylene groups on the response SE. When the hydrophobic content is at a maximum (terpolymers with 3 mol% of hydrophobic content) (Figure 19), the SE values increase with the increase of the charged fraction, hydrodynamic diameter and number of methylene groups in the hydrophobic chain. Thus, the presence of the hydrophobic monomer has the capability to change the way the SE depends on other predictors, such as the charged fraction and hydrodynamic diameter. The influence of the hydrophobic chain and charged fraction can be attributed to the increased regions in the polymer chain that are able to interact with the oily effluent particles, increasing the compactness of the flocs (higher SE values).

Case 6: The Prediction of the d0.5 for Copolymers and Terpolymers
A PLS model was also used for handling the collinearity issues in this case (Y-variability of 89% explained with seven latent variables). A variable importance plot for case 6 is present in Figure 20. The concentration and the interaction of the hydrophobic content with the hydrodynamic diameter appear to be the variables with larger influences on floc size, even if the hydrodynamic diameter also has a significant impact on the response. The interaction of the hydrophobic content and the number of carbons, the interaction of the hydrophobic content and concentration, and the interaction of the number of carbons with the charged fraction have the shortest bars, indicating that they are not particularly correlated to floc size. When the hydrophobic content is at a maximum (terpolymers with 3 mol% of hydrophobic content) (Figure 19), the SE values increase with the increase of the charged fraction, hydrodynamic diameter and number of methylene groups in the hydrophobic chain. Thus, the presence of the hydrophobic monomer has the capability to change the way the SE depends on other predictors, such as the charged fraction and hydrodynamic diameter. The influence of the hydrophobic chain and charged fraction can be attributed to the increased regions in the polymer chain that are able to interact with the oily effluent particles, increasing the compactness of the flocs (higher SE values). When the hydrophobic content is at a maximum (terpolymers with 3 mol% of hydrophobic content) (Figure 19), the SE values increase with the increase of the charged fraction, hydrodynamic diameter and number of methylene groups in the hydrophobic chain. Thus, the presence of the hydrophobic monomer has the capability to change the way the SE depends on other predictors, such as the charged fraction and hydrodynamic diameter. The influence of the hydrophobic chain and charged fraction can be attributed to the increased regions in the polymer chain that are able to interact with the oily effluent particles, increasing the compactness of the flocs (higher SE values).

Case 6: The Prediction of the d0.5 for Copolymers and Terpolymers
A PLS model was also used for handling the collinearity issues in this case (Y-variability of 89% explained with seven latent variables). A variable importance plot for case 6 is present in Figure 20. The concentration and the interaction of the hydrophobic content with the hydrodynamic diameter appear to be the variables with larger influences on floc size, even if the hydrodynamic diameter also has a significant impact on the response. The interaction of the hydrophobic content and the number of carbons, the interaction of the hydrophobic content and concentration, and the interaction of the number of carbons with the charged fraction have the shortest bars, indicating that they are not particularly correlated to floc size.

Case 6: The Prediction of the d0.5 for Copolymers and Terpolymers
A PLS model was also used for handling the collinearity issues in this case (Y-variability of 89% explained with seven latent variables). A variable importance plot for case 6 is present in Figure 20. The concentration and the interaction of the hydrophobic content with the hydrodynamic diameter appear to be the variables with larger influences on floc size, even if the hydrodynamic diameter also has a significant impact on the response. The interaction of the hydrophobic content and the number of carbons, the interaction of the hydrophobic content and concentration, and the interaction of the number of carbons with the charged fraction have the shortest bars, indicating that they are not particularly correlated to floc size. The influence of polymer characteristics on the median floc size can be observed in the profiles shown in Figure 21. It was observed that when there is no hydrophobic content (copolymers only), the median floc size increases with the increase of the charged fraction, hydrodynamic diameter and concentration, due to a larger number of attachment regions in the polymer chain to adsorb particles, which consequently leads to larger flocs. As expected, there is no influence of the number of methylene groups on the d0.5. The influence of polymer characteristics on the median floc size can be observed in the profiles shown in Figure 21. It was observed that when there is no hydrophobic content (copolymers only), the median floc size increases with the increase of the charged fraction, hydrodynamic diameter and concentration, due to a larger number of attachment regions in the polymer chain to adsorb particles, which consequently leads to larger flocs. As expected, there is no influence of the number of methylene groups on the d0.5. When the hydrophobic content is at its maximum (terpolymers with 3 mol% of hydrophobic content) (Figure 22), the influence of the charged fraction and hydrodynamic diameter is the opposite of what happens when there is no hydrophobic content. Once again, a strong influence of the hydrophobic content on the trends for other predictors was observed. The influence of the number of methylene groups seems to be stronger when the polymer comprises higher hydrophobic content. Thus, larger number of methylene groups led to smaller flocs, suggesting that there is an ideal length of hydrophobic chain that is beneficial to obtaining large flocs and high flocculation efficiency. This interaction effect is quite interesting and offers new possibilities to develop better flocculants in the future.

Conclusions
The experimental design and data analysis allowed to infer which polyelectrolyte characteristics are critical for the flocs' size and structure, and how they affect these properties in a quantitative way. The hydrodynamic diameter, charged fraction and concentration are the parameters with the greatest influence on the size and structure of the flocs obtained with the copolymers studied, while the charged fraction, concentration and hydrophobic content proved to have more influence on the characteristics of the flocs produced using the terpolymers developed. The effect of the hydrophobic content suggests that the presence of hydrophobicity is favourable for the flocculation process of oily effluents; however, there is an optimum hydrophobic content that optimizes the floc size, and above which the effect is no longer beneficial.
Finally, the analysis conducted showed that there is a strong dependence of the responses upon the level of hydrophobic content employed, as the trends regarding the remaining factors can even be inverted by manipulating this factor. This effect is mostly due to the distinct interaction between the effluent oily particles and the hydrophobic chains. These results open new perspectives for developing flocculants with tailor-made characteristics for processing a given effluent of interest.  When the hydrophobic content is at its maximum (terpolymers with 3 mol% of hydrophobic content) (Figure 22), the influence of the charged fraction and hydrodynamic diameter is the opposite of what happens when there is no hydrophobic content. Once again, a strong influence of the hydrophobic content on the trends for other predictors was observed. The influence of the number of methylene groups seems to be stronger when the polymer comprises higher hydrophobic content. Thus, larger number of methylene groups led to smaller flocs, suggesting that there is an ideal length of hydrophobic chain that is beneficial to obtaining large flocs and high flocculation efficiency. This interaction effect is quite interesting and offers new possibilities to develop better flocculants in the future. When the hydrophobic content is at its maximum (terpolymers with 3 mol% of hydrophobic content) (Figure 22), the influence of the charged fraction and hydrodynamic diameter is the opposite of what happens when there is no hydrophobic content. Once again, a strong influence of the hydrophobic content on the trends for other predictors was observed. The influence of the number of methylene groups seems to be stronger when the polymer comprises higher hydrophobic content. Thus, larger number of methylene groups led to smaller flocs, suggesting that there is an ideal length of hydrophobic chain that is beneficial to obtaining large flocs and high flocculation efficiency. This interaction effect is quite interesting and offers new possibilities to develop better flocculants in the future.

Conclusions
The experimental design and data analysis allowed to infer which polyelectrolyte characteristics are critical for the flocs' size and structure, and how they affect these properties in a quantitative way. The hydrodynamic diameter, charged fraction and concentration are the parameters with the greatest influence on the size and structure of the flocs obtained with the copolymers studied, while the charged fraction, concentration and hydrophobic content proved to have more influence on the characteristics of the flocs produced using the terpolymers developed. The effect of the hydrophobic content suggests that the presence of hydrophobicity is favourable for the flocculation process of oily effluents; however, there is an optimum hydrophobic content that optimizes the floc size, and above which the effect is no longer beneficial.
Finally, the analysis conducted showed that there is a strong dependence of the responses upon the level of hydrophobic content employed, as the trends regarding the remaining factors can even be inverted by manipulating this factor. This effect is mostly due to the distinct interaction between the effluent oily particles and the hydrophobic chains. These results open new perspectives for developing flocculants with tailor-made characteristics for processing a given effluent of interest.

Conclusions
The experimental design and data analysis allowed to infer which polyelectrolyte characteristics are critical for the flocs' size and structure, and how they affect these properties in a quantitative way. The hydrodynamic diameter, charged fraction and concentration are the parameters with the greatest influence on the size and structure of the flocs obtained with the copolymers studied, while the charged fraction, concentration and hydrophobic content proved to have more influence on the characteristics of the flocs produced using the terpolymers developed. The effect of the hydrophobic content suggests that the presence of hydrophobicity is favourable for the flocculation process of oily effluents; however, there is an optimum hydrophobic content that optimizes the floc size, and above which the effect is no longer beneficial.
Finally, the analysis conducted showed that there is a strong dependence of the responses upon the level of hydrophobic content employed, as the trends regarding the remaining factors can even be inverted by manipulating this factor. This effect is mostly due to the distinct interaction between the effluent oily particles and the hydrophobic chains. These results open new perspectives for developing flocculants with tailor-made characteristics for processing a given effluent of interest.