Partial Least Square Model (PLS) as a Tool to Predict the Diffusion of Steroids Across Artificial Membranes

One of the most challenging goals in modern pharmaceutical research is to develop models that can predict drugs’ behavior, particularly permeability in human tissues. Since the permeability is closely related to the molecular properties, numerous characteristics are necessary in order to develop a reliable predictive tool. The present study attempts to decode the permeability by correlating the apparent permeability coefficient (Papp) of 33 steroids with their properties (physicochemical and structural). The Papp of the molecules was determined by in vitro experiments and the results were plotted as Y variable on a Partial Least Squares (PLS) model, while 37 pharmacokinetic and structural properties were used as X descriptors. The developed model was subjected to internal validation and it tends to be robust with good predictive potential (R2Y = 0.902, RMSEE = 0.00265379, Q2Y = 0.722, RMSEP = 0.0077). Based on the results specific properties (logS, logP, logD, PSA and VDss) were proved to be more important than others in terms of drugs Papp. The models can be utilized to predict the permeability of a new candidate drug avoiding needless animal experiments, as well as time and material consuming experiments.


Introduction
Steroids are an important category of active pharmaceutical ingredients (APIs). Their structure is characterized by a rigid steroid ring of cyclopentane-perhydro-phenanthrene or sterane [1]. Steroids are small lipophilic molecules and based on their genomic characteristics, they can enter the target cell by a passive diffusion mechanism (mainly by the transcellular route) across plasma membranes [2,3]. As they are derived from cholesterol, they are insoluble in water, and have many pharmacologic effects in almost every major system of the body including the endocrine, cardiovascular, musculoskeletal, nervous, and immune systems [4]. Due to their properties they can be administrated almost through every available administration route such as oral [5], buccal [6,7], transdermal [8], vaginal [9], otic [10], ocular [11], nasal [12], inhalation [13], intravenous [14].
A candidate drug should have appropriate physicochemical and pharmacological properties in order to successfully pass the pre-clinical and clinical trials. Such compound, must exhibit acceptable pharmacokinetic scheme in terms of absorption, distribution, metabolism, excretion and tolerable toxicity (ADMET). The simultaneous optimization of the above processes is one of the main challenges of current pharmacological research [15][16][17]. Unfortunately, these methods are laborious and extremely time-consuming, and they typically require 10-13 years [18,19].
Nowadays, there is a huge number of new candidate drugs that are designed and synthesized in the laboratory. In order to minimize the consumed cost and time, the pharmacokinetic behavior of the compound can be predicted using computational tools (e.g., cheminformatics) providing reliable pharmacokinetic models [15].
Quantitative structure activity/property relationship (QSAR/QSPR) studies correlate the physicochemical properties of a compound to biological activity [16]. Such studies have been extensively used for developing predictive models in which the chemical structures and biological properties are correlated. Alternately, such data could be obtained through in vitro, ex vivo and in vivo experiments [15].
Due to development of cheminformatics, there are plenty of QSAR modeling techniques, such as support vector machine (SVM), artificial neural networks (ANNs), multiple linear regression (MLR), principal component analysis (PCA) and partial least squares (PLS) regression [15]. The PLS method provides the possibility for linear correlation of numerous observations and multiple X variables with one or more Y variables [17].
Generally, PLS is a rapid and effective method for developing robust and reliable QSAR models. It has been widely used for the design of plenty of predictive patterns, such as for the placental-barrier permeability [18], blood-brain-barrier permeability from simulated chromatographic conditions [19], central nervous system (CNS) drug exposure [20], blood-brain barrier permeation of α-adrenergic and imidazoline receptor ligands using the parallel artificial membrane permeability assay (PAMPA) technique [21]. Additionally, PLS tool was used to discover potent Wee1 inhibitors [22], to evaluate 2-cyano-pyrimidine analogs as cathepsin-K inhibitors [23] and also to characterize the performance of dry powder inhalers [24].
The main scope of this research is to develop a new model that would be able to predict the permeability of a compound having the chemical structure of steroids. This approach is based on the correlation of its characteristics (physicochemical and structural properties) with the permeability of the molecule determined by in vitro experiments. In the present study the permeability of 33 steroids has been investigated using vertical Franz type diffusion cells including a synthetic cellulose membrane as model membrane [25]. Due to low water solubility of steroids, solubility enhancers (e.g., Polyethylene Glycol and Polysorbate 80) were used in order to achieve the desirable concentration for each compound. The obtained experimental results were treated using the Partial Least Squares (PLS) methodology. The developed models were validated and were found to be statistically significant with good predictive ability.

Dataset Compilation
The present study involves the data processing of the derived experimental results using the PLS methodology. A Soft Independent Modeling of Class Analogies Simca-P (version 9; Umetrics, Uppsala, Sweden) [26,27] chemometric software was used to construct the classical PLS models.
The object of the research was to investigate the effect of several properties of steroids on their permeability at a hydrophilic cellulose membrane. The number of models developed in this process was five since the Y response variable was either calculated differently, or refers to four separate sampling times, after the first hour of the experiments. Therefore, P 2h , P 4h , P 6h , P 8h models denote the number of steroids permeating the artificial membrane at 2 h, 4 h, 6 h and 8 h, respectively (Y variable: permeability µg/cm 2 ), whereas model P app expresses the (Y) variable calculated as the apparent permeability factor. In the present study, the theoretical explanation of steroids permeability was mainly based on model P app , which is considered as the most important. Each of the five models contained 32 observations (analytes which belong to steroids) with 46 X variables and one Y variable. The large amount of X variables used was considered necessary, even though some of them were proved to be of minor interest. In order to implement the proposed models, it was rather important to carefully collect and record some of their most important properties and structural characteristics. Each dataset consists of three parts. The first is the column containing the observations (33 analytes). The second is the main part of each dataset and it is populated by a few physicochemical and structural characteristics of the analytes. There are 37 descriptors (physicochemical properties), which were calculated using a series of different software or free online databases (Table 1). In more details, the studied compounds were imported in the free cheminformatics program Data Warrior [28], in order to predict the clogP (calculated partition coefficient, log(C octanol /C water )), the clogS (water solubility at 25 • C, log mol/L), the number of hydrogen bond acceptors and donors, the number of aromatic rings, carboxyl groups, carbonyl groups, hydroxyl groups, also the molecular complexity, the total surface area (Å 2 ), the relative polar surface area (Å 2 ), the polar surface area-PSA (Å 2 ), the shape index, the molecular flexibility and the drug-likeness. Descriptors related with the pharmacokinetic properties of the compounds were calculated by inserting the simplified molecular-input line-entry system (SMILES) of the drugs in the freely accessible web server pkCSM [29]. The pharmacokinetic properties employed were Caco2 permeability (log P app ), intestinal absorption (% absorption), skin permeability (log K p ), steady state of volume distribution (VD ss , log L/kg), blood-brain barrier (BBB) permeability (logBB), CNS permeability (logPS), and total clearance (log mL/min/kg).
The melting point, • C of the compounds was obtained from Open Melting Point Dataset [30] and also from EPA DSSTox [31]. The topological polar surface area (Å 2 ) [32] was also predicted from PubChem data [33]. Moreover, Marvin, a free ChemAxon tool [34] was used in order to draw and characterize chemical structures of the compounds for the calculation of pK a , logP, number of rings, distribution coefficient (logD) at pH 7.4 and their water solubility at 25 • C (logS, log mol/L). Details about the molar volume V m (cm 3 ), molar refractivity (cm 3 ), PSA (Å 2 ), polarizability (cm 3 ), molar volume (cm 3 ) were obtained via ACD/Labs [35]. All the above descriptors represented the X variables of the model developed and they are summarized in Table 1.
It is important to mention that some descriptors (e.g., logP) were calculated using more than one software program since there was a need to confirm their dominant role in the model. The structural features were found in the constitutional parameters and are outlined with nine descriptors used to decode the chemical structure of the analytes on the same basis. This was achieved by using integer numbers and zero to indicate the presence, the multiplicity or the absence of a structural characteristic.
The third part of the PLS dataset is a column with Y variable that corresponds to the calculated permeability of the drugs on Franz cells experiments. The Y variable is expressed as apparent permeability P app , or permeability P 2h . P 4h . P 6h . P 8h at different sampling times (2h, 4h, 6h, 8h).
Variables' Importance in the Projection (VIP) column plots provide information about the importance of the parameters in the dataset. However, apart from the importance of a descriptor in a model, it is crucial to know whether its impact on the signal response is positive or negative. For this purpose, it was necessary to evaluate the loadings plots (w × c[1]/w × c [2]) of the models at the first two components.

Validation
Normalization of the observations (values of both X and Y variables) was achieved using mean centering and unit variance scaling. Validation of the PLS models was performed making use of three techniques, Cross-Validation (CV) the external and the internal validation [26,36].
First, the Cross Validation (CV) was achieved by dividing data into seven parts and each 1/7th of samples was excluded to build a model with the remaining 6/7th of samples. The Y values for the excluded data were then predicted by this new model and the procedure was repeated until all samples had been predicted once. If the original model is valid, then the plot of predicted Y versus actual measured Y values will be a straight line with the RMSEE (Root Mean Squares Error of Estimation) as low as possible ( Figure 1) and calculated from Equation (1).
(ŷ i represents the estimated Papp value for the i th object and y i the reference P app value) The prediction error sum of squares (PRESS) is a good measure of the predictive power of the model, providing information about the significance of the component (a component is considered significant when PRESS/Residual sum of squares < 1). Using the appropriate number of significant components, the total models were fit ( Table 2) according to Haaland and Thomas criteria [37].  2 (ȳ i represents the means of the true P app values in the predictor set).
Verification of the reliability of the models was also achieved with the response permutation methodology (internal validation). During this process, the data for Y are not changed but they are randomly rearranged. Then the PLS model is applied again on the modified Y data and the R 2 Y and Q 2 Y values are recalculated. The above are compared with the initial values providing a first indication about the validity of the model. This process is repeated (20 permutations/model) and the results represent the statistical evaluation of the significance of the R 2 Y and Q 2 Y parameters in the initial model. In the diagram derived, the y-axis represents the R 2 Y/Q 2 Y values of all models and the x-axis represents the correlation coefficient between the modified and initial responses. In order to summarize the results of the method, regression analysis is applied on both R 2 Y and Q 2 Y and the regression lines are obtained. Verification of statistical significance of the original assessments is in accordance with the intercept limits regarding permutations ( Figure 2) and they are set to R 2 Y < 0.3−0.4 and Q 2 Y < 0.05 [38]. External validation was performed dividing data set of model P app in two equal parts training and test set. Thereafter, the calculation of the training set and the prediction of the test were completed, and their roles were swapped. The quality of external prediction was assessed by the Q 2 (Q 2 train = 75.4, Q 2 test = 71.5) and the Root Mean Square Error of Prediction (RMSEP) from Equation (2) value, where RMSEP was equal to 0.00770361 for the training set and 0.00764925 for the test set, respectively.
The fact that the two estimates are similar means that these two subsets have similar information and can be combined in a total data set. External prediction may also aim the model to predict the Y values of new entities, in other words, entities excluded from the data set. Hence, the model is applicable either to interpret the behavior of a steroid based on its physicochemical properties or to predict the behavior of an unknown drug in the human body. PLS regression analysis is appropriate since it uses linear correlations and at the same time can predict with high reliability.

Interpretation of Steroids Permeability Through PLS
The permeability of a group of steroids across an artificial membrane was estimated using a hydrophilic cellulose membrane and the apparent permeability coefficient values were calculated. The mainly PLS model P app was established using 32 compounds and a 47-descriptor analysis aimed at identification of the most critical molecular properties that influence permeability across the artificial membrane. According to the VIP plot of P app model ( Figure 3) logS, logP, logD (at pH 5.5 and 7.4), PSA (topological and relative) and VD ss were found to be the most influential descriptors (VIP > 1) on the apparent permeability of the tested steroids through the cellulose membrane. All the other descriptors were found to have a similar and non-discriminating effect on the permeability of the tested compounds (VIP < 1). Further information on the positive or negative effect of the X variables on the permeability is derived from scatter w × c [1] versus w × c [2] plot for P app model in Figure 4. Drug dissolution is almost always a precondition for adequate permeability and absorption and, therefore, poor aqueous solubility is commonly associated with limited drug bioavailability [39]. It has been also exemplified that poor solubility may originate from high lipophilicity, resulting in poor permeability [40]. Compliant to this consensus, the findings of the current study support the positive contribution of logS (marked red in Table 3) and the respective negative effect of logP (marked blue in Table 3) on the P app of the tested steroids. PSA has been recently recognized as a useful predictor of permeability. It defines the polar part of a molecule and correlates with passive molecular transport through membranes. It has been previously observed that compounds with PSA < 60 Å are highly permeable, in contrast to those with PSA > 120 Å that are poorly permeable [41]. In that context, optimal permeability has been recognized when PSA is below 120 Å. Apart from prednisolone 21-sodium succinate (PSA = 141 Å), which has been classified as an outlier, PSA values for all steroids evaluated in the present study were below the cutoff value (PSA < 110 Å) suggested for the identification of poorly permeable compounds. Even though it's been recognized that lower PSA contributes to higher permeability [41], that trend was not confirmed here, mainly due to the absence of extreme variations in the PSA values and considering the relatively narrow range of PSA for the tested steroids.
Lipophilicity is considered one of the main factors with a positive effect on drug permeation across biological membranes. However, an inverse relationship between logP and permeability may be encountered upon increasing drug lipophilicity, due to a greater tendency for drug partitioning from the aqueous phase to the membrane [42]. It has been previously proposed that steroid permeation through a cellulose acetate membrane is a sequence of adsorption and desorption events with an intermittent membrane diffusion process, with the latter being dependent on the permeant's molecular size, its interaction with the membrane and the membrane's structural characteristics [43]. Such interaction might be favored with decrease in steroid polarity because, despite its hydrophilic nature, cellulose acetate remains more hydrophobic relative to the water [43]. Even though molecular size and polarity (with the latter being typically expressed as PSA or H-donors and acceptors) have been adversely associated with drug permeation [44,45], a positive correlation between steroids' polarity and permeability has been previously recognized. In particular, among three oestrogens of similar molecular size and distinctive polarities, an increase in permeability was observed with decreasing steroid-membrane interactions [46]. An inverse correlation between clogP and steroid permeability across Caco-2 cell monolayers was also recognized by Faasen et al., [47]. The results demonstrated a faster diffusion of the more hydrophilic steroids across the cell monolayers compared to the more hydrophobic ones. These findings coincide with the findings of the current study, concluding that steroids with lower logP gravitate towards a higher permeability.
Volume of distribution at steady state (VD ss ) is rendered as a solid indicator of drug distribution in the body reflecting its ability to permeate membranes and bind in tissues. Certain criteria have been defined to discriminate between drugs with high and low VD ss . LogP has been shown to be a significant determinant of VD ss , which along with the presence of Cl atoms and molecule compactness, have a positive contribution on this descriptor, while polarity and strong electrophiles have a negative contribution on VD ss [48]. High VD ss values (> 42 L), representative of more lipophilic drugs, indicate a high likeliness of drug distribution throughout body tissues, whereas low VD ss values (< 3 L) associate with a predominant location in the systemic circulation [49]. According to the findings of the current study, a negative correlation between P app and VD ss was obtained, which aligns with the positive correlation between logP and VD ss also observed in the present study.
Among the steroids evaluated, 4-chlorotestosterone demonstrated the lowest and prednisone and prednisolone the highest P app value. As illustrated in Figure 5a, the presence of the Cl atom seems to be the most determinant descriptor affecting P app of 4-chlorotestosterone. The chlorine atom as substituent in a molecule has been shown to increase its lipophilicity [19,50]. This positive contribution of Cl atoms to logP justifies the decrease in the apparent permeability of 4-chlorotestosterone, due to the negative correlation between logP and P app , as already demonstrated in the present study. On the other hand, for both steroids showing the highest P app , a combination of the same descriptors (logP and logS) was identified to be the most discriminative (Figure 5b,c). The steroids with the highest aqueous solubility and the lowest lipophilicity tend to diffuse faster across the cellulose membrane, compared to the most hydrophobic and less soluble steroids, which tend to a lower permeability. Additionally, androstanolone was considered as outlier during the first 4 h of the in vitro permeability study, showing significantly higher P app values compared to the rest of the tested steroids. Based on its contribution plot at both 2 h and 4 h, it is evident that a combination of parameters related to the molecular size of androstanolone (number of double bonds, shape index, molar refractivity, polarizability, MW) are lower than the respective average values of the tested compounds, whereas pK a was found to be higher compared to the average pK a values of the tested steroids. Since all steroids remain unionized in the pH used in the current study, the contribution of pK a to P app may be considered negligible. On the other hand, results signify the importance of molecular size on P app with an inverse relationship existing between the two.
As already mentioned, drug diffusion across membranes consists of a series of events including drug transfer from the hydrophilic aqueous environment of the donor compartment, through the more hydrophobic (relative to the water) membrane to the hydrophilic aqueous environment of the receptor compartment. The ease of drug diffusion may be explained by elucidating the most significant parameters affecting drug permeability in a time-dependent manner. As seen in Table 3, the most critical descriptors affecting the amount of drug permeated at 2 h mainly relate to the molecular size of the permeants including shape index, MW and molar volume which is also directly related to the refractivity index and polarizability of the steroids tested [19], all the above are marked green in Table 3.
All these parameters contribute negatively to drug permeation, which could translate to hindering drug diffusion to the receptor phase and, instead, increasing their retention and affinity towards the membrane. This trend seems to change with time, with logS (red marked on Table 3) and logP (blue marked on Table 3) being the dominant descriptors affecting drug permeation thereafter.
The utility of in silico models in predicting drug permeability across biological membranes has been recognized as a time-and cost-efficient tool to facilitate drug discovery and development. The PLS model has been previously employed to identify the most critical molecular parameters affecting the permeability and retention of 17b-carboxamide steroids across an artificial membrane (parallel artificial membrane permeability assay (PAMPA)), as a means to predict their permeability across human skin [41]. In another study, Zhang et al., (2015) confirmed the good predictability of the PLS model, highlighting its potential utility as a high-throughput screening tool of placental drug permeability [18]. PLS and the genetic algorithm-PLS method have also been found appropriate in identifying the optimal subset of descriptors that have a significant contribution on drugs' permeability across Caco-2 cell monolayers [51], as well as on in vivo human drug intestinal permeability [52].
The present study is initially considered to be a reliable tool for the development of a theoretical background that will explain the permeability of steroids to biological membranes. In addition, the remarkable ability of PLS models to predict the behavior of drugs increases the usefulness of the proposed technique in designing new more effective steroids.

Reagents, Materials, Solutions
Acetonitrile (ACN) and water (HPLC grade) were purchased from VWR Chemicals (Radnor, USA), and Sigma-Aldrich (Darmstadt, Germany) respectively. For LC-MS analyses, water and ACN were both LC-MS gradient grade and provided by Sigma-Aldrich (Darmstadt, Germany).
The dialysis tubing cellulose membrane (flat width 43 mm) was obtained from Sigma-Aldrich (Darmstadt, Germany).
The corticosteroids substances (Table 4) were United Stated Pharmacopeia-USP grade and were obtained from Sigma-Aldrich (Darmstadt, Germany).

Solubility Study
Solubility studies were carried out for the most lipophilic corticosteroid based on its logS value, obtained from Marvin (COMP 9, logS = −5.79, 25 • C). The study was conducted in PBS (pH 7.4) in the presence of polyethylene glycol 200 (PEG 200) and polysobrate 80 (Tween80) used as co-solvents at different ratios, owing the ability to enhance the water solubility of lipophilic drugs. In detail, an excess amount of the drug was added in the above-mentioned solvent mixtures and sonicated for 1 h at 30 • C. Then, the mixtures were kept under mild agitation for 48 h at room temperature to facilitate the dissolution. Any visible remaining drug particulates were removed by centrifugation at 2000× g for 20 min. The supernatants were quantified by HPLC analysis using the conditions described in Table 5. Based on the results of the solubility study, PBS, 40 % (w/w) PEG 200 and 0.2 % (w/w) Tween 80 was used as the solvent mixture. The same procedure was followed for all compounds in this solvent mixture and a final concentration of 100 µg/mL was selected for the in vitro permeation studies.

In Vitro Permeation Studies
Cellulose membrane was properly treated and mounted in the Franz diffusion cells (diffusion area 4.9 cm 2 , compartment volume 20 mL). The acceptor compartment was filled with PBS pH 7.4 and the donor compartment was filled with 1 mL of the formulation described above (100 µg/mL of the compounds). Permeation studies were conducted under constant stirring (90 rpm) at 37 • C. Samples of 0.5 mL were withdrawn from the acceptor compartment at predetermined time intervals (30 min, 1 h, 2 h, 4 h, 6 h, 8 h) and replaced with fresh and preheated PBS. Experiments were repeated in triplicates for each compound and blank experiments containing only the medium were also performed. The samples were analyzed by HPLC without any previous pretreatment.
Steady state flux (J ss ) was calculated from the slope of the linear section of the plot of the amount of permeated compound per unit area (µg/cm 2 ) against to time. The apparent permeability coefficient (P app ) was calculated using Equation (3), where C d is the initial concentration of the drug in the donor compartment and J ss is the steady state flux.

HPLC Experimental Conditions/Method Validation
The drug content was quantified using either the HPLC-UV (High-performance Liquid Chromatography-Ultraviolet) or LC-MS (Liquid Chromatography-Mass Spectrometry) instrument. The HPLC-UV setup was equipped with two LC-20AD pumps, a SIL-20AC HT auto-sampler, a CTO-20AC column oven and an SPD-M20A diode array detector (Shimadzu). For LC-MS analysis, a Shimadzu LC-MS 2020 single-quadrupole mass spectrometer with an electrospray ion source (ESI) was utilized. A nitrogen gas generator N 2 LCMS (Nitrogen Generator, Claind) was used throughout in this study. The temperature of the curved dessolvation line was set at 250 • C, the N 2 nebulizer gas flow was maintained at 1.5 L/min and the drying gas flow was set at 15 L/min, while the interface voltage was set at 4.5 kV in positive mode. The analytical column temperature was kept constant at 30 • C. The stationary phase was a C 18 column (4.6 × 50 mm, 2.5 µm, Shimadzu). The sample injection volume was 5 µL in all cases. The mobile phase was a binary mixture of acetonitrile and water at appropriate ratio for each compound in order to avoid the prolonged analysis time. The HPLC-UV and LC-MS experimental conditions used for the analysis of each compound are described in Table 5.
Both the HPLC-UV and LC-MS methods were validated in-house according to ICH (International Conference Harmonization) guidelines [53]. The calibration curves for each compound was linear (r 2 > 0.999) in the range of LOQ-20 µg/mL (six calibration levels). Regression analysis, LOD (Limit of Detection) and LOQ (Limit of Quantification) values were tabulated in Table 6. Samples were analyzed in triplicate.

Conclusions
An attempt to describe, experimentally and theoretically, the ability of a drug to permeate human tissues and be distributed in the body was carried out. For this purpose, five different PLS regression models were applied, using the permeability factor P app as Y variable, for a series of steroids/drugs versus their physicochemical and structural properties (X variables). The determination of P app factor was performed by in vitro drug permeability experiments across a cellulose membrane. According to the VIP values of the P app model, the two factors with the stronger effect were logS and logP, which are dominant to the phenomenon with reverse influence. It is also remarkable that the permeability of steroids is dependent on the effect of numerous parameters and cannot be considered as a result of a specific factor (physicochemical property or structural feature). Finally, it is worth noting that one of steroids (4-chlorotestosterone) with chloro-substituted moiety did not penetrate the membrane at all, which makes it unique.
The PLS model seems to accurately describe this simulation and predict with reliability the behavior for an unknown drug. Based on such databases, researchers could use the information provided to predict whether a drug can be distributed in a tissue via passive transfer.