Key Challenges in Designing CHO Chassis Platforms

: Following the success of and the high demand for recombinant protein-based therapeutics during the last 25 years, the pharmaceutical industry has invested signiﬁcantly in the development of novel treatments based on biologics. Mammalian cells are the major production systems for these complex biopharmaceuticals, with Chinese hamster ovary (CHO) cell lines as the most important players. Over the years, various engineering strategies and modeling approaches have been used to improve microbial production platforms, such as bacteria and yeasts, as well as to create pre-optimized chassis host strains. However, the complexity of mammalian cells curtailed the optimization of these host cells by metabolic engineering. Most of the improvements of titer and productivity were achieved by media optimization and large-scale screening of producer clones. The advances made in recent years now open the door to again consider the potential application of systems biology approaches and metabolic engineering also to CHO. The availability of a reference genome sequence, genome-scale metabolic models and the growing number of various “omics” datasets can help overcome the complexity of CHO cells and support design strategies to boost their production performance. Modular design approaches applied to engineer industrially relevant cell lines have evolved to reduce the time and effort needed for the generation of new producer cells and to allow the achievement of desired product titers and quality. Nevertheless, important steps to enable the design of a chassis platform similar to those in use in the microbial world are still missing. In this review, we highlight the importance of mammalian cellular platforms for the production of biopharmaceuticals and compare them to microbial platforms, with an emphasis on describing novel approaches and discussing still open questions that need to be resolved to reach the objective of designing enhanced modular chassis CHO cell lines.


Introduction
Following the approval of the first biopharmaceutical products by regulatory authorities in the early 1980s [1], innovation in the biopharmaceutical field has triggered the generation of many novel compounds that demonstrated great therapeutic potential towards the treatment of existing and

Model-based Engineering Approaches
The availability of a full genome sequence enables researchers to reconstruct cellular metabolism in silico. This knowledge is captured in genome-scale metabolic models (GSMMs), which are essential for the rational identification of engineering targets [38,39]. It is, however, not obvious how to find a feasible implementation of the aforementioned strategies. Due to the size and complexity of the metabolic model, computational support is necessary in order to narrow down and select promising intervention strategies from the combinatorial universe of possible modifications. Computational strain design methods, many of which are based on constraint based reconstruction and analysis of cellular metabolism [40][41][42][43], are available towards this end [44] and are continuously being refined.
One of the earliest available methods was OptKnock [24]. It utilizes a bi-level optimization framework to find knockouts that maximize both product synthesis rate and growth rate. Since then, several improvements and alternatives evolved, summarised in Table 1 as a brief overview.  [46] and genetic design through local search (GDLS) frameworks [50,64] to predict gene deletion strategies that would achieve growth-coupled production for 20 non-native commodity chemicals (e.g., 1-Butanol, 1-Propanol, etc.). As a highlight, 15 out of 20 cases demonstrated a possible growth-coupling. For the other five cases (e.g., 2-phenylethanol and 4-hydroxybutyrate), it was not possible to determine any adequate interventions [64].
Failing to achieve growth-coupling may indicate a fundamental inability or the over-constraining of design choices [25]. However, growth-coupling strategies often introduce auxotrophies that need to be addressed by appropriate media supplements. For instance, growth-coupled production of itaconic acid in E. coli is infeasible on minimal M9 medium, but possible upon supplementation of methionine and glutamate, among other molecules [65,66]. OptCouple is a recent tool that aims to overcome such shortcomings by predicting more comprehensive intervention strategies consisting of, e.g., gene deletions, knock-ins and medium supplements [61].

Chassis Strains
In the last decades, numerous studies that aimed at improving product yield and productivity in a custom-tailored fashion emerged [67]. Thus, the question arises if there exists a common chassis cell that could act as host for generating many different cell factories [68,69]. Ideally, chassis cells are essentially pre-optimized microbes of minimal functionality that-in a "plug and play" fashion-can be turned into a full-scale cell factory by inserting any pathway that reliably achieves the desired product yield and productivity [70]. This would allow fast generation of new host cell lines, because the cells would have already been optimised before, and would then be reused, needing only minimal adaptation for the production of new products of interest [71]. As an example, the production of organic acids (e.g., succinic acid and malic acid) [72] and biofuels (e.g., n-butanol) [73] was implemented using potential chassis microbial strains.
A promising strategy for the generation of a chassis strain is the optimization of carbon metabolic flux towards the production of a desired product. This can be achieved by the removal of unnecessary genes and pathways from a genome, thus creating cells with high controllability and production efficiency compared to the original strains [74]. Consequently, a lower number of metabolites is produced in the cells, which helps to reroute the resources towards the synthesis of the industrial product of interest [75,76]. The production of a target compound is also often limited by the availability of specific metabolic precursors. To overcome this challenge, chassis strains can be engineered to overexpress biosynthetic pathways for these metabolites. Improvements in the availability of metabolic precursors such as malonyl-CoA, acetyl-CoA and aromatic amino acids were previously described [77][78][79][80].

Dynamic Control of Cell Metabolism
Cells are commonly engineered at the gene or pathway levels. Taking strain engineering one step further, metabolism can even be optimized to dynamically respond to the cell's current state in a way that maximizes the synthesis of a target product. The expression of the product of interest can be timed to start at a certain point during the cultivation, for example when the cell concentration is sufficient [81,82]. The switch could be triggered by the addition of a small molecule or by engineering cells to respond to intracellular or extracellular signals [83]. Such a strategy was successfully applied to E.coli [84,85], where a toggle switch was used to redirect flux from the TCA cycle towards the synthesis of target products after reaching a sufficiently high cell concentration. Further, feedback loops could be put in place. These could activate biosynthetic pathways if precursors become limiting or degradation pathways if toxic byproducts accumulate [86]. If the synthesis of a product requires several steps, the expression of the individual enzymes might be fine-tuned such that no intermediates are accumulated [87]. Thus, by reducing the metabolic burden, additional resources for increasing the production of the product of interest can be freed.

Requirements for Such Strategies to Work
To apply the aforementioned strategies, it is necessary to better understand cells at different levels, such as genome sequence, RNA and protein expression, interactions between various macromolecules, signalling and metabolic pathways [88].
The field of genomics and transcriptomics witnessed a great advance in maturity as a result of the optimization of next generation sequencing (NGS) technologies, which are now routinely used to sequence whole genomes and to quantify gene expression using RNAseq [89]. Based on these information sets on the potential of an organism, GSMMs can be built in order to predict the best and most efficient interventions for metabolic engineering. Advances in other fields, such as proteomics, metabolomics or fluxomics will help to improve the predictive capabilities of these models. Based on these advances, the generation of accurate omics data and their analysis using meaningful models, helps to solve different aspects of the cellular "black box" and to improve our understanding of the genotype-phenotype interactions in cells.
In order to apply these predictions in vivo it is necessary to have robust tools for gene deletions, insertions and editing. The tools at our disposal have become efficient and more rapid with the discovery of the CRISPR/Cas9 system [90,91], the availability of cheap and easy-to-use DNA cloning techniques and the possibility to synthesize large numbers of genes of interest.
Finally, to succeed and to really take advantage of the full "design" space, it is necessary to have a collection of well-characterized synthetic biology parts, such as promoters, terminators, regulatory sequences, or genes encoding sensor proteins that can react to various stimuli. Any combination of these parts should result in predictable behaviour. Such a collection of parts is already available for bacteria http://parts.igem.org/Main_Page where one can choose from a wide selection of promoters, terminators and genes involved in various cellular processes, such as signalling, cell death, motility, recombination and many more. These parts can be combined into circuits with the use of logical gates, such as AND, OR, NOR or NAND. These circuits then can perform a wide variety of functions, such as oscillations, toggle switching, timekeeping or band-pass control [92].

Importance of Mammalian Cells
Cultivated mammalian cell lines have emerged as a powerful tool to produce complex biopharmaceuticals, which is why the optimization of these platforms is a major objective of the biopharmaceutical industry. Within mammalian platforms, CHO cells are the most widely used cell line. While, as indicated by their name, these cells were derived from the ovary of the Chinese hamster, they are in fact mainly of epithelial phenotype [93]. One of the main advantages of mammalian cells is that they are able to perform complex PTMs. These are required to sustain optimal pharmacokinetic and pharmacodynamic properties for many biopharmaceuticals [94], where glycosylation represents one of the most important quality control attributes [95,96] and the most common structurally diversified modification in secreted proteins [97].
To satisfy the need for large scale production of complex biotherapeutics, various strategies have been employed to boost the titer of the product of interest, mostly based on media optimization and on high-throughput screening for good producers [98,99]. Alternative optimization strategies based on modular design, synthetic biology, and systems metabolic engineering, hold tremendous promise to further improve the productivity, yield and product quality, and thus to reduce the time and cost of cell line development. Nonetheless, applying such rational engineering tools to mammalian cells is more difficult compared to other platforms. This is due to the need to consider many additional factors, namely the large genome, sophisticated regulatory, signalling and metabolic networks, genome instability and epigenetic regulation.

Omics Landscape of CHO
CHO cells were originally established in the late 1950s by Theodore T. Puck [100]. Since then, the family of CHO cells has expanded, giving rise to various new lineages such as CHO-K1, CHO-S, CHO-GS -, CHO-DG44, etc. These lineages were developed to fulfill specific industrial requirements, such as suspension culture or specific gene deficiencies that enabled selection, and are the result of genetic modifications via chemical and radiation mutagenesis [101], targeted gene knockouts and adaptation to new culture conditions [102,103].
The CHO-K1 genome published in 2011 [104] was an important stepping stone towards the application of systems biology methods to CHO. However, in contrast to microbial systems, the genome of CHO cells often contains various chromosomal abnormalities caused by their genetic instability [105,106]. Thus, cells belonging to the same lineage within a CHO family can have distinct genetic information and phenotypes. Differences in phenotype can even be observed between cells that nominally belong to the same lineage, but were grown in different laboratories and under various culture conditions. These can in part be explained by the structural variations in the genome and the accumulated genetic changes such as single nucleotide polymorphisms (SNPs), transgene copy number variations and chromosomal rearrangements [107][108][109][110]. Vcelar et al. showed that chromosomal rearrangements within the genome of a population are observed during subcloning, adaption of the cells to a new medium or simply during long-term cultivation [111,112]. On top of this, variations in phenotypes that cannot be explained by genomic diversity and variation alone, are frequently observed, even in subclones of subclones [113]. These facts suggested that the genetic information of the CHO-K1 cells sequenced in 2011 [104] are not representative for all CHO cell lines and subclones, so a reliable and stable reference genome was needed. This was addressed by the generation of a common reference genome of the Chinese hamster Cricetulus griseus [19,114] which was further improved by a more complete genome assembly in 2018 [115]. These, along with the genomes of other cell lines sequenced in the meantime [19,109,110], serve as basic datasets for in silico studies of CHO via integrative analyses of omics data to support cell line and metabolic engineering studies.

Comparison of CHO to Bacteria and Yeast
From a systems biology perspective, mammalian cells are considered more complex than microbial systems as their genome is by far larger than that of E. coli and S. cerevisiae. The assembly of the sequenced CHO-K1 genome comprises 2.45 Gb with 24,383 predicted genes [104], while microbial cells used by the biotechnological industry have a smaller genome size by one to two orders of magnitude (Table 2). A comparison of E. coli, S. cerevisiae and CHO cell platforms is summarized in Table 2. Even though having a larger genome does not necessarily relate to the cell's morphological complexity, it can be an indication of the intricacy of its proteome, fluxome, transcriptome and metabolome, and, in particular, of its regulatory capacities. Even from the viewpoint of the basic proteins that are encoded in these genomes, the proteins constituting prokaryotic cells are considered less complex than those of eukaryotic ones. The latter idea was claimed by different researchers such as Zhang et al. and Wang et al. [116,117], stating that the organism's protein structural complexity (e.g., length) can directly affect the growth performance of cells. It was demonstrated that, when optimizing for growth, a higher growth rate was observed for cell types containing smaller proteins. This is due to the cells' tendency to increase their mass-normalized kinetic efficiencies during growth [118].
In addition, cultivating mammalian cells is considered more demanding compared to microbial organisms, especially when focusing on their bioprocessing requirements. Due to the lack of a cell wall, there is significantly higher shear sensitivity and the cell culture medium must contain a higher number of essential nutrients compared to microbial systems. Bacteria (e.g., E. coli) and yeast (e.g., S. cerevisiae) can grow in a simple medium containing solely basic elements (e.g., glucose and salts) and usually only in specific cases a few amino acids or vitamins are supplemented. In contrast, mammalian cells require a larger and more complex set of nutrients, including amino acids, organic acids, vitamins, cofactors, carbohydrates and salts. This complexity of the growth medium reveals the strict nutritional demand of mammalian cells.
A major difference between microbial and mammalian cells is the fact that the genome of the latter actually encodes many different types of cells and developmental stages, namely more than 100 different types of tissue that are part of a mammalian body. To ensure correct expression of the required genes at the necessary level in each of these different tissue types, a much more complex regulatory network is required that includes highly sophisticated mechanisms such as epigenetics and chromatin remodeling that simply are not necessary for microbial cells and therefore are not present or only at immature levels of development [119]. Apart from these chromatin state and epigenetic mechanisms, other regulatory factors are abundant in mammalian cells, such as microRNAs or long-non-coding RNAs (lncRNAs), which are transcribed in large numbers [120,121]. While only 2-3% of the genome are regions coding for proteins, in total approximately 30% of the genome are actively transcribed in CHO cells (unpublished results), indicating the regulatory importance of these transcripts and the high level of sophistication.

Engineering CHO for Improved Performance
In view of the many different aspects of a well performing mammalian production cell line, several engineering approaches have been developed to address, typically individually, the many challenges encountered during cell line development and manufacturing of highly complex biotherapeutics. These include increasing cell specific productivity, maintaining a balance between the competing interests of growth and productivity, generating an efficient and targeted metabolism to enhance both, ensuring product quality and ensuring stability both of phenotype and of productivity.

Increasing Cell Specific Productivity and Process Performance
The larger part of the omics studies presented on CHO focuses on the identification of limiting pathways and genes and their subsequent engineering by overexpression or knock-out [134,135]. Most of these addressed the endoplasmic reticulum and the unfolded protein response, as cells producing high amounts of a "foreign" protein tend to have problems in processing and assembling such large cargos. Here, approaches to either overexpress specific helper proteins such as protein disulfide isomerase or to upregulate the entire ER were reported, as shown for example in [136][137][138].
The success of many of these studies was hampered by the fact that the differential regulation of a single gene is not likely to completely change the behaviour or phenotype of a cell line, so that these approaches were only successful if indeed the engineered gene was the main limiting factor due to its low expression level in the studied cell line [139]. Very little overlap in specific genes was observed in all of these studies, although frequently similar pathways were identified [140,141]. This led to the search for global engineering approaches where multiple genes and entire pathways could be controlled in a single step. One option that was extensively investigated, was the engineering of microRNAs (miRNAs), which globally regulate post-transcriptional processing of mRNAs and protein translation. With the annotation of miRNAs [142][143][144], their potential towards enhancing protein productivity by influencing various cellular pathways (e.g., cell cycle, apoptosis, metabolism, protein expression, etc.) was taken advantage of [145][146][147][148][149].
With respect to process performance, the main focus has been on the reduction of the process associated impurities such as host cell proteins (HCPs), which are considered a hurdle, especially for downstream processing. From this perspective, performing knockouts of pro-apoptotic genes (e.g., caspases), proteases, or overexpression of anti-apoptotic genes, chaperones or proteins involved in the secretory pathway has been previously described. Indeed, several studies demonstrated that overexpressing anti-apoptotic genes may improve the productivity of the cells through extending their lifespan [150,151]. It also results in a lower release of proteins from lysed cells, which were shown to make up a considerable amount of the total protein in culture supernatants [152,153]. Here, Fukuda et al. recently described a novel approach where they developed Anxa2-and Ctsd-knockout CHO cell lines aiming at minimizing the release of HCPs into the supernatant of cultures used for the production of biotherapeutics [154]. Subsequently, Kol et al. described a novel model-based approach to predict the effect of decreasing the secretion of HCPs on CHO and its impact on the cell productivity. These predictions helped directing the design of "clean" cells by knocking-out 14 genes (using multiplex CRISPR-Cas9) that were proven to be responsible for the production of HCPs [155]. The main outcomes of this study were the improvement of the production of recombinant proteins, in part due to the release of resources, and the reduction of impurities at the end of the culture. More targeted approaches were directed against specific HCPs that are known to pose problems in downstream processing [156].
An excellent summary of the different engineering approaches that have been applied to CHO so far is provided by Fischer et al. [157].

Maintaining the Balance between Growth and Productivity
As described earlier, growth-coupled production is a common design principle employed for the generation of several compounds using microbial cell factories. However, this approach is limited to simple metabolites which can be stoichiometrically coupled to growth, and cannot be applied to protein production, since it is competitive to growth. Hence, a contrasting solution is usually applied for the production of recombinant proteins in CHO cells, where growth and production phases are separated [158]. The switch from high-proliferation to high-production is commonly triggered by a reduction in cultivation temperature [159][160][161] or by treating the cells with certain chemicals, such as sodium butyrate, which promotes gene expression and growth suppression [162]. In addition, cell cycle arrest in the G1 phase was achieved by controlling the activity of cyclin-dependent kinase inhibitors (cdkis) and resulted in an increase in the specific productivity of the cells [163,164].
In fact, various approaches have already been employed to control the proliferation of the cells during cultivation. So far, however, mathematical modeling, although of great promise, has not been fully exploited to simulate bioprocesses and design better control of the switch from cell proliferation to increased heterologous protein production. One of the few examples is a study by Klamt et al., who computationally compared the volumetric productivities of two-stage fermentation strategies against the conventional one-stage production system. [165].

Making Metabolism more Efficient
As already mentioned, cellular resources are limited and need to be shared between growth and recombinant protein production [166]. In the omics studies mentioned above, energy metabolism was the second most frequently found differentially regulated pathway, after unfolded protein response. Therefore, a deeper understanding of CHO metabolism was the main objective after the genomic sequence of CHO became available, which helped with the reconstruction of a genome-scale metabolic model of CHO cells [20]. This model provides the toolkit for in silico metabolic engineering and medium optimization [167]. Combined with different mathematical approaches and experimental data, the model can be used to identify possible metabolic bottlenecks and predict specific targets for metabolic engineering [51]. It allows the study of the cells by applying linear programming-based strategies such as FBA or flux variability analysis (FVA) along with other computational tools that are already commonly used for bacteria or yeast (see Table 1). Furthermore, feeding the GSMM with omics data (e.g., transcriptomics, proteomics, fluxomics and metabolomics) will further improve the in silico prediction accuracy and serve as a basis for cell line engineering [168], employing the tools summarized in Table 1.
Prediction tools based on metabolic modeling have only recently started to be applied to the design of engineering strategies in CHO cells, while also contributing to characterize and understand them better. One of the first examples of an application of the CHO GSMM to an industrial process is the work done by Calmels et al., who curated the genome-scale metabolic model [20] and tailored it to a CHO-DG44 producer cell line. They performed corrections, such as modifying 601 reactions (for example silencing of 537 amino acids transporters), which lead to an improvement of the growth rate and exometabolome predictions [169]. In addition, the secretory pathway was integrated into the GSMM of CHO by Gutierrez et al. to enable predictions of energetic and machinery demands of secreted proteins [170], which might lead to better predictions of engineering targets that aim at improving protein production, as shown for example in the work of Kol et al. [155].
Metabolism is an important network of chemical reactions involved in every feature of cellular function. In the case of CHO cells, the metabolism is considered suboptimal. Due to the complexity of the metabolic network compared to microbial systems, it is still unclear how the cells react to nutrient availability and why certain pathways are used by the cells and not others. A major focus for many years was on the inefficient utilisation of glucose and the rapid accumulation of, in part toxic, waste products.
Usually, CHO cells consume high amounts of glucose, which leads to the production of lactate, despite sufficient oxygen supply (the so called "Warburg effect") [171]. This is a result of the inefficient use of glucose by the cells. Efforts trying to decrease lactate dehydrogenase A (LDHA) activity using small interfering RNAs (siRNAs) vectors showed promise in reducing the levels of lactate in the culture without influencing the process productivity [172]. Additionally, the metabolism of amino acids leads to the production of toxic by-products which can impact cell growth or productivity. Ammonia represents one of the major byproducts of amino acids metabolism (e.g., glutamine catabolism) [173]. Furthermore, several other byproducts of the amino acid metabolism (e.g., formate, homocysteine or indolelactate) as well as metabolites originating from lipid metabolism were determined to be detrimental to CHO growth [173,174]. To overcome these issues, numerous successful interventions, such as engineering amino acid catabolism, were performed to reduce the accumulation of toxic metabolites and to improve growth and product titers of CHO cell cultures [157,[175][176][177].
CHO cells are auxotrophic for several amino acids and therefore need to take them up from the medium to support growth and recombinant protein production. However, the uptake of these amino acids is quite low, even though the amount in the medium is high [131], which might be due to insufficient capacity of the transporters. Increasing the transport capacity of the essential amino acids or inserting the complete pathways for their synthesis might be favourable for cell growth and protein production. Geoghegan et al. identified amino acid transporters that are likely upregulated in producer cell lines and suggested that overexpression of one or more amino acid transporters might be beneficial for growth and productivity [178].

Ensuring Product Quality
An additional area of focus, apart from improving growth and final titers, is controlling the PTMs, especially glycosylation. Glycosylation patterns are often heterogeneous and they are heavily influenced by the culture status [179,180]. It has been described that glycosylation patterns within CHO platforms can be modulated by varying the culture conditions (e.g., culture medium supplements) [181]. Identifying the role of the individual glycosylation enzymes as well as the limiting steps of glycosylation would help to engineer CHO cells to consistently produce completely glycosylated proteins with desired glycosylation patterns. In particular for monoclonal antibodies, the specific glycan structure plays an important role in the immunoactivity of the product [182] which led to the development of production cell lines that lack, for instance, fucosyltransferase FUT8, or that overexpress enzymes to generate more complex glycan structures [183,184]. Coats et al. showed that with increased productivity, the quality of N-glycosylation of EpoFc decreases [185]. The next step would be to identify the rate limiting step(s) and overexpress the necessary glycosylation enzymes or pathways for precursor synthesis. Fisher et al. described several approaches for modulating post-translational modifications of recombinant proteins by genome editing in CHO [157].
Steps towards custom and consistent glycosylation have already been taken. For example, a panel of cell lines with custom glycosylation patterns was created with the use of CRISPR/Cas9 technology [186]. In another study, the level of galactosylation was manipulated based on predictions from a kinetic model, leading to a reduction in glycan heterogeneity [187]. While for monoclonal antibodies with their relatively simple glycosylation pattern, work on detailed control has already been initiated, the field is still open for more complex proteins bearing multiple glycosylation structures with high prevalence of tetra-antennary structure and the need for full terminal sialylation [188].

Maintaining Stability
The high variations of behaviour observed between different CHO cell lines and subclones are two sides of Yin and Yang: On the one hand, this high variation contributes largely to their adaptability and ease of cultivation and also usually enables the isolation of a subclone that is able to produce the product of interest in large amounts (even if thousands of clones need to be screened to find such). On the other hand, once such a subclone is identified, there is the danger of phenotypic drift and instability, where the term is usually applied to instability in specific productivity of the transgene rather than other types of phenotypic drift that relate to other cellular properties. While these two types of instability should be separated, there are some common features that underlie both types [189].
For phenotypic instability the main causes are likely a mixture of genetic changes as discussed above and changes in the expression pattern of genes, where changes in expression levels are more frequent than ON/OFF types of differential expression. Traditionally, studies of genetic variation have focused on the coding genes, where of course mutations may cause loss of function or enhanced activity. However, mutations in regulatory regions have been largely ignored so far, mostly because of our limited understanding of their underlying mechanisms, but they could conceivably contribute to the differences in the transcriptome of the coding genes, even if the coding genes themselves are not directly affected. On top of such variations that are connected to changes in the genome sequence, there are the complex epigenetic mechanisms that determine the precise level of expression of all transcripts in a cell [190]. In fact, a recent study of the impact of genome variation and altered DNA-methylation patterns of the genes involved in the oxidative phosphorylation of cells, including the mitochondrial genome, revealed that for cells with specific phenotypes there is an accumulation of both genomic variants and changes in DNA-methylation in genes that are likely to be associated with that phenotype [108].
Analysis of these epigenetic regulatory mechanisms is still in its infancy in the field of CHO cell research, even though some basic mechanisms have been explored. Typically, when discussing DNA-methylation, one only thinks of the fact that the status of methylation of CpG islands in the promoter will effectively determine whether a gene is expressed or not. However, of all Cs in a genome, between 60 and 80% are methylated, including 100% of the Cs in transcribed regions and most of the Cs in inactive heterochromatin regions. The regions of the genome that show the highest variation in DNA methylation levels under different culture conditions or between subclones are those in chromatin states that have regulatory function, such as the areas around the transcription start sites and both genic and other enhancers [109]. Another level of regulation is the presence or absence of activating or repressing histone marks which can lead to constitutive expression or to moderate upregulation or downregulation of genes, in particular in response to short term changes in the environment, such as nutrient depletion or the accumulation of waste molecules [191]. High amounts of histone methylation (at certain amino acid positions) and low amounts of acetylation in the promoter's environment can also decrease the expression of recombinant proteins [107]. Recent developments now allow the controlled manipulation of such epigenetic marks at specific sites using CRISPR/dCas9 tools [192,193]. Finally, it was recently shown that random changes in a population's DNA-methylation pattern induce higher phenotypic diversity and thus enable a more efficient isolation of "outliers" with enhanced performance [194]. A more detailed understanding of these mechanisms is still required, but the new tools recently developed will contribute to generate such understanding while at the same time providing the possibility of more sophisticated and reliable control to researchers.
As mentioned before, the main focus of all "stability" related research so far has been on the stability of productivity. Here, several factors come into play that work both on the level of genome and epigenome. Traditionally, recombinant cell lines were generated by random integration of the transgene(s) into the host cell genome. Thus, as random integration usually occurs at adventitious double strand breaks that just happened to be present in the cell at the time of transfection, these integration sites could reside in all types of chromatin regions, including regions of high transcriptional activity and such of low activity or even silenced chromatin states. Consequently, the resulting productivity may vary, as it was shown that adjacent chromatin states can spread into newly integrated sequences. To avoid this, genetic sequences that maintain an open chromatin configuration have been used [195,196]. Direct silencing of the CMV promoter used for most transgenes, either by DNA-methylation or by changes in the histone modifications has also been shown [197][198][199][200]. These different strategies to silence the gene of interest are most likely due to the cell's drive to remove the stress of high productivity [201][202][203], by which ever means possible. Therefore, a frequent cause of loss of productivity is also the complete deletion of the recombinant genes from the genome [204,205].
One way of controlling most of the above that has been extensively investigated over the last years, is targeted integration into safe harbours. For this purpose, well-characterized and transcriptionally active regions are identified that enable high stability and continuous expression of the transgenes over long periods of time [206]. Such sites can be identified either computationally, based on transcriptome data, by random selection [207][208][209] or using recombinase pseudosites [210] or viral vectors [208]. For reuse, "landing-pads" containing recombination sites and selection markers [208,210] are inserted into these stable sites, allowing easy integration of heterologous DNA of interest and thereby facilitating the development of stable producer cell lines with a known genomic environment [208]. Some approaches target already known open chromatin areas of the genome [211][212][213]. Another example is the Hipp11 gene locus [214]. The main advantage of these approaches is the efficiency and reduced time required for establishing a new production clone.

What is Missing Towards the Construction of a CHO Chassis Cell?
Developing strong modular mammalian expression platforms is an important R&D goal of the biopharmaceutical industry. As the development of stable CHO production cell lines is still a challenging and slow process, the establishment of pre-optimised chassis cell lines with plug-and-play modules that can be combined at will and as needed is a promising approach [215], even if it is more challenging than in simpler microbial systems.
The most important obstacle here is our lack of knowledge on the many regulatory mechanisms that are in place in cells to enable them to respond to different conditions and challenges. As an example, the regulatory function of long-non-coding RNAs was already discussed. Currently, for lncRNAs, there are 127,802 transcripts and 56,946 genes annotated, however, only for 1867 is their function understood or at least known to be associated with a given phenotype in 3762 cases [216,217]. Similarly, for coding genes, many of these, while annotated, are still labeled as EST (expressed sequence tag), with no understanding of their function or at best a prediction. In many cases, the annotation is derived from studies of embryogenesis and development or disease association, which does not properly capture their everyday function in in vitro cultivated cells. This is why frequently the analysis of differentially expressed genes results in pathway enrichments such as "Neural Development" which are hard to interpret in the context of a recombinant production cell line derived from the ovary and grown in protein free suspension culture. In fact, the development of the human proteome atlas revealed that there are very few proteins that are expressed exclusively in only one tissue type, while the majority is expressed in multiple ones, albeit at different levels and in some cases even in different intracellular locations [218]. As long as no precise functional annotation for all the genes and transcripts that are expressed in CHO is available for the context of cells grown in a bioprocessing environment, our understanding of CHO metabolism will remain perforce incomplete, making the full design of a chassis cell line, as already done for microbial cells [69], difficult.
In addition, such design strategies would require the availability of parts and modules for different functionality that can be combined at will and as required for the production of a given product (Figure 1). Due to the complexity of mammalian cells and the large number of coding genes and apparently required non-coding transcripts, a full design of such a chassis cell line seems quite a challenge in the near future. However, both the selection of pre-adapted cell lines [219,220] and the development of pre-optimised and pre-engineered cell lines with specific properties have already been addressed to good purpose [155,175,176,207,221]. With the recently emerged and still emerging tools, such model-based designed engineering approaches become more and more feasible, where multiple genes are targeted to achieve a specific purpose. New circuits and tools from synthetic biology [222][223][224], including as an example a panel of both endogenous promoters with defined properties [133,225] as well as artificial ones that can be adapted and modified as needed [226,227], will contribute and will be more and more required. Such engineering approaches will require a diverse set of such control units that are based on different mechanisms, as more and more genes may have to be controlled at different expression levels. Such expression levels may be ON/OFF, which is easily achieved by CRISPR/Cas9 strategies of deletion or knock-in, but fine-tuning the expression level and integrating it into endogenous cellular circuits so as to make them responsive to the cellular state, is still a challenge. Here, again, recently developed tools based on dCas9 linked to different modulators of expression or of epigenetic marks, will come in useful [192,193]. Finally, apart from deletion and overexpression and modulation of transcription, in the future other control strategies that target mRNA processing or translation will be needed to enlarge the range and number of possible interventions that can be used for this purpose [228]. As the load of cargo to be engineered increases, artificial chromosomes are a promising alternative that could be used to integrate an entire circuit and multiple genes in predefined combinations into cells in a single step [229].
As the genomic instability and large number of genomic variation of CHO cells is associated with their rapid growth and high division rate, as in other rapidly growing cell lines such as cancer cells, engineering them for more precise DNA replication and a lower rate of faulty chromosome segregation bears the risk of reducing growth rate to a level that is industrially unfeasible. This may be the price we have to pay for high growth rate and good process performance. Nevertheless, with respect to production stability, many successful approaches have been described that can be combined and used for an integrated designer cell line, including insulators [230], scaffold attachment regions [231] and CpG island regions [232]. In addition, removal of transposons or of chromosomal hotspots for rearrangements such as translocations may promote a higher stability of the entire genome [233], similarly as it was performed for E. coli [75]. Finally, simply speeding up cell line development by new and more efficient selection mechanisms or by the above mentioned use of targeted integration and of pre-optimised cell lines, will reduce the number of divisions that a cell line will undergo until ready for manufacturing, thus also reducing the accumulation of variants or mutations. Finally, in all these approaches the impact of the product gene and its specific requirements for translation, processing and secretion will have to be taken into account [215,234]. Steps towards creating a Chinese hamster ovary (CHO) chassis cell line. A combination of various in silico tools is used to predict intervention targets (e.g., for deletion or overexpression). A chassis cell line is constructed by performing genetic changes that boost growth or productivity and ensure desired product quality and stability. Various transgenes of interest can be inserted into a pre-defined landing-pad in the chassis cell line. Afterwards, the performance of the producer cell lines is validated experimentally and the results can be used to further optimize the chassis cell line.
In the attempt to reach a fully designed chassis cell line-rather than an optimised or engineered one-model-based predictions and computational strategies for the identification of the most useful engineering strategies will play a key role. A challenge that still needs to be addressed here is the availability of tools to combine and correlate the different omics data sets in a comprehensive and automated way. Traditionally, most of these are analysed separately [235]. A proper connection should link different omics layers in order to obtain a complete picture of their interrelationship and its combined influence on the system [236]. So far, several algorithms have been developed to integrate omics data, such as transcriptomics, proteomics or metabolomics into GSMMs [237][238][239][240][241][242][243][244][245], but no truly comprehensive solution is yet available. Thus, with the rapid advances that have been achieved over the last years, both in our basic understanding of cellular mechanisms and regulatory circuits, and with the new tools that have emerged, we are today in a much better position to aim for the design of mammalian chassis cell lines with defined characteristics, even though a number of challenges still need to be resolved.

Conflicts of Interest:
The authors declare no conflict of interest.