Investigating the Molecular Genetic Basis of Cytoplasmic Sex Determination Caused by Wolbachia Endosymbionts in Terrestrial Isopods

In animals, sexual differences between males and females are usually determined by sex chromosomes. Alternatively, sex may also be determined by vertically transmitted intracellular microbial endosymbionts. The best known cytoplasmic sex manipulative endosymbiont is Wolbachia which can, for instance, feminize genetic males into phenotypic females in the terrestrial isopod Armadillidium vulgare. However, the molecular genetic basis of cytoplasmic sex determination is unknown. To identify candidate genes of feminization induced by Wolbachia strain wVulC from A. vulgare, we sequenced the genome of Wolbachia strain wCon from Cylisticus convexus, the most closely related known Wolbachia strain to wVulC that does not induce feminization, and compared it to the wVulC genome. Then, we performed gene expression profiling of the 216 resulting wVulC candidate genes throughout host developmental stages in A. vulgare and the heterologous host C. convexus. We identified a set of 35 feminization candidate genes showing differential expression during host sexual development. Interestingly, 27 of the 35 genes are present in the f element, which is a piece of a feminizing Wolbachia genome horizontally transferred into the nuclear genome of A. vulgare and involved in female sex determination. Assuming that the molecular genetic basis of feminization by Wolbachia and the f element is the same, the 27 genes are candidates for acting as master sex determination genes in A. vulgare females carrying the f element.


Introduction
Sex determination is a key biological pathway which governs the sexual differentiation of an individual into a male or female and its ability to produce male or female gametes. Although development as either a male or female is a generally conserved feature in animals, there is an amazing diversity of modes of sex determination [1,2]. The most common mode of sex determination is genotypic sex determination, in which nuclear genetic elements control sex determination. Prime examples of genotypic sex determination are sex chromosomes, i.e., chromosomes that carry sex-determining factors. Sex chromosome systems can take two major forms: male heterogamety (when males are heterozygous for the sex chromosomes and females homozygous, called XY systems), and female heterogamety (when females are heterozygous for the sex chromosomes and males homozygous, called ZW systems). The second mode of sex determination is environmental sex determination, in which external stimuli control sex determination. Environmental sex determination is defined here in So far, many studies aiming at identifying candidate genes involved in Wolbachia-host interactions used targeted approaches, i.e., by focusing on genes containing functional domains with prior knowledge of involvement in molecular interactions. Considering the quite limited amount of functional knowledge of Wolbachia genes (e.g., 23% of hypothetical genes in wVulC genome), it may be relevant to consider alternative strategies to identify candidate effectors to elucidate molecular mechanisms of induced phenotypes [38]. In this study, we used a combination of comparative genomics and expression profiles throughout host development to identify candidate genes of feminization induced by Wolbachia wVulC, without prior functional knowledge of genes. Specifically, we sequenced the genome of Wolbachia wCon from the terrestrial isopod Cylisticus convexus and compared it to the wVulC genome. wCon was selected because it is the most closely related known strain to wVulC available in our laboratory that does not induce feminization (but CI) [39,40]. In addition, it has been shown that wVulC also induces feminization when transfected into C. convexus and wCon also induces CI when transfected into A. vulgare [39,41], indicating that the observed phenotypes are related to the Wolbachia strains. Next, we performed gene expression profiling of selected wVulC genes throughout host developmental stages in wVulC native host A. vulgare and following wVulC transfection in a heterologous host (C. convexus). C. convexus was selected for comparison with A. vulgare because: (i) wVulC has been shown to induce feminization after transfection in C. convexus, and (ii) A. vulgare and C. convexus differ in their timing of sexual differentiation [41]. Using this strategy, we identified a set of 35 candidate genes out of the 1888 wVulC genes that may be implicated in feminization induced by Wolbachia wVulC in the host A. vulgare, two of which appear to be particularly promising.

Terrestrial Isopod Lines
All C. convexus individuals naturally infected with Wolbachia strain wCon were from our laboratory line CCw, derived from individuals caught in Avanton, France, in 2004. All C. convexus individuals carrying Wolbachia strain wVulC were from our laboratory line AW (derived from individuals caught in Villedaigne, France, in 1997) and were experimentally infected with Wolbachia strain wVulC (from our laboratory line ZN, derived from individuals caught in Celles sur Belle, France, 1991), as described in Badawi et al. [41]. All A. vulgare individuals naturally infected with Wolbachia strain wVulC were from our laboratory line ZN. Isopods were reared at 20 • C with food ad libitum (dead lime tree leaves and carrots) under a natural photoperiod, except those in cross-breeding and juveniles which were reared under a 18L:6D photoperiod.

wCon DNA Enrichment and Genome Sequencing
To identify the C. convexus tissue naturally possessing the highest Wolbachia wCon density, DNA was extracted from ovaries, the nervous chain and the hemolymph of four C. convexus females using a standard phenol/chloroform extraction [42]. Proportions of wCon, mitochondrial and nuclear DNA were estimated with quantitative PCR targeting single copy genes wsp, cytochrome oxidase I (COI) and androgenic hormone (AH), respectively. Reactions and cycle conditions were as described in Le Clec'h et al. [43] and in Table S1. Copy numbers were estimated using calibration range curves as described in Le Clec'h et al. [43]. We assessed the ratios of the base pair number from mitochondria (R mt ) and nucleus (R nuc ) relative to the wCon genome, as follows: R mt = (G mt × N COI )/(G Wo × N wsp ) and R nuc = (G nuc × N AH )/(G Wo × N wsp ), where G mt , G nuc and G Wo are mitochondrial, nuclear and Wolbachia genome sizes, respectively, and N COI , N AH and N wsp are COI, AH and wsp copy numbers, respectively. Absolute genome sizes were estimated as 1.7 Mb and 2 Gb for wCon and nuclear genomes respectively, based on information available for the closely related Wolbachia wVulC/ A. vulgare complex (http://www.genomesize.com/). C. convexus mitochondrial genome length is 14 kb, based on sequencing data [44].
Ovaries were selected for further analyses as they contained the highest ratio of wCon copies. We dissected ovaries, as well as nerve cords of 30 C. convexus sisters infected with the wCon Wolbachia strain. First, we performed PCR amplification and Sanger sequencing of the Wolbachia-specific markers wsp and ftsZ using the DNA of the nerve cords from each of the 30 females as templates and then verified that the Wolbachia strain was indeed wCon (see Table S1 for primers and PCR conditions). Next, to enrich the sample in wCon DNA, we homogenized the ovaries of the 30 females with a Dounce tissue grinder B within a PBS solution containing sucrose (0.25 M) and L-glutamine (5 mM) to crush cells while keeping nuclei intact. We then passed the solution through a 5 µm filter that specifically retained nuclei but not Wolbachia. Next, the DNA of the Wolbachia-enriched solution was extracted using a standard phenol/chloroform procedure. RNA contaminants were removed with RNase A treatment (0.2 µg/µL at 37 • C for 1 h). Then, the sample was purified through another round of phenol/chloroform extraction. We determined the relative proportion of wCon DNA relative to mitochondrial and nuclear DNA with quantitative PCR (qPCR), as described above. The DNA sample was used by GenoScreen (Lille, France) to prepare a Nextera sequencing library. The library was sequenced by GenoScreen in a 1/4 and a 1/8 454 GS FLX sequencer runs with Titanium chemistry.
Bad quality reads and mitochondrial reads were filtered out and the remaining reads were assembled de novo with gsAssembler (Newbler 2.6) with default parameters except seed step set to 1, yielding 5153 contigs. wCon contigs were retrieved using BLAT (minimal identity set to 60), Mauve and R2CAT [45][46][47]. Six representative Wolbachia reference genomes from supergroups A and B were used to retrieve wCon contigs: wMel (NC_002978.6), wRi (NC_012416.1), wPip-Pel (NC_010981.1), wAlB (NZ_CAGB00000000.1), wVitB (NZ_AERW00000000.1) and wVulC (ALWU00000000). The assembly of 389 contigs showing high similarity with Wolbachia genomes were manually improved by exploiting the de Bruijn graph to join contigs truncated due to repeats, which resulted in a final wCon assembly of 237 contigs. Gene prediction and annotation was performed with PROKKA pipeline [48] and manually curated in artemis. Clusters of Orthologous Groups of proteins (COG) were assigned to each predicted protein using BLASTp with minimal identity of 70% on at least 70% of the protein sequences. The annotated genome sequence and raw sequence data were deposited in GenBank under bioproject accession number PRJNA439208.

In Silico Classification of wVulC Genes
To determine a list of feminization candidate genes, all annotated genes from the wVulC genome (GenBank accession number ALWU00000000) were processed using a homemade pipeline of Perl scripts (Supplementary Materials 1). First, according to genome annotation, all repeats (including transposable elements, genes coding for prophages and other repeated genes), pseudogenes (including all coding sequences of genes annotated as "pseudogenes" and small coding sequences of genes split in two chunks) and small peptides (<50 amino acids) were removed. Repeats were discarded because they were not well suited for downstream expression analyses by PCR. Pseudogenes and small peptides were discarded as they are unlikely to be functional genes. Second, all genes belonging to the core genome of Wolbachia strains that infect arthropods were discarded. This core genome was computed using OrthoMCL [49] with a minimal identity of 70% and a minimal e-value of 10 −6 , using the following five complete Wolbachia genomes from supergroups A and B: wMel (NC_002978.6), wRi (NC_012416.1), wHa (NC_021089.1), wNo (NC_021084.1), wPip-Pel (NC_010981.1). Finally, all proteins from wVulC showing 100% similarity with their homolog in the closely related, non-feminizing wCon genome on at least 90% of their length, based on BLASTp searches, were discarded. The remaining genes were considered as candidates and were selected for expression analyses.

Expression of wVulC Candidate Genes throughout Host Sexual Development
To investigate the expression of Wolbachia candidate genes during host sexual development, genetic crosses involving A. vulgare and C. convexus individuals were performed between uninfected males and females carrying the wVulC Wolbachia strain. In the terrestrial isopods A. vulgare and C. convexus, embryos that inherited the feminizing Wolbachia strain wVulC develop into functional females. This conversion of genetic males into functional phenotypic females occurs during host sexual differentiation. In brief, A. vulgare juvenile development occurs within a period of 10 to 15 weeks after the release of juveniles from the female ventral pouch. During this period, eight post-embryonic stages were defined, each corresponding to an intermolt stage. Gonads differentiate during stages 4 to 6 and after this period the experimental reversion of gonadal sex becomes impossible [50][51][52]. Therefore, Wolbachia is assumed to act before or during sexual differentiation as it inhibits male gonad differentiation and hence converts genetic males into phenotypic females. C. convexus has a different sexual differentiation timing compared to A. vulgare, as it shows a one-stage shift and occurs during stages 3 to 5 [41]. After hatching, juveniles were sampled individually 2-3 days after each molt, from the first molt (stage 1) to the seventh molt (stage 7), and stored in liquid nitrogen.
First, we tested whether wVulC candidate genes are expressed in any of the gonadal differentiation key stages (3 to 6) in A. vulgare juveniles. We co-extracted DNA and RNA of 3, 2, 2 and 2 juvenile pools sampled from stages 3 to 6 respectively, using the AllPrep DNA/RNA kit (Qiagen ® , Venlo, The Netherlands), according to the manufacturer's instructions. We confirmed wVulC presence in each sample by amplifying molecular markers gatB and wsp by PCR (Table S1 for primers and PCR conditions). RNA samples were then treated with DNase I (4u at 37 • C for 1h and inactivated at 75 • C for 10 min, New England Biolabs ® , Ipswich, MA, USA) and cDNA synthesis was performed using the Superscript SSIII kit (Invitrogen ® , Carlsbad, CA, USA), according to the manufacturer's instructions, and using 250 ng of RNA (estimated with Nanodrop, Thermo Fisher, Waltham, MA, USA). Then, RT-PCR was performed on all candidate genes (Table S2 for primers and PCR conditions) and PCR products were run on an agarose gel 1.5%. After staining the gel for 10 min in an ethidium bromide bath, bands were revealed under UV light. Candidate genes being expressed during at least one host developmental stage were selected for expression quantification.
Wolbachia wVulC gene expression was quantified by quantitative real-time PCR (qRT-PCR) on three successive samplings. The first sampling was composed of 40, 24, 16, 16 and 16 A. vulgare juvenile pools sampled from stages 2 to 6, respectively. The second sampling was composed of triplicated pools of 10, 10, 9, 6, 6, 6 and 6 A. vulgare juveniles corresponding to stages 1 to 7, respectively. The third sampling was composed of triplicated pools of eight C. convexus juveniles for each stage from 1 to 7. All quantitative experiments were conducted with two technical replicates, using a LightCycler LC480 device (Roche ® , Basel, Switzerland). Reactions were made in a total volume of 10 µL composed of 1× Fast SYBR GREEN Master Mix (Roche ® , Basel, Switzerland), 5 µM of each primer, and 2.5 µL of 4×-diluted cDNA synthesized as above. Cycle conditions were as described in Le Clec'h et al. [43].
To identify appropriate reference genes, we measured the expression of 15 Wolbachia housekeeping genes (wsp, coxA, atpD, sucB, gatB, gltA, L2, L20, S4, fabF, pyrB, purF, tkt, hcpA and fbpA) by qRT-PCR in A. vulgare stages 2 to 6. We then selected those exhibiting a highly correlated expression (r > 0.95) using the BestKeeper software (version 1) [53] (Table S1 for primers and PCR conditions). At last, we verified that the expression of the selected genes remained constant across stages in A. vulgare and C. convexus relative to the wsp copy number. This procedure resulted in the validation of six genes: gltA, L2, L20, S4, fabF and purF. Expression ratios were calculated by combining efficiency and 2 ∆∆ct , according to LightCycler 480 SW v1.5 software manual (Roche ® , Basel, Switzerland). Efficiency and cycle threshold were assessed by real-time PCR (RT-PCR) Miner [54]. Then, each expression ratio was calibrated against stage 2 (corresponding to the earliest stage for which all genes were amplified in all samplings). Under-and over-expression were considered with thresholds of 0.5 and 2, respectively for all successive samplings.
To compare wVulC gene expression patterns between A. vulgare and C. convexus hosts, we used cross-correlation tests using R software [55]. This test measures correlation by taking into account gene expression patterns shifted between stages (hence taking into account the one-stage shift of sexual differentiation that occurs between the two hosts [41]). This test was applied after an autocorrelation test aimed at verifying the absence of autocorrelation between the different time steps. In positive cross-correlation tests, we computed the η 2 value to verify that the variance did not influence the cross-correlation test.

Identification of Candidate Genes in the f Element
Considering the f element as the Wolbachia nuclear insert characterized by Leclercq et al. [16], the presence of the final set of feminization candidate genes from wVulC in the f element was investigated using BLASTp against f element sequences sensu Leclercq et al. [16] (GenBank accession numbers LYUU01002088.1-LYUU01002096.1), with a maximal e-value set to 0.001 and at least 90% of identity.

Sequencing of wCon Wolbachia Genome
To sequence the wCon Wolbachia genome, we first applied an enrichment method, as Wolbachia DNA is often in low abundance compared to host genetic material (nuclear and mitochondrial).
To evaluate which C. convexus tissue contains the highest abundance in Wolbachia DNA, we quantified three single-copy genes by means of qPCR (androgenic hormone for nuclear DNA, cytochrome oxidase I for mitochondrial DNA and wsp for Wolbachia DNA) in three different tissues (gonads, hemolymph and nervous cord). The highest level of wCon DNA was found in gonads (~1 Wolbachia bp for 330 nuclear bp), followed by the nervous cord (~1 Wolbachia bp for~720 nuclear bp) and hemolymph (~1 Wolbachia bp for~34,720 nuclear bp) ( Figure 1A). These figures correspond to 1-4 Wolbachia cells per host cell on average. The quantity of mitochondrial nucleotides was lower than that of Wolbachia (~1 Wolbachia bp for~0.4 mitochondrial bp in ovaries; Figure 1B). Subsequently, we applied our DNA extraction and enrichment protocol to a pool of 30 gonad pairs from C. convexus sisters, reaching a >300-fold enrichment in wCon DNA ( Figure 1C,D).

Identification of Candidate Genes in the f Element
Considering the f element as the Wolbachia nuclear insert characterized by Leclercq et al. [16], the presence of the final set of feminization candidate genes from wVulC in the f element was investigated using BLASTp against f element sequences sensu Leclercq et al. [16] (GenBank accession numbers LYUU01002088.1-LYUU01002096.1), with a maximal e-value set to 0.001 and at least 90% of identity.

Sequencing of wCon Wolbachia Genome
To sequence the wCon Wolbachia genome, we first applied an enrichment method, as Wolbachia DNA is often in low abundance compared to host genetic material (nuclear and mitochondrial). To evaluate which C. convexus tissue contains the highest abundance in Wolbachia DNA, we quantified three single-copy genes by means of qPCR (androgenic hormone for nuclear DNA, cytochrome oxidase I for mitochondrial DNA and wsp for Wolbachia DNA) in three different tissues (gonads, hemolymph and nervous cord). The highest level of wCon DNA was found in gonads (~1 Wolbachia bp for ~330 nuclear bp), followed by the nervous cord (~1 Wolbachia bp for ~720 nuclear bp) and hemolymph (~1 Wolbachia bp for ~34,720 nuclear bp) ( Figure 1A). These figures correspond to 1-4 Wolbachia cells per host cell on average. The quantity of mitochondrial nucleotides was lower than that of Wolbachia (~1 Wolbachia bp for ~0.4 mitochondrial bp in ovaries; Figure 1B). Subsequently, we applied our DNA extraction and enrichment protocol to a pool of 30 gonad pairs from C. convexus sisters, reaching a >300-fold enrichment in wCon DNA ( Figure 1C,D).  Sequencing of this sample generated 395,850 reads representing a total of 110,934,433 bp. 16,491 reads were filtered out for low quality and 47,065 reads that mapped against C. convexus mitochondrial genome were discarded. In total, 163,099 reads (43%) were mapped against the final set of wCon contigs. Therefore, qPCR estimation of wCon DNA proportion (50%) was close to that obtained by sequencing ( Figure 1D,E). From these reads, we assembled the wCon Wolbachia genome in 237 contigs, with a total size of 2.1 Mb, an average sequencing depth of 25×, a N50 of 4.5 kb and a GC content of 34.7%. Contig size ranged from 404 to 71,427 bp. PROKKA annotation pipeline revealed the presence of 2359 coding sequences with an average size of 702 bp per gene, 35 transfer RNA (tRNA) and 3 ribosomal RNA (rRNA) genes (Table 1). Genome size is likely overestimated since redundant repeats are present at the ends of contigs resulting from manual scaffolding, as described in Section 2.2. Consistently, assignment of annotated coding DNA sequences (CDS) to COG categories revealed that wCon contains a high number of genes in the category "Replication, Recombination and DNA Repair" among the 7 Wolbachia genomes compared ( Figure S2). This COG category includes transposable elements (402 repeats in wCon).
Core genome analysis performed on five complete Wolbachia genomes from arthropods (wMel, wPip-Pel, wRi, wHa and wNo) revealed the presence of 762 orthologous groups in their core genome. We found 752 and 755 of these orthologous groups in the wCon and wVulC genomes, respectively (748 of which are shared by wCon and wVulC). Furthermore, 35 tRNA and 3 rRNA genes were characterized in both wCon and wVulC genomes (Table 1). Altogether, these analyses indicate that we obtained reliable assemblies of the wCon and wVulC genomes, capturing most of the genetic information despite assembly fragmentation (237 and 10 contigs for wCon and wVulC, respectively).

In Silico Determination of Candidate Genes
The wVulC Wolbachia genome contains 1888 annotated coding sequences. A succession of in silico filters (see Materials and Methods) allowed us to establish a set of feminization candidate genes. As a result, based on the wVulC annotation, we excluded all repeats (n = 792), pseudogenes (n = 26), genes <150 bp (n = 52) and small CDS of genes split in two CDS (n = 16). We also removed genes orthologous to the ones identified in the core genome of five Wolbachia strains of arthropods (wMel, wRi, wPip-Pel, wHa and wNo) (n = 721). This number is different from the above core genome analysis (n = 755) because a subset of core genes was discarded as part of previous filters. Next, remaining wVulC proteins were compared to their homologs in the non-feminizing wCon Wolbachia genome (CI-inducing strain), and those showing 100% similarity (n = 16) were excluded (Table S3, Figure S1). The 265 remaining wVulC genes were considered as a first set of feminization candidate genes and were processed for expression studies.

Candidate Gene Expression throughout Host Development
Among the 265 remaining wVulC genes, successful PCR amplification was obtained for 216 candidate genes. Then, gene expression of the 216 candidate genes from the wVulC Wolbachia genome was investigated by RT-PCR throughout A. vulgare development, in particular during the stages of sexual differentiation, i.e., from stages 3 to 6. We found that 139 out of 216 wVulC genes were expressed during at least at one of these host developmental stages (Table S3). Next, gene expression of the 139 candidate genes was investigated by qRT-PCR throughout developmental stages of the host A. vulgare, using two successive samplings. First, we investigated expression profiles of these genes during host developmental stages 2 to 6, using stage 2 for calibration (see Materials and Methods, Figure S3). This analysis identified 13 and 29 genes that were 2-fold under and over-expressed, respectively, in at least one of stages 3 to 6. Expression profiles of these 42 differentially expressed genes were then investigated within the frame of a second sampling. The second sampling was designed to confirm and refine expression profiles of wVulC candidate genes throughout the full development sequence of the host A. vulgare (i.e., from stages 1 to 8). In total, seven genes were below the 2-fold expression threshold for all replicates. Therefore, 35 genes were consistently under or over-expressed, making them the final set of candidate genes.
Finally, expression profiles of the 35 wVulC candidate genes were investigated, during developmental stages 1 to 7 in the heterologous host C. convexus, in which wVulC transfection was shown to be associated with feminization ( Figure 2) [41]. We found that all 35 wVulC genes were expressed during C. convexus development, but only 29 of these genes were significantly down or up-regulated in both hosts ( Figure 2, Table S3). A cross-correlation analysis of gene expression profiles between the two hosts indicated that three wVulC genes exhibited the same gene expression profile throughout the entire developmental sequence in both hosts: wVul_1408 and wVul_1821 with one-stage shift that perfectly matches the one-stage shift of sexual differentiation existing between A. vulgare and C. convexus (Figure 2) [41], and wVul_0881, but without any stage shift between the two hosts ( Figure S4).

Prediction of General Function of wVulC Candidate Genes
We investigated the general function of the 35 wVulC candidate genes that are consistently over or under-expressed throughout A. vulgare development. We were able to infer functions for 12 of these genes, in which we found a total of 14 different domains (up to 4 domains per gene) ( Table 2; Table S4). We unraveled the following predicted domains associated with: 1.
DNA binding: Smc (Structural Maintenance of Chromosomes) in wVul_0067, wVul_0085 and wVul_0375 that binds DNA and acts in organization and segregation of chromosomes for partition [69,70]; AAA (ATPases Associated with diverse cellular Activities) [71] in wVul_0375 that are involved in diverse cellular functions including DNA replication, gene expression and also vesicle-mediated transport and peroxisome assembly; p53 interaction protein in wVul_1775 that is responsible for DNA binding [72].
Membranes: Mitofilin in wVul_0085 that is an inner membrane protein domain of mitochondria that controls cristae morphology [77,78]; IncA in wVul_1360 that is associated with homotypic fusion of inclusions in the obligate intracellular bacterium Chlamydia trachomatis [79]; OmpA-like superfamily in wVul_0359 that is an outer membran protein domain [80]; Trp in wVul_1408 and wVul_1775 that is a transient-receptor-potential calcium channel protein [81].

Presence of Candidate Genes in the f Element
We searched the f element sequence for the 35 wVulC candidate genes. Blast results indicated that 27 genes are present in the f element with copy numbers varying from 1 to 5 (Table S5). The 8 wVulC candidate genes with no hit in the f element were wVul_0609, wVul_0626, wVul_1360, wVul_1408, wVul_1454, wVul_1646, wVul_1660 and wVul_1696. Among the 27 genes present in both wVulC and the f element, 20 presented identical copies at the amino acid level (Table S5). The lowest amino acid identity was observed for wVul_0198 (95.35% identity). There was no significant enrichment of differentially expressed genes in the f element (Fisher's Exact test, p = 0.22).

Discussion
Our analyses resulted in the identification of a reduced set of 35 candidate genes that may be implicated in Wolbachia wVulC-mediated feminization in the native host A. vulgare and the heterologous host C. convexus. The vast majority of wVulC genes that we tested are consistently expressed throughout the entire host development sequence, as previously observed for Wolbachia wMel in its host Drosophila melanogaster [82]. Only one-fourth of genes analyzed by qRT-PCR (35/139) are expressed differentially, of which only one-third are functionally annotated (12/35). For example, we noted the presence of several eukaryotic-like domain proteins (ANK, TPR and LRR) and an outer membrane protein (OmpA-like) in our set of candidate genes. Identifying genes encoding such domains among our candidates is not really surprising, as previous host/Wolbachia molecular interaction studies have often searched for such functional domains when using targeted strategies to identify candidate genes [27][28][29]. One advantage of the strategy we used here was the ability to identify candidate genes without any a priori knowledge of their function. This allowed us to identify functionally annotated candidate genes among the "usual suspects", as well as candidate genes for which no functional information is available. Genes from the latter category may be prime choices for future functional studies, in the context of feminization induced by wVulC. This point may be relevant in identifying Wolbachia effectors, as exemplified by cifA and cifB, and their homologs cidA and cidB, which were not functionally annotated before their identification as CI-effectors in wMel and wPip-Pel strains, respectively [35,36].
The candidate gene approach we used also has its own potential limitations. First, as it largely relies on gene expression patterns, we implicitly assumed that gene expression and protein expression are correlated, which may not necessarily be true. This assumption is, however, generally made by studies involving gene expression. Second, we opted to discard core genes and mobile genetic elements. In particular, the rationale for removing mobile genetic elements was their repeated nature that makes them poorly suited to the design of specific PCR assays. Nevertheless, it is clear that genes encoded by mobile genetic elements may be involved in host-Wolbachia interactions, as perfectly exemplified by the cifA and cifB genes that are located in prophage WO [35,36]. However, our strategy did not dismiss those genes as candidate genes. Instead, we opted to give them lower priority in our expression analyses. Third, our approach is based on gene expression profiling throughout host development. However, during host development, Wolbachia is likely involved in various biological processes unrelated to feminization. For example, while host cells are dividing, Wolbachia cell population is also in expansion, replicating fast and colonizing new cells [83][84][85][86][87][88]. Therefore, a subset of our candidate genes may have passed our successive filters not because they are involved in feminization but because they are involved in a confounding biological process taking place during host sexual development.
In this context, the three domains Smc, Hook and Spc7 found in the wVul_0067, wVul_0085 and wVul_0375 candidate genes, are consistent with the involvement of these candidate genes in cell division [67][68][69][70]. Moreover, Wolbachia cell multiplication requires lipid recruitment to generate new membranes and additional nutrient supply. Thus, it is conceivable that the wVul_1360 candidate gene containing an IncA domain may contribute to lipid recruitment, as it seems to act in C. trachomatis [79,89,90]. This may also explain the presence of several lipase or peptidase domains among our candidate genes, which may contribute to additional nutrient supply. Interestingly, two candidate genes (wVul_1408 and wVul_1775) possess both a eukaryotic-like domain (ANK) and a peptidase/protease domain, suggesting that Wolbachia might cleave host peptides.
Among the 35 feminization candidate genes we identified, 27 were also found in the f element, which is a piece of a feminizing Wolbachia genome horizontally transferred in the nuclear genome of A. vulgare and involved in female sex determination [16]. Assuming that the molecular genetic basis of feminization by Wolbachia and the f element is the same, the 27 aforementioned genes are candidates for acting as master sex determination genes in the A. vulgare genome. However, in spite of the high nucleotide similarity of these genes in wVulC and the f element, their function or regulation may not necessarily be conserved in the two genetic entities. This may be the case especially when considering that Wolbachia genes are located in the cytoplasm and controlled in a prokaryotic context while f element genes are located in the nucleus and controlled in a eukaryotic context. Furthermore, feminization induced by Wolbachia and the f element presents differences (reviewed in [15]). For example, it is possible to experimentally masculinize young females carrying the f element by grafting of androgenic glands [91]. By contrast, young females infected by Wolbachia are not masculinized using the same experimental procedure [92,93]. Thus, the underlying mechanisms of feminization by the f element and Wolbachia may differ, at least partly. Nevertheless, given that the f element is derived from Wolbachia, it is parsimonious to consider that the molecular genetic basis for feminization shows at least some overlap between the two feminizers. For this reason, it is relevant to consider the 27 Wolbachia feminization candidate genes present in the f element as candidate genes for master sex determination in females carrying the f element. In this context, it will be useful to investigate the expression profiles of these candidate genes during sexual development of A. vulgare females carrying the f element.
In conclusion, we identified a set of 35 Wolbachia wVulC genes that may be involved in the induction of cytoplasmic sex determination in the terrestrial isopod A. vulgare. Our results open the avenue for further functional analyses of these candidate genes, which may provide additional insights into the molecular genetic basis and cellular mechanisms of feminization. Of the 35 genes, 2 genes (wVul_1408 and wVul_1821) presented an expression profile in the heterologous host C. convexus which is correlated with that in the natural host A. vulgare and showing a one-stage shift corresponding to the one-stage shift existing in the sexual development sequence of A. vulgare and C. convexus. While wVul_1821 contains no known functional domain, wVul_1408 contains two known functional domains, including an Ankyrin Repeat Domain and a protease domain. Although wVul_1408 is absent from the f element, it nevertheless constitutes a prime candidate for being involved in Wolbachia wVulC-mediated feminization. While wVul_1408 may constitute a top candidate in subsequent studies aiming at unraveling the molecular genetic basis of feminization, it may also be important to invest efforts in the identification of the function of other candidate genes such as wVul_1821. This will ultimately allow us to better understand the impact of Wolbachia endosymbionts on the mechanisms of sex determination of their terrestrial isopod hosts. It will also shed new light on the role of Wolbachia in sex chromosome turnovers experienced by A. vulgare in particular [16], and terrestrial isopods in general [94].
Author Contributions: M.B., P.G. and R.C. conceived and designed the experiments; M.B. and I.G. performed the experiments; M.B. and B.M. analyzed the data; P.G. and R.C. contributed reagents/materials/analysis tools; M.B., P.G. and R.C. wrote the paper.