The Mitochondrial Genome of Amara aulica (Coleoptera, Carabidae, Harpalinae) and Insights into the Phylogeny of Ground Beetles

Carabidae are one of the most species-rich families of beetles, comprising more than 40,000 described species worldwide. Forty-three complete or partial mitochondrial genomes (mitogenomes) from this family have been published in GenBank to date. In this study, we sequenced a nearly complete mitogenome of Amara aulica (Carabidae), using a next-generation sequencing method. This mitogenome was 16,646 bp in length, which encoded the typical 13 mitochondrial protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, and a putative control region. Combining with the published mitogenomes of Carabidae and five outgroup species from Trachypachidae, Gyrinidae and Dytiscidae, we performed phylogenetic estimates under maximum likelihood and Bayesian inference criteria to investigate the phylogenetic relationships of carabid beetles. The results showed that the family Carabidae was a non-monophyletic assemblage. The subfamilies Cicindelinae, Elaphrinae, Carabinae, Trechinae and Harpalinae were recovered as monophyletic groups. Moreover, the clade (Trechinae + (Brachininae + Harpalinae)) was consistently recovered in all analyses.


Introduction
The Carabidae, also known as carabid beetles or ground beetles, are among the most species-rich families in Caraboidea. They currently comprise more than 40,000 described species worldwide, which can be classified into 16 subfamilies [1] and 86 tribes [2][3][4]. Carabid beetles are often considered as indicators of ecological changes, and are used as the biocontrol agents against insect pests in crops [5][6][7]. Furthermore, some researches indicated that carabids could contribute to weed management in agroecosystems (as reviewed in [8]).
The taxonomy of carabid beetles has been extensively studied. Traditionally, phylogenetic reconstructions of carabids are based on the morphological characters, for example, the male [9,10] and female genitalia [11] and the wing folding structures [12]. Liebherr and Will (1998) recovered Carabidae as a non-monophyletic assemblage, with the characters of the female reproductive tract [13]. By analyzing the larval morphology, Arndt (1998) retrieved Carabidae as a monophyletic group, with the members of Rhysodidae excluded [14]. Kavanaugh (1998) investigated the relationships among the basal carabids and recovered Trachypachidae as sister to all carabid taxa [15]. The Cicindelinae (tiger beetles) was found to be related to the tribes Carabini, Cychrini, Cicindelini and Omophronini [15]. Grebennikov and Maddison (2005) analyzed the phylogenetic relationships within the supertribe Trechitae based on larval morphology [16]. Beutel et al. (2006) applied morphological characters of adults and larvae to recover Carabidae as a sister to Omoglymmius (Rhysodidae), which together form In recent years, sequences of mitochondrial genome (mitogenome) have been widely used to investigate insect phylogenetic relationships, molecular evolution and conservation genetics [36,37,[48][49][50][51][52]. As a class of molecular marker, the mitogenome has the characteristics of maternal inheritance, rapid evolution rate, simple genetic structure and rare recombination [53]. The typical insect mitochondrial genome is a closed-circular and double-stranded DNA molecule of nearly 16 kb in length, and contains 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes, two ribosomal RNA (rRNA) genes and one large AT-rich noncoding control region. The mitogenome provides an increasingly complete picture of phylogenetic relationships of insects through a large number of taxon sampling [51,54]. With the development of sequencing technology, next-generation sequencing (NGS) provides a much more cost-effective and time-saving method to generate a great number of mitogenome sequences simultaneously [37,51,55].
In this paper, we sequenced the nearly complete mitogenome of Amara aulica from the subfamily Harpalinae, by using a next-generation sequencing method. Combined with other 48 published mitogenome sequences, we reconstructed the phylogenetic relationships of the main lineages in Carabidae, under the maximum likelihood (ML) and Bayesian inference (BI) criteria.

Sampling and DNA Extraction
The species A. aulica is native to Europe and has been introduced to Asia and North America [56][57][58]. Adult specimens were collected from Zhengzhou, Henan Province (the geospatial coordinates: 34.723 • N, 113.635 • E). No specific permits were required for the insects sampled for this study.
After the samples were directly killed and preserved in absolute ethanol, they were stored in the dark at −20 • C in Entomological Museum of Henan Agricultural University (voucher number: EMHAU-2015-Zz122902) for further experiment. Total genomic DNA of the individual specimen was extracted from the thorax with the TIANamp Micro DNA kit (TIANGEN BIOTECH CO., LTD, Beijing, China), following the manufacturer's instructions.

Mitochondrial Genome Sequencing and Assembly
Next-generation-sequencing (NGS) technology was applied to obtain the mitogenome sequences. Genomic DNA was pooled with other insect species, which had a distantly phylogenetic relationship to A. aulica. In the pool, the DNA concentrations were approximately equimolar. The library was constructed by using the Illumina TruSeqTM DNA Sample Prep Kit (Illumina, San Diego, CA, USA), with the insert size of 350 bp. Following sequencing was conducted on an Illumina HiSeq2500 platform at Shanghai OE Biotech CO., LTD, with the 150-base paired-end strategy.
NGS QC toolkit [59] was used to filter raw data for quality control. The high-quality reads were assembled using IDBA-UD v. 1.1.1 (Hong Kong, China) [60], with the following settings: the minimum size of contig of 200, an initial k-mer size of 40, an iteration size of 10 and a maximum k-mer size of 90. Three mitochondrial gene fragments (cox1, cob and rrnL) were pre-sequenced for bait sequences, by using traditional polymerase chain reaction and Sanger sequencing. The primers for the polymerase chain reactions were used as those in Song et al. (2016b) [61]. The local-blasting searches were implemented in BioEdit [62], in order to identify the mitochondrial contig.

Mitogenome Annotation and Analysis
The initial mitogenome annotation was conducted in MITOS web [63]. The start codon, stop codon and length of each protein-coding gene were further checked and adjusted by alignment to the published carabid beetle mitogenomes in GenBank. The mitogenome organization of A. aulica was presented in Table 1. The genome structure image was generated in CGView (http://stothard.afns. ualberta.ca/cgview_server/) ( Figure 1). The composition skew was calculated based on the AT-skew = (A − T)/(A + T) and GC-skew = (G − C)/(G + C) formulas [64]. The newly determined mitogenome sequence of A. aulica was deposited in GenBank, accession number MN335930.

Sequence Alignment
Our taxon sample included 49 beetle mitogenome sequences representing 12 subfamilies of Carabidae (44 taxa) and three families of Trachypachidae, Gyrinidae and Dytiscidae as outgroups (five taxa) (Table. 2). The protein-coding genes were aligned separately using TranslatorX [65] with the following parameters: Genetic code = "invertebrate mitochondrial", Protein alignment = "MAFFT", and the stop codons were excluded. Both the mitochondrial tRNA and rRNA genes were aligned using the program MAFFT under the iterative refinement method of "E-INS-i" [66]. The alignments were checked in MEGA 7 [67] and ambiguously aligned positions were manually excluded. Gaps were pruned using the online version of Gap Strip/Squeeze v2.1.0, with 40% Gap tolerance. Finally, the resulting alignments were concatenated together to make the dataset of PCGRNA (including 13 protein-coding genes, 22 tRNA genes two and rRNA genes), with the Perl script FASconCAT_v1.0 [68]. The mean ka (nonsynonymous substitution rate) and ks (synonymous substitution rate) values were calculated using DnaSP version 5 (Barcelona, Spain) [69].

Sequence Alignment
Our taxon sample included 49 beetle mitogenome sequences representing 12 subfamilies of Carabidae (44 taxa) and three families of Trachypachidae, Gyrinidae and Dytiscidae as outgroups (five taxa) ( Table 2). The protein-coding genes were aligned separately using TranslatorX [65] with the following parameters: Genetic code = "invertebrate mitochondrial", Protein alignment = "MAFFT", and the stop codons were excluded. Both the mitochondrial tRNA and rRNA genes were aligned using the program MAFFT under the iterative refinement method of "E-INS-i" [66]. The alignments were checked in MEGA 7 [67] and ambiguously aligned positions were manually excluded. Gaps were pruned using the online version of Gap Strip/Squeeze v2.1.0, with 40% Gap tolerance. Finally, the resulting alignments were concatenated together to make the dataset of PCGRNA (including 13 protein-coding genes, 22 tRNA genes two and rRNA genes), with the Perl script FASconCAT_v1.0 [68]. The mean ka (nonsynonymous substitution rate) and ks (synonymous substitution rate) values were calculated using DnaSP version 5 (Barcelona, Spain) [69].

Phylogenetic Analyses
In the phylogenetic analyses, our taxon sample included 46 beetle species representing 12 subfamilies of Carabidae, namely, Brachininae, Broscinae, Carabinae, Cicindelinae, Elaphrinae, Harpalinae, Nebriinae, Paussinae, Promecognathinae, Rhysodinae, Scaritinae and Trechinae. In addition, two mitogenome sequences from Dytiscidae and Gyrinidae respectively, and one from Trachypachidae were selected as outgroups. A total of 49 mitogenome sequences representing the taxa described above were compiled to make the data matrix of 49taxa_PCGRNA.
Phylogenetic trees were built based on the dataset of 49taxa_PCGRNA, under the maximum likelihood and Bayesian inferences. Maximum likelihood analysis was carried out using IQ-TREE [70] and applied the data partition schemes and best-fitting models pre-determined by PartitionFinder 2 [71] (Table S1). The data blocks were defined by genes and codon positions. Branch support was assessed using fast bootstrap analysis with 10,000 replicates. The Bayesian analysis was performed using PhyloBayes MPI v.1.5a [72]. Two parallel runs with four chains were performed, and started from a random topology. The site-heterogeneous CAT-GTR model was used for the analysis, which was originally developed to reduce long-branch attraction artifacts by modelling site-specific features of sequence evolution [73]. Convergence of runs was assessed using bpcomp program implemented in PhyloBayes to ensure that analyses had reached stationarity and that the maxdiff value was less than 0.1. Trees sampled after the burn-in from the two runs were combined and used to build a 50% majority rule consensus tree, with bpcomp program.
To investigate the potential effect of long-branch taxa on tree reconstruction, we compiled a reduced taxon dataset, namely the dataset of 48taxa_PCGRNA. In which, the long-branched Rhysodes sp. was removed. The same phylogenetic analyses were repeated with the dataset of 48taxa_PCGRNA. The sequence alignments supporting the phylogenetic results generated in this article are available in figshare (DOI: 10.6084/m9.figshare.11669280).

Next-Generation Sequencing Output and Mitogenome Organization
In total, 4,110,380 mapped bases were generated by sequencing from the Illumina HiSeq2500. The mean base coverage of the mitochondrial contig was 248-fold. The nearly complete mitogenome of A. aulica was 16,646 bp in length. The only gap occurred in the putative control region.
The obtained mitogenome of A. aulica consisted of 13 protein-coding genes, 22 tRNA genes, two rRNA genes and a partial control region ( Figure 1). There are 23 genes encoded on the heavy strand, while the remaining 14 genes encoded on the light strand. The organization of A. aulica mitogenome was compact, because only 29 bp gene overlaps were identified in 11 gene junctions, with the length ranging from one to seven nucleotides. A total of 116 bp intergenic spacers were found in 16 positions, which had the lengths ranging from one to 32 bp. The largest intergenic regions (32 bp) lied between trnW and trnC. The average nucleotide composition of the full mitogenome sequence was: A = 41.2%, T = 39.2%, C = 11.5% and G = 8.0%, which shows a strong bias towards A and T nucleotides (80.4%). In the A. aulica mitogenome, the AT-skew is 0.025, whereas the GC-skew is −0.179 (Table 3).

Protein-Coding Gene
The protein-coding genes had a total length of 11,194 bp, which encoded 3719 amino acid residues and the 37 bp stop codons. Nine out of 13 protein-coding genes were encoded on the heavy strand, while the remaining four were encoded on the light strand. All the protein-coding genes started with the typical ATN codons, except for the cox1 gene. The start codon ATT was used for nad3, nad5, nad4l and atp8, ATG for cox2, cox3, atp6, nad4 and cob, ATA for nad1, nad2 and nad6. The cox1 gene was initiated with the unusual CGA. The cox2 gene used a single T as the stop codon, while the rest of protein-coding genes ended with the complete termination codon TAA or TAG. The relative synonymous codon usage (RSCU) of A. aulica mitogenome are presented in Figure S1 and Table S2. The results showed that UUA (Leu2), AUU (Ile), UUU (Phe), AUA (Met) and AAU (Asn) were the five most frequently used codons. It was obvious that all of them were AT-rich codons. The A+T content of protein-coding genes was 78.5%, and the third codon positions had the highest A + T content (Table 3).

Transfer RNAs
Twenty-two tRNA genes were identified in the mitogenome of A. aulica and ranged in length from 64 bp to 72 bp. The full length of tRNA genes was 1478 bp. Fourteen tRNA genes were located on the heavy strand, and the remaining eight were encoded on the light strand. The inferred secondary structures for tRNAs are provided in Figure S2. With the exception of trnS1, all tRNA genes can be folded into the typical cloverleaf secondary structure. In the structure of trnS1, the dihydrouridine arm was replaced by a simple loop, which is a common character in most of insect mitogenomes published.

Ribosomal RNAs
The large ribosomal gene (rrnL) was 1293 bp in length, which was located between the trnL (CUN) and trnV. The small ribosomal gene (rrnS) was 699 bp, which was located between trnV and the control region. The inferred secondary structures of both rrnL and rrnS are shown in Figures S3 and S4. The secondary structure of rrnL contained five domains (labeled I, II, IV, V and VI) and 50 helices. The rrnS gene was composed of three domains (labeled I, II, III) and 30 helices.

Phylogenetic Analysis
Based on the results from PartitonFinder, six partition schemes were selected for the dataset of 49taxa_PCGRNA, and the GTR+I+G or GTR+G model was the preferred model for the corresponding partition (Table S1). Both Bayesian trees and ML trees revealed an extremely long terminal branch corresponding to the Rhysodes (Figures 2 and 3). Moreover, the placement of Rhysodes varied between analyses. In the ML tree under the site-homogeneous GTR model, the Rhysodes was retrieved as sister group to Cicindelinae, and both together were sister to the remaining carabid beetle lineages (including Trachypachidae). This branching pattern may be due to long-branch attraction effect. The substitution rate analyses indicated that the Rhysodes has been engaged in a process of accelerated rate of evolution, with the highest ka/ks values among the species analyzed (Table 4). In the long-branch extraction analyses, the removal of the Rhysodes did not change the tree topology greatly (Figures S5 and S6).    The family Trachypachidae always embedded within Carabidae, rendering the latter to be a non-monophyletic assemblage. In the ML analysis based on the dataset of 49taxa_PCGRNA, the Trachypachidae was the sister to the subfamily Carabinae, whereas in the Bayesian analysis based on the same dataset, the Trachypachidae was placed in an intermediate position between the subfamily Cicindelinae and the remaining carabid beetles.  The family Trachypachidae always embedded within Carabidae, rendering the latter to be a non-monophyletic assemblage. In the ML analysis based on the dataset of 49taxa_PCGRNA, the Trachypachidae was the sister to the subfamily Carabinae, whereas in the Bayesian analysis based on the same dataset, the Trachypachidae was placed in an intermediate position between the subfamily Cicindelinae and the remaining carabid beetles.
Within the family Carabidae, four subfamilies with multiple taxon sampling (Cicindelinae, Carabinae, Elaphrinae and Harpalinae) were consistently recovered as monophyletic groups with high support (BP ≥ 96, PP ≥ 0.92). The Cicindelinae were placed as sister group to the remaining ingroup taxa. The monophyly of Trechinae remained elusive due to the ambiguous classification of the exemplar of Carabidae sp. (GenBank accession number: KT696200). A sister group relationship between Brachininae and Harpalinae was strongly supported (BP = 89, PP = 0.98). The phylogenetic positions of the remaining carabid subfamilies were unstable across phylogenetic analyses.
The subfamily Harpalinae had the largest taxon coverage in this study, which allowed us to address some lower taxonomic relationships within this group. The newly sequenced A. aulica was strongly supported as a sister to Amara communis (BP = 100, PP = 0.99). At the tribe level, the Pterostichini was found to be paraphyletic, with Sphodrini embedded therein. The Zabrini formed a sister group to the clade comprising Pterostichini and Sphodrini. The relationships between the rest of harpaline tribes remained largely unresolved in the Bayesian trees ( Figure 3, Figure S6). In contrast, ML trees elucidated a clearer relationship: the intermediate position of Harpalini, and a sister-group relationship of Hexagoniini with Lebiini ( Figure 2, Figure S5).

Discussion
Previous studies have shown that the site-heterogeneous CAT-GTR model implemented in Bayesian analysis can effectively suppress the long-branch attraction artefacts in the animal phylogeny [52,[74][75][76][77]. The long-branched Rhysodinae was pulled toward a more derived position and away from the Cicindelinae in the PhyloBayes trees. However, analyses using the site-heterogeneous CAT-GTR model showed limited resolution on the subfamily relationships among Promecognathinae, Paussinae and Elaphrinae (Figure 3, Figure S6).
The family Carabidae was recovered as non-monophyletic, with respect to Trachypachidae (Figures 2 and 3, Figures S5 and S6). Maddison et al. (2009) [24] supported the nested placement of Trachypachidae within a monophyletic Geadephaga, based on the nuclear gene sequences. However, the sister group of Trachypachidae within Geadephaga is undetermined. Trachypachids were placed with Carabitae, migadopines, elaphrines or a large clade comprising the majority of carabids [24]. In the study of Mckenna et al. (2015) [42] with expanding nuclear gene markers, the placement of Trachypachidae was still unclear. It clustered with Calosoma (Carabidae) or other Carabini [42]. These branching patterns resulted in a paraphyletic Carabidae. The similar situation was revealed in the current analyses based on the mitogenome sequence data.
Within Carabidae, the subfamily relationships changed depending on analyses. Compared with ML trees, the deep divergences among several carabid subfamilies were unresolved in the Bayesian trees ( Figure 3, Figure S6). Tree topology comprising very short internodes of early divergences occurred frequently in phylogenetic analysis [78,79]. Lack of resolution may be owing to non-optimal substitution rates, insufficient and conflicting phylogenetic signal. The short internal branches associated with the deep-level relationships of carabids (the large polytomy) also emerged in the prior studies [23]. The authors attributed this to inappropriate methods of inference. Rogue taxa may be another factor leading to weak nodal support and very short internal branches [38]. In addition, rapid radiation of beetle insects may explain the generally short diverging nodes between major groupings at the base of the carabid tree. A large clade comprising Trechinae, Brachininae and Harpalinae was consistently recovered in all analyses. The Brachininae formed a sister group to Harpalinae, both of which were sister to Trechinae. These two sister group relationships were strongly supported (BP ≥ 89, PP ≥ 0.98). This result was concordant with previous studies [23,45].

Conclusions
The Harpalinae is a megadiverse group within the family Carabidae. However, mitogenome sequences available for Harpaline are very limited. Here, we presented the detailed description of the nearly complete mitogenome of A. aulica (Carabidae, Harpalinae). In this mitogenome, gene order and content are consistent with the hypothesized ancestral insect [49]. The new mitogenome sequence was added to investigate the phylogenetic relationships among carabid beetles. The results supported the Carabidae to be a non-monophyletic group with respect to the Trachypachidae. Four subfamilies within Carabidae were strongly supported, namely Cicindelinae, Carabinae, Elaphrinae and Harpalinae. The Cicindelinae was retrieved as sister to all other carabid lineages. The Trechinae (including Carabidae sp.-KT696200) formed a sister group to the clade of (Brachininae + Harpalinae). These results demonstrated that mitogenome sequences can be useful for resolving the subfamily relationships of Carabidae.