Phylogenomic Reconstruction Sheds Light on New Relationships and Timescale of Rails (Aves: Rallidae) Evolution

: The integration of state-of-the-art molecular techniques and analyses, together with a broad taxonomic sampling, can provide new insights into bird interrelationships and divergence. Despite their evolutionary significance, the relationships among several rail lineages remain unresolved as does the general timescale of rail evolution. Here, we disentangle the deep phylogenetic structure of rails using anchored phylogenomics. We analysed a set of 393 loci from 63 species, representing approximately 40% of the extant familial diversity. Our phylogenomic analyses reconstruct the phylogeny of rails and robustly infer several previously contentious relationships. Concatenated maximum likelihood and coalescent species-tree approaches recover identical topologies with strong node support. The results are concordant with previous phylogenetic studies using small DNA datasets, but they also supply an additional resolution. Our dating analysis provides contrasting divergence times using fossils and Bayesian and non-Bayesian approaches. Our study refines the evolutionary history of rails, offering a foundation for future evolutionary studies of birds.


Introduction
The rails (Aves: Rallidae) are a remarkable group of birds with many secretive species that are difficult to observe and others that have managed to colonise very isolated islands despite their apparent poor ability to fly [1][2][3][4]. It is a widespread family of birds extensively distributed throughout insular and continental settings and only absent in polar regions, waterless deserts, and mountains above the snow line. Interest in rail diversity, evolution, and biogeography has led to studies of their relationships using morphology and PCR-based Sanger approaches [5,6]. However, phylogenetic inferences for rails have proven difficult with morphology [5], and despite considerable effort with molecules (e.g., [6][7][8][9][10]), their relationships and diversification dynamics remain poorly understood, with disagreements regarding key relationships due to few markers and species availability. Recently, one study utilized mitogenomes to explore their basal relationships and diversification, but sampling was too limited for understanding deep evolutionary patterns as they lacked good taxon representation at the genus level [11].
Garcia-R et al. [12] published the most comprehensive molecular phylogeny of the rails using approximately 70% of recognised extant and recently extinct rail species diversity. This study clarified many of the systematic issues within the group and established eight main clades ("Aramides", "Rallus", "Fulica", "Laterallus", "Porzana", "Gallicrex", Porphyrio, and Rallina) to help resolve taxonomic confusion arising from evolutionary convergence. Several important aspects of the systematics and taxonomy of the rails have been in flux after the study of Garcia-R et al. [12], including the re-establishment of the genus Zapornia to several species previously placed in Porzana and the description of a new genus in recent years [13]. Furthermore, some relationships are just beginning to settle, including the monotypic ocellated crake (Micropygia schomburgkii) from the tropical forest of the New World [14] and a species within Atlantisia [15]. Nonetheless, some relationships remain largely unresolved or understudied, despite their evolutionary significance, and the general timescale of rail evolution is poorly known. For example, the relationships of the endemic genus Rallicula from New Guinea and the monotypic Rouget's rail (Rougetius rougetii) from east Africa have not been explored from a molecular perspective, and their placement on the rail phylogeny is uncertain.
The deep origin of the group is also a matter of debate, because important discrepancies persist in the phylogenetic signals retrieved from fossils and DNA data [16][17][18][19][20][21][22][23]. The adequate diagnostic material of fossil rails in continental deposits and subfossil insular endemics has been assigned to modern genera, showing that young crown group lineages were present during Pliocene and Pleistocene times (< 5 Mya) [24,25], while older fossils appear not to be directly ancestral to current rallid crown groups [26][27][28][29]. Garcia-R et al. [11], using mitochondrial DNA (mtDNA) genomes, showed that the temporal origin and diversification of the rails occurred during the Eocene ca. 40.5  Mya. By comparison, previous studies with reduced sampling using nuclear DNA (nDNA) sequences and different calibration constraints [17,30,31] estimated the origin of the Rallidae to be of the Miocene age ca. 20 Mya, nearly half the age estimated for mtDNA.
Genome-scale data can facilitate the construction of well-supported phylogenies across the Tree of Life and improve the estimation of accurate and precise molecular dating. The recent application of dense taxon sampling using large numbers of genes through modern phylogenomic approaches has resolved relationships and dates in groups of birds (e.g., [22]) and other vertebrates (e.g., [32]). Here, we disentangle the deep phylogenetic structure of the rails using anchored phylogenomics. This is the first evolutionary hypothesis of the rails derived from genomic-scale data, and we aim to use these data to resolve several previously contentious relationships. We analysed a dataset of 393 loci for 63 species, representing approximately 40% of the extant familial diversity, including all major lineages and deep branches of the tree. The taxa selected were chosen attempting to maximize sampling from different clades while prioritising time and resources. The integration of state-of-theart molecular techniques, together with a broad taxonomic sampling and analytical approaches, provides new insights into rail relationships and evolution.

Data Collection
We collected tissues from 63 rails, including species not previously found in other molecularbased approaches. Genomic DNA was extracted using the standard protocol of the QIAGEN Dneasy kit. Prior to library preparation, the quantity and quality of the DNA extractions were inspected using Qubit and a 2% TAE agarose gel, respectively. Sequencing data were generated in an Illumina HiSeq2500 platform at the Center for Anchored Phylogenenomics at Florida State University (www.anchoredphylogeny.com), following Lemmon et al. [33]. Briefly, the DNA extracted was fragmented (150-350 bp) using a Covaris E220 focused-ultrasonicator with Covaris microTUBES. Subsequently, library preparation and indexing were performed on a Beckman-Coulter Biomek FXp liquid-handling robot, following a protocol modified from Meyer and Kircher [34]. Anchored hybrid enrichment was performed using a custom SureSelect kit (Agilent Technologies) targeting loci derived from Prum et al. [22]. Sequencing was performed in the Translational Science Laboratory in the College of Medicine at Florida State University.

Sequence Assembly and Alignment
After sequencing, paired reads were processed downstream, following the methods described in Prum et al. [22]. Paired reads were compared using all possible degrees of overlap. For each overlap possibility, the number of matches between the two reads was counted. Then, using a binomial model, the probability of obtaining that many matches by chance was computed. Lastly, paired reads were merged when one of these probabilities (corresponding to a particular degree of overlap) was less than 1 × 10 −10 , and the next smallest probability (corresponding to a different degree of overlap) was more than 1 × 10 −7 . This stringent filter prevents reads containing repetitive regions from being merged [35]. After reads were merged, mismatches were reconciled using base-specific quality scores, which were combined to form the new quality scores for the merged read [35]. Reads failing to meet the probability criterion were kept separate but still used in the assembly [35]. Reads were assembled into contigs using a pipeline described by Ruane et al. [36] and Hamilton et al. [37]. After filtering out consensus sequences generated from fewer than 100 reads, sets of orthologous sequences were obtained based on pairwise sequence distances. Sequences were aligned using MAFFT v7.023b [38] with "-genafpair -maxiterate 1000" flags. The alignment for each locus was then trimmed/masked, following Ruane et al. [36] and Hamilton et al. [37], with "good" sites identified as those containing >40% identity and fewer than 12 missing/unmasked characters removed from the alignment. The data are available in the Zenodo archive (http:// 10.5281/zenodo.3576641).

Phylogenetic Analyses and Divergence Time Estimates
The sequences from similar loci to those included in our dataset were extracted and aligned from other birds using data provided by Prum et al. [22] (Supplementary Table 1). The interpretation of the phylogenetic relationships was based primarily on this full dataset with concatenated and coalescent species tree approaches in RAxML v8.2.8 [39] and ASTRAL-II v4.10.12 [40], respectively. The maximum likelihood (ML) trees were estimated for each locus individually and for the concatenated alignment partitioned by the gene using a GTR + gamma model and a rapid bootstrap with 1000 replicates [41]. A coalescent species tree was summarized with ASTRAL-II using individual trees as inputs and multi-locus bootstrapping. ASTRAL-II uses a heuristic search to find the species tree that agrees with the largest number of quartet trees induced by the set of input gene trees.
We used calibration constraints from the fossil information of Gruiformes and Rallidae to estimate the divergence times in BEAST2 v2.4.7 [42]. We applied the method of Claramunt and Cracraft [23], in which prior calibration densities were modeled based on available fossil records. We placed an exponential prior on the crown ages of Gruiformes (offset: 52, mean: 8.5) and a lognormal prior on the age of the Rallidae (offset: 32.6, mean: 1.1, standard deviation: 1.8) based on densities inferred by Stervander et al. [15], which were themselves based on thoroughly examined fossil data [43,44]. Belgirallus is considered a well-constrained and the oldest fossil record of the Rallidae [27,28], but it is sometimes shown outside the crown group Rallidae [26]. We then also used Belgirallus as a stem group representative to compare and assess the impact on ages predicted. We combined the results of two independent runs of 50 million generations where chains were sampled every 5000th generation using a Birth-Death process prior. A burn-in of 20% was implemented to obtain ESS values above 200. Given the computational limits, the Bayesian coalescent species tree method in BEAST2 was applied to a subset of the data. To reduce the computational time demanded from the large dataset, we randomly selected a subset of 34 loci (a total of 57,241 nucleotides) with the most complete data for all taxa and variable sites > 25% (to maximize the coverage of sites) [45]. The substitution model for each locus was determined in jModelTest2 [46] using the Bayesian information criteria. For comparison, we performed a timetree analysis using a non-Bayesian model in RelTime [47]. This approach implemented in MEGAX [48] allows for using the full anchored loci dataset without the computer burn caused by BEAST2. We used the GTR + Γ model with five gamma categories, fixed the topology to that of the ML tree, and implemented the same calibration parameters as in BEAST2. For this analysis, we used an ML tree that included a more extensive set of bird clades as an outgroup (Supplementary Table 1).

Discussion
We reported the relationship of the rails using a 393-gene dataset with dense taxonomic sampling (63 rail species). The evolutionary relationships in our analyses are mostly in agreement with those found using multi-gene phylogenetic datasets [12]. However, our dataset outperforms previous concatenated alignments in terms of data quantity and provides strong statistical support for deep clade divergences, except for the splitting of Himantornis + "Fulica", and Porphyrio, which receives < 90% bootstrap support. The relationships of species within Rallicula are resolved now, showing that they are outside of Rallidae and sister to Sarothrura [12] and Mentocrex [14], and are all better treated as the distinct family Sarothruridae. Both Rallicula and Sarothrura present sexual dichromatism, a condition only present in two species of the rails, the watercock (Gallicrex cinerea) and the little crake (Zapornia parva) [49]. Another novel result includes the proposal of a distinctive and separate placement of the African Nkulengu rail, an outcome concordant with morphological data [5,49] but conflicting with previous molecular analysis [12]. The apparent explanations for this could be the use of an incorrectly identified sample in Garcia-R et al. [12] or the increased molecular data in our study that unambiguously place this species separate from other clades. The genus Crex, composed of two species, the corn crake (C. crex) and the African crake (C. egregia), is not monophyletic. The African crake is shown in our phylogenetic analysis to form a monophyletic group with the Rouget's rail (Rougetius rougetii), also found in Africa. Our analysis confirms the relationship among the ocellated crake (Micropygia schomburgkii) and russet-crowned crake (Rufirallus viridis) as suggested in a previous molecular phylogenetic study [14]. The enigmatic white-browed crake (Amaurornis cinerea), which has been placed in several genera, including Poliolimnas, Porphyrio, and Porzana, is now shown in our analysis to be closely related to the New Guinea flightless rail (Megacrex inepta). This result, and the clustering of the watercock (Gallicrex cinerea), renders Amaurornis as a nonmonophyletic group despite the current placing of several species, previously assigned to this genus, to Zapornia (e.g., Z. akool and Z. flavirostra). Our results also showed that the Inaccessible rail (Atlantisia rogersi) is embedded within the "Laterallus" clade and closely related to the black rail (Laterallus jamaicensis). The current generic assignment of this species to Atlantisia will show Laterallus as non-monophyletic. The taxonomic transferring of the Inaccessible rail to Laterallus has been recommended by Stervander et al. [15], which will also include the Ascension crake (A. elpenor), if its placement to the monotypic Mundia [51] is unjustified. Testing a hypothesis of the relationship of these species will be needed using a dataset that includes the Ascension crake. Future taxonomic changes are warranted for species within Crex, Gallirallus, Porzana, and Gallinula [13].
The unresolved affinities of Belgirallus and other Oligocene rail-like fossils [26,29] make it difficult to estimate the approximate age of crown Rallidae. These fossils are only known from partial limb bones, and there are no current phylogenies that place them into the crown Rallidae. Furthermore, identification of these fossils is complicated by the paraphyly of traditional Rallidae, and some of the earlier rallid fossils seem to belong to Sarothruridae [52]. Our crown and stem inferences, nonetheless, place an uncertainty on Belgirallus and provide a range of calibration points for divergence time analyses. The divergence times obtained using a Bayesian relaxed-clock model and a non-Bayesian model were mostly similar when Belgirallus was used as a crown Rallidae representative. As illustrated in our node dating analyses, the root and origin of Rallidae were estimated to be 33 (32)(33)(34)(35)(36) Mya using a Bayesian approach or 34 (32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44) Mya with a non-Bayesian method. However, the credibility intervals (uncertainty) for the origin of Rallidae estimated in RelTime were rather wide compared with the 95% highest posterior density (HPD) intervals in BEAST2. RelTime has been used in studies with large datasets [53], which are intractable with Bayesian analysis, showing an efficient way to estimate divergence times [54]. Previous inferences of divergence times using RelTime have been in disagreement with Bayesian relaxed molecular clocks, [55] but this seems to be related to the priors assigned in Bayesian approaches [56]. The divergence times obtained from both methods when Belgirallus was used as a crown representative of Ralloidea were significantly younger, especially with RelTime, and in concordance with earlier studies that dated the clade's crown group between 18 (12-24) and 22  Mya [17,30,31,57]. Most recent studies have recovered significantly older ages around 40.5  Mya [11,14] that conflict markedly with conclusive rail fossils identified in the Early Miocene [58]. The difficulty in estimating a consistent divergence times of the Rallidae crown group originates in part from the scarcity of reliable old crown fossils, the calibration constraints used, the methods implemented, and the relationship between branch lengths from mitochondrial versus nuclear data [18,[59][60][61]. The uncertainty in the timescale of rails can be refined and updated as new early fossils are discovered.
Large-scale, multilocus data, combined with improved analytical tools for inferring phylogenetic trees and new methods and fossils for tree calibration, provide unprecedented opportunities for resolving phylogenetic relationships and estimating the divergence time of monophyletic groups. The rails are frequently undersampled in phylogenomic studies (e.g., [22,62]), and our results provide a robust framework for building the phylogenomic supertree of birds. This is important for the study of avian macroevolution and biogeography [63] and for resolving discrepancies in the phylogenetic signals retrieved from geo-temporal fossils and molecular data.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1; Figure S1: Complete phylogenetic relationships among the rails and other birds based on the coalescent ASTRAL analysis; Figure S2: Timetrees estimated with RelTime with Belgirallus as a stem (A) and crown (B) group representative. The bar in each node represents confidence intervals.; Table S1: List of species of the rails and other birds from Prum et al. [22] used in this study. The Handbook of the Birds of the World (HBW) was followed for current nomenclature and species delimitations. Acronyms for museums are the following: