A collection of 13 archaeal and 46 bacterial genomes reconstructed from marine metagenomes derived from the North Sea

: Marine bacteria are key drivers of ocean biogeochemistry. Despite the increasing number of studies, the complex interaction of marine bacterioplankton communities with their environment is still not fully understood. Additionally, our knowledge about prominent marine lineages is mostly based on genomic information retrieved from single isolates, which do not necessarily represent these groups. Consequently, deciphering the ecological contributions of single bacterioplankton community members is one major challenge in marine microbiology. In the present study, we reconstructed 13 archaeal and 46 bacterial metagenome-assembled genomes (MAGs) from four metagenomic data sets derived from the North Sea. Archaeal MAGs were affiliated to Marine Group II within the Euryarchaeota. Bacterial MAGs mainly belonged to marine groups within the Bacteroidetes as well as alpha- and gammaproteobacteria. In addition, two bacterial MAGs were classified as members of the Actinobacteria and Verrucomicrobiota, respectively. The reconstructed genomes contribute to our understanding of important marine lineages and may serve as a basis for further research on functional traits of these groups. Dataset: The metagenome-assembled genomes have been deposited in NCBI GenBank under the accessions QXXR00000000-QXZX00000000 (submission id SUB4359442). For further details see


Summary
The present data set comprises of 59 archaeal and bacterial metagenome-assembled genomes (MAGs). These MAGs were reconstructed from four metagenomic data sets derived from the North Sea. Four seawater samples were taken at three different sites in the North Sea at 3 and 350 m, respectively. Free-living planktonic communities were harvested from these samples by serial filtration. Environmental DNA was extracted from harvested microbial communities and subjected to next-generation sequencing. The obtained sequencing data were quality filtered and scanned for contaminations prior to metagenome assembly. MAGs were reconstructed from the assembled metagenomic datasets using two independent binning approaches and a subsequent refinement. A total of thirteen archaeal MAGs affiliated to Marine Group II (MGII) and 46 bacterial MAGs mainly assigned to marine groups within the Bacteroidetes as well as alpha-and gammaproteobacteria were extracted. All MAGs have been deposited in Genbank. They provide a basis for further studies aiming at understanding the complex interaction of important marine lineage with their environment.

Data Description
Here, we report 59 MAGs extracted from 4 marine water samples taken in the North Sea ( Figure 1). The North Sea is a typical coastal shelf sea. Coastal shelf seas of the temperate zone are highly productive because of the continuous nutrient supply by rivers. The North Sea is connected to the Atlantic Ocean via the English Channel in the South and the Norwegian Sea in the North. The southern part has a water depth of less than 50 m and is subjected to strong tidal currents. Nutrient suspension from the sediment and loss of water stratification are results of these currents. The northern part of the North Sea is deeper (up to 725 m) and strong tidal currents do not occur. The southern region has especially undergone high nutrient loading and warming during the last 40 years [1,2].  (13). Note that numbering follows ship stations. The map was generated in R using the maps and mapdata packages [3][4][5].

Archaeal Metagenome-Assembled Genomes
A total of 13 archaeal MAGs were reconstructed from the four seawater metagenomes (Supplementary Table S1). Completeness and contamination varied between 54.3% and 84.39% (average 70.84%) and 0% and 8.33% (average 1.39%), respectively. Classification with GTDB-Tk [6] placed all genomes within Marine Group II (MGII; Euryarchaeaota). Members of this group have been frequently observed in various marine ecosystems [7,8]. For instance, Pernthaler et al. observed blooms of MGII archaea during spring and summer in the German Bight with >30% of the total picoplankton abundance [7]. Although frequently observed, our understanding of this abundant marine lineage is still rudimentary [9].

Bacterial Metagenome-Assembled Genomes
A total of 46 bacterial MAGs were reconstructed (Supplementary Table S1). Completeness and contamination varied between 50.86% and 97.52% (average 77.96%) and 0% and 9.73% (average 2.85%), respectively. Classification with GTDB-Tk [6] affiliated most MAGs to important marine lineages including the Roseobacter clade, the SAR92 clade, as well the OM60/NOR5 clade. Interestingly, four MAGs were assigned to Planktomarina temperata within the Roseobacter RCA cluster, which is abundant in temperate and polar regions [10][11][12][13]. The first published genome belonging to the RCA has been published recently, highlighting the global abundance of this marine pelagic group [14]. Five genomes were affiliated to the marine SAR92 clade, an important gammaproteobacterial marine lineage, which has been frequently observed in the North Sea [15,16]. Five genomes were assigned to the genus Luminiphilus, a genus within the OM60/NOR5 clade, which is abundant in coastal marine ecosystems [11,17].
Deciphering the functional traits of prominent marine lineages is a major challenge in marine microbiology. Information on biogeochemical and functional traits of prominent marine bacterial lineages is often missing and is mostly based on genomic information that is retrieved from single isolates [14,18,19]. These isolates do not necessarily represent abundant marine lineages. This is exemplified by the unexpected discovery of respiratory nitrate reductases in members of the SAR11 clade [20], contributing to an anoxic lifestyle. Here, we present 59 genomes belonging to important marine lineages, such as the marine Roseobacter lineage [14] as well as the SAR 92 clade [18]. These genomes may serve as a basis for further research on functional traits of important marine clades and their contribution to ecosystem services.

Sampling and Sample Preparation
Seawater samples were collected in the North Sea at three sites on board of the RV Heincke in July 2011 (Figure 1). Three samples were taken in 3 m depth (2,13,14) and one sample was taken at 350 m depth (13). Sampling and filtration were performed as described previously [21]. In brief, samples were prefiltered with a glass fiber filter (Whatman GF/D, GE Healthcare, Freiburg, Germany). Bacterioplankton was subsequently harvested from a prefiltered 10 L sample using a filter sandwich consisting of a glass fiber filter (Whatman GF/F, GE Healthcare) and a 0.2-µm polycarbonate filter (Whatman Nuclepore, GE Healthcare). Filter samples were stored at -80°C or on dry ice during transport from ship to laboratory.

DNA Extraction and Sequencing
DNA was extracted and purified according to Weinbauer et al. [22]. DNA was subsequently purified, employing the peqGOLD gel extraction kit (Peqlab, Erlangen, Germany). The Göttingen Genomics Laboratory determined the sequences of the extracted DNA using an Illumina Genome Analyzer IIx (San Diego, USA).

Assembly and Genome Reconstruction
Generated metagenomic datasets were processed as follows: fastq files derived from Illumina sequencing were processed employing the Trimmomatic tool version 0.36 [23]. Processing included the removal adapter sequences and low-quality regions (settings: ILLUMINACLIP:adaptor.fa:2:30:10:2 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36). Processed reads were assembled using metaSPAdes version 3.12.0 [24]. To determine the coverage for each contig, unassembled reads were mapped on obtained scaffolds using bowtie2 version 2.3.2 [25]. Sam files were converted to sorted bam files using samtools version 1.7 [26]. The depth was calculated with jgi_summarize_depth supplied with MetaBAT [27]. MetaBAT version 0.32.5 [27] and MyCC version 2017 [28] were used to reconstruct archaeal and bacterial genomes with a minimum input sequence length of 2500 bp. In order to increase the overall accuracy and to remove potential contaminations, obtained genomes were refined using binning_refiner [29]. The completeness and contamination were determined using CheckM version 0.7 [30]. Genomes were taxonomically classified using GTDB-Tk version 1.0.2 and the GTDB release 86 [6,31].

Supplementary Materials:
The following are available online at www.mdpi.com/xxx/s1, Table S1: Submission details and genome characteristics.
Funding: This work was funded by Deutsche Forschungsgemeinschaft (DFG) within the Collaborative Research Center TRR 51. The research cruise was funded under GrantNo AWI-HE361_00. Additionally, we acknowledge support by DFG and the Open Access Publication Funds of the Göttingen University.