Utility of Circulating Cell-Free RNA Analysis for the Characterization of Global Transcriptome Profiles of Multiple Myeloma Patients

In this study, we evaluated the utility of extracellular RNA (exRNA) derived from the plasma of multiple myeloma (MM) patients for whole transcriptome characterization. exRNA from 10 healthy controls (HC), five newly diagnosed (NDMM), and 12 relapsed and refractory (RRMM) MM patients were analyzed and compared. We showed that ~45% of the exRNA genes were protein-coding genes and ~85% of the identified genes were covered >70%. Compared to HC, we identified 632 differentially expressed genes (DEGs) in MM patients, of which 26 were common to NDMM and RRMM. We further identified 54 and 191 genes specific to NDMM and RRMM, respectively, and these included potential biomarkers such as LINC00863, MIR6754, CHRNE, ITPKA, and RGS18 in NDMM, and LINC00462, PPBP, RPL5, IER3, and MIR425 in RRMM, that were subsequently validated using droplet digital PCR. Moreover, single nucleotide polymorphisms and small indels were identified in the exRNA, including mucin family genes that demonstrated different rates of mutations between NDMM and RRMM. This is the first whole transcriptome study of exRNA in hematological malignancy and has provided the basis for the utilization of exRNA to enhance our understanding of the MM biology and to identify potential biomarkers relevant to the diagnosis and prognosis of MM patients.


Introduction
Multiple myeloma (MM) is a neoplasm of terminally differentiated plasma cells (PC), which at diagnosis is usually present throughout the bone marrow (BM) and frequently, with disease progression, spreads to extramedullary locations. The treatment of MM has witnessed significant progress with the introduction of proteasome inhibitors and immunomodulatory agents; however, the disease remains incurable with MM cells acquiring resistance to systemic therapies due to the accumulation of additional mutations that are often not present during the initial stages of the disease [1]. Resistance to therapy is mediated through the genetic evolution of MM cells under selective pressure treatment with more resistant clones emerging that possess growth and survival advantages [2].
Next-generation deep sequencing of MM tumor specimens has identified widespread genetic heterogeneity in MM [3,4]. Gene expression profiling (GEP) studies of PC isolated from patients presenting with the pre-MM precursor syndrome, monoclonal gammopathy of undetermined significance (MGUS), and symptomatic MM, have identified specific transcriptional changes that occur at different stages of the disease and these studies have reported pathways that had not been previously implicated in MM pathogenesis [5]. The current practice for obtaining genetic information about the tumor is to perform sequential BM biopsies and this is confounded by the known inter and intra-clonal spatial and genetic heterogeneity of the tumor(s) [1,2,6]. An alternative approach that we believe will provide a more comprehensive transcriptional picture is to analyze the extracellular RNA (exRNA) from the peripheral blood (PB) plasma, as this theoretically contains genetic material that arises not only from multiple independent MM tumors but also from the tumor microenvironment (TME).
Nucleic acids are released into the plasma and serum through cellular apoptosis, necrosis, and the spontaneous release of DNA/RNA-lipoprotein complexes, amongst other sources [7]. The presence of cell-free (cf) nucleic acids in the human blood was described more than 60 years ago [8]. A number of recent studies in breast, ovarian, lung, and colorectal cancers have indicated that tumor(s) from both primary and secondary sites release nucleic acids into the PB, as such, circulating cfDNA contains a representation of the entire genome of the tumor with DNA sourced from multiple independent tumors. Whole genome sequencing and whole exome sequencing of the cfDNA can be utilized to identify mutations associated with acquired resistance to cancer therapy without the need to perform sequential biopsies of the tumor [9,10]. Also, some studies have uncovered the presence of microRNAs and specific long noncoding RNAs in the exosomes, an extracellular vesicle subtype ranging from 30-150 nm in size [11], isolated from the blood of MM patients [12,13]. However, exRNA has not been systematically explored, with only a limited number of studies exploring the relevance of specific mRNA transcripts [14][15][16][17][18]. More recently, technologies have advanced that promote the reliable extraction and analysis of exRNA using high-throughput sequencing technologies, and a limited number of studies have also performed whole transcriptome studies of exRNA in pregnancy and Alzheimer's disease [19][20][21]. However, to our knowledge, no studies have explored the whole transcriptome of exRNA in MM or other cancers.
Global gene expression profiling of PC from normal, MGUS, smoldering (asymptomatic) myeloma (SMM), and symptomatic MM patients has indicated that MGUS patients have a profile that is different to SMM and MM patients and that a gene-based classification system can be employed for MM [22][23][24]. In this study, we profiled the whole transcriptome of exRNA obtained from the PB plasma of healthy controls (HC) and MM patients to identify potential biomarkers for both newly diagnosed (NDMM) and relapsed and refractory (RRMM) MM patients.

Peripheral Blood (PB) Collection and Processing
Peripheral blood (PB) samples were obtained from healthy volunteers (HC) and MM patients using 10 mL Streck RNA blood collection tubes (BCTs) following informed consent as per the Alfred Hospital Human Research and Ethics Committee. Immediately upon sample collection, the tubes were inverted to mix the blood with the preservative in the collection tube. The patented preservative in Cell-Free RNA BCT preserves exRNA in plasma and prevents the release of non-target background RNA from blood cells during sample processing and storage [25,26]. The Streck RNA tubes allow the collection of high-quality stabilized exRNA for rare target detection and determining accurate exRNA concentrations. Streck RNA BCT tubes were stored at room temperature and processed for plasma collection within 24 h of PB collection. Plasma was separated from PB through centrifugation at 820× g for 10 min. The supernatant was collected without disturbing the cellular layer and centrifuged again at 16,000× g for 10 min to remove any residual cellular debris. The supernatant was collected and stored at −80 • C until exRNA extraction.

exRNA Extraction
Frozen plasma samples (6 mL) were used to extract the exRNA using the QIAamp circulating nucleic acid kit (Qiagen, Hilden, Germany), according to the manufacturers' instructions. Subsequently, a Qubit 2.0 Fluorometer (Life Technologies, Victoria, Australia) was used to measure the RNA yield, according to the manufacturer's instructions with freshly prepared RNA standards. The maximum input volume utilized for the QUBIT assay was 5 µL. The RNA concentration was quantified as being between 0.5-90 ng/µL. The extracted RNA was stored at −80 • C until further processing for RNA-Seq.

Transcriptome Library Construction and Sequencing
After the RNA sample was treated using the Ribo-Zero Gold rRNA Removal Kit (Illumina, San Diego, CA, USA), a total amount of 100 ng RNA was used for cDNA library construction using the TruSeq™ RNA Library Preparation Kit v2 protocol (Illumina, San Diego, CA, USA), as previously described [27]. Then, the amplified cDNA libraries were assessed for quality using both the Agilent 2100 Bioanalyzer and qRT-PCR. After primer ligation sequencing, qRT-PCR was again used for quantification, and the final library was sequenced on an Illumina HiSeq 2000 platform (paired-end 100 bp) at the Australian Genomics Research Facility (Melbourne, Australia).

Analysis of Transcriptome Data
Raw transcriptome sequencing data were cleaned using SOAPnuke under default parameters [28]. The clean reads were aligned to the human reference genome (ENSEMBL, GRCH38.85) using Hisat2 and the expression of protein-coding and noncoding genes were profiled using Stringtie [29]. HTSeq (v0.9.0, https://htseq.readthedocs.io) was used to obtain the raw read counts aligned to each gene.
The TPM (transcripts per million reads) method was used to normalize the gene expression in each sample, and lowly expressed (<5 TPM) genes were filtered. To perform differential expression analysis, we first removed individual differences among samples by filtering genes with high coefficient values (>0.8) of standard variation in each group. An R package named as edgeR was used to calculate the statistical values for differentially expressed genes (DEGs) [30] and the DEGs were identified based on the following criteria: TPM > 5, log2 fold change (Log2FC) > 1 or Log2FC < −1, p-value < 0.05 and FDR < 0.05. Principle component analysis was performed using the R package "prcomp". Genome alignment files could be accessed on the National Center for Biotechnology Information (NCBI) platform under the accession number SRA893057.

Functional Analysis
Functional analysis of DEGs was performed using the DAVID Bioinformatics Resources 6.8 (https://david.ncifcrf.gov).

Identification of SNP and Indel Variations
Variants including SNPs and indels were called using the Genome Analysis Toolkit (GATK v3.4) pipeline, according to the Best Practice workflow. In brief, the clean reads for each sample were aligned to the human reference genome (GRCH38) using the STAR 2-pass alignment steps [31]. Then, the resulting SAM file was put through the usual Picard processing steps, including adding read group information, sorting, marking duplicates, and indexing (http://broadinstitute.github.io/picard/). Next, the RNA-Seq specific GATK tool called SplitNCigarReads was used to split the reads into exon segments and hard-clip any sequences overhanging into the intronic regions. After the indel realignment and base recalibration, we used HaplotypeCaller to call SNPs and indels, followed by variant filtering with recommended commands and parameters. We further filtered the results by removing the variants detected in less than half of the sample size. Final variants in MM patients excluded the events detected in HC. ANNOVAR was used to annotate the variants [32].

Droplet Digital PCR
We randomly selected 10 samples used for RNA-Seq, including three HC, two NDMM, and five RRMM patients for gene expression validation using droplet digital PCR (ddPCR). Initially, 3 mL of plasma samples were used for exRNA extraction, as described. Then, a NanoDrop 1000 was used to quantify the RNA yield. One-Step RT-ddPCR Advanced Kit for Probes (Bio-Rad Laboratories, Hercules, CA, USA) was used for the detection of five DEGs, including RGS18 (dHsaCPE5054808), ITPKA (dHsaCPE5192800), PPBP (dHsaCPE5039499), FTH1 (dHsaCPE5031151), and IER3 (dHsaCPE5190327), according to the manufacturer's instructions. Briefly, the reaction mix (22 µL) was prepared with 5 µL Supermix (1×), 2 µL reverse transcriptase (20 U/µL), 1 µL DTT (300 nM), 1 µL gene probe (900 nM), 1 µL total RNA, and 12 µL RNase-free water. Then, the droplets were generated on the Automated Droplet Generator (Bio-Rad), followed by the thermal cycling reaction. The cycling conditions were conducted on a C1000 Touch Thermal Cycler (Bio-Rad) as follows: reverse transcription (46 • C, 60 min), enzyme activation (95 • C, 10 min), 40 cycles of denaturation (95 • C, 30 s), and annealing/extension (60 • C, 1 min), and enzyme deactivation (98 • C, 10 min). After thermal cycling, the plates were read by the QX200 Droplet Reader (Bio-Rad) and the QuantaSoft software was used to analyze the data. To reduce the errors, we employed GAPDH (Glyceraldehyde-3-Phosphate Dehydrogenase) as an internal control and the gene expression was shown relative to the absolute copies of GAPDH. Aberrant values (>500 copies/µL) were set to null, and samples with no gene expression were not counted for the calculation of the average gene expression.

Characteristics of MM Patients and RNA Profiles
To investigate the biomarker potential of exRNA for NDMM and RRMM, this study recruited a total of 27 volunteers, including 10 HC (six males and four females), five NDMM (three males and two females) patients, and 12 RRMM (seven males and five females) patients. No difference was found between the groups based on gender, moreover, as shown in Table 1, the mean age of NDMM (66 years) and RRMM (64.6 years) were also similar (older than the HC group: 41.9 years). After the total RNA was extracted, we evaluated the RNA profiles of exRNA using an Agilent Bioanalyzer 2100. Unlike the RNA profiles of whole cell lysates, the exRNA profiles lacked the 18S and 28S rRNA peaks (Supplementary Figure S1). Whether this is one of the characteristics of exRNA requires further experiments to explore because in this instance, the exRNA sample was treated with the Ribo-Zero Gold rRNA Removal Kit.

Overview of the Transcriptome Sequencing
After low-quality reads, adapter and contamination reads were filtered by SOAPnuke; we obtained a total of 762.50 million paired reads (28.24 million reads per sample on average) ( Table 2). The lowest Q20 and Q30 of this sequencing were 93.83% and 86.53%, respectively ( Table 2). Using Hisat2, 88.40-96.78% of the clean reads were aligned to the human reference genome (GRCH38), and 83.20-94.73% of the clean reads were paired matched (Table 2). Then, we used Stringtie to profile the gene expression in each sample. As shown in Table 2, 23,274 to 38,293 genes were identified in the MM patients and HC, respectively. We next evaluated the coverage of identified genes in each sample and found that 79.76-88.58% of the identified genes were covered more than 70% by the sequencing reads ( Figure 1A). Gene annotation showed that 38.71-50.51% of the identified genes had the potential of protein-coding capacity while 1.77-2.33% of the identified genes were to be experimentally confirmed (TEC) ( Figure 1B). Three long-noncoding RNAs, including RN7SL1, RN7SL2, and RN7SL4P, were the most highly expressed genes in the HC, NDMM, and RRMM samples, followed by the Metazoa_SRP (Metazoan signal recognition particle RNA). We next analyzed the gene ontology in terms of the cellular component to source the exRNA molecules in cells. The top 20 enriched cellular components of GO were shared by NDMM, RRMM, and HC ( Figure 1C). With the exception of the nucleus and the cytoplasm, the genes were enriched in the membrane and extracellular vesicle-related GO terms, such as the "plasma membrane (GO:0005886)" and "extracellular exosome (GO:0070062)". Table 2. Overview of the extracellular RNA (exRNA) transcriptome sequencing. Q20 and Q30 were applied to the clean reads. The number of mapped reads was given by Hisat2 and '.5' means that one of the paired reads was mapped. Percentages were calculated for mapped reads and paired reads matched to the genome. The numbers of genes identified in the samples (>5 transcripts per million reads (TPM)) were listed.  We next analyzed the gene ontology in terms of the cellular component to source the exRNA molecules in cells. The top 20 enriched cellular components of GO were shared by NDMM, RRMM, and HC ( Figure 1C). With the exception of the nucleus and the cytoplasm, the genes were enriched in the membrane and extracellular vesicle-related GO terms, such as the "plasma membrane

Dysregulated exRNA in MM Patients
To identify potential exRNA biomarkers for NDMM and RRMM, we performed differential gene expression analysis using edgeR under the criteria described in the methods. First, we compared all the MM (NDMM + RRMM) patients to the HC and identified 375 DEGs (224 up-regulated and 151 down-regulated) ( Figure 1D, Supplementary Table S1). As shown in Figure 1E Figure 1E). The highly up-regulated protein-coding genes included immunoglobulin genes and GOLGA8O, while the highly down-regulated protein-coding genes included HIST1H4A and RPL13A.
Functional analysis using DAVID bioinformatics resources showed that 10 and 11 DEGs in RRMM compared to HC were involved in the biological process of "SRP-dependent co-translational protein targeting to membrane" (GO:0006614) and "Ribosome" (hsa03010) pathways, respectively. Also, two DEGs in NDMM compared to HC were involved in the molecular function of "chromatin DNA binding" (GO:0031490). According to the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation, we also found that ITPKA, which was up-regulated in NDMM, was related to Next, we compared the exRNA profiles of NDMM and RRMM. Using edgeR we identified a total of 24 genes differentially expressed in RRMM relative to NDMM, including 12 up-regulated and 12 down-regulated genes ( Figure 1D). The distribution of different gene types in the DEGs of RR and ND ( Figure 1E) showed that six protein-coding DEGs, including two up-regulated genes (IGKV1-5 and OR4N4) and four down-regulated genes (AC017081.1, UBB, CHRNE, and USP17L17) were dysregulated in RRMM compared to NDMM. Among them, UBB and CHRNE were up-regulated in NDMM and UBB was also up-regulated in MM (NDMM + RRMM) (Supplementary Table S1). Also, we found a lincRNA gene AC137934.1 up-regulated in RRMM compared to NDMM but down-regulated in NDMM compared to HC.
Functional analysis using DAVID bioinformatics resources showed that 10 and 11 DEGs in RRMM compared to HC were involved in the biological process of "SRP-dependent co-translational protein targeting to membrane" (GO:0006614) and "Ribosome" (hsa03010) pathways, respectively. Also, two DEGs in NDMM compared to HC were involved in the molecular function of "chromatin DNA binding" (GO:0031490). According to the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation, we also found that ITPKA, which was up-regulated in NDMM, was related to signal transduction. Among the DEGs in the RRMM, PPBP (down-regulated) and CCL4L2 (up-regulated) were involved in the "chemokine signaling pathway" (ko04062), HIST1H4A (down-regulated) was related to "Viral carcinogenesis" (ko05203), and PCBP1 (down-regulated) was a regulator of "spliceosome" (ko03040).

Gene Variants in MM Patients
We next used the GATK pipeline to detect SNPs and small indels in MM patients. As shown in Figure 3A, we first identified 3,606,315 SNPs in HC, of which 79,008 were found in more than five individuals. We also identified 97,547 SNPs in more than two NDMM and 159,506 SNPs in more than six RRMM. The SNPs identified in HC were classified as background and were filtered from the MM patients, resulting in 734,79 and 125,399 SNPs in NDMM and RRMM, respectively. There were 19,320 SNPs shared between NDMM and RRMM ( Figure 3A). ANNOVAR was used to annotate the SNPs, and the results showed that 137 exonic SNPs were identified in both NDMM and RRMM, including one splicing event, 80 nonsynonymous, and 57 synonymous single-nucleotide variants (SNVs) ( Figure 3A). The shared nonsynonymous SNPs all have been annotated in dbSNP (Table 3). Seven SNPs were identified in more than four NDMM patients and more than 10 RRMM patients, including rs61821060, rs2363468, rs1941635, rs2172521, rs7199961, rs73714227, and rs4728137. rs61821060 is a mutation of the RPFIA4 gene (chr1 203039046, G->C) and was found in five NDMM and 11 RRMM patients. There were 388 and 861 exonic SNPs specific to NDMM and RRMM, respectively ( Figure 3A). The 388 NDMM specific exonic SNPs included 168 nonsynonymous (Supplementary Table S2), 219 synonymous, and one stop-loss SNP. Notably, we found one nonsynonymous SNP (chr7, 100958135, T->C, MUC3A) that has not been reported in the Single Nucleotide Polymorphism Database (dbSNP). The 861 RRMM specific exonic SNPs included 403 nonsynonymous (Supplementary Table S3), 446 synonymous, two stop-gain and 10 unknown SNPs. Of the 403 nonsynonymous SNPs we identified five novel mutations, two occurred in MUC5AC (chr11, 1191524, T->C; chr11, 1191527, C->T), and three in MUC3A (chr7, 100953992, C->A; chr7, 100954123, G->A; chr7, 100954136, C->G). Next, we analyzed the mutations occurring in intestinal mucin genes, such as MUC3A, MUC5AC, MUC12 and MUC16. It was revealed that more SNPs occurred in mucin genes in RRMM than in NDMM ( Figure 3B), particularly the MUC5AC gene.
We next evaluated for the presence of indels in the MM patients. As shown in Figure 3C, a total of 15,384 and 3833 indels were identified in NDMM and RRMM, respectively. NDMM and RRMM shared 1505 indels, of which one was derived from the exonic region. The shared exonic indel was annotated as rs11402251, which is located in the 10th exon of the VSIG10L gene and is a frameshift insertion. Two known indels (rs5843453 and rs55745992) were identified specifically in RRMM, while 13 known indels were identified specifically in NDMM ( Figure 3D). Interestingly, nine out of the 13 NDMM specific indels had no frameshift function to their mother genes. The left indels included frameshift deletions on P2RX5 and DEFB126, frameshift insertions to GPATCH4, and a stop-gain insertion to ZNF480 ( Figure 3D).
Functional analysis by DAVID showed that mutated genes for both NDMM and RRMM were involved in "Olfactory transduction (hsa04740)" and "GO:0007186~G-protein-coupled receptor signaling pathway". In addition, RRMM specific mutated genes were also involved in "ECM-receptor interaction (hsa04512)", "GO:0007155~cell adhesion", "GO:0071813~lipoprotein particle binding", and "PI3K-AKT signaling pathway (hsa04151)". We next evaluated for the presence of indels in the MM patients. As shown in Figure 3C, a total of 15,384 and 3833 indels were identified in NDMM and RRMM, respectively. NDMM and RRMM shared 1505 indels, of which one was derived from the exonic region. The shared exonic indel was annotated as rs11402251, which is located in the 10th exon of the VSIG10L gene and is a frameshift insertion. Two known indels (rs5843453 and rs55745992) were identified specifically in RRMM, while 13 known indels were identified specifically in NDMM ( Figure 3D). Interestingly, nine out of the 13 NDMM specific indels had no frameshift function to their mother genes. The left indels included frameshift deletions on P2RX5 and DEFB126, frameshift insertions to GPATCH4, and a stop-gain insertion to ZNF480 ( Figure 3D).

Digital Droplet PCR Validation
We next utilized ddPCR to validate five DEGs (RGS18, ITPKA, PPBP, FTH1, and IER3) in the exRNA samples from 10 participants (three HC, two NDMM, and five RRMM). We used relative normalized expression (RNE, relative to GAPDH) to present the gene expression detected by ddPCR. Figure 4 showed the comparison of gene expression identified by ddPCR and RNA-Seq. Among

Digital Droplet PCR Validation
We next utilized ddPCR to validate five DEGs (RGS18, ITPKA, PPBP, FTH1, and IER3) in the exRNA samples from 10 participants (three HC, two NDMM, and five RRMM). We used relative normalized expression (RNE, relative to GAPDH) to present the gene expression detected by ddPCR. Figure 4 showed the comparison of gene expression identified by ddPCR and RNA-Seq. Among them, the expression levels of three genes (RGS18, ITPKA, and PPBP) were consistent between these two experiments in HC, NDMM, and RRMM. In addition, FTH1 was up-regulated in NDMM compared to HC (Log2FC = 1.78, FDR = 2.16 × 10 −7 , Supplementary Table S1) and its up-regulation was also confirmed by ddPCR. The abnormal performance of FTH1 in Figure 4 is due to its high expression in the three HCs. The disagreement of ddPCR and RNA-Seq on the expression change of IER3 require further experiments for validation. them, the expression levels of three genes (RGS18, ITPKA, and PPBP) were consistent between these two experiments in HC, NDMM, and RRMM. In addition, FTH1 was up-regulated in NDMM compared to HC (Log2FC = 1.78, FDR = 2.16 × 10 −7 , Supplementary Table S1) and its up-regulation was also confirmed by ddPCR. The abnormal performance of FTH1 in Figure 4 is due to its high expression in the three HCs. The disagreement of ddPCR and RNA-Seq on the expression change of IER3 require further experiments for validation.

Discussion
In this study, we evaluated plasma-derived exRNA from MM patients and healthy individuals ( Table 2). The exRNA has been reported to be subjected to degradation, instability, low abundance, and intracellular communication from specimen processing [33,34]; however, in our study we demonstrated that 79.76-88.58% of the identified genes were covered more than 70%, which means that a large set of gene transcripts was complete in the exRNA tested. The stability of exRNA in the plasma or serum may be due to the protection of extracellular vesicles, such as exosomes, microparticles, microvesicles, or multivesicles [35], which are shed from cellular surfaces into the bloodstream [11]. In support of this, we observed that the RNA profiles generated by the Agilent Bioanalyzer 2100 were similar to exosomal RNA profiles [36], which lack 18S and 28S rRNA peaks. Furthermore, our results showed that 2002-2135 genes were related to the "extracellular exosome (GO:0070062)" ( Figure 1C). New evidence has shown that a large proportion of human blood plasma cf-DNA is localized in exosomes [37]; however, the proportion of exosomal RNA in exRNA requires further investigation.

Discussion
In this study, we evaluated plasma-derived exRNA from MM patients and healthy individuals ( Table 2). The exRNA has been reported to be subjected to degradation, instability, low abundance, and intracellular communication from specimen processing [33,34]; however, in our study we demonstrated that 79.76-88.58% of the identified genes were covered more than 70%, which means that a large set of gene transcripts was complete in the exRNA tested. The stability of exRNA in the plasma or serum may be due to the protection of extracellular vesicles, such as exosomes, microparticles, microvesicles, or multivesicles [35], which are shed from cellular surfaces into the bloodstream [11]. In support of this, we observed that the RNA profiles generated by the Agilent Bioanalyzer 2100 were similar to exosomal RNA profiles [36], which lack 18S and 28S rRNA peaks. Furthermore, our results showed that 2002-2135 genes were related to the "extracellular exosome (GO:0070062)" ( Figure 1C). New evidence has shown that a large proportion of human blood plasma cf-DNA is localized in exosomes [37]; however, the proportion of exosomal RNA in exRNA requires further investigation.
No studies have investigated the global transcriptome profiles of exRNA in MM. In this study, we identified genes in exRNA that could potentially be used as diagnostic and prognostic biomarkers for MM, including 18 up-regulated and eight down-regulated genes ( Figure 2B, Supplementary Table S1). GOLGA8O was the only up-regulated protein-coding gene common to both NDMM and RRMM. It has been reported to be significantly down-regulated in patients with major depressive disorder [38]; however, the function of this gene in cancers has not been investigated. The common down-regulated genes in MM included TRAK2, a possible regulator of endosome-to-lysosome trafficking of membrane cargo, including the epidermal growth factor receptors (EGFR) [39]. Endosome-to-lysosome trafficking is also an important process in exosome biogenesis [11]. In addition, TRAK1, the ortholog of TRAK2, has been identified as MGb2-Ag-a novel cancer biomarker [40]. In MM, the down-regulation of TRAK2 might be due to the up-regulation of miR-19a [41,42], which is a member of miR-17~92a cluster and can target the TRAK2 gene [41].
In this study, we identified 54 genes specifically dysregulated in NDMM, including MYOD1, UBB, MIR6754, and PACERR (Supplementary Table S1). The hypermethylation of MYOD1 has been reported to be a prognostic biomarker for both colorectal and cervical cancers [43,44]. The polyubiquitin gene UBB, encoding a regulatory protein involved in ubiquitin, has been identified to be up-regulated more than 100-fold in MM patient cells versus normal twin plasma cells [45]. Moreover, the knockdown of UBB can significantly decrease the expression of ubiquitin, which is essential for cancer cell growth. UBB may, therefore, represent a potential target in anticancer treatment including in MM [46]. miR-6754 has been related to cell proliferation and invasion in non-small cell lung cancer [47]. PACERR is the antisense gene of PGTS2 (prostaglandin-endoperoxide synthase 2) that has been related to colorectal cancer and breast cancer [48,49]. There are currently no published studies that relate MIR6754 and PACERR or other noncoding RNAs to MM; however, their expression patterns in NDMM suggest that they might play a role in the progression of MM and may also represent potential biomarkers for MM.
Among the 191 genes specifically dysregulated in RRMM patients, seven down-regulated (PPBP, EEF1B2, RPL5, RPL19, CH507-9B2.3, EEF2, and RPS27) and three up-regulated (VN1R1, IER3 and POTED) protein-coding genes were identified (Supplementary Table S1). Studies have reported the overexpression of most of these genes in multiple cancers, with the exception of CH507-9B2.3 [50][51][52][53][54]; however, they have also been found to be down-regulated in some specific cancers. For example, the chemokine PPBP (pro-platelet basic protein), also known as CXCL7, is decreased in pancreatic and ovarian cancers [55]. The eukaryotic translation elongation factors EEF1B2 and EEF2 are down-regulated in breast, esophageal, and lung cancers while EEF1B2 is found with repression in some other cancer types, such as head and neck, leukemia, and pancreatic cancer [54]. Whether the down-regulation of these genes is specific to MM requires further study. Only three protein-coding genes were specifically up-regulated in RRMM, including VN1R1, POTED, and IER3 (Supplementary Table S1). Among them, VN1R1 has been shown to be overexpressed in prostate adenocarcinomas and glioblastoma cancer cells [56,57] and POTED in prostate cancer patients, making it a potential molecule for targeted therapy [58]. Interestingly, IER3 has been shown in pancreatic ductal adenocarcinoma to effectively mitigate against cellular stress induced by starvation or exposure to gemcitabine [59]. Notably, IER3 has also been demonstrated to have an anti-apoptotic role in MM endothelial cells and is overexpressed in MM plasma cells [60].
In this study, we also identified gene variants using the transcriptome data ( Figure 3). Of interest were the RRMM specific indels related to both cell adhesion and PI3K-AKT signaling pathways. Unfortunately, the hotspot mutations on KRAS, NRAS, and TP53 genes, which are common in MM [70], were not identified in this study, probably due to the low coverage of these genes [71]. However, our findings revealed that mucin genes, such as MUC3A, MUC5AC, MUC12, and MUC16, may play a role in MM progression, as they were frequently mutated in the MM patient, particularly the RRMM patients. Moreover, MUC16 mutations, the most frequently mutated in this analysis, have been previously demonstrated to be associated with a higher tumor mutational burden and superior survival outcomes in gastric cancer patients [72]. Additionally, uterine endometrioid endometrial adenocarcinoma harbor a high-frequency of the MUC3A mutations [73]. Whether mucin gene variants are derived from MM is still unknown, and our results indicate that mucin genes might be related to the MM pathogenesis. These results demonstrate the potential capacity of exRNA interrogation to identify genetic variants; however, we acknowledge that more sequencing data (high-coverage) and DNA sequencing are required to validate the genomic mutations of the described mucin genes and more patient samples are needed for large scale scanning.

Conclusions
In conclusion, this is the first transcriptome study of exRNA in cancer and in our hands, we have demonstrated that~85% of genes in exRNA were covered more than 70% and that~45% of exRNA genes were protein-coding genes. DEGs identified in MM patients, including GOLGA8O and TRAK2 may be potential biomarkers for the disease. Importantly, we also identified specific differentially expressed protein-coding genes in both NDMM and RRMM including cancer-associated genes such as MYOD1, UBB, VN1R1, POTED, and IER3, and while they may have potential as biomarkers also warrants further study to determine their potential role in the pathogenesis of the disease. In addition, a range of nonsynonymous mutations were identified in the exRNA, including multiple mutated mucin genes and their role in MM also warrants further evaluation. While preliminary, these data demonstrate that exRNA may represent a valuable, non-invasive compartment not only for the identification of new diagnostic and prognostic biomarkers in MM but also for the study of the biology of the disease.