The EBV sequences are available for download as BAM alignments from the Public directory at the DCC: https://cgci-data.nci.nih.gov/Public/BLGSP/WGS/L2/.
The 106 BAM files made available by open access are the Epstein-Barr virus (EBV) sequences that were extracted from the BLGSP patient cohort genomes included in the following publication:
Grande BM, Gerhard DS, Jiang A, et al. Genome-wide discovery of somatic coding and non-coding mutations in pediatric endemic and sporadic Burkitt lymphoma. Blood. March 2019; 21;133(12):1313-1324. (PMID: 30617194)
The following intentionally stringent criteria were used to ensure that no human reads were included in the BAMs.
- Only reads aligned to the EBV genome (chrEBV) in the reference (GenBank accession AJ507799.2) were included.
- Unmapped reads were excluded.
- Reads whose mate did not align to the same chromosome (i.e. chrEBV) were excluded.
- Reads with more than 5 clipped bases (soft- or hard-clipped) in case of a split read (e.g. due to an EBV genome integration event) were excluded.
As an additional check, the number of reads in EBV-negative tumors were counted with the expectation of finding virtually nothing if human reads are not contaminating. Out of 35 EBV-negative genomes, 25 (71%) had exactly zero reads. The remaining genomes, with one exception (which had 90), had at most 19 (range: 1-19) reads. When a few randomly selected reads were attempted to align to the human genome, only short matches (20-30 bp) were found that were expected to be spurious. Therefore, it is believed that these are real EBV reads.
Given that EBV is ubiquitous (e.g. over 90% of adults globally and most African children are infected), it is possible that EBV-infected normal B cells were included at very low levels in otherwise EBV-negative tumor biopsies. This would explain the presence of a few EBV reads found in EBV-negative BL samples. In general, EBV reads are often found in DNA sequencing data. For more information, see http://www.cureffi.org/2013/02/01/the-decoy-genome/ .Therefore, we are confident that there are virtually no human reads in these EBV BAM files, consistent with the strict criteria that were used.