Skip to main content

Using TARGET Data

- ANNOUNCEMENT -

Harmonized TARGET Data Released in the NCI GDC Data Portal

The NCI Genomic Data Commons (GDC) includes the data that the TARGET project has created. GDC harmonizes TARGET data using GDC alignment and variant calling pipelines and makes the harmonized data available in the GDC Data Portal.

The GDC applies multiple variant calling pipelines for the somatic variant calls in the Variant Call Format (VCF). They plan to release new aggregated mutations in the Mutation Annotation Format (MAF) for each sample in the near future. The currently available MAFs do not reflect the VCF updates. Additional details are available in the GDC Data Release Notes.

To learn more about the GDC, please visit the GDC Web Site or attend a GDC Webinar.

........................

The TARGET Initiative produces large-scale genomic data for a set of pediatric cancers and provides the research community access to those data. The goal for broadly sharing TARGET data is to facilitate the discovery of therapeutic targets for childhood cancers and catalyze the translation of these discoveries into clinical applications.

Learn how to search and download TARGET data by reading the sections below.

Expand All

About the Data

TARGET project teams took an integrated approach to identify genetic alterations within tumors from children enrolled primarily in Children’s Oncology Group clinical or biology studies. TARGET utilized various complementary genomics methods, such as gene expression and next-generation sequencing platforms, to analyze tumor and matched normal samples and relapse samples when available. Resulting data were correlated with clinical outcome to extract biological insights and reveal potentially targetable clinical markers in pediatric cancers. Visit the TARGET Project Experimental Methods page for detailed information describing how TARGET data were generated by genomic platform, including protocols for establishing high-quality nucleic acid samples.

Genome-Scale Characterization

Researchers used array-based techniques to analyze tumor and matched normal samples for gross changes to genome structure and expression. Data from these methods can be analyzed individually by platform, as well as integrated with other array or sequence data, to construct a more comprehensive genomic profile.

  • Gene expression profiling
  • Chromosome-specific copy number analysis
  • Methylation profiling (including some sequencing)
  • miRNA profiling (including some sequencing)

Sequencing

Researchers used next-generation sequencing to analyze tumor and matched normal samples for mutations, gene fusions, and other alterations present in childhood cancers. The acute lymphoblastic leukemia and neuroblastoma studies additionally employed targeted sequencing for certain case cohorts.

  • Whole Genome Sequencing
  • Whole Exome Sequencing
  • Transcriptome Sequencing (mRNA-seq and/or miRNA-seq)
  • Targeted Capture Sequencing (primarily for verification and validation)
  • Targeted Sanger Sequencing (including kinome)
Open vs. Controlled TARGET Data

TARGET employs stringent human subject protection and data access policies to protect the privacy and confidentiality of research participants. Therefore, TARGET data are available to the scientific community in two tiers: open or controlled access. Both types of data can be accessed through the TARGET Data Matrix.

Visit the Guide to Accessing Data page for a visual and interactive guide on how to access all TARGET data. Please refer to this guide as you read the information and section below, Open vs. Controlled Access and How to Access Protected Data, respectively.

Open Access Data

Open access data generally consist of verified and analyzed data that cannot be used to identify individual patients. These data can be analyzed, for example, to make correlations between molecular subtypes and clinical outcomes. Most researchers may find open access data sufficient in fulfilling their research needs. For non-sequencing-based TARGET analyses, such as gene expression array and methylation array, raw data may also be open access. TARGET provides the scientific community the maximum amount of open access data allowable by informed consent.

Researchers can access these data by clicking on any link labeled “Open” in the TARGET Data Matrix. Data Use Certification (i.e. approval) is not required, and researchers may explore data without restriction. Examples of open access data:

  • Clinical information that could not be used to identify patients
  • Tissue pathology data
  • Chromosome-specific (segmented) copy number alterations and loss of heterozygosity
  • Mutations

Controlled Access Data

Data within this category present a small risk of patient re-identification. While stripped of direct patient identifiers as defined by HIPAA, controlled access data contains specific patient/tumor information and unverified or raw molecular data (e.g. array-based and sequence files). These data can be used to perform sophisticated bioinformatic analyses.

Researchers must obtain approval in the form of Data Use Certification (DUC) to access and download controlled data. They must apply for DUC by submitting requests through NCBI’s dbGaP (National Center for Biotechnology Information’s database of Genotypes and Phenotypes). Requestors must agree to the Data Use Limitations specific to this TARGET study. Examples of controlled access data:

  • Specific genotype or phenotype data for each case
  • Raw sequence files for an individual case
How to Access Protected Data

Below are step-by-step instructions for how to access protected TARGET data. The Guide to Accessing Data provides a visual and interactive overview of these steps.

  1. Obtain Data Use Certification through dbGaP
  2. Maintain User Account for Data Access
  3. Access Data via the TARGET Data Matrix
  4. Get Help If You Have Trouble Accessing Data

1. Obtain Data Use Certification through dbGaP

  • Login to dbGaP (https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login) to apply for access to controlled TARGET data.
    • All users must have an eRA Commons account or HHS credentials (for intramural investigators) to submit requests for access. Further information can be found on the NCBI dbGaP homepage.
  • Complete the electronic dbGaP Data Access Request (SF 424 (R&R)) form, which requires a brief description of the investigator’s intended use of the data. To get approved for a Data Use Certification (DUC), the requestors must:
    • Agree not to try to identify and/or contact the patients
    • Agree with the Data Use Limitations of the TARGET Initiative:
      Requests for controlled access data will be considered for research projects that can only be conducted using pediatric data (i.e. the research objectives cannot be accomplished using data from adults) and that focus on the development of more effective treatments, diagnostic tests, or prognostic markers for childhood cancers. Moreover, TARGET data can be used for research relevant to the biology, causes, treatment, and late complications of treatment of pediatric cancers. Applications proposing methods, software, or other tool development are not considered acceptable uses of the data.
  • Submit the completed SF 424 (R&R) form electronically to dbGaP for consideration of data access approval.
    • Upon SF 424 (R&R) form submission, the signing official of the Principal Investigator’s institution will review and certify the submission if relevant institutional policies and applicable laws and regulations (if any) have been followed.
    • After the signing official has certified the submission, the SF 424 (R&R) application will be sent to the NCI Data Access Committee (DAC) to review for approval. The approval review process can take 2-4 weeks.
    • Approval in the form of a DUC allows the investigator data access to TARGET data for one calendar year.
  • Submit a progress report to the DAC no later than one year after obtaining the DUC. The requestor needs to understand that a progress report is a current condition for the data access. Approved users may also apply for renewal to access protected data at the same time they submit the reports. A reminder to submit an annual progress report and renew approval status, if needed, will be sent by the DAC staff approximately one month before the access termination deadline. If the requestor does not submit the progress report or request a renewal, access to the data will cease.

2. Maintain User Accounts for Data Access

Intramural investigators with an approved DUC may access protected TARGET data using their HHS credentials. Investigators outside of HHS with an approved DUC require two separate user accounts to access protected TARGET data:

  • Access to TARGET data stored and maintained at NCBI and the NCI Genomic Data Commons (GDC) – approved users can access TARGET data stored at NCBI using the eRA Commons account associated with the original Data Access Request. TARGET data stored at NCBI includes raw and aligned reads from next-generation sequencing (FASTQ and BAM files), which are all accessible via the NCBI Sequence Read Archive (SRA). TARGET data stored at the GDC includes raw and aligned reads from next-generation sequencing (FASTQ and BAM files), as well as some aggregate data (including mutation calls and other associated molecular data).
  • Access to data stored and maintained at the OCG Data Coordinating Center (DCC) at the National Cancer Institute (NCI) – approved users outside of HHS are required to use eRA Commons account credentials to log on to Globus.org in order to access TARGET controlled access data housed at the NCI’s OCG DCC. (Approved PIs and designated downloaders will receive an email with detailed instructions on how to use Globus.org to access OCG DCC data upon approval). This eRA Commons account will be used to access data at the OCG DCC, which includes most of the genomic data generated for the TARGET Initiative (clinical information, all levels of chip-based molecular characterization, and higher-level sequencing data). *** The password on this account needs to be updated every 90 days, but for some instances can be extended. Instructions are distributed when the account is created. ***

The OCG DCC houses data produced by TARGET project teams including analyzed data they generated for related manuscripts. This differs from what is at the GDC in that the GDC downloaded all of the raw TARGET next-gen sequencing data and did their own analysis, therefore producing their own L3/analyzed data files. For more information on maintaining or troubleshooting these data access accounts, visit the Guide to Accessing Data page.

3. Access Protected Data via the TARGET Data Matrix

Approved users may access protected TARGET data through the TARGET Data Matrix with either HHS credentials (intramural investigators) or eRA Commons credentials via Globus.org as outlined in #2 (extramural investigators).

  • A Globus.org account associated with eRA Commons is required to access controlled access TARGET data. (Users may already have another non-eRA Commons-related Globus account set up; however, only eRA-commons Globus accounts can be used to access this dataset).
  • Access data stored at the OCG DCC directly through the TARGET Data Matrix (requires eRA Commons account for extramural investigators):
    • Protected clinical information
    • Raw chip-based molecular characterization data
    • Processed sequencing data (upper level files, excluding BAM files; i.e. VCF or MAF files)
  • Access sequence files stored at NCBI and the GDC indirectly through hyperlinks on the TARGET Data Matrix (requires eRA Commons account for extramural investigators):
    • FASTQ/BAM files stored in the Sequence Read Archive (SRA) accessible through NCBI dbGaPnext-generation whole genome, exome, mRNA-seq, miRNA-seq, targeted capture, methyl-seq, ChIP-seq
    • FASTQ/BAM/VCF files stored in the GDC accessible through NCI GDC website with eRA login – next-generation whole genome, exome, mRNA-seq, miRNA-seq, targeted capture

4. Get Help If You Have Trouble Accessing Data

How to Navigate the TARGET Data Matrix

TARGET data are available to the research community and accessible through a tabular, easy-to-use Data Matrix. Throughout the initiative, matrix version history has been updated as datasets are added. Users should be aware of the version and date when a dataset is downloaded, as alternative versions of the TARGET Data Matrix exist.

The Data Matrix links to both open and controlled access TARGET data. To obtain specific datasets or metadata, including descriptions of each project, users can hover over the text within the table and click to access the appropriate files. “Sample and Data Relationship Format” files, or SDRFs, map all entities of an experiment (case, sample, libraries, files etc.) together. SDRFs allow users to determine which files are connected to which cases. These files can be found in the METADATA directory of each type of analysis. Metadata files, including MAGE-TAB-formatted SDRF and IDF files, map cases within a study to related data files produced by the project.

The TARGET project in acute lymphoblastic leukemia (ALL) is separated by phase: Phase 1, the pilot portion of the initiative; Phase 2; and Phase 3, Acute Leukemia of Ambiguous Lineage.

Data Levels

  • Raw or low-level data files (level 1)
  • Normalized and integrated data (levels 2 and 3)
  • Summarized findings (level 4)

Data Access Code

  • Blue = open access
  • Red = controlled access (NCI & NCBI)
  • Black = unavailable

Types of Data Found in the Matrix

  • Names of diseases studied
  • Clinical information, including outcomes
  • Types of molecular data generated, and platforms used
  • Metadata descriptions about each individual project (SDRF, IDF)
  • Multi-level chip-based and sequencing data links

We want the TARGET Data Matrix to meet the needs of the research community and encourage users to send comments, questions, and suggestions for improvement to OCG@mail.nih.gov.