Skip navigation

HIPC transcriptional profiling data standard

The HIPC Data Standards Working group proposes the following data standard for transcriptional profiling (microarray and RNA-seq) data:

  • For microarrays (Affymetrix and Illumina):
    • Raw (e.g., .cel files) should be submitted to GEO
    • Processed data (expression values used in publication) should be submitted to GEO
  • For RNA-seq:
    • Raw data (FASTQ) should be submitted to SRA
    • Processed data (e.g., TPM values) should be submitted to GEO

NOTE: In cases where it is not possible to make raw data publicly available, processed data can be deposited into GEO without an accompanying raw data submission. In this case, the following statement should be added to the GEO records: "The summitter declares that the raw data cannot be made available via any mechanism due to patient privacy concerns."

ImmPort will link to these data using the Experiment Sample templates, which will include:

  • Repository Name: GEO.
  • Repository Accession: the GEO sample accession number (GSMxxx).
  • For RNA-seq data, the GSM record in GEO should include a reference to the SRA accession for the raw data. Note that for next-generation sequencing, GEO can broker the complete set of raw data files (e.g., FASTQ) to the SRA database on your behalf (https://www.ncbi.nlm.nih.gov/geo/info/faq.html#rawdata).

RECOMMENDATION: After ImmPort data submission, the GEO metadata in the SOFT file should be updated to include a link back to the data in ImmPort . This will make users of GEO aware that related data is available through ImmPort. This should be done by including the tag: !Sample_ImmPort_expsamp_acc = ImmPort ExperimentSample ID

See NIH Genomic Data Sharing and examples of Data Repositories and Trusted Partners