Skip to main content

CMB-LCA

CMB-LCA | Cancer Moonshot Biobank - Lung Cancer Collection

DOI: 10.7937/3CX3-S132 | Data Citation Required | Image Collection

Location Species Subjects Data Types Cancer Types Size Supporting Data Status Updated
Lung Human 23 CT, DX, MR, NM, PT, US, Pathology Lung Cancer 59.92GB Clinical Public, Ongoing 2023/12/07

Summary

The Cancer Moonshot Biobank is a National Cancer Institute initiative to support current and future investigations into drug resistance and sensitivity and other NCI-sponsored cancer research initiatives, with an aim of improving researchers' understanding of cancer and how to intervene in cancer initiation and progression. During the course of this study, biospecimens (blood and tissue removed during medical procedures) and associated data will be collected longitudinally from at least 1000 patients across at least 10 cancer types, who represent the demographic diversity of the U.S. and receiving standard of care cancer treatment at multiple NCI Community Oncology Research Program (NCORP) sites.

This collection contains de-identified radiology and histopathology imaging procured from subjects in NCI’s Cancer Moonshot Biobank-Lung Cancer (CMB-LCA) cohort. Associated genomic, phenotypic and clinical data will be hosted by The Database of Genotypes and Phenotypes (dbGaP) and other NCI databases. A summary of Cancer Moonshot Biobank imaging efforts can be found on the Cancer Moonshot Biobank Imaging page.

Data Access

Version 4: Updated 2023/12/07

Added TCIA restricted data.

Title Data Type Format Access Points Subjects Studies Series Images License
Images MR, NM, DX, US, CT, PT DICOM
Download requires NBIA Data Retriever
16 52 305 29,915 CC BY 4.0
Images of the head (see Restricted License) CT, MR, PT DICOM
Download requires NBIA Data Retriever
12 30 284 29,695 TCIA Restricted
Tissue Slide Images, Pathology Metadata Pathology JSON and SVS
Download requires IBM-Aspera-Connect plugin
23 39 CC BY 4.0

Additional Resources for this Dataset

The database of Genotypes and Phenotypes (dbGaP) hosts genomic, phenotypic, and clinical data for NCI’s Cancer Moonshot Biobank (CMB) project. Information and access to the data can be found at:

The NCI Cancer Research Data Commons (CRDC) provides access to additional data and a cloud-based data science infrastructure that connects data sets with analytics tools to allow users to share, integrate, analyze, and visualize cancer research data.

Citations & Data Usage Policy

Data Citation Required: Users must abide by the TCIA Data Usage Policy and Restrictions. Attribution must include the following citation, including the Digital Object Identifier:

Data Citation

Cancer Moonshot Biobank. (2022). Cancer Moonshot Biobank – Lung Cancer Collection (CMB-LCA) (Version 4) [dataset]. The Cancer Imaging Archive. https://doi.org/10.7937/3CX3-S132

Detailed Description

Introduction

Biobank radiology imaging data on TCIA contains the “days from enrollment (registration)” for each scan, embedded in the DICOM files (DICOM tag (0012,0053)).   This allows for temporal alignment between the imaging on TCIA and clinical events data found on the Biobank Catalog.
Note:  In order that the images display properly in DICOM readers, the radiology imaging data also contains de-identified dates that preserve the temporal sequence relationship between scans in a given study.

Days from enrollment (registration)

In addition to modifying the actual date fields in the DICOM header, the “days from registration” values are calculated and stored in the DICOM tag (0012,0052) Longitudinal Temporal Offset from Event with the associated tag (0012,0053) Longitudinal Temporal Event Type set to “REGISTRATION”.   Here is an example DICOM header from a scan where the patient’s imaging was performed 2 days before the registration, resulting in a negative offset value.

(0012,0052) Longitudinal Temporal Offset from Event -2.0
(0012,0053) Longitudinal Temporal Event Type REGISTRATION

If you would like to filter your search results using this information, you can leverage the “Clinical Trial Time Points” filter  via our data portal at https://nbia.cancerimagingarchive.net/nbia-search/.

De-identification of DICOM dates

De-identification of dates for this dataset uses the DICOM Part 3.15 Annex E standard “Retain Longitudinal With Modified Dates Option” which allows dates to be retained as long as they are modified from the original date.  TCIA implements this using a technique which de-identifies the dates while preserving the longitudinal relationship between them.  Original dates will be first normalized to January 1, 1960 and then offset relative to the date of registration for each patient.  This normalized date system was chosen in order to make it obvious that the dates are not real, and to make it easy to quickly determine how much time has passed between the date of registration and the patients’ related imaging studies.

For example, if the real date of a patient’s registration was 03/27/2018 and the original imaging Study Date was 03/29/2018 then the anonymized TCIA Study Date would become 01/03/1960 (two days after the base date of 1/1/1960).

Other Publications Using this Data

TCIA maintains a list of publications which leverage TCIA data. If you have a manuscript you’d like to add please contact the TCIA Helpdesk.

TCIA Citation

Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., Moore, S., Phillips, S., Maffitt, D., Pringle, M., Tarbox, L., & Prior, F. (2013). The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. In Journal of Digital Imaging (Vol. 26, Issue 6, pp. 1045–1057). Springer Science and Business Media LLC. https://doi.org/10.1007/s10278-013-9622-7

Acknowledgement

The Cancer Moonshot Biobank program requests that publications using data from this program include the following statement: “Data used in this publication were generated by the National Cancer Institute’s Cancer Moonshot Biobank.”

Previous Versions

Version 2: Updated 2022/08/29

Note: Removed Scout and similar series (Scout, Topogram, Localizer, and 1 Sec Capture) that did not have corresponding MR/CT image series.

Title Data Type Format Access Points Studies Series Images License
Images DICOM CC BY 4.0
Tissue Slide Images SVS CC BY 4.0

Version 1: Updated 2022/08/12

Title Data Type Format Access Points Studies Series Images License
Images DICOM CC BY 4.0
Tissue Slide Images SVS CC BY 4.0

Version 3: Updated 2023/10/19

Note: Additional data for existing patients and new patients

Title Data Type Format Access Points Studies Series Images License
Images MR, NM, PT, CT, US, DX DICOM 52 305 29,915 CC BY 4.0
Tissue Slide Images, Pathology Metadata Pathology JSON and SVS 39 CC BY 4.0