Cancer Imaging Checklist for DAta Sharing (CICADAS)
This checklist was developed to guide data publishers in preparing a TCIA (The Cancer Imaging Archive) data abstract and description. It ensures that researchers have the information they need to determine whether your dataset is suitable for their projects. By following these guidelines, you can make your data optimally discoverable on TCIA and other search engines.
Your dataset description should be comprehensive and informative, providing users with all the necessary details to utilize your data effectively. The focus should be primarily on describing the dataset itself, not your research projects (though links to related research can be included). Use the outline below when drafting your dataset description:
Title
- Full Title: Provide a clear and concise title (recommended length: 110 characters or fewer).
- Short Title: A brief identifier (<30 characters) consisting only of letters, numbers, and dashes (no spaces). Avoid including terms like “data” or “dataset.”
Abstract (Maximum 1,000 Characters)
Provide a brief overview of the dataset, including:
- Number of subjects.
- Types of imaging data included.
- Types of non-imaging supporting data (e.g., image classifications, segmentations, demographics, treatment details, outcomes).
- Potential research applications of the dataset.
Introduction (Unlimited Length)
Provide a background of the dataset, including its purpose and uniqueness (i.e. how other researchers will benefit from us publishing this dataset). There is no hard requirement on length, but we encourage you to present this as concisely as possible.
Methods (Unlimited Length)
The following subsections provide information about how the data were selected, acquired and prepared for publication.
Subject Inclusion and Exclusion Criteria
Describe things such as the approximate date range of image acquisitions, demographic characteristics, clinical characteristics (e.g. diseases, stage, severity) and treatment history
Data Acquisition
- Radiology:
- CT Scanning Parameters: Scanner vendor and model (e.g., Siemens SOMATOM Definition Flash), tube voltage (kVp), tube current (mA), and radiation dose metrics (e.g., CTDI, DLP). Reconstruction parameters: slice thickness, reconstruction kernel, pitch, and whether the scan was contrast-enhanced. Scan protocol details (e.g., helical scan mode) and any postprocessing performed (e.g., multiplanar reconstruction).
- MRI Acquisition Details: Magnetic field strength (e.g., 1.5T, 3T). Sequence type (e.g., T1-weighted, T2-weighted, FLAIR, diffusion-weighted imaging). Technical parameters such as repetition time (TR), echo time (TE), inversion time (TI) if applicable, slice thickness, field of view (FOV), matrix size, and any specific coils used (e.g., head coil). Use of contrast agents, timing post-contrast, and dynamic imaging protocols if applicable.
- PET/CT or PET/MRI Specifics: Details about the radiotracer (e.g., FDG, dose injected in mCi or MBq). Timing between tracer injection and image acquisition. Attenuation correction methods and reconstruction algorithms.
- Ultrasound Specifications: Transducer type and frequency. Imaging modes (B-mode, Doppler, elastography) along with machine settings such as gain, depth, and focal zones.
- Histopathology: Details regarding tissue fixation (e.g., formalin-fixed, paraffin-embedded tissue), staining methods (e.g., H&E, immunohistochemistry) and any counterstaining, scanner manufacturer, resolution (e.g., 20x or 40x magnification), pixel size, and file format (e.g., SVS, TIFF).
- Clinical: Data capture process (e.g., electronic health records, manual entry), types of data included and relevant standards used (if applicable) in the data dictionary.
- Other data: Details about any other supporting data collected such as genomic or proteomic data.
Data Analysis
- File Format Conversions: Describe any changes made to the original data format, such as conversions from DICOM to NIfTI or JPEG. Include details about software or scripts used for these conversions and any quality control steps taken to verify data fidelity.
- Image Preprocessing Steps: List any preprocessing pipelines applied, such as noise reduction, image normalization, intensity scaling, or bias-field correction. If image registration or motion correction algorithms were used, provide details and reference the software or algorithm.
- Manual Annotation and Segmentation Protocols: Detail the software, guidelines, or protocols used for manual or semi-automatic annotations. Include information on inter- and intra-observer variability assessments, if applicable, and any training provided to annotators.
- Quality Control and Validation: Describe automated or manual quality control steps (e.g., visual review or statistical checks) used to verify image integrity or segmentation accuracy. Include details about any reprocessing that was triggered by these QC measures.
- Automated Image Analyses: Explain any mathematically derived features (e.g., radiomic features) that were computed from the images. Provide details on algorithms, models or pipelines that generate these derived metrics and note any cut-offs or thresholds used.
- Scripts, Code, and Software Versions: Reference any custom scripts (with version numbers) or open-source tools used in the analysis pipeline. Provide links to public repositories or documentation where others can find more details about the pipeline.
Usage Notes (Unlimited Length)
Provide practical guidance for data users, such as:
- Explanations of data organization and naming conventions (i.e. significance of identifiers, explanation of time points, directory and filename structures).
- Details about intended training/test groupings or other noteworthy subsets of the data.
- Instructions for using spreadsheets or unusual file formats.
- Recommendations for software that can be used to open the data.
External Resources (Optional)
Include information about related datasets, source code or other tools that may be stored outside of TCIA (e.g. other databases, Github, HuggingFace) which you recommend to users of your data.
References
Include citations for any references relevant to your dataset.
By adhering to the CICADAS checklist, you can ensure your dataset is both comprehensive and easily accessible to the research community.