Skip to main content

SAROS

SAROS | SAROS - A large, heterogeneous, and sparsely annotated segmentation dataset on CT imaging data

DOI: 10.25737/SZ96-ZG60 | Data Citation Required | Analysis Result

Cancer Types Location Subjects Related Collections Size Supporting Data Updated
Adenocarcinoma, Breast, Corpus Endometrial Carcinoma, COVID-19(non-cancer), Cutaneous Melanoma, Ductal Adenocarcinoma, Head and Neck Carcinomas, Head and Neck Squamous Cell Carcinoma, Healthy Controls (non-cancer), Kidney Cancer, Liver Hepatocellular Carcinoma, Lung Adenocarcinoma, Lung Cancer, Lung Squamous Cell Carcinoma, Melanoma, Non-small Cell Lung Cancer, Soft-tissue Sarcoma, Squamous Cell Carcinoma, Stomach Adenocarcinoma, Uterine Corpus Endometrial Carcinoma Breast, Chest, Extremities, Head-Neck, Kidney, Liver, Lung, Pancreas, Skin, Stomach, Uterus 882 87.14MB Segmentations 2024/03/07

Summary

Sparsely Annotated Region and Organ Segmentation (SAROS) contributes a large heterogeneous semantic segmentation annotation dataset for existing CT imaging cases on TCIA. The goal of this dataset is to provide high-quality annotations for building body composition analysis tools (References: Koitka 2020 and Haubold 2023). Existing in-house segmentation models were employed to generate annotation candidates on randomly selected cases. All generated annotations were manually reviewed and corrected by medical residents and students on every fifth axial slice while other slices were set to an ignore label (numeric value 255).

900 CT series from 882 patients were randomly selected from the following TCIA collections (number of CTs per collection in parenthesis):  ACRIN-FLT-Breast (32), ACRIN-HNSCC-FDG-PET/CT (48), ACRIN-NSCLC-FDG-PET (129), Anti-PD-1_Lung (12), Anti-PD-1_MELANOMA (2), C4KC-KiTS (175), COVID-19-NY-SBU (1), CPTAC-CM (1), CPTAC-LSCC (3), CPTAC-LUAD (1), CPTAC-PDA (8), CPTAC-UCEC (26), HNSCC (17), Head-Neck Cetuximab (12), LIDC-IDRI (133), Lung-PET-CT-Dx (17), NSCLC Radiogenomics (7), NSCLC-Radiomics (56), NSCLC-Radiomics-Genomics (20), Pancreas-CT (58), QIN-HEADNECK (94), Soft-tissue-Sarcoma (6), TCGA-HNSC (1), TCGA-LIHC (33), TCGA-LUAD (2), TCGA-LUSC (3), TCGA-STAD (2), TCGA-UCEC (1).

A script to download and resample the images is provided in our GitHub repository: https://github.com/UMEssen/saros-dataset

The annotations are provided in NIfTI format and were performed on 5mm slice thickness. The annotation files define foreground labels on the same axial slices and match pixel-perfect. In total, 13 semantic body regions and 6 body part labels were annotated with an index that corresponds to a numeric value in the segmentation file. 

Body Regions

  1. Subcutaneous Tissue
  2. Muscle
  3. Abdominal Cavity
  4. Thoracic Cavity
  5. Bones
  6. Parotid Glands
  7. Pericardium
  8. Breast Implant
  9. Mediastinum
  10. Brain
  11. Spinal Cord
  12. Thyroid Glands
  13. Submandibular Glands

Body Parts

  1. Torso
  2. Head
  3. Right Leg
  4. Left Leg
  5. Right Arm
  6. Left Arm

The labels which were modified or require further commentary are listed and explained below:

  • Subcutaneous Adipose Tissue: The cutis was included into this label due to its limited differentiation in 5mm-CT.
  • Muscle: All muscular tissue was segmented contiguously and not separated into single muscles. Thus, fascias and intermuscular fat were included into the label. Inter- and intramuscular fat is subtracted automatically in the process.
  • Abdominal Cavity: This label includes the pelvis. The label does not separate between the positional relationships of the peritoneum.
  • Mediastinum: The International Thymic Malignancy Group (ITMIG) scheme was used for the segmentation guidelines.
  • Head + Neck: The neck is confined by the base of the trapezius muscle.
  • Right + Left Leg: The legs are separated from the torso by the line between the two lowest points of the Rami ossa pubis.
  • Right + Left Arm: The arms are separated from the torso by the diagonal between the most lateral point of the acromion and the tuberculum infraglenoidale.

For reproducibility on downstream tasks, five cross-validation folds and a test set were pre-defined and are described in the provided spreadsheet. Segmentation was conducted strictly in accordance with anatomical guidelines and only modified if required for the gain of segmentation efficiency.

Data Access

Version 2: Updated 2024/03/07

The segmentations of 91 cases were updated to improve the segmentation quality. In some cases, some bones (mostly the ribs) were incorrectly annotated as “muscle”. These mistakes were revised and the segmentation accuracy of these areas was improved.

Title Data Type Format Access Points Subjects Studies Series Images License
SAROS Segmentations NIFTI and ZIP CC BY 4.0
Segmentation Information Spreadsheet CSV CC BY 4.0

Collections Used In This Analysis Result

Title Data Type Format Access Points Subjects Studies Series Images License
Source Images ACRIN-HNSCC-FDG-PET/CT (48), Anti-PD-1_MELANOMA (2), HNSCC (17), Head-Neck Cetuximab (12), QIN-HEADNECK (94), TCGA-HNSC (1) CT DICOM 174 174 174 56,400 TCIA Restricted
Source Images ACRIN-FLT-Breast (32), ACRIN-NSCLC-FDG-PET (129), Anti-PD-1_Lung (12), C4KC-KiTS (175), CPTAC-CM (1), CPTAC-LSCC (3), CPTAC-LUAD (1), CPTAC-PDA (8), CPTAC-UCEC (26), LIDC-IDRI (133), NSCLC Radiogenomics (7), Pancreas-CT (58), Soft-tissue-Sar CT DICOM 614 626 632 126,796 CC BY 3.0
Source Images NSCLC-Radiomics (56), NSCLC-Radiomics-Genomics (20) CT DICOM 76 76 76 8,807 CC BY-NC 3.0
Source Images COVID-19-NY-SBU (1), Lung-PET-CT-Dx (17) CT DICOM 18 18 18 2,654 CC BY 4.0

Additional Resources For This Dataset

Citations & Data Usage Policy

Data Citation Required: Users must abide by the TCIA Data Usage Policy and Restrictions. Attribution must include the following citation, including the Digital Object Identifier:

Data Citation

Koitka, S., Baldini, G., Kroll, L., van Landeghem, N., Haubold, J., Sung Kim, M., Kleesiek, J., Nensa, F., & Hosch, R. (2023). SAROS – A large, heterogeneous, and sparsely annotated segmentation dataset on CT imaging data (SAROS) (Version 2) [Data set]. The Cancer Imaging Archive. https://doi.org/10.25737/SZ96-ZG60

Acknowledgements

To the entire annotation lab team at the Institute for Artificial Intelligence in Medicine (IKIM, University Hospital Essen), we express our profound gratitude for your meticulous efforts in data segmentation. Your dedication ensures accuracy and efficiency, paving the way for this collection. Thank you for your invaluable contribution.

To all collections that shared their data and made it possible that we could prepare the segmentations: thank you! Your contributions made it possible to provide an open available segmentation dataset for CT based body composition analysis.

Publications Using This Data

TCIA maintains a list of publications which leverage TCIA data. If you have a manuscript you’d like to add please contact TCIA’s Helpdesk.

TCIA Citation

Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., Moore, S., Phillips, S., Maffitt, D., Pringle, M., Tarbox, L., & Prior, F. (2013). The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. In Journal of Digital Imaging (Vol. 26, Issue 6, pp. 1045–1057). Springer Science and Business Media LLC. https://doi.org/10.1007/s10278-013-9622-7

Previous Versions

Version 1: Updated 2023/09/28

Title Data Type Format Access Points Subjects Studies Series Images License
SAROS Segmentations NIFTI CC BY 4.0
Segmentation Information Spreadsheet CSV CC BY 4.0

Collections Used In This Analysis Result

Related Collections