CLEAR item#14

“Data overlap. State if any part of the dataset was used in a previous publication. Describe the differences between the current study and previous studies in terms of study purpose and methodology.” [1] (from the article by Kocak et al.; licensed under CC BY 4.0)

Reporting examples for CLEAR item#14

Example#1. “The study population was composed of locoregionally advanced HNSCC patients (TNM7 stage III-IVA/B (M0)) receiving curative treatment between 2008 and 2017, collected within the framework of the BD2Decide project (http://www.bd2decide.eu/, accessed on 13 May 2021, H2020-PHC30-689715, IRB P-number P0125, ClinicalTrials.gov Identifier: NCT02832102) […] The collected patient population was originally staged at diagnosis of the TNM7 staging system. During the BD2Decide project, these patients were re-staged to I-IVA/B (M0) using the newly developed TNM8 staging system.” [2] (from the article by Keek et al.; licensed under CC BY 4.0)

Example#2. “The image datasets were gathered from The Cancer Imaging Archive (TCIA) open-access dataset, and the respective geneexpression profiles were acquired through The Cancer Genome Atlas (TCGA). Motivated by prior studies that indicated that TIME is correlated with the prediction of breast cancer, we created and verified the association of imaging phenotypes with TIME by using three datasets. There was zero patient overlap across the three datasets, and descriptive and clinical statistics of all three cohorts are shown below in Table 1.” [3] (from the article by Han et al.; licensed under CC BY 4.0)

Example#3. “Patients were retrospectively selected from a population scale cohort comprising 207 adult patients, extending a previous series.” [4] (from the article by Nenning et al.; licensed under CC BY-NC 4.0)

Example#4. “Study subjects or cohorts overlap: Some study subjects have been previously reported in: Sepulcri, M.; Fusella, M.; Cuppari, L.; Zorz, A.; Paiusco, M.; Evangelista, L. Value of 18F-fluorocholine PET/CT in predicting response to radical radiotherapy in patients with localized prostate cancer. Clin. Transl. Radiat. Oncol. 2021, 30, 71–77, doi:10.1016/j.ctro.2021.07.002” [5] (from the article by Marturano et al.; licensed under CC BY 4.0)

Explanation and elaboration of CLEAR item#14

Declaring the presence or not of datasets overlap in radiomics publication is important mainly to be aware of potential bias of the results [6]. In overlapping datasets, the radiomic model developed may be biased by highlighting specific cases rather than evaluating other features, therefore this can lead to overestimation of performance and it can result in an overly optimistic assessment of the model’s performance [6–8]. Reporting dataset overlap allows readers to gauge the generalization capability of the proposed radiomic model and helps in understanding the robustness of the radiomics model across different patient populations or imaging devices. A model that performs well on completely diverse datasets is more likely to be clinically applicable [6, 8]. Declaring dataset overlap is important also in terms of reproducibility since it allows other researchers to understand the methodology and enables the readers to compare the study with previous research. Understanding whether the dataset used in a new study overlaps with datasets from earlier studies is important for evaluating the progress in the field and ensuring that new findings are not merely repetitions of earlier results [9]. Although the best approach would be to report this item in the materials and methods part of a paper, some reputable journals encourage reporting such data overlaps in papers’ declarations part (see Example#4), indicating its importance in academic publishing as well.

References

  1. Kocak B, Baessler B, Bakas S, et al (2023) CheckList for EvaluAtion of Radiomics research (CLEAR): a step-by-step reporting guideline for authors and reviewers endorsed by ESR and EuSoMII. Insights Imaging 14:75. https://doi.org/10.1186/s13244-023-01415-8
  2. Keek SA, Wesseling FWR, Woodruff HC, et al (2021) A Prospectively Validated Prognostic Model for Patients with Locally Advanced Squamous Cell Carcinoma of the Head and Neck Based on Radiomics of Computed Tomography Images. Cancers 13:3271. https://doi.org/10.3390/cancers13133271
  3. Han X, Cao W, Wu L, Liang C (2022) Radiomics Assessment of the Tumor Immune Microenvironment to Predict Outcomes in Breast Cancer. Front Immunol 12. https://doi.org/10.3389/fimmu.2021.773581
  4. Nenning K-H, Gesperger J, Furtner J, et al (2023) Radiomic features define risk and are linked to DNA methylation attributes in primary CNS lymphoma. Neuro-Oncol Adv 5:vdad136. https://doi.org/10.1093/noajnl/vdad136
  5. Marturano F, Guglielmo P, Bettinelli A, et al (2023) Role of radiomic analysis of [18F]fluoromethylcholine PET/CT in predicting biochemical recurrence in a cohort of intermediate and high risk prostate cancer patients at initial staging. Eur Radiol 33:7199–7208. https://doi.org/10.1007/s00330-023-09642-9
  6. Moskowitz CS, Welch ML, Jacobs MA, et al (2022) Radiomic Analysis: Study Design, Statistical Analysis, and Other Bias Mitigation Strategies. Radiology 304:265–273. https://doi.org/10.1148/radiol.211597
  7. Priestley M, O’donnell F, Simperl E (2023) A Survey of Data Quality Requirements That Matter in ML Development Pipelines. J Data Inf Qual 15:11:1-11:39. https://doi.org/10.1145/3592616
  8. Chen H, Chen J, Ding J (2021) Data Evaluation and Enhancement for Quality Improvement of Machine Learning. IEEE Trans Reliab 70:831–847. https://doi.org/10.1109/TR.2021.3070863
  9. Park JE, Park SY, Kim HJ, Kim HS (2019) Reproducibility and Generalizability in Radiomics Modeling: Possible Strategies in Radiologic and Statistical Perspectives. Korean J Radiol 20:1124–1137. https://doi.org/10.3348/kjr.2018.0070

Back Next