Slice-prompted HR-CTV interactive segmentation for cervical cancer brachytherapy: A multi-center study.
In computed tomography (CT)-guided cervical cancer brachytherapy, the manual contouring for the high-risk clinical target volume (HR-CTV) is a time-consuming and expertise-dependent process. Furthermore, automated approaches struggle with ambiguous boundaries of HR-CTV.
We aimed to develop a clinically efficient interactive segmentation framework integrating deep learning with clinician expertise.
We propose a slice-prompted interactive segmentation method (SPSeg) for HR-CTV delineation in CT-guided cervical cancer brachytherapy. Clinicians provided sparse prompts by manually outlining HR-CTV on key slices, which were then encoded into a 3D U-Net architecture to guide full-volume segmentation. We investigated two architectural variants: SPSeg-Mono, which jointly processes the CT images and the prompt masks with a single encoder; and SPSeg-Dual, which employs two separate encoders for image and prompt, fusing their features at a deeper level. The model was trained on 640 CT scans (from 160 patients) and validated on 160 scans (40 patients) from a single center, and externally tested on three multi-center cohorts: 400 scans (100 patients), 115 scans (40 patients), and 150 scans (30 patients), respectively. Evaluation included Dice Similarity Coefficient (DSC), 95% Hausdorff Distance (HD95), a 5-point Likert scale for clinical acceptability, time efficiency, and inter-observer agreement.
Performance consistently improved with the addition of prompt slices, with SPSeg-Dual outperforming SPSeg-Mono. Without prompts, the model yielded DSCs of 0.83, 0.76, and 0.76, and HD95s of 7.5, 10.1, and 11.6 mm for Test Sets 1, 2, and 3, respectively. With the addition of just three prompt slices, DSCs increased significantly to 0.95, 0.92, and 0.91, while HD95s decreased to 2.1, 3.1, and 3.2 mm, respectively (all p < 0.001). Qualitative scores confirmed high clinical acceptability (mean Likert scores > 3), and the interactive method substantially reduced contouring time for both clinicians (from 11.7 to 1.7 min for Clinician A, and from 9.9 to 1.5 min for Clinician B). It also improved inter-observer agreement, with DSC increasing from 0.88 to 0.93 and HD95 decreasing from 3.2 to 2.5 mm (p < 0.001).
The proposed SPSeg method effectively integrates clinical expertise with deep learning, offering a highly precise and efficient solution for HR-CTV delineation in cervical cancer brachytherapy.
We aimed to develop a clinically efficient interactive segmentation framework integrating deep learning with clinician expertise.
We propose a slice-prompted interactive segmentation method (SPSeg) for HR-CTV delineation in CT-guided cervical cancer brachytherapy. Clinicians provided sparse prompts by manually outlining HR-CTV on key slices, which were then encoded into a 3D U-Net architecture to guide full-volume segmentation. We investigated two architectural variants: SPSeg-Mono, which jointly processes the CT images and the prompt masks with a single encoder; and SPSeg-Dual, which employs two separate encoders for image and prompt, fusing their features at a deeper level. The model was trained on 640 CT scans (from 160 patients) and validated on 160 scans (40 patients) from a single center, and externally tested on three multi-center cohorts: 400 scans (100 patients), 115 scans (40 patients), and 150 scans (30 patients), respectively. Evaluation included Dice Similarity Coefficient (DSC), 95% Hausdorff Distance (HD95), a 5-point Likert scale for clinical acceptability, time efficiency, and inter-observer agreement.
Performance consistently improved with the addition of prompt slices, with SPSeg-Dual outperforming SPSeg-Mono. Without prompts, the model yielded DSCs of 0.83, 0.76, and 0.76, and HD95s of 7.5, 10.1, and 11.6 mm for Test Sets 1, 2, and 3, respectively. With the addition of just three prompt slices, DSCs increased significantly to 0.95, 0.92, and 0.91, while HD95s decreased to 2.1, 3.1, and 3.2 mm, respectively (all p < 0.001). Qualitative scores confirmed high clinical acceptability (mean Likert scores > 3), and the interactive method substantially reduced contouring time for both clinicians (from 11.7 to 1.7 min for Clinician A, and from 9.9 to 1.5 min for Clinician B). It also improved inter-observer agreement, with DSC increasing from 0.88 to 0.93 and HD95 decreasing from 3.2 to 2.5 mm (p < 0.001).
The proposed SPSeg method effectively integrates clinical expertise with deep learning, offering a highly precise and efficient solution for HR-CTV delineation in cervical cancer brachytherapy.
Authors
Peng Peng, Liu Liu, Tang Tang, Li Li, Yang Yang, Shao Shao, Cao Cao, Song Song, Huo Huo, Yang Yang
View on Pubmed