Colorectaal

Deep reinforcement learning voor optimalisatie van tumorbehandeling

Toepassing van actiegerichte deep reinforcement learning voor het verbeteren van exploratie en exploitatie bij de optimalisatie van tumorbehandelingen.

Abstract (original)

Inverse treatment planning is pivotal in tumor treatment planning. It enables the multi-objective optimization of radiation dose delivery, ensuring precise tumor targeting while sparing surrounding healthy tissues. This process often requires frequent parameter adjustments to achieve the desired balance between objectives, making it both labor-intensive and time-consuming. Deep reinforcement learning (DRL) provides an automated, model-based planning solution, aimed at reducing reliance on human expertise and enhancing the efficiency of objective parameter optimization. However, most current approaches apply DRL to inverse planning without fully leveraging the knowledge embedded in the continuous state-action space, defined by the coupling between nonstationary planning states and continuous decision variables. This may result in insufficient exploration and exploitation, leading to inefficient optimization. This work introduces an innovative action-guided DRL (AgDRL) approach for automatic inverse planning. Our goal is to enhance exploration and exploitation by leveraging insightful guidance from reward-guided actions. The implementation of AgDRL incorporates both exploitation and exploration in the action-state space. For exploitation, high-reward actions are employed as guidance to achieve the optimal action adjustment. For exploration, low-reward actions are recommended as training resets to explore a broader range of the latent state space. Quantitative and qualitative experiments are conducted in various settings to evaluate the proposed method. The results are assessed using DRL-related metrics (e.g. reward gains) and clinical-related measurements (e.g. dose-volume histograms, DVHs). Experimental results on a real-world rectal cancer dataset empirically demonstrate that the proposed AgDRL-based approach significantly improves optimization efficiency through a high-reward strategy while enhancing exploration diversity via a low-reward strategy, consistently outperforming the MatRad treatment planning optimization platform.

Dit artikel is een samenvatting van een publicatie in International journal of neural systems. Voor het volledige artikel, alle details en referenties verwijzen wij u naar de oorspronkelijke bron.

Lees het volledige artikel

DOI: 10.1142/S0129065726500206