Title

Predicting Discontinuation of Docetaxel Treatment for Metastatic Castration-Resistant Prostate Cancer (mCRPC) With Hill-Climbing and Random Forest

Publication Date

11-30-2015

Document Type

Article

Abstract

Motivation In the DREAM 9.5 Prostate Cancer subchallenge 2, we developed predictive models to predict patient outcomes in metastatic castrate-resistant prostate cancer (mCRPC) with subsequent discontinuation of docetaxel therapy. The input data consist of 131 variables measured across data from three clinical trials, namely, Memorial Sloan Kettering (MSK, 476 patients), Celgene (526 patients), Sanofi (598 patients). The goal is to predict which patients in a fourth clinical trial, AstraZeneca (AZ, 470 patients), would discontinue treatment due to adverse events within 3 months. Data & Methods The data cleansing were done separately within each clinical trial and later merged back together. Our data cleansing and pre-processing procedures include imputation of missing data, and removal of clinical variables with a high percentage of missing data. Data augmentation were also performed by converting selected multi-label variables into binary variables. We observed that univariate feature selection methods did not perform well. Hence, we adopted a hill-climbing approach that optimized the AUC within 10-fold cross validation of the training data. We also addressed the issue of imbalanced data (1292 negative and 197 positive samples) by randomly removing negative samples to meet a ratio roughly of 60% negative and 40% positive samples. We applied random forest using Sanofi as the hold-out, setting the parameters “mtry” to 25% of the number of features and number of trees to 100 times of the number of features. Our predictive model using MSK and Celgene data as the training set and Sanofi data as the test set yielded AUC=0.165, accuracy=0.9, precision=0.21, F1=0.092 and recall=0.06. Results Our final submission in predicting the discontinuation of docetaxel in the AstraZeneca clinical trial (using MSK, Celgene and Sanofi as training data) resulted in AUC of 0.13. Across the 470 patients in AstraZeneca clinical trial, 8 patients are predicted to discontinue the treatment within 3 months. Acknowledgement Hung and Yeung are supported by NIH grant U54-HL127624. This project used computing resources provided by Microsoft Azure. We thank all students in TCSS 588 Bioinformatics in Spring 2015 at University of Washington Tacoma who contributed to this project.

Publication Title

F1000Research

Volume

4

DOI

10.7490/f1000research.1111091.1

Publisher Policy

open access

Open Access Status

OA Journal

This document is currently not available here.

Find in your library

Share

COinS