br SRRT SEM br GEFS br Random
Random Forest AdaBoost
F-ratio p-value F-ratio p-value
F-ratio p-value F-ratio p-value F-ratio p-value F-ratio p-value F-ratio p-value
GEFS RMSE – –
Random Subspace RMSE – –
Method MAE – –
Gradient Boosting RMSE – –
Regression Tree MAE – –
Random Forest RMSE – –
Regressor MAE – –
AdaBoost RMSE – –
Regression Tree MAE – –
Regression Tree RMSE – –
The 10-fold cross-validation results of Table 9 show that AUY922 (NVP-AUY922) the proposed SRRT-SEM has the lowest average value of RMSE and MAE, and the highest average value of R2. There is a small discrepancy between the predictive survival time and the real survival time in the prediction model with proposed regression method and the average gap between prediction value and true value is 9.1194 months. The values of R2 suggest that the SRRT-SEM model can explain 38.93% of the survival month variability, better than GEFS (23.35%), random subspace method (16.39%), random forest (19.61%), gradient boosting regression tree (19.45%), AdaBoost regression tree (14.88%) and regression tree (11.98%).
To assess the e ciency of the proposed method in terms of the performance indicators, we carried out statistical signif-icance tests using the commercial software SPSS (Version 19.0), which compare the proposed method with GEFS, random subspace method, gradient boosting regression tree, random forest, AdaBoost regression tree and regression tree. The analy-sis of variance (ANOVA) is used to analyze the RMSE, MAE and R2 values obtained by the compared methods. The difference is considered statistically significant if the p-value is less than 0.05. Table 10 summarizes the resulting F-ratios and p-values.
Performance comparison of two-stage model and one-stage regression model.
SRRT-SEM GEFS Random Subspace Random Gradient Boosting AdaBoost Regression Tree
Method Forest Regression Tree
From Table 10, we can see that the p-values of the proposed method versus the compared methods in terms of the three indicators are all smaller than 0.05, indicating that there is a statistically significant difference between the performance of the compared algorithms and that of the proposed method. To be specific, GEFS is significantly better than the regression tree in terms of RMSE. In terms of MAE, GEFS, the gradient boosting regression tree and the random forest are significantly better than AdaBoost regression tree. In terms of R2, GEFS is significantly better than the random subspace method, Ad-aBoost regression tree and regression tree, while the gradient boosting regression tree and random forest are significantly better than the regression tree. From Table 9, it is also worth noticing that the real ensemble size of the random subspace method, GEFS, random forest regressor, gradient boosting regression tree and AdaBoost regression tree are 100 while that of SRRT-SEM regressor has a mean value of 21 thanks to the strategy of the selective ensemble. Although the process of model selection brings SRRT-SEM a higher time complexity in training, SRRT-SEM takes less time in real prediction than the compared ensemble methods due to its low complexity.
4.4.4. Comparison of the two-stage model and one-stage regression model
Cancer survival time prediction can also be realized by one-stage regression model in which survival time is predicted directly whether or not a case is survival (survived more than 5 years). A comparison of two-stage model and one-stage regression model is conducted on the seven regression methods above in terms of RMSE, MAE and R2, and the comparison results are summarized in Table 11. We compute difference by indicator values of one-stage regression model minus that of two-stage model and every indicator value is obtained by averaging the results of 10 folds. All values of RMSE and MAE are positive and all values of R2 are negative, which indicates that the proposed two-stage model has smaller RMSE, MAE and higher R2. The performance increase includes more than 1.6 months of prediction error reduction and over 0.11% of model explanation ability improvement, which verifies that the two-stage model is a good model for cancer survival time prediction.