51社区黑料

MENU

Matthew Berkowitz

Title: Improving random (survival) forest predictions and estimates through targeted tuning and bias correction
Date: September 4th, 2025
Time: 10:00am
Location: LIB 2020 & Zoom
Supervised by: Tom Loughin & Rachel Altman

Abstract:

Random forests (RFs) have become a cornerstone for nonparametric estimation and prediction, yet their finite-sample properties鈥攑articularly in survival analysis and quantile estimation鈥攔emain under-studied and often misaligned with practitioners鈥 goals.We present three complementary studies that collectively diagnose and remedy these shortcomings. First, through an extensive simulation study, we compare survival forest methods in terms of their ability to produce accurate survival function estimates and point predictions. We identify six top performers and offer context-specific recommendations, demonstrating how the method and data structure influence accuracy. Second, we introduce a novel tuning procedure for RFs that improves the accuracy of estimated quantiles and produces valid, relatively narrow one-sided or two-sided prediction intervals. Standard approaches for building RFs often result in excessively biased quantile estimates. To reduce this bias, our proposed tuning procedure minimizes 鈥渜uantile coverage
loss鈥 (QCL), which we define as the estimated bias of the marginal quantile coverage probability estimate based on the out-of-bag sample. We adapt QCL tuning to handle censored data and demonstrate its use with random survival forests. QCL tuning led to quantile estimates with substantially more accurate coverage probabilities than those produced by alternative approaches. Third, we explain fundamental sources of bias of RF-based estimated conditional distribution functions (ECDFs). For given covariate values, the ECDF produced by a RF is typically based on observations that are not identically distributed, which can result in considerable bias in both the ECDF itself and quantities estimated from the ECDF. Therefore,
we propose a novel, two-stage, bias-adjustment procedure that aims to reduce the bias of the ECDF and any estimates derived from it, including estimates of means and quantiles.


Using an estimate of the relationship between the RF-based estimate of the ECDF and the covariates, we develop a bias adjustment for any estimated quantile. Our bias correction procedure more effectively reduced conditional bias in median estimates, on average, and often lowered MSE. It also produced among the least biased tail quantiles estimates, as well as prediction intervals that remained valid over a wider region.