This paper discusses the estimation of a partial linear single-index model (PLSIM) for longitudinal data. It proposes a hybrid Random Forest-based estimator complemented by statistical regularization to ensure parameter stability and accurate retrieval of the nonlinear component. The proposed approach is based on estimating g(⋅) via Random Forests with a conservative selection of the number of trees with a 1-SE rule and subject-wise K-fold cross-validation, estimating the trend index β, and estimating the coefficients of the linear component θ. On balanced simulation data of sizes N=[50,100,15] and a real function g(u)=sin(u), the hybrid estimator showed high accuracy in retrieving g(⋅) across the sample-supported domain, with a coefficient of determination of R_9^2≈0.96-0.98 and a decreasing mean square error with increasing size, while the overall model performance stabilized at R^2≈0.90-0.92 and MSE≈0.10-0.13. θ biases appeared small across all scenarios, while the β estimate maintained functional stability reflected in a strong visual match between the real and estimated, with systematically reduced marginal deviations compared to the conventional two-stage estimator, which showed greater sensitivity to the bootstrap parameter and index error and higher overall MSE at the larger sample. The results demonstrate that combining RF with statistical methods provides a practical and accurate path for estimating longitudinal PLSIM models, with straightforward applicability and limited parameter tuning. The study suggests potential future improvements in performance by expanding the framework to account for heteroscedasticity and random effects.
Keywords
Partial Linear Single-Index ModelLongitudinal DataRandom Forest (RF)Local PolynomialSemiparametricTwo-Stage.
References
R. J. Carroll, J. Fan, I. Gijbels, and M. P. Wand, “Generalized partially linear single-index models,” J. Am. Stat. Assoc., vol. 92, no. 438, p. 477, 1997, doi: 10.2307/2965697.
D. Ruppert, M. P. Wand, and R. J. Carroll, Semiparametric Regression, Cambridge University Press, 2003, doi: 10.1017/cbo9780511755453.
Y. Xia and W. Härdle, “Semi-parametric estimation of partially linear single-index models,” J. Multivar. Anal., vol. 97, no. 5, pp. 1162-1184, 2006, doi: 10.1016/j.jmva.2005.11.005.
H. Liang, X. Liu, R. Li, and C. L. Tsai, “Estimation and testing for partially linear single-index models,” Ann. Stat., vol. 38, no. 6, pp. 3811-3836, 2010, doi: 10.1214/10-AOS835.
J. L. Wang, L. Xue, L. Zhu, and Y. S. Chong, “Estimation for a partial-linear single-index model,” Ann. Stat., vol. 38, no. 1, pp. 246-274, 2010, doi: 10.1214/09-AOS712.
J. Chen, D. Li, H. Liang, and S. Wang, “Semiparametric GEE analysis in partially linear single-index models for longitudinal data,” Ann. Stat., vol. 43, no. 4, pp. 1682-1715, 2015, doi: 10.1214/15-AOS1320.
Q. Cai and S. Wang, “Efficient estimation in partially linear single-index models for longitudinal data,” Scand. J. Stat., vol. 46, no. 1, pp. 116-141, 2019, doi: 10.1111/sjos.12340.
H. Liang and N. Wang, “Partially linear single-index measurement error models,” Stat. Sin., vol. 15, no. 1, pp. 99-116, 2005.
J. Chen, J. Gao, and D. Li, “Estimation in partially linear single-index panel data models with fixed effects,” J. Bus. Econ. Stat., vol. 31, no. 3, pp. 315-330, 2013, doi: 10.1080/07350015.2013.775093.
T. Chen and T. Parker, “Semiparametric efficiency for partially linear single-index regression models,” J. Multivar. Anal., vol. 130, pp. 376-386, 2014, doi: 10.1016/j.jmva.2014.06.006.
S. Ma, H. Liang, and C. L. Tsai, “Partially linear single index models for repeated measurements,” J. Multivar. Anal., vol. 130, pp. 354-375, 2014, doi: 10.1016/j.jmva.2014.06.011.
L. Capitaine, R. Genuer, and R. Thiébaut, “Random forests for high-dimensional longitudinal data,” Stat. Methods Med. Res., vol. 30, no. 1, pp. 166-184, 2021, doi: 10.1177/0962280220946080.
E. H. Young and R. D. Shah, “ROSE random forests for robust semiparametric efficient estimation,” 2024, [Online]. Available: http://arxiv.org/abs/2410.03471.
C. Chang, “Research on two-stage estimation of partially linear single-index model with longitudinal data,” Acad. J. Sci. Technol., vol. 5, no. 1, pp. 112-115, 2023.