Proceedings of International Conference on Applied Innovation in IT  ·  2026/03/31  ·  Vol. 14  ·  Issue 1  ·  pp. 81–87
Random Forest-Based Estimation of Partial Linear Single-Index Model in Longitudinal Data
Hussein Jabbar Bayyoodh and Mohammed Sadeq Aldouri
This paper discusses the estimation of a partial linear single-index model (PLSIM) for longitudinal data. It proposes a hybrid Random Forest-based estimator complemented by statistical regularization to ensure parameter stability and accurate retrieval of the nonlinear component. The proposed approach is based on estimating g(⋅) via Random Forests with a conservative selection of the number of trees with a 1-SE rule and subject-wise K-fold cross-validation, estimating the trend index β, and estimating the coefficients of the linear component θ. On balanced simulation data of sizes N=[50,100,15] and a real function g(u)=sin⁡(u), the hybrid estimator showed high accuracy in retrieving g(⋅) across the sample-supported domain, with a coefficient of determination of R_9^2≈0.96-0.98 and a decreasing mean square error with increasing size, while the overall model performance stabilized at R^2≈0.90-0.92 and MSE≈0.10-0.13. θ biases appeared small across all scenarios, while the β estimate maintained functional stability reflected in a strong visual match between the real and estimated, with systematically reduced marginal deviations compared to the conventional two-stage estimator, which showed greater sensitivity to the bootstrap parameter and index error and higher overall MSE at the larger sample. The results demonstrate that combining RF with statistical methods provides a practical and accurate path for estimating longitudinal PLSIM models, with straightforward applicability and limited parameter tuning. The study suggests potential future improvements in performance by expanding the framework to account for heteroscedasticity and random effects.
Partial Linear Single-Index Model Longitudinal Data Random Forest (RF) Local Polynomial Semiparametric Two-Stage.
References
  1. R. J. Carroll, J. Fan, I. Gijbels, and M. P. Wand, “Generalized partially linear single-index models,” J. Am. Stat. Assoc., vol. 92, no. 438, p. 477, 1997, doi: 10.2307/2965697.
  2. D. Ruppert, M. P. Wand, and R. J. Carroll, Semiparametric Regression, Cambridge University Press, 2003, doi: 10.1017/cbo9780511755453.
  3. Y. Xia and W. Härdle, “Semi-parametric estimation of partially linear single-index models,” J. Multivar. Anal., vol. 97, no. 5, pp. 1162-1184, 2006, doi: 10.1016/j.jmva.2005.11.005.
  4. H. Liang, X. Liu, R. Li, and C. L. Tsai, “Estimation and testing for partially linear single-index models,” Ann. Stat., vol. 38, no. 6, pp. 3811-3836, 2010, doi: 10.1214/10-AOS835.
  5. J. L. Wang, L. Xue, L. Zhu, and Y. S. Chong, “Estimation for a partial-linear single-index model,” Ann. Stat., vol. 38, no. 1, pp. 246-274, 2010, doi: 10.1214/09-AOS712.
  6. J. Chen, D. Li, H. Liang, and S. Wang, “Semiparametric GEE analysis in partially linear single-index models for longitudinal data,” Ann. Stat., vol. 43, no. 4, pp. 1682-1715, 2015, doi: 10.1214/15-AOS1320.
  7. Q. Cai and S. Wang, “Efficient estimation in partially linear single-index models for longitudinal data,” Scand. J. Stat., vol. 46, no. 1, pp. 116-141, 2019, doi: 10.1111/sjos.12340.
  8. H. Liang and N. Wang, “Partially linear single-index measurement error models,” Stat. Sin., vol. 15, no. 1, pp. 99-116, 2005.
  9. J. Chen, J. Gao, and D. Li, “Estimation in partially linear single-index panel data models with fixed effects,” J. Bus. Econ. Stat., vol. 31, no. 3, pp. 315-330, 2013, doi: 10.1080/07350015.2013.775093.
  10. T. Chen and T. Parker, “Semiparametric efficiency for partially linear single-index regression models,” J. Multivar. Anal., vol. 130, pp. 376-386, 2014, doi: 10.1016/j.jmva.2014.06.006.
  11. S. Ma, H. Liang, and C. L. Tsai, “Partially linear single index models for repeated measurements,” J. Multivar. Anal., vol. 130, pp. 354-375, 2014, doi: 10.1016/j.jmva.2014.06.011.
  12. L. Capitaine, R. Genuer, and R. Thiébaut, “Random forests for high-dimensional longitudinal data,” Stat. Methods Med. Res., vol. 30, no. 1, pp. 166-184, 2021, doi: 10.1177/0962280220946080.
  13. E. H. Young and R. D. Shah, “ROSE random forests for robust semiparametric efficient estimation,” 2024, [Online]. Available: http://arxiv.org/abs/2410.03471.
  14. C. Chang, “Research on two-stage estimation of partially linear single-index model with longitudinal data,” Acad. J. Sci. Technol., vol. 5, no. 1, pp. 112-115, 2023.

Proceedings of the International Conference on Applied Innovations in IT by Anhalt University of Applied Sciences is licensed under CC BY-SA 4.0  ·  This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

ICAIIT 2026
International Conference on Applied Innovation in IT
Navigation
Publisher
ISSN2199-8876
Location Anhalt University of Applied Sciences
Phone +49 (0) 3496 67 5611
Address Building 01, Room 425
Bernburger Str. 55
D-06366 Köthen, Germany
Open Access License

All works are licensed under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0), unless otherwise noted.

Published by ICAIIT in cooperation with Anhalt University of Applied Sciences.

© 2026 ICAIIT — International Conference on Applied Innovations in IT. Anhalt University of Applied Sciences, Köthen, Germany.
Visitors: site traffic counter