Nalenz, Malte; Rodemann, Julian; Augustin, Thomas

doi:10.1007/s10994-023-06439-1

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Learning de-biased regression trees and forests from complex samples.
Authors: Nalenz, Malte; Rodemann, Julian; Augustin, Thomas
Abstract: Regression trees and forests are widely used due to their flexibility and predictive accuracy. Whereas typical tree induction assumes independently identically distributed (i.i.d.) data, in many applications the training sample follows a complex sampling structure. This includes unequal probability sampling, which is often found in survey data. Then, a 'naive estimation' that simply ignores the sampling weights may be substantially biased. This article analyzes the bias arising from a naive estimation of regression trees or forests under complex sample designs and proposes ways of de-biasing. This is achieved by bridging tree learning to survey statistics, due to the correspondence of the mean-squared-error criterion in regression trees and variance estimation. Transferring population variance estimation approaches from survey statistics to tree induction, indeed considerably reduces the bias in the resulting trees, both in predictions and the tree structure. The latter is particularly crucial if the trees are to be interpreted. Our methodology is extended to random forests, where we show on simulated data and a housing dataset that correcting for complex sample designs leads to overall much better predictive accuracy and more trustworthy interpretation. Interestingly, corrected forests can surpass forests learned on i.i.d. samples in terms of accuracy, which also has important implications for adaptive data collection approaches.
Subjects: RANDOM forest algorithms; POPULATION transfers; REGRESSION trees; SUPERVISED learning; ACQUISITION of data
Publication: Machine Learning, 2024, Vol 113, Issue 6, p3379
ISSN: 0885-6125
Publication type: Article
DOI: 10.1007/s10994-023-06439-1

We found a match

Learning de-biased regression trees and forests from complex samples.

Nalenz, Malte; Rodemann, Julian; Augustin, Thomas

RANDOM forest algorithms; POPULATION transfers; REGRESSION trees; SUPERVISED learning; ACQUISITION of data

Machine Learning, 2024, Vol 113, Issue 6, p3379

0885-6125

Article

10.1007/s10994-023-06439-1