Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:iot-reloaded:random_forests [2024/12/10 20:48] pczekalskien:iot-reloaded:random_forests [2024/12/10 21:39] (current) pczekalski
Line 1: Line 1:
-====== Random forests ======+====== Random Forests ======
  
 Random forests ((https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#intro|Random forests)) are among the best out-of-the-box methods highly valued by developers and data scientists. For a better understanding of the process, an imaginary weather forecast problem might be considered, represented by the following true decision tree (figure {{ref>Weatherforecastexample}}): Random forests ((https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#intro|Random forests)) are among the best out-of-the-box methods highly valued by developers and data scientists. For a better understanding of the process, an imaginary weather forecast problem might be considered, represented by the following true decision tree (figure {{ref>Weatherforecastexample}}):
Line 18: Line 18:
 Some advantages: Some advantages:
   * RF uses more knowledge than a single decision tree.   * RF uses more knowledge than a single decision tree.
-  * Furthermore, the more diverse initial information sources have been used, the more diverse the models will be and the more robust the final estimate.+  * Furthermore, the more diverse the initial information sources used, the more diverse the models will be and the more robust the final estimate.
   * This is true because a single data source might suffer from data anomalies reflected in model anomalies.   * This is true because a single data source might suffer from data anomalies reflected in model anomalies.
  
Line 47: Line 47:
 Each tree's strength depends on various factors, including its depth and the features it uses for splitting. However, there is a trade-off between correlation and strength. For example, reducing m (the number of features considered at each split) increases the diversity among the trees, lowering correlation. Still, it may also reduce the strength of each tree, as it may limit its access to highly predictive features. Each tree's strength depends on various factors, including its depth and the features it uses for splitting. However, there is a trade-off between correlation and strength. For example, reducing m (the number of features considered at each split) increases the diversity among the trees, lowering correlation. Still, it may also reduce the strength of each tree, as it may limit its access to highly predictive features.
  
-Despite this trade-off, Random Forests balance these dynamics by optimising m to minimise the ensemble error. Generally, a moderate reduction in m lowers correlation without significantly compromising the strength of each tree, thus leading to an overall decrease in the forests error rate.+Despite this trade-off, Random Forests balance these dynamics by optimising m to minimise the ensemble error. Generally, a moderate reduction in m lowers correlation without significantly compromising the strength of each tree, thus leading to an overall decrease in the forest's error rate.
  
  
 **Implications for the Forest Error Rate:** The forest error rate in a Random Forest model is influenced by the correlation among the trees and the strength of each tree. Specifically: **Implications for the Forest Error Rate:** The forest error rate in a Random Forest model is influenced by the correlation among the trees and the strength of each tree. Specifically:
-  * Increasing correlation among trees typically increases the error rate, as it reduces the ensembles ability to correct individual trees' errors.+  * Increasing correlation among trees typically increases the error rate, as it reduces the ensemble's ability to correct individual trees' errors.
   * Increasing the strength of each tree (i.e., reducing its error rate) generally decreases the forest error rate, as each tree becomes a more reliable classifier.   * Increasing the strength of each tree (i.e., reducing its error rate) generally decreases the forest error rate, as each tree becomes a more reliable classifier.
 Consequently, an ideal Random Forest model balances between individually strong and sufficiently diverse trees, typically achieved by tuning the m parameter. Consequently, an ideal Random Forest model balances between individually strong and sufficiently diverse trees, typically achieved by tuning the m parameter.
en/iot-reloaded/random_forests.1733863735.txt.gz · Last modified: 2024/12/10 20:48 by pczekalski
CC Attribution-Share Alike 4.0 International
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0