Differences

This shows you the differences between two versions of the page.

--- en:iot-reloaded:random_forests [2024/12/10 20:48] – pczekalski
+++ en:iot-reloaded:random_forests [2024/12/10 21:39] (current) – pczekalski
@@ Line 1: / Line 1: @@
-====== Random forests ======
+====== Random Forests ======
 Random forests ((https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#intro|Random forests)) are among the best out-of-the-box methods highly valued by developers and data scientists. For a better understanding of the process, an imaginary weather forecast problem might be considered, represented by the following true decision tree (figure {{ref>Weatherforecastexample}}):
@@ Line 18: / Line 18: @@
 Some advantages:
   * RF uses more knowledge than a single decision tree.
-  * Furthermore, the more diverse initial information sources have been used, the more diverse the models will be and the more robust the final estimate.
+  * Furthermore, the more diverse the initial information sources used, the more diverse the models will be and the more robust the final estimate.
   * This is true because a single data source might suffer from data anomalies reflected in model anomalies.
@@ Line 47: / Line 47: @@
 Each tree's strength depends on various factors, including its depth and the features it uses for splitting. However, there is a trade-off between correlation and strength. For example, reducing m (the number of features considered at each split) increases the diversity among the trees, lowering correlation. Still, it may also reduce the strength of each tree, as it may limit its access to highly predictive features.
-Despite this trade-off, Random Forests balance these dynamics by optimising m to minimise the ensemble error. Generally, a moderate reduction in m lowers correlation without significantly compromising the strength of each tree, thus leading to an overall decrease in the forest’s error rate.
+Despite this trade-off, Random Forests balance these dynamics by optimising m to minimise the ensemble error. Generally, a moderate reduction in m lowers correlation without significantly compromising the strength of each tree, thus leading to an overall decrease in the forest's error rate.
 **Implications for the Forest Error Rate:** The forest error rate in a Random Forest model is influenced by the correlation among the trees and the strength of each tree. Specifically:
-  * Increasing correlation among trees typically increases the error rate, as it reduces the ensemble’s ability to correct individual trees' errors.
+  * Increasing correlation among trees typically increases the error rate, as it reduces the ensemble's ability to correct individual trees' errors.
   * Increasing the strength of each tree (i.e., reducing its error rate) generally decreases the forest error rate, as each tree becomes a more reliable classifier.
 Consequently, an ideal Random Forest model balances between individually strong and sufficiently diverse trees, typically achieved by tuning the m parameter.

en/iot-reloaded/random_forests.1733863735.txt.gz · Last modified: 2024/12/10 20:48 by pczekalski