top of page

Extremely Randomized Trees

Time:

Level:

11 min

Advanced

Model Type:

Ensemble

for those new to Machine Learning for free learning gratis how to make the best extremely random forest ensemble method.  what are the best hyperparameters in sklean using python

About the Model

The RandomForest ensemble method is more well-known and so to explain the Extremely Random Tress ensemble method we wanted to start by contrasting the two machine learning models.


RandomForestRegressor and the ExtraTreesRegressor, are two popular ensemble learning techniques in the realm of regression in machine learning. Ensembles are like combining multiple expert opinions to make a more accurate prediction. In this case, both Random Forest and Extra Trees are ensembles of decision trees, but they have distinct differences:


  1. Bootstrap Aggregation (Bagging):

    • Random Forest: It employs a technique called bootstrapped sampling. It builds multiple decision trees by resampling the training data with replacement. This introduces diversity in the dataset used to train each tree.

    • Extra Trees: Like Random Forest, it also uses bootstrapped sampling. However, it takes this concept further by introducing randomization in both feature selection and the threshold for splitting at each node in each tree.

  2. Feature Selection:

    • Random Forest: It selects features for splitting nodes in each tree based on the best among a random subset of features. This randomization helps in reducing overfitting.

    • Extra Trees: Here, feature selection is even more random. It selects features based on a random threshold for each feature at each node. This high degree of randomness can lead to more diverse trees.

  3. Splitting Strategy:

    • Random Forest: It uses the best split among the selected features based on criteria like Gini impurity or Mean Squared Error (MSE).

    • Extra Trees: It uses random splits, which means that the split points for each feature are chosen randomly, rather than based on optimization criteria.

  4. Predictions:

    • Random Forest: Predictions are made by averaging the predictions of all the trees in the forest for regression tasks.

    • Extra Trees: Similarly, predictions are made by averaging the predictions of all the trees.

  5. Bias-Variance Trade-off:

    • Random Forest: It typically has lower bias and slightly higher variance compared to Extra Trees, thanks to its feature selection process that uses a random subset of features.

    • Extra Trees: It tends to have slightly higher bias but lower variance compared to Random Forests because of its more aggressive randomization.

  6. Computational Efficiency:

    • Random Forest: Generally, it is computationally more expensive compared to Extra Trees due to the exhaustive search for feature splits.

    • Extra Trees: It is often faster to train than Random Forest because of its simplified splitting strategy.

  7. Tuning Parameters:

    • Both Random Forest and Extra Trees have parameters that can be tuned, such as the number of trees in the ensemble and the maximum depth of the trees.


Free Python Code Example for ExtraTrees Ensemble method



A Little Bit more about Extremely Randomized Trees


Extremely Randomized Trees (ExtraTrees) share many similarities with Random Forests, as both are ensemble methods based on decision trees. However, there are key differences:


  1. Randomness Level: The primary distinction is in the level of randomness. ExtraTrees take the randomness concept a step further. In Random Forests, feature subsets are randomly selected for each split, but in ExtraTrees, the splits themselves are chosen randomly. This means that ExtraTrees are more randomized, leading to more diverse and uncorrelated trees.

  2. Bias-Variance Tradeoff: ExtraTrees tend to have a higher bias and lower variance compared to Random Forests. The extreme randomness in feature and split selection makes ExtraTrees less likely to fit the training data perfectly, leading to a more biased model. However, this bias often results in improved generalization to unseen data.


Benefits of Extremely Randomized Trees:


  1. Reduced Overfitting: Due to their extreme randomness, ExtraTrees are less prone to overfitting. They provide a good option when dealing with noisy or limited data.

  2. Faster Training: ExtraTrees can be quicker to train than Random Forests since they require less computation to determine the best splits at each node.

  3. Exploration of Feature Space: ExtraTrees are valuable for exploring the feature space. By introducing more randomness in feature selection, they can uncover potentially important features that might be overlooked by other methods.

Data Science Learning Communities

Real World Application of Extremely Randomized Trees