top of page

Python Machine Learning Guided Project Decision Tree Pre Post Pruning Techniques - Level 6, 50 min

Updated: Aug 21, 2023


Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, decision tree, post pruned, postpruning, post pruning
Post Pruned Decision Tree


In this Python ML project, we will explore an individual DecisionTree and learn to use pre-pruning and post-pruning techniques. Prepruning techniques are generally easier to use and involve setting hyperparameters that limit the growth of our decision trees. Post-pruning techniques are a little harder to work with but very important to understand how to use the when applying them to a random forest of decision trees. Let's practice with pre and post-pruning techniques in Sklearn in this Python Machine Learning Project.




Part 1





Send Data Science Teacher Brandyn a message if you have any questions



dataGroups:







Part 2



Part 3





Part 4



Part 5





Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, decisiontree, port pruned
Post Pruned Decision Tree

We will use cost complexity path pruning techniques in the post pruning of our DecisionTreeClassifier.



Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, confusion matrix,
user Defined Function to create confusion matrix

In this ML classification task, we look at building a user-defined function to plot our predictions. In classification, it becomes very important to understand if your errors are False Positives or False Negatives as your business use case will have very different real-world costs to them.


Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, decision tree, decisiontree, unpruned decision tree
UnPruned Decision Tree

An unpruned decision tree has the propensity to overfit because it just keeps growing around every little nook and cranky in our dataset. As data scientists, we need to learn to control our model because an overfit model won't just give bad predictions it could give crazy predictions.



Controlling if our model is overfitting by using hyperparameters like max_depth will to a lot to prevent overfitting. However, it generalizes that both sides of the decision tree at truncated at the same depth where one side of the tree might benefit from more splits without overfitting.


Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, cost complexity path
Cost Complexity Path Pruning

In post-pruning we grow a tree and let after examing each branch we prune the necessary ones. The post-pruning technique we will use is the cost complexity path technique where we look at the ccp_alpha hyperparameter and using a for loop examine the tree's accuracy as we slowly increase the ccp_alpha hyperparameter to determine where we should set the parameter to let each branch grow it's optimal depth.



Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, pruned decision tree
Compared the UnPruned, PrePruned and PostPruned Decision Trees


We plot the accuracy train and test scores for the unpruned decision tree and the prepruned decision and the postpruned decision tree. We can see that we achieve a much-improved score with the prepuning techniques like setting the max_depth. ccp_alpha is much harder to set and we grow out our tree and study the tree to determine where to set ccp_alpha to allow each branch to grow it's optimal amount without overfitting. We achieve a smaller increase in the accuracy scores of our test set with this technique but optimize our model further.



1,063 views0 comments
bottom of page