In this Python Machine Learning lesson, we will focus on understanding our models in the hope of explaining how they make predictions. In simple linear regression, this is easy and the coefficient values are easy to interpret as we get into an advanced ML like with a Random Forest Model understanding exactly how our models decide to vote one way or another becomes very time-consuming and in most cases too time-consuming to justify going through each tree in our random forest. And as we move into deep learning this understanding simply isn't possible and we need a solution to understand how our models are making predictions.
The Shap Library's shap summary plot is an amazing tool to help us understand how our model is making decisions. With a better understanding of which features are important, we can better understand which features simply to leave out which could improve our scores we can also use this understanding to engineer new features creating compositions of features that could support our model in its decision-making process.
If we can better understand our model we can better understand actions we can take to improve its overall predictiveness as a machine learning model.
As with most things in the shap library we start with first usingshap kmeansto make our background datasets. We then put this into the KernelExplainer from which point we will be able to get our shaply values.
Shapley values are a concept from cooperative game theory adapted for machine learning interpretation, providing a fair way to distribute the contribution of each feature to a model's prediction by considering all possible feature combinations.
Shaply values let us understand the impact of different values of a feature and the final impact on the resulting prediction. The doesn't related to how well the variable aided in predictions like with features importances in the sklearn library. Although not eluding to accuracy shaply values give us an understanding of each features contribution, as a team member, to the final prediction.
This helps us better understand the individual impacts of features and potential new ideas for feature engineering. As well as offering us the ability to extract valuable business insights from our machinelearning model.