top of page

Python Machine Learning Guided Project - Spaceship Titanic part 2 ML Predict, Level 8, 49 min

Updated: Aug 21, 2023

In the second part of the Python Guided Machine Learning Project, the data scientist picks up where the data analyst left off. We use the data analyst's sights to guide the data scientist preprocessing strategy for machine learning.

This is extremely helpful in a team setting so the data scientist can focus on building the model. And as we see there is a lot to try when building a model.



Here we go a step further and don’t just select the best model, we use a pairplot in Seaborn to plot the hyperparameters against the mean test score in our grid search from Sklearn to understand what is really impacting the output of your model.


This workflow is also set up with an experimental science approach in that the workflow allows for easy ability to change preprocessing and feature selection. In python use Pandas, Seaborn and Sklearn in this kaggle competition prediction.














Follow Data Science Teacher Brandyn




dataGroups:













Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, gridsearch, pairplot
Seaborn Pairplot of Sklearn GridSearchCV

Use pairplot in Seaborn to plot each hyperparameter against the other hyperparameters in a gridsearch and really understand the effect each hyperparameter has on the final test score.



Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, GridSeachCV, accuracy_score
Create User Defined Pairplot function of GridSeach History

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn,gridsearch, hyperparameters, mean test score
Compare and see how each Hyperparamter actually impacts the final test score

A huge benefit of seeing all the hyperparameters next to each other we can start to understand what is actually impacting the final predictions. The large range of n_estimators all producing similar top scores would make it seem that this is not the most important by itself.


Better to go with the simplest version of what works best.


We can see that deviance appears on the top models most often for the loss hyperparameter. We would then be able to have confidence that it is the loss we should be using in our model's hyperparameters.



Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, kaggle competition, submit kaggle competition predictions
Learn how to Submit your Prediction to the Kaggle Competition





1,693 views0 comments
bottom of page