- Dec 22, 2022
- 2 min read

Python Machine Learning Guided Project - Spaceship Titanic part 2 ML Predict, Level 8, 49 min

Updated: Aug 21, 2023

In the second part of the Python Guided Machine Learning Project, the data scientist picks up where the data analyst left off. We use the data analyst's sights to guide the data scientist preprocessing strategy for machine learning.

This is extremely helpful in a team setting so the data scientist can focus on building the model. And as we see there is a lot to try when building a model.

Here we go a step further and don’t just select the best model, we use a pairplot in Seaborn to plot the hyperparameters against the mean test score in our grid search from Sklearn to understand what is really impacting the output of your model.

This workflow is also set up with an experimental science approach in that the workflow allows for easy ability to change preprocessing and feature selection. In python use Pandas, Seaborn and Sklearn in this kaggle competition prediction.

Template Workbook

Solutions Workbook

Data Science Teacher Brandyn YouTube Channel

One on one time with Data Science Teacher Brandyn

Dataset on Kaggle

Follow Data Science Teacher Brandyn

On Facebook

On Linkedin

dataGroups:

Showcase your DataArt on facebook

Showcase your DataArt on linkedin

Python data analysis group, share your analysis on facebook

Python data analysis on linkedin

Machine learning in sklearn group

Join the deep learning with tensorflow facebook group

Join the deep learning with tensorflow on linkedin

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, gridsearch, pairplot — Seaborn Pairplot of Sklearn GridSearchCV

Use pairplot in Seaborn to plot each hyperparameter against the other hyperparameters in a gridsearch and really understand the effect each hyperparameter has on the final test score.

A huge benefit of seeing all the hyperparameters next to each other we can start to understand what is actually impacting the final predictions. The large range of n_estimators all producing similar top scores would make it seem that this is not the most important by itself.

Better to go with the simplest version of what works best.

We can see that deviance appears on the top models most often for the loss hyperparameter. We would then be able to have confidence that it is the loss we should be using in our model's hyperparameters.

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, kaggle competition, submit kaggle competition predictions — Learn how to Submit your Prediction to the Kaggle Competition

Python Machine Learning Guided Project - Spaceship Titanic part 2 ML Predict, Level 8, 49 min

Recent Posts

Subscribe to Our Newsletter