top of page

Python Machine Learning Guided Project, Early Diabetes Prediction - Level 2, 15 minutes

Updated: Aug 21, 2023

Follow along with this Python ML-guided project. In this beginner Python project, we build a classification model in Sklearn to predict if or not a person has early-stage diabetes in this medical prediction with machine learning.


Learn the basics of setting up data science, go through the exploratory data analysis to understand the data then with the insights you've collected build a machine learning model in Python with Sklearn.













Follow Data Science Teacher Brandyn





dataGroups:










Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, linear regression, simple linear regression
use sns.histplot in for loop to plot the univariate distrbibutions

An important part of machine learning projects is understanding distributions in your data. Here we plot many histograms to inspect the distributions of our features.

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, ensemble
Notes on the distributions is important to understand what needs to be done in preprocessing.

As we go through the exploratory data analysis we will gather insights related to building our machine-learning model. We will use our data insights to guided how we will complete our preprocessing of the data.



Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, treating outliers for ML, machine learning
treating outliers is an important preprocessing step to get the data ready for our model.

Outliers have a big impact on the predictiveness of our ML models. We will use Pandas .clip function to truncate outliers. This is a good way to deal with outliers when you have a normal distribution.

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, train test split. X, y
Set up your train test split to allow for experimentation of features.

Set up your train test split with Sklearn in a way that will allow for you to experiment with different features heading into the modeling section of our ML project.



Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn,StandardScaler, PCA
Use StandardScaler and PCA in Sklearn to preform model preprocessing

Standardize and take principle components to allow our model to better use the data. Use StandardScaler to complete the standardization and PCA to get the principle components in Sklearn.

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, model_factory, random forest, randomforestclassifier
create model_factory function to allow easy testing of many different models


A good practice with Sklearn is to build a model factory function that will do the training of each model and get all the scores in one line of code in Python.


Here we build a model factory function that will be the RandomForest or Bagging Machine Learning models and will do the training and get train and test scores for each Sklearn model.






Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page