Follow along with this Python ML-guided project. In this beginner Python project, we build a classification model in Sklearn to predict if or not a person has early-stage diabetes in this medical prediction with machine learning.
Learn the basics of setting up data science, go through the exploratory data analysis to understand the data then with the insights you've collected build a machine learning model in Python with Sklearn.
An important part of machine learning projects is understanding distributions in your data. Here we plot many histograms to inspect the distributions of our features.
As we go through the exploratory data analysis we will gather insights related to building our machine-learning model. We will use our data insights to guided how we will complete our preprocessing of the data.
Outliers have a big impact on the predictiveness of our ML models. We will use Pandas .clip function to truncate outliers. This is a good way to deal with outliers when you have a normal like distribution.
Set up your train test split with Sklearn in a way that will allow for you to experiment with different features heading into the modeling section of our ML project.
Standardize and take principle components to allow our model to better use the data. Use StandardScaler to complete the standardization and PCA to get the principle components in Sklearn.
A good practice with Sklearn is to build a model factory function that will do the training of each model and get all the scores in one line of code in Python.
Here we build a model factory function that will be the RandomForest or Bagging Machine Learning models and will do the training and get train and test scores for each Sklearn model.
Comments