top of page

Python Machine Learning Guided Project Cell Phone Price Prediction - Level 3, 30 minutes

Updated: Aug 21, 2023

In the Python Project, we will use Pandas and Seaborn to perform our exploratory data analysis in en effort to understand how our features impact our target. A big aspect of this is paying close attention to the distributions of our features use in our supervised machine learning project.



After we've explored the data and extracted valuable insights for our business partners and ideas on how to build our model we will use Sklearn to preprocess the data for our ML model.





Part 1



Part 2



Part 3







Follow Data Science Teacher Brandyn






dataGroups:






Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, random forest, bagging, gradient boosting, adaboost
Try many different models in Sklearn and see which is best for your data problem

A good practice in Machine Learning is to try one of every major model type on your ML problem. All models try to predict the same thing but with different maths. No matter how well you understand the math humans just aren't capable of thinking through all the interrelationships among features and how they relate to the target. An easier solution is to try them all. Sklearn makes it rather easy to try RandomForest, Bagging, AdaBoost, and GradientBoosting


Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn,randomforest, gradientboosting, ada boost
take notes of the distributions of the data set to determine how to preprocess for you model

As we go through our EDA section we will collect insight specifically about the distributions of our features because that will be very important to allow us to correctly preprocessing our features for our Machine Learning model.



Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, heatmap
Put a correlation matrix into Seaborn's heatmap

In our bivariate analysis we will plot the correlation matrix with Pandas .corr() and Seaborn's heatmap() to give use easy understanding of the linear correlations in our features.


Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, numpy, log transformation, exponential distribution
Log Transformation on an exponential distribution

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, log transformation
After log transformation, better but no perfect.

Exponential distributions are difficult for ML models and it often is better to take a log transform of your exponential distribution to bring it to a more normal distribution. This is an imperfect technique but will most likely make the average a better representation of the data and make for better predictions.


Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze,  get dummies, one hot encode, 0 1,
use Pandas to one hot encode using get_dummies

Using Pandas get_dummies() to one hot encode our categories and bucketized continuous features.


Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn models
Create a function to get the scores from many different sklearn models so you can determine

With Sklearn it's handy to create a little function fits and get the scores for each model.

94 views0 comments
bottom of page