- Dec 12, 2022
- 2 min read

Python Machine Learning Guided Project Cell Phone Price Prediction - Level 3, 30 minutes

Updated: Aug 21, 2023

In the Python Project, we will use Pandas and Seaborn to perform our exploratory data analysis in en effort to understand how our features impact our target. A big aspect of this is paying close attention to the distributions of our features use in our supervised machine learning project.

After we've explored the data and extracted valuable insights for our business partners and ideas on how to build our model we will use Sklearn to preprocess the data for our ML model.

Template Workbook

Solutions Workbook

Part 1

Part 2

Part 3

Data Science Teacher Brandyn YouTube Channel

One on one time with Data Science Teacher Brandyn

Dataset on Kaggle

Follow Data Science Teacher Brandyn

On Facebook

On Linkedin

dataGroups:

Showcase your DataArt

Python data analysis group, share your analysis

Machine learning in sklearn group

Join the deep learning with tensorflow group for more updates

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, random forest, bagging, gradient boosting, adaboost — Try many different models in Sklearn and see which is best for your data problem

A good practice in Machine Learning is to try one of every major model type on your ML problem. All models try to predict the same thing but with different maths. No matter how well you understand the math humans just aren't capable of thinking through all the interrelationships among features and how they relate to the target. An easier solution is to try them all. Sklearn makes it rather easy to try RandomForest, Bagging, AdaBoost, and GradientBoosting

As we go through our EDA section we will collect insight specifically about the distributions of our features because that will be very important to allow us to correctly preprocessing our features for our Machine Learning model.

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, heatmap — Put a correlation matrix into Seaborn's heatmap

In our bivariate analysis we will plot the correlation matrix with Pandas .corr() and Seaborn's heatmap() to give use easy understanding of the linear correlations in our features.

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, numpy, log transformation, exponential distribution — Log Transformation on an exponential distribution

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, log transformation — After log transformation, better but no perfect.

Exponential distributions are difficult for ML models and it often is better to take a log transform of your exponential distribution to bring it to a more normal distribution. This is an imperfect technique but will most likely make the average a better representation of the data and make for better predictions.

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, get dummies, one hot encode, 0 1, — use Pandas to one hot encode using get_dummies

Using Pandas get_dummies() to one hot encode our categories and bucketized continuous features.

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn models — Create a function to get the scores from many different sklearn models so you can determine

With Sklearn it's handy to create a little function fits and get the scores for each model.

Python Machine Learning Guided Project Cell Phone Price Prediction - Level 3, 30 minutes

Recent Posts

Subscribe to Our Newsletter