top of page

DataSimple Guided Projects

Data science and learning how to code Python or really any coding language is unlike learning a traditional academic skill.  Learning to write Python and build Machine or Deep Learning models is more like learning to paint in that we need to explore what is possible and how to follow your imagination in how we perform data analysis, pre-processing the data for our machine learning model, or architecting our deep learning model. 

​

There are just so many possibilities out there that it's hard to know where to start.  In this section, we have our free to use Python guided projects we practice data analysis with Pandas and Seaborn, machine learning with Sklearn and deep learning with Tensorflow.  

Python Machine Learning Guided Projects

Explore the many Machine Learning models in Python with Sklearn.  Machine Learning is very powerful is the tasks it can handle.  Let's look at regression and classification problems with Sklearn and use models like LinearRegression, ARDRegression, DecisionTrees, RandomForest, GradientBoosting, and NuSVR.

Supervised Learning with Sklearn

This starter project is great for those new to Sklearn and machine learning.  Learn how to set up an ML workflow.  Use pandas and seaborn in Python to perform your data analysis.  Then use Sklearn to do the train test split and make your final test predictions.

Python Simple Intructional ML Random Forest Project

In this Python guided project, you can follow along and build your first Simple Random Forest machine-learning model. In this Python project, we will use RandomForestClassifier from Sklearn. In is a good idea when doing an ML workflow to have a simple base model that your more robust model will try and beat. In this situation, Logistic Regression acts like our base model and Random Forest acts like our robust complex model.

Simle Clustering Iris Flowers

Level 1, 23 minutes

Follow along in this free to use Simple Clustering Project.  In this project we use the classic Iris data set for Kmeans clustering with sklearn.  In order to determine the number of centroids we will have in our k means we will use yellowbrick

 elbow method plotting tool  to determine a good amount of clusters.  The is no certain way to say which is is the best number of clusters but that elbow method is a valuable technique to gain a sense of which is the best number of centroids for kmeans.

Coming Soon

Smart Watch Price Prediction

Level 3, 25 minutes

Is this simple ensemble method project we explore all major types of ensemble method is Sklearn like the Random Forest, Gradient Boosting a Bagging Sklearn ML regression models.  It is good practice to try more than one model and the complex interaction that happens during predictions are hard for our human brains to interpret so it's best to experiment and try different models to find the best machine learning model for your dataset.

dataanalysisgp29.jpg

Classic Car MPG

Level 3, 24 minutes

In this Python Regression project, we will be predicting the MPG of classic cars.  Use ensemble methods like RandomForestRegressor,  and GradientBoostingRegressor in the supervised machine learning project.  This is a great beginner Python project to practice machine learning with ensemble methods.

MLgp11.jpg

Credit Card Approvals 

Level 4, 40 minutes

 In this Python project, we will use Sklearn for this supervised classification problem.  We will focus on error analysis in this classification problem. Understanding precision versus recall and why we would want to focus on one versus the other.  Will we be using the error analysis tool in Yellowbrick to try and improve our model's score.

MLgp17.jpg

 In this Python project, we will use Sklearn for this supervised classification problem.  We will focus on error analysis in this classification problem. Understanding precision versus recall and why we would want to focus on one versus the other.  Will we be using the error analysis tool in Yellowbrick to try and improve our model's score.

dataanalysisgp33.jpg

Polish Car Price Regression

Level 5, 40 minutes

Predict the price of cars in Poland.  This supervised learning problem in Python is a regression problem.  In this project with will focus on linear regression techniques including PassiveAgressiveRegressor and ARDRegression models.  Ever wonder which is the best machine learning regression model?

In this project we choose to test a diverse set of ML models in sklearn. Here we use ARDRegression and KNeighborsRegressor as the base model in a BaggingRegressor ensemble method. We also test the Gradient Boosting Regressor and the Random Forest Regressor in this guided Python project.

 

We use these models in a Bayesian grid search to enhance the optimization process by leveraging prior knowledge and incorporating uncertainty estimation. By using a probabilistic approach, Bayesian grid search explores the parameter space more efficiently and effectively than traditional grid searches. It allows for a more informed decision-making process by providing posterior distributions and credible intervals, enabling a deeper understanding of parameter sensitivities and trade-offs.

shaply values

Follow along with this Python Regression Project.  Here we will deal with a common problem in house price predictions.  Too many features and how to choose which to use.  We will use Shaply values to help us determine the real impact of each feature on the final prediction and then which can be removed as they don't help.

spaceship01.jpg

In the second part of the Python Guided Machine Learning Project, the data scientist picks up where the data analyst left off.  We use the data analyst's sights to guide the data scientist.
This is extremely helpful in a team setting so the data scientist can focus on building the model. And as we see there is a lot to try when building a model.  

bottom of page