DataSimple Guided Projects
Data science and learning how to code Python or really any coding language is unlike learning a traditional academic skill. Learning to write Python and build Machine or Deep Learning models is more like learning to paint in that we need to explore what is possible and how to follow your imagination in how we perform data analysis, pre-processing the data for our machine learning model, or architecting our deep learning model.
​
There are just so many possibilities out there that it's hard to know where to start. In this section, we have our free to use Python guided projects we practice data analysis with Pandas and Seaborn, machine learning with Sklearn and deep learning with Tensorflow.
Python Machine Learning Guided Projects
Explore the many Machine Learning models in Python with Sklearn. Machine Learning is very powerful is the tasks it can handle. Let's look at regression and classification problems with Sklearn and use models like LinearRegression, ARDRegression, DecisionTrees, RandomForest, GradientBoosting, and NuSVR.
Supervised Learning with Sklearn
Simple Linear Regression House Price Prediction
Level 1, 24 minutes
This starter project is great for those new to Sklearn and machine learning. Learn how to set up an ML workflow. Use pandas and seaborn in Python to perform your data analysis. Then use Sklearn to do the train test split and make your final test predictions.
Simple Random Forest Classification
Level 1, 28 minutes
In this Python guided project, you can follow along and build your first Simple Random Forest machine-learning model. In this Python project, we will use RandomForestClassifier from Sklearn. In is a good idea when doing an ML workflow to have a simple base model that your more robust model will try and beat. In this situation, Logistic Regression acts like our base model and Random Forest acts like our robust complex model.
Level 1, 23 minutes
Follow along in this free to use Simple Clustering Project. In this project we use the classic Iris data set for Kmeans clustering with sklearn. In order to determine the number of centroids we will have in our k means we will use yellowbrick
elbow method plotting tool to determine a good amount of clusters. The is no certain way to say which is is the best number of clusters but that elbow method is a valuable technique to gain a sense of which is the best number of centroids for kmeans.
Coming Soon
Level 3, 25 minutes
Is this simple ensemble method project we explore all major types of ensemble method is Sklearn like the Random Forest, Gradient Boosting a Bagging Sklearn ML regression models. It is good practice to try more than one model and the complex interaction that happens during predictions are hard for our human brains to interpret so it's best to experiment and try different models to find the best machine learning model for your dataset.
Level 3, 24 minutes
In this Python Regression project, we will be predicting the MPG of classic cars. Use ensemble methods like RandomForestRegressor, and GradientBoostingRegressor in the supervised machine learning project. This is a great beginner Python project to practice machine learning with ensemble methods.
Level 4, 40 minutes
In this Python project, we will use Sklearn for this supervised classification problem. We will focus on error analysis in this classification problem. Understanding precision versus recall and why we would want to focus on one versus the other. Will we be using the error analysis tool in Yellowbrick to try and improve our model's score.
Decision Tree, Pre and Post Pruning Techniques
Level 6, 50 minutes
In this Python project, we will use Sklearn for this supervised classification problem. We will focus on error analysis in this classification problem. Understanding precision versus recall and why we would want to focus on one versus the other. Will we be using the error analysis tool in Yellowbrick to try and improve our model's score.
Level 5, 40 minutes
Predict the price of cars in Poland. This supervised learning problem in Python is a regression problem. In this project with will focus on linear regression techniques including PassiveAgressiveRegressor and ARDRegression models. Ever wonder which is the best machine learning regression model?
Power Transformer - Ensemble Method, Partial Correlation Error Analysis
Level 7, 48 minutes
In this project we choose to test a diverse set of ML models in sklearn. Here we use ARDRegression and KNeighborsRegressor as the base model in a BaggingRegressor ensemble method. We also test the Gradient Boosting Regressor and the Random Forest Regressor in this guided Python project.
We use these models in a Bayesian grid search to enhance the optimization process by leveraging prior knowledge and incorporating uncertainty estimation. By using a probabilistic approach, Bayesian grid search explores the parameter space more efficiently and effectively than traditional grid searches. It allows for a more informed decision-making process by providing posterior distributions and credible intervals, enabling a deeper understanding of parameter sensitivities and trade-offs.
Follow along with this Python Regression Project. Here we will deal with a common problem in house price predictions. Too many features and how to choose which to use. We will use Shaply values to help us determine the real impact of each feature on the final prediction and then which can be removed as they don't help.
Spaceship Titanic Part 2 ML Predictions, Competition Submissions
Level 8, 49 minutes
In the second part of the Python Guided Machine Learning Project, the data scientist picks up where the data analyst left off. We use the data analyst's sights to guide the data scientist.
This is extremely helpful in a team setting so the data scientist can focus on building the model. And as we see there is a lot to try when building a model.