top of page

Python Data Analysis Guided Project - Analyze Dog Breeds, Level 2, 31 min

Updated: Aug 21, 2023


free,  beginner , instructional, data analysis in Python
Use Seaborn to understand how fur color affects height of dogs in this data analysis project

In this Python data analysis guided project, we will explore dog breeds from this Kaggle data set. We will use our Python data analysis skills in this beginner data analysis project to understand the eye color, fur color, and height of common dog breeds.


To start our Python data analysis project we will start by doing a little processing to enable our analyses. This is needed because of the semi-structured data format that happens when we have a list of different sizes. Like in the character traits features there is a list of different amounts of traits, this is not amenable to data analysis. To easily solve this issue we will use a Pandas' function explode to turn our features into structured data ready for analysis.




After we complete a univariate analysis of each feature we move on to our Python Bivariate Data Analysis. In our bivariate analysis, we will complete an analysis to determine how one column affects another.


We will understand how the fur color of dogs, the dogs' character traits, and how common health issues affect the dogs' height and life span. We will make use of Seaborn's histplot and will use it with the hue argument to change the color of each category in our histogram plot.











Follow Data Science Teacher Brandyn





dataGroups:









free, Instruct, instructional, instructional education,free python learn, seaborn, python, data analysis, analysis, explode, pandas, analyze, statical analysis, distribution
Turn a feature from semi structure to strcuture data with Pandas's explode

A common problem is that in a feature there is a list of different sizes of different categories. To fix this issue we will use Pandas' split function to turn what is a long string into an actual list data type for the next step. After we've turned the long string into a list the feature is ready for Pandas' explode function.



free, Instruct, instructional, instructional education,free python learn, seaborn, python, data analysis, analysis, pandas plot, value_counts, analyze, statical analysis, distribution
Pandas Plot on value_counts

After we've turned a feature into structure data we are able to complete our data analysis and here we look at the most common fur color of dog breeds. We do this using Pandas' plot to create a bar graph.


free, Instruct, instructional, instructional education,free python learn, seaborn, python, data analysis, analysis, count plot, analyze, statical analysis, distribution
logical indexing of value_counts using value_counts

Here while using Pandas' value_counts function we we apply logical indexing to only plot the values that are greater than one to make our plot user-friendly.



free, Instruct, instructional, instructional education,free python learn, seaborn, python, data analysis, analysis, analyze, statical analysis, distribution
Number features as range of values

In our Python Data Analysis Project we notice that the height feature was an object data when we first called Pandas' info function. Which gives us a count of the non-null values and all the data types by column in our DataFrame.


Upon inspecting this column we see that it's represented as a range of height and so we will need to clean this feature to begin to analyze it.


free, Instruct, instructional, instructional education,free python learn, seaborn, python, data analysis, analysis, apply, user defined function, analyze, statical analysis, distribution
Create functions to extract values from string, then apply functions

To extract the values needed from the string in this feature we will create two user-defined functions to extract the max and minimum values.



After we create each function we will use Pandas' apply function to apply the function and we will be able to save this output to a new column.


free, Instruct, instructional, instructional education,free python learn, seaborn, python, data analysis, analysis, pandas histogram, analyze, statical analysis, distribution
Pandas' Plot to plot the distribution of height

After we've extracted the min and max values from the string we use Pandas' plot to plot the distribution of the continuous variable using kind = hist.


free, Instruct, instructional, instructional education,free python learn, seaborn, python, data analysis, analysis, histplot, distribution, hue, analyze, statical analysis, distribution
Use hue argument in Seaborn to change color by category

Lastly, as we changed each feature from a semi-structured to a structured format of data we at the end of our project are able to understand how fur color and common health problems affect the height and longevity of common dog breeds.


1,358 views0 comments
bottom of page