Return to Course Home Page
TryPy 01 - Exploring St. Louis Blood Toxicity Data
Part 0. Setup Steps
- Create a repo on GitHub named
eds217-trypy-01
- Clone to create a version-controlled project
- Create some subfolder infrastructure (nbs, data, figs)
- Create and save a new
quarto in RStudio calledjupyter notebook (.ipynb
file) namedstl-lead-yourinitials.ipynb
in thenbs
folder (for example, mine would bestl-lead-kc.ipynb
). - Make sure to associate the notebook with the
eds217_2023
environment.
Part 1 - Get the data
"""
Create a new variable containing
the link to the .csv file on
the EDS_221 github repository.
"""
= 'https://raw.githubusercontent.com/'\
url 'allisonhorst/EDS_221_programming-essentials/'\
'main/activities/stl_blood_lead.csv'
"""
pandas can read a csv file into a
dataframe directly from a url:
"""
= pd.read_csv(url) stl_lead
Read more about the data here.
Part 2 - Explore the data
In your .ipynb
file:
- Create a code cell that imports the
numpy
andpandas
packages and run the cell to import the packages. - Use the code above to read the url for
stl_blood_lead.csv
into a pandas DataFrame calledstl_lead
- Do some basic exploration of the dataset using the DataFrame
info
anddescribe
commands. - In a new code chunk, from
stl_lead
create a new column calledprop_white
that contains the percent of each census tract identifying as white (variablewhite
in the dataset divided by variabletotalPop
, times 100).
Hint: df['new_col'] = df['col_a'] / df['col_b']
will create a new column new_col
in the dataframe df
that contains the value of col_a / col_b
Part 3 - Create a scatterplot
- Import matplotlib (
import matplotlib.pyplot as plt
) - Create a scatterplot graph of the percentage of children in each census tract with elevated blood lead levels (
pctElevated
) versus the percent of each census tract identifying as white.
Part 4 - Create a histogram
- Create a histogram of only the
pctElevated
column in the data frame - Customize the fill, color, and size aesthetics - test some stuff! Feel free to make it awful.