Return to Course Home Page

TryPy 01 - Exploring St. Louis Blood Toxicity Data

Part 0. Setup Steps

Create a repo on GitHub named eds217-trypy-01
Clone to create a version-controlled project
Create some subfolder infrastructure (nbs, data, figs)
Create and save a new ~~quarto in RStudio called~~ jupyter notebook (.ipynb file) named stl-lead-yourinitials.ipynb in the nbs folder (for example, mine would be stl-lead-kc.ipynb).
Make sure to associate the notebook with the eds217_2023 environment.

Part 1 - Get the data


"""
Create a new variable containing 
the link to the .csv file on 
the EDS_221 github repository.
"""
url = 'https://raw.githubusercontent.com/'\
'allisonhorst/EDS_221_programming-essentials/'\
'main/activities/stl_blood_lead.csv'

""" 
pandas can read a csv file into a 
dataframe directly from a url:
"""
stl_lead = pd.read_csv(url)

Part 2 - Explore the data

In your .ipynb file:

Create a code cell that imports the numpy and pandas packages and run the cell to import the packages.
Use the code above to read the url for stl_blood_lead.csv into a pandas DataFrame called stl_lead
Do some basic exploration of the dataset using the DataFrame info and describe commands.
In a new code chunk, from stl_lead create a new column called prop_white that contains the percent of each census tract identifying as white (variable white in the dataset divided by variable totalPop, times 100).

Hint: df['new_col'] = df['col_a'] / df['col_b'] will create a new column new_col in the dataframe df that contains the value of col_a / col_b

Part 3 - Create a scatterplot

Import matplotlib (import matplotlib.pyplot as plt)
Create a scatterplot graph of the percentage of children in each census tract with elevated blood lead levels (pctElevated) versus the percent of each census tract identifying as white.

Part 4 - Create a histogram

Create a histogram of only the pctElevated column in the data frame
Customize the fill, color, and size aesthetics - test some stuff! Feel free to make it awful.