Return to Course Home Page
Practice 4-1: Pandas 🐼
⬅️ Previous Session | 🏠 Course Home | ➡️ Next Session |
Part 0. Setup Steps
- Create a repo on GitHub named
eds217-practice-4-1-pandas
- Clone to create a version-controlled project
- Create some subfolder infrastructure (nbs, data, figs)
- Create and save a new
quarto in RStudio calledjupyter notebook (.ipynb
file) namedpractice_4-1_pandas.ipynb
in thenbs
folder. - Open the notebook in VSCode or jupyter notebook/lab
- Make sure to associate the notebook with the
eds217_2023
environment.
📚 Practice 0.
- Create a cell that imports your essential data science libraries.(
numpy
asnp
,matplotlib.pyplot
asplt
andpandas
aspd
)
📚 Practice 1.
Import some radiation data for our practice session
Use panda’s read_csv()
function to import the data from github and create a new DataFrame named bsrn
= "https://raw.githubusercontent.com/environmental-data-science" \
data_url "/eds217_2023/main/data/BSRN_GOB_2019-10.csv"
📚 Practice 2. Using the DataFrame bsrn
:
- Print a list of your DataFrame’s column names.
- How many values are there in the entire DataFrame?
- What is the data type of the first column?
📚 Practice 3.
-
Create a new DataFrame containing the first record for each day and the following columns: the timestamp of the record, incoming shortwave radiation, direct and diffuse radiation, and incoming longwave radiation. (Hint: the BSRN station collects data every minute).
-
Create a new Series containing the temperature values every hour at the top of the hour.
-
Convert the
DATE
column todatetime
using thepd.to_datetime()
function. -
Set the
DATE
column as the index of the DataFrame using theset_index()
method.
📚 Practice 4.
Calculate the mean incoming shortwave, outgoing shortwave, incoming longwave, and outgoing longwave radiation over the entire month.
📚 Practice 5.
-
Add a column
‘NET_SW’
tobsrn
with the net shortwave radiation. -
Add a column
‘NET_LW’
tobsrn
with the net longwave radiation. -
Add a column
Net radiation is given by the following equation:‘NET_RAD’
tobsrn
with the net total radiation. - Create a new DataFrame with the day of the month and daily mean values of shortwave incoming, shortwave outgoing, longwave incoming, longwave outgoing radiation, and net total radiation. (Hint: use masking!).
R^{}_{N} \, = \, R^{\, \downarrow}_{SW} \, - \, R^{\, \uparrow}_{SW} \, + \, R^{\, \downarrow}_{LW} \, - \, R^{\, \uparrow}_{LW}
where R^{\, \downarrow}_{SW} and R^{\, \uparrow}_{SW} are incoming and outgoing shortwave radiation, respectively, and R^{\, \downarrow}_{LW} and R^{\, \uparrow}_{LW} are incoming and outgoing longwave radiation, respectively.
# To get you started, here's some example code to create a new dataframe containing each day, and the average incoming shortwave (solar) radiation, avg_SWD (aka SW_in), and the average outgoing shortwave (solar) radiation, avg_SWU (aka SW_out):
= []
daily_rad for d in bsrn.index.day.unique():
= bsrn.SWD_Wm2[bsrn.index.day == d].mean()
avg_SWD = bsrn.SWU_Wm2[bsrn.index.day == d].mean()
avg_SWU # Append the current day and these two values to our list of daily radiation values:
daily_rad.append([d, avg_SWD, avg_SWU])
= pd.DataFrame(daily_rad, columns=['day', 'SW_in', 'SW_out']) daily_SW
📚 Practice 6.
-
Use a
list
oflist
s to construct a DataFrame nameddf1
containing the data in the table below. -
Use a
dict
to construct a DataFrame nameddf2
containing the data in the table below. -
Concatenate
df1
anddf2
into a new DataFrame with all 9 rivers. -
Create a new DataFrame
rivers
with the discharge, mouth, source, and continent information and add this to your DataFrame from (a) to produce a DataFrame with all of the data in the table below.
River | Length (\text{km}) | Drainage area (\text{km}^2) |
---|---|---|
Amazon | 6400 | 7,050,000 |
Congo | 4371 | 4,014,500 |
Yangtze | 6418 | 1,808,500 |
Mississippi | 3730 | 3,202,230 |
River | Length (\text{km}) | Drainage area (\text{km}^2) |
---|---|---|
Zambezi | 2574 | 1,331,000 |
Mekong | 4023 | 811,000 |
Murray | 2508 | 1,061,469 |
Rhône | 813 | 98,000 |
Cubango | 1056 | 530,000 |
River | Length (\text{km}) | Drainage area (\text{km}^2) | Discharge (\text{m}^2/\text{s}) | Mouth | Source | Continent |
---|---|---|---|---|---|---|
Amazon | 6400 | 7,050,000 | 209,000 | Atlantic Ocean | Rio Mantaro | South America |
Congo | 4371 | 4,014,500 | 41,200 | Atlantic Ocean | Lualaba River | Africa |
Yangtze | 6418 | 1,808,500 | 30,166 | East China Sea | Jianggendiru Glacier | Asia |
Mississippi | 3730 | 3,202,230 | 16,792 | Gulf of Mexico | Lake Itasca | North America |
Zambezi | 2574 | 1,331,000 | 3,400 | Indian Ocean | Miombo Woodlands | Africa |
Mekong | 4023 | 811,000 | 16,000 | South China Sea | Lasagongma Spring | Asia |
Murray | 2508 | 1,061,469 | 767 | Southern Ocean | Australian Alps | Oceania |
Rhône | 813 | 98,000 | 1,710 | Mediterranean Sea | Rhône Glacier | Europe |
Cubango | 1056 | 530,000 | 475 | Okavango Delta | Bié Plateau | Africa |
📚 Practice 7.
Use the plt
module to create a visualization of your radiation data and/or the rivers data. There are examples of plotting functions from our prior exercises. Also, don’t be afraid to experiment or try using GitHub CoPilot or ChatGPT to generate some intial code.
📚 Practice 8.
Export your
rivers
DataFrame to a CSV file in your repository’s data folder.Export your
bsrn
DataFrame to a CSV file in your repository’s data folder.