Return to Course Home Page
Practice 4-1: Pandas 🐼
⬅️ Previous Session | 🏠 Course Home | ➡️ Next Session |
Part 0. Setup Steps
- Create a repo on GitHub named
eds217-practice-4-1-pandas - Clone to create a version-controlled project
- Create some subfolder infrastructure (nbs, data, figs)
- Create and save a new
quarto in RStudio calledjupyter notebook (.ipynbfile) namedpractice_4-1_pandas.ipynbin thenbsfolder. - Open the notebook in VSCode or jupyter notebook/lab
- Make sure to associate the notebook with the
eds217_2023environment.
📚 Practice 0.
- Create a cell that imports your essential data science libraries.(
numpyasnp,matplotlib.pyplotaspltandpandasaspd)
📚 Practice 1.
Import some radiation data for our practice session
Use panda’s read_csv() function to import the data from github and create a new DataFrame named bsrn
data_url = "https://raw.githubusercontent.com/environmental-data-science" \
"/eds217_2023/main/data/BSRN_GOB_2019-10.csv"📚 Practice 2. Using the DataFrame bsrn:
- Print a list of your DataFrame’s column names.
- How many values are there in the entire DataFrame?
- What is the data type of the first column?
📚 Practice 3.
-
Create a new DataFrame containing the first record for each day and the following columns: the timestamp of the record, incoming shortwave radiation, direct and diffuse radiation, and incoming longwave radiation. (Hint: the BSRN station collects data every minute).
-
Create a new Series containing the temperature values every hour at the top of the hour.
-
Convert the
DATEcolumn todatetimeusing thepd.to_datetime()function. -
Set the
DATEcolumn as the index of the DataFrame using theset_index()method.
📚 Practice 4.
Calculate the mean incoming shortwave, outgoing shortwave, incoming longwave, and outgoing longwave radiation over the entire month.
📚 Practice 5.
-
Add a column
‘NET_SW’tobsrnwith the net shortwave radiation. -
Add a column
‘NET_LW’tobsrnwith the net longwave radiation. -
Add a column
Net radiation is given by the following equation:‘NET_RAD’tobsrnwith the net total radiation. - Create a new DataFrame with the day of the month and daily mean values of shortwave incoming, shortwave outgoing, longwave incoming, longwave outgoing radiation, and net total radiation. (Hint: use masking!).
R^{}_{N} \, = \, R^{\, \downarrow}_{SW} \, - \, R^{\, \uparrow}_{SW} \, + \, R^{\, \downarrow}_{LW} \, - \, R^{\, \uparrow}_{LW}
where R^{\, \downarrow}_{SW} and R^{\, \uparrow}_{SW} are incoming and outgoing shortwave radiation, respectively, and R^{\, \downarrow}_{LW} and R^{\, \uparrow}_{LW} are incoming and outgoing longwave radiation, respectively.
# To get you started, here's some example code to create a new dataframe containing each day, and the average incoming shortwave (solar) radiation, avg_SWD (aka SW_in), and the average outgoing shortwave (solar) radiation, avg_SWU (aka SW_out):
daily_rad = []
for d in bsrn.index.day.unique():
avg_SWD = bsrn.SWD_Wm2[bsrn.index.day == d].mean()
avg_SWU = bsrn.SWU_Wm2[bsrn.index.day == d].mean()
# Append the current day and these two values to our list of daily radiation values:
daily_rad.append([d, avg_SWD, avg_SWU])
daily_SW = pd.DataFrame(daily_rad, columns=['day', 'SW_in', 'SW_out'])📚 Practice 6.
-
Use a
listoflists to construct a DataFrame nameddf1containing the data in the table below. -
Use a
dictto construct a DataFrame nameddf2containing the data in the table below. -
Concatenate
df1anddf2into a new DataFrame with all 9 rivers. -
Create a new DataFrame
riverswith the discharge, mouth, source, and continent information and add this to your DataFrame from (a) to produce a DataFrame with all of the data in the table below.
| River | Length (\text{km}) | Drainage area (\text{km}^2) |
|---|---|---|
| Amazon | 6400 | 7,050,000 |
| Congo | 4371 | 4,014,500 |
| Yangtze | 6418 | 1,808,500 |
| Mississippi | 3730 | 3,202,230 |
| River | Length (\text{km}) | Drainage area (\text{km}^2) |
|---|---|---|
| Zambezi | 2574 | 1,331,000 |
| Mekong | 4023 | 811,000 |
| Murray | 2508 | 1,061,469 |
| Rhône | 813 | 98,000 |
| Cubango | 1056 | 530,000 |
| River | Length (\text{km}) | Drainage area (\text{km}^2) | Discharge (\text{m}^2/\text{s}) | Mouth | Source | Continent |
|---|---|---|---|---|---|---|
| Amazon | 6400 | 7,050,000 | 209,000 | Atlantic Ocean | Rio Mantaro | South America |
| Congo | 4371 | 4,014,500 | 41,200 | Atlantic Ocean | Lualaba River | Africa |
| Yangtze | 6418 | 1,808,500 | 30,166 | East China Sea | Jianggendiru Glacier | Asia |
| Mississippi | 3730 | 3,202,230 | 16,792 | Gulf of Mexico | Lake Itasca | North America |
| Zambezi | 2574 | 1,331,000 | 3,400 | Indian Ocean | Miombo Woodlands | Africa |
| Mekong | 4023 | 811,000 | 16,000 | South China Sea | Lasagongma Spring | Asia |
| Murray | 2508 | 1,061,469 | 767 | Southern Ocean | Australian Alps | Oceania |
| Rhône | 813 | 98,000 | 1,710 | Mediterranean Sea | Rhône Glacier | Europe |
| Cubango | 1056 | 530,000 | 475 | Okavango Delta | Bié Plateau | Africa |
📚 Practice 7.
Use the plt module to create a visualization of your radiation data and/or the rivers data. There are examples of plotting functions from our prior exercises. Also, don’t be afraid to experiment or try using GitHub CoPilot or ChatGPT to generate some intial code.
📚 Practice 8.
Export your
riversDataFrame to a CSV file in your repository’s data folder.Export your
bsrnDataFrame to a CSV file in your repository’s data folder.