Return to Course Home Page

Practice 4-1: Pandas 🐼

⬅️ Previous Session | 🏠 Course Home | ➡️ Next Session |

Part 0. Setup Steps

Create a repo on GitHub named eds217-practice-4-1-pandas

Clone to create a version-controlled project

Create some subfolder infrastructure (nbs, data, figs)

Create and save a new ~~quarto in RStudio called~~ jupyter notebook (.ipynb file) named practice_4-1_pandas.ipynb in the nbs folder.

Open the notebook in VSCode or jupyter notebook/lab

Make sure to associate the notebook with the eds217_2023 environment.

📚 Practice 0.

Create a cell that imports your essential data science libraries.(numpy as np, matplotlib.pyplot as plt and pandas as pd)

📚 Practice 1.

Import some radiation data for our practice session

Use panda’s read_csv() function to import the data from github and create a new DataFrame named bsrn

data_url = "https://raw.githubusercontent.com/environmental-data-science" \ "/eds217_2023/main/data/BSRN_GOB_2019-10.csv"

📚 Practice 2. Using the DataFrame bsrn:

Print a list of your DataFrame’s column names.

How many values are there in the entire DataFrame?

What is the data type of the first column?

📚 Practice 3.

Create a new DataFrame containing the first record for each day and the following columns: the timestamp of the record, incoming shortwave radiation, direct and diffuse radiation, and incoming longwave radiation. (Hint: the BSRN station collects data every minute).

Create a new Series containing the temperature values every hour at the top of the hour.

Convert the DATE column to datetime using the pd.to_datetime() function.

Set the DATE column as the index of the DataFrame using the set_index() method.

📚 Practice 4.

Calculate the mean incoming shortwave, outgoing shortwave, incoming longwave, and outgoing longwave radiation over the entire month.

📚 Practice 5.

Add a column ‘NET_SW’ to bsrn with the net shortwave radiation.

Add a column ‘NET_LW’ to bsrn with the net longwave radiation.

Add a column ‘NET_RAD’ to bsrn with the net total radiation.
Net radiation is given by the following equation:

R^{}_{N} \, = \, R^{\, \downarrow}_{SW} \, - \, R^{\, \uparrow}_{SW} \, + \, R^{\, \downarrow}_{LW} \, - \, R^{\, \uparrow}_{LW}

where R^{\, \downarrow}_{SW} and R^{\, \uparrow}_{SW} are incoming and outgoing shortwave radiation, respectively, and R^{\, \downarrow}_{LW} and R^{\, \uparrow}_{LW} are incoming and outgoing longwave radiation, respectively.

Create a new DataFrame with the day of the month and daily mean values of shortwave incoming, shortwave outgoing, longwave incoming, longwave outgoing radiation, and net total radiation. (Hint: use masking!).

# To get you started, here's some example code to create a new dataframe containing each day, and the average incoming shortwave (solar) radiation, avg_SWD (aka SW_in), and the average outgoing shortwave (solar) radiation, avg_SWU (aka SW_out): daily_rad = [] for d in bsrn.index.day.unique(): avg_SWD = bsrn.SWD_Wm2[bsrn.index.day == d].mean() avg_SWU = bsrn.SWU_Wm2[bsrn.index.day == d].mean() # Append the current day and these two values to our list of daily radiation values: daily_rad.append([d, avg_SWD, avg_SWU]) daily_SW = pd.DataFrame(daily_rad, columns=['day', 'SW_in', 'SW_out'])

📚 Practice 6.

Use a list of lists to construct a DataFrame named df1 containing the data in the table below.

River Length (\text{km}) Drainage area (\text{km}^2)

Amazon 6400 7,050,000

Congo 4371 4,014,500

Yangtze 6418 1,808,500

Mississippi 3730 3,202,230

Use a dict to construct a DataFrame named df2 containing the data in the table below.

River Length (\text{km}) Drainage area (\text{km}^2)

Zambezi 2574 1,331,000

Mekong 4023 811,000

Murray 2508 1,061,469

Rhône 813 98,000

Cubango 1056 530,000

Concatenate df1 and df2 into a new DataFrame with all 9 rivers.

Create a new DataFrame rivers with the discharge, mouth, source, and continent information and add this to your DataFrame from (a) to produce a DataFrame with all of the data in the table below.

River Length (\text{km}) Drainage area (\text{km}^2) Discharge (\text{m}^2/\text{s}) Mouth Source Continent

Amazon 6400 7,050,000 209,000 Atlantic Ocean Rio Mantaro South America

Congo 4371 4,014,500 41,200 Atlantic Ocean Lualaba River Africa

Yangtze 6418 1,808,500 30,166 East China Sea Jianggendiru Glacier Asia

Mississippi 3730 3,202,230 16,792 Gulf of Mexico Lake Itasca North America

Zambezi 2574 1,331,000 3,400 Indian Ocean Miombo Woodlands Africa

Mekong 4023 811,000 16,000 South China Sea Lasagongma Spring Asia

Murray 2508 1,061,469 767 Southern Ocean Australian Alps Oceania

Rhône 813 98,000 1,710 Mediterranean Sea Rhône Glacier Europe

Cubango 1056 530,000 475 Okavango Delta Bié Plateau Africa

📚 Practice 7.

Use the plt module to create a visualization of your radiation data and/or the rivers data. There are examples of plotting functions from our prior exercises. Also, don’t be afraid to experiment or try using GitHub CoPilot or ChatGPT to generate some intial code.

📚 Practice 8.

Export your rivers DataFrame to a CSV file in your repository’s data folder.

Export your bsrn DataFrame to a CSV file in your repository’s data folder.

Return to Course Home Page

Practice 4-1: Pandas 🐼

Part 0. Setup Steps

📚 Practice 0.

📚 Practice 1.

Import some radiation data for our practice session

📚 Practice 2. Using the DataFrame `bsrn`:

📚 Practice 3.

📚 Practice 4.

📚 Practice 5.

📚 Practice 6.

📚 Practice 7.

📚 Practice 8.

River	Length (\text{km})	Drainage area (\text{km}^2)
Amazon	6400	7,050,000
Congo	4371	4,014,500
Yangtze	6418	1,808,500
Mississippi	3730	3,202,230

River	Length (\text{km})	Drainage area (\text{km}^2)
Zambezi	2574	1,331,000
Mekong	4023	811,000
Murray	2508	1,061,469
Rhône	813	98,000
Cubango	1056	530,000

Return to Course Home Page

Part 0. Setup Steps

📚 Practice 0.

📚 Practice 1.

Import some radiation data for our practice session

📚 Practice 2. Using the DataFrame bsrn:

📚 Practice 3.

📚 Practice 4.

📚 Practice 5.

📚 Practice 6.

📚 Practice 7.

📚 Practice 8.

📚 Practice 2. Using the DataFrame `bsrn`: