Return to Course Home Page


Practice 4-1: Pandas 🐼

⬅️ Previous Session | 🏠 Course Home | ➡️ Next Session |

Part 0. Setup Steps

  • Create a repo on GitHub named eds217-practice-4-1-pandas
  • Clone to create a version-controlled project
  • Create some subfolder infrastructure (nbs, data, figs)
  • Create and save a new quarto in RStudio called jupyter notebook (.ipynb file) named practice_4-1_pandas.ipynb in the nbs folder.
  • Open the notebook in VSCode or jupyter notebook/lab
  • Make sure to associate the notebook with the eds217_2023 environment.

📚 Practice 0.

  • Create a cell that imports your essential data science libraries.(numpy as np, matplotlib.pyplot as plt and pandas as pd)

📚 Practice 1.

Import some radiation data for our practice session

Use panda’s read_csv() function to import the data from github and create a new DataFrame named bsrn

data_url = "https://raw.githubusercontent.com/environmental-data-science" \
            "/eds217_2023/main/data/BSRN_GOB_2019-10.csv"

📚 Practice 2. Using the DataFrame bsrn:

  1. Print a list of your DataFrame’s column names.
  2. How many values are there in the entire DataFrame?
  3. What is the data type of the first column?

📚 Practice 3.

  1. Create a new DataFrame containing the first record for each day and the following columns: the timestamp of the record, incoming shortwave radiation, direct and diffuse radiation, and incoming longwave radiation. (Hint: the BSRN station collects data every minute).

  2. Create a new Series containing the temperature values every hour at the top of the hour.

  3. Convert the DATE column to datetime using the pd.to_datetime() function.

  4. Set the DATE column as the index of the DataFrame using the set_index() method.

📚 Practice 4.

Calculate the mean incoming shortwave, outgoing shortwave, incoming longwave, and outgoing longwave radiation over the entire month.

📚 Practice 5.

  1. Add a column ‘NET_SW’ to bsrn with the net shortwave radiation.
  2. Add a column ‘NET_LW’ to bsrn with the net longwave radiation.
  3. Add a column ‘NET_RAD’ to bsrn with the net total radiation.

    Net radiation is given by the following equation:
  4. R^{}_{N} \, = \, R^{\, \downarrow}_{SW} \, - \, R^{\, \uparrow}_{SW} \, + \, R^{\, \downarrow}_{LW} \, - \, R^{\, \uparrow}_{LW}

    where R^{\, \downarrow}_{SW} and R^{\, \uparrow}_{SW} are incoming and outgoing shortwave radiation, respectively, and R^{\, \downarrow}_{LW} and R^{\, \uparrow}_{LW} are incoming and outgoing longwave radiation, respectively.

  5. Create a new DataFrame with the day of the month and daily mean values of shortwave incoming, shortwave outgoing, longwave incoming, longwave outgoing radiation, and net total radiation. (Hint: use masking!).
  6. # To get you started, here's some example code to create a new dataframe containing each day, and the average incoming shortwave (solar) radiation, avg_SWD (aka SW_in),  and the average outgoing shortwave (solar) radiation, avg_SWU (aka SW_out):
    
    daily_rad = []
    for d in bsrn.index.day.unique():
        avg_SWD = bsrn.SWD_Wm2[bsrn.index.day == d].mean()
        avg_SWU = bsrn.SWU_Wm2[bsrn.index.day == d].mean()
        # Append the current day and these two values to our list of daily radiation values:
        daily_rad.append([d, avg_SWD, avg_SWU])
    
    daily_SW = pd.DataFrame(daily_rad, columns=['day', 'SW_in', 'SW_out'])

📚 Practice 6.

  1. Use a list of lists to construct a DataFrame named df1 containing the data in the table below.
  2. River Length (\text{km}) Drainage area (\text{km}^2)
    Amazon 6400 7,050,000
    Congo 4371 4,014,500
    Yangtze 6418 1,808,500
    Mississippi 3730 3,202,230
    1. Use a dict to construct a DataFrame named df2 containing the data in the table below.
    2. River Length (\text{km}) Drainage area (\text{km}^2)
      Zambezi 2574 1,331,000
      Mekong 4023 811,000
      Murray 2508 1,061,469
      Rhône 813 98,000
      Cubango 1056 530,000
    1. Concatenate df1 and df2 into a new DataFrame with all 9 rivers.
    2. Create a new DataFrame rivers with the discharge, mouth, source, and continent information and add this to your DataFrame from (a) to produce a DataFrame with all of the data in the table below.
    River Length (\text{km}) Drainage area (\text{km}^2) Discharge (\text{m}^2/\text{s}) Mouth Source Continent
    Amazon 6400 7,050,000 209,000 Atlantic Ocean Rio Mantaro South America
    Congo 4371 4,014,500 41,200 Atlantic Ocean Lualaba River Africa
    Yangtze 6418 1,808,500 30,166 East China Sea Jianggendiru Glacier Asia
    Mississippi 3730 3,202,230 16,792 Gulf of Mexico Lake Itasca North America
    Zambezi 2574 1,331,000 3,400 Indian Ocean Miombo Woodlands Africa
    Mekong 4023 811,000 16,000 South China Sea Lasagongma Spring Asia
    Murray 2508 1,061,469 767 Southern Ocean Australian Alps Oceania
    Rhône 813 98,000 1,710 Mediterranean Sea Rhône Glacier Europe
    Cubango 1056 530,000 475 Okavango Delta Bié Plateau Africa

📚 Practice 7.

Use the plt module to create a visualization of your radiation data and/or the rivers data. There are examples of plotting functions from our prior exercises. Also, don’t be afraid to experiment or try using GitHub CoPilot or ChatGPT to generate some intial code.

📚 Practice 8.

  1. Export your rivers DataFrame to a CSV file in your repository’s data folder.

  2. Export your bsrn DataFrame to a CSV file in your repository’s data folder.