In the past 7 weeks, you have been working with Google mobility data. Now, let's combine that data with covid-19 data to see if we can derive some interesting insights. There are multiple sources of COVID-19 data. Maybe the country that you chose has its separate data source.
Feel free to use either of these data sources or something you found on your own!
import pandas as pd
import numpy as np
from scipy.signal import find_peaks
from scipy.stats import pearsonr
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
This dataframe should combine mobility data and covid-19 data of your chosen country. There are different types of covid data available such as the number of positively tested cases, hospital admission, fatality rates, government stringency index, etc. Provide a brief explanation or data dictionary of your new dataframe. Keep in mind that you need to associate these two datasets, then pick municipal, provincial, or nationwide data accordingly.
The mobility data (NZ_nation.csv) is from New Zealand and retrieved from Google after which it was processed to only contain the national data.
The Covid-19 data (owid-covid-data.csv) from New Zealand is retrieved from OurWorldInData. This data contains all the countries, but will be filtered in the next step to only contain New Zealand
The files: 'Final Assignment.ipynb', 'NZ_nation.csv' and 'owid-covid-data.csv' are all placed in the same directory.
# Read the csv file of the mobility data from New Zealand as a dataframe
df_mobi_nz = pd.read_csv('NZ_nation.csv')
df_mobi_nz = df_mobi_nz.iloc[: , 1:]
# Clean the data by dropping columns of the dataframe that will not be used
df_mobi_nz = df_mobi_nz.drop(columns=['country_region_code', 'country_region', 'sub_region_1',
'sub_region_2','metro_area', 'iso_3166_2_code',
'census_fips_code', 'place_id'])
# Read the data of New Zealand from the csv file of the Covid-19 data as a dataframe
df_covid = pd.read_csv('owid-covid-data.csv')
df_covid_nz = df_covid[df_covid.loc[:, 'location'] == 'New Zealand']
# Clean the data by dropping columns of the dataframe that will not be used
df_covid_nz = df_covid_nz.drop(columns=['iso_code', 'continent', 'location'])
# Join both dataframes on the column 'date' to one dataframe
df_nz = pd.merge(df_mobi_nz, df_covid_nz, how='outer', on='date')
The dataframe 'df_nz' is made from the mobility data (df_mobi_nz) and the Covid-19 data (df_covid_nz) of New Zealand.
The dataframe of the mobility data 'df_mobi_nz' shows the changes in mobility (in percentages) during 2020/2021.
With this data the impact of Covid-19 on the change in mobility in different places can be analysed and visualised.
The dataframe of the Covid-19 data 'df_covid_nz' shows different parameters on Covid-19, for example:
the total cases, weekly ICU admissions and the stringency index.
With the dataframe 'df_nz', correlation between the changes in and the peaks / valleys of the mobility and Covid-19 data can be analysed and visualised.
As you already know, there are various peaks/valleys in the changes of mobility activity data. In this assignment, find peaks/valleys (if available) in the covid data.
After identifying peaks from two datasets, you need to check if there are common peaks. Most likely, the peaks do not intersect on the same day, so it should be possible to provide a certain offset to combine peaks/valleys that are close to each other. A visual representation of this problem is shown in the following image:
Below are the challenges that need to be solved for this part:
Motivate your selection for the data choice for finding the common peaks
This algorithm uses the indices of the data since they correspond to the dates of the data. The algorithm start after the indices of the peaks or valleys are found of both activities (question 3) or both types of Covid-19 data (question 4) with the scipy function 'find_peaks'.
Lists are made for the matched indices of the first activity, matched indices of the second activity and the dates.
An offset is set, where the numeric value corresponds to the number of days.
Only the date column from the 'df_nz' dataframe is selected into another dataframe.
Then three if-statements are executed:
The two activities that are chosen for comparing the common peaks and valleys are 'Retail and recreation' and 'Grocery and pharmacy'.
Data in the columns of these activities shows the change of these activities in percentages compared to baseline.
I chose these activities because they are closely related. To perform these activities outdoors you need to go to stores.
Usually different types of stores are located in a shopping centre or area. Therefore, I expect that when one of the two activities shows an peak, the other one might also shows a peak around the same time.
In order not to copy or write the names of the activities everytime, a list was made with the names of the columns of the activities.
act_list = ['retail_and_recreation_percent_change_from_baseline', 'grocery_and_pharmacy_percent_change_from_baseline',
'parks_percent_change_from_baseline', 'transit_stations_percent_change_from_baseline',
'workplaces_percent_change_from_baseline', 'residential_percent_change_from_baseline'
]
# INITIALISATION
# Use find_peaks to determine the indices of the peaks of the first activity
peaks_act, _ = find_peaks(df_nz[act_list[0]], height=10, distance=5)
# Use find_peaks to determine the indices of the peaks of the second activity
peaks_act_two, _ = find_peaks(df_nz[act_list[1]], height=10, distance=5)
# Use find_peaks to determine the indices of the valleys of the first activity
valleys_act, _ = find_peaks(-(df_nz[act_list[0]]), height=10, distance=5)
# Use find_peaks to determine the indices of the valleys of the second activity
valleys_act_two, _ = find_peaks(-(df_nz[act_list[1]]), height=10, distance=5)
# Create empty lists
match_act = []
match_act_two = []
date_list_act = []
# Retrieve only date column from dataframe 'df'
df_date = df_nz['date']
# Set the offset
offset = 2
# Execute the if-statements
for i in peaks_act:
for j in peaks_act_two:
if i == j:
match_act.append(i)
match_act_two.append(j)
date_list_act.append(df_date.iloc[i])
elif (i - offset) == j:
match_act.append(i)
match_act_two.append(j)
date_list_act.append(df_date.iloc[i-offset])
elif (i + offset) == j:
match_act.append(i)
match_act_two.append(j)
date_list_act.append(df_date.iloc[i+offset])
# Convert the lists to arrays
df_max_act = np.array(df_nz[act_list[0]].iloc[match_act])
df_max_act_two = np.array(df_nz[act_list[1]].iloc[match_act_two])
date_list_act = np.array(date_list_act)
# Make a dataframe from all the arrays
df_max_act_all = pd.DataFrame({'date': date_list_act, act_list[0]: df_max_act, act_list[1]: df_max_act_two})
# Display all common peaks of both activities based on weeks starting on Monday
df_max_act_all
date | retail_and_recreation_percent_change_from_baseline | grocery_and_pharmacy_percent_change_from_baseline | |
---|---|---|---|
0 | 2021-04-01 | 22.0 | 40.0 |
1 | 2021-04-26 | 19.0 | 12.0 |
2 | 2021-05-08 | 22.0 | 15.0 |
3 | 2021-05-15 | 21.0 | 11.0 |
4 | 2021-05-20 | 21.0 | 11.0 |
5 | 2021-06-05 | 23.0 | 15.0 |
6 | 2021-06-10 | 20.0 | 11.0 |
7 | 2021-06-17 | 18.0 | 10.0 |
8 | 2021-07-03 | 22.0 | 12.0 |
9 | 2021-07-10 | 22.0 | 13.0 |
10 | 2021-07-15 | 16.0 | 15.0 |
11 | 2021-07-22 | 20.0 | 15.0 |
12 | 2021-07-31 | 20.0 | 13.0 |
13 | 2021-08-07 | 21.0 | 13.0 |
14 | 2021-08-12 | 22.0 | 14.0 |
# PLOT THE DATAFRAME
# Plot the dataframe with the common peaks between the two activities
fig = px.scatter(df_max_act_all, x='date', y=[act_list[0], act_list[1]],
title = 'Common peaks between two activities',
labels={'date': 'Date', 'value': 'Percentage change', 'variable': 'Type of mobility data'})
# Move the location of the legend to the upper right corner
fig.update_layout(legend=dict(yanchor='top', y=1.25, xanchor='right', x=0.99))
# Show the graph
fig.show()
# Create empty lists
match_act = []
match_act_two = []
date_list_act = []
# Retrieve only date column from dataframe 'df'
df_date = df_nz['date']
# Set the offset
offset = 2
# Execute the if-statements
for i in valleys_act:
for j in valleys_act_two:
if i == j:
match_act.append(i)
match_act_two.append(j)
date_list_act.append(df_date.iloc[i])
elif (i - offset) == j:
match_act.append(i)
match_act_two.append(j)
date_list_act.append(df_date.iloc[i-offset])
elif (i + offset) == j:
match_act.append(i)
match_act_two.append(j)
date_list_act.append(df_date.iloc[i+offset])
# Convert the lists to arrays
df_min_act = np.array(df_nz[act_list[0]].iloc[match_act])
df_min_act_two = np.array(df_nz[act_list[1]].iloc[match_act_two])
date_list_act = np.array(date_list_act)
# Make a dataframe from all the arrays
df_min_act_all = pd.DataFrame({'date': date_list_act, act_list[0]: df_min_act, act_list[1]: df_min_act_two})
# Display all common valleys of both activities based on weeks starting on Monday
df_min_act_all
date | retail_and_recreation_percent_change_from_baseline | grocery_and_pharmacy_percent_change_from_baseline | |
---|---|---|---|
0 | 2020-03-26 | -92.0 | -58.0 |
1 | 2020-04-10 | -95.0 | -91.0 |
2 | 2020-04-25 | -89.0 | -53.0 |
3 | 2020-05-03 | -87.0 | -38.0 |
4 | 2020-05-25 | -29.0 | -16.0 |
5 | 2020-05-31 | -29.0 | -13.0 |
6 | 2020-06-08 | -21.0 | -14.0 |
7 | 2020-06-24 | -16.0 | -10.0 |
8 | 2020-06-29 | -17.0 | -13.0 |
9 | 2020-08-03 | -10.0 | -10.0 |
10 | 2020-08-30 | -25.0 | -11.0 |
11 | 2020-09-07 | -13.0 | -10.0 |
12 | 2020-12-25 | -83.0 | -89.0 |
13 | 2021-01-01 | -32.0 | -26.0 |
14 | 2021-02-15 | -27.0 | -13.0 |
15 | 2021-04-02 | -45.0 | -68.0 |
# PLOT THE DATAFRAME
# Plot the dataframe with the common valleys between the two activities
fig = px.scatter(df_min_act_all, x='date', y=[act_list[0], act_list[1]],
title = 'Common valleys between two activities',
labels={'date': 'Date', 'value': 'Percentage change', 'variable': 'Type of mobility data'})
# Move the location of the legend to the upper right corner
fig.update_layout(legend=dict(yanchor='top', y=1.25, xanchor='right', x=0.99))
# Show the graph
fig.show()
The two types of Covid-19 data that are chosen for comparing the common peaks and valleys are 'New tests' and 'New cases'.
Data in the columns of these types show the numbers of the new tests and the new cases of Covid-19, respectively.
I chose these types of Covid-19 data because I think they could be related.
If there is an sharp increase (peak) in new tests being done, I would expect that there could also be an increase in new cases.
This of course does depend on how fast the results of the tests are provided and registered.
# INITIALISATION
# Set the first type of Covid-19 data
covid_data = 'new_tests'
# Set the second type of Covid-19 data
covid_data_two = 'new_cases'
# Use find_peaks to determine the indices of the peaks of the first type of Covid-19 data
peaks_cov, _ = find_peaks(df_nz[covid_data], height=1000, distance=5)
# Use find_peaks to determine the indices of the peaks of the first type of Covid-19 data
peaks_cov_two, _ = find_peaks(df_nz[covid_data_two], height=10, distance=5)
# Use find_peaks to determine the indices of the valleys of the first type of Covid-19 data
valleys_cov, _ = find_peaks(-(df_nz[covid_data]), height=10, distance=5)
# Use find_peaks to determine the indices of the valleys of the first type of Covid-19 data
valleys_cov_two, _ = find_peaks(-(df_nz[covid_data_two]), height=1, distance=5)
# Create empty lists
match_cov = []
match_cov_two = []
date_list_cov = []
# Retrieve only date column from dataframe 'df'
df_date = df_nz['date']
# Set the offset
offset = 2
# Execute the if-statements
for i in peaks_cov:
for j in peaks_cov_two:
if i == j:
match_cov.append(i)
match_cov_two.append(j)
date_list_cov.append(df_date.iloc[i])
elif (i - offset) == j:
match_cov.append(i)
match_cov_two.append(j)
date_list_cov.append(df_date.iloc[i-offset])
elif (i + offset) == j:
match_cov.append(i)
match_cov_two.append(j)
date_list_cov.append(df_date.iloc[i+offset])
# Convert the lists to arrays
df_max_cov = np.array(df_nz[covid_data].iloc[match_cov])
df_max_cov_two = np.array(df_nz[covid_data_two].iloc[match_cov_two])
date_list_cov = np.array(date_list_cov)
# Make a dataframe from all the arrays
df_max_cov_all = pd.DataFrame({'date': date_list_cov, covid_data: df_max_cov, covid_data_two: df_max_cov_two})
# Display all common peaks of both types of Covid-19 data based on weeks starting on Monday
df_max_cov_all
date | new_tests | new_cases | |
---|---|---|---|
0 | 2020-03-22 | 2192.0 | 50.0 |
1 | 2020-09-30 | 6359.0 | 12.0 |
2 | 2021-07-12 | 15348.0 | 18.0 |
3 | 2021-09-21 | 19194.0 | 24.0 |
# PLOT THE DATAFRAME
# Plot the dataframe with the common peaks between the two types of Covid-19 data
fig = px.scatter(df_max_cov_all, x='date', y=[covid_data, covid_data_two],
title = 'Common peaks between two types of Covid-19 data',
labels={'date': 'Date', 'value': 'Quantity', 'variable': 'Type of Covid-19 data'})
# Move the location of the legend to the upper right corner
fig.update_layout(legend=dict(yanchor='top', y=1.25, xanchor='right', x=0.99))
# Show the graph
fig.show()
# Create empty lists
match_cov = []
match_cov_two = []
date_list_cov = []
# Retrieve only date column from dataframe 'df'
df_date = df_nz['date']
# Set the offset
offset = 2
# Execute the if-statements
for i in valleys_cov:
for j in valleys_cov_two:
if i == j:
match_cov.append(i)
match_cov_two.append(j)
date_list_cov.append(df_date.iloc[i])
elif (i - offset) == j:
match_cov.append(i)
match_cov_two.append(j)
date_list_cov.append(df_date.iloc[i-offset])
elif (i + offset) == j:
match_cov.append(i)
match_cov_two.append(j)
date_list_cov.append(df_date.iloc[i+offset])
# Convert the lists to arrays
df_min_cov = np.array(df_nz[covid_data].iloc[match_cov])
df_min_cov_two = np.array(df_nz[covid_data_two].iloc[match_cov_two])
date_list_cov = np.array(date_list_cov)
# Make a dataframe from all the arrays
df_min_cov_all = pd.DataFrame({'date': date_list_cov, covid_data: df_min_cov, covid_data_two: df_min_cov_two})
# Display all common valleys of both types of Covid-19 data based on weeks starting on Monday
df_min_cov_all
date | new_tests | new_cases |
---|
# PLOT THE DATAFRAME
# Plot the dataframe with the common valleys between the two types of Covid-19 data
fig = px.scatter(df_min_cov_all, x='date', y=[covid_data, covid_data_two],
title = 'Common valleys between two types of Covid-19 data',
labels={'date': 'Date', 'value': 'Quantity', 'variable': 'Type of Covid-19 data'})
# Move the location of the legend to the upper right corner
fig.update_layout(legend=dict(yanchor='top', y=1.25, xanchor='right', x=0.99))
# Show the graph
fig.show()
The activity chosen for comparing the common peaks / valleys is 'Residential' and the type of Covid-19 data is 'New cases'.
Data in the columns of the activity and type of Covid-19 data show the change in residential percentage compared to baseline and the new cases of Covid-19, respectively.
I chose the activity and the type of Covid-19 data because I thought there might be a correlation, since a peak in new cases could result in restrictions, which itself could result in an increase in change in residential percentages compared to baseline.
I used a different method (a built-in method) for the time offset in this question than that of question 3 and 4, since the assignment information said that the offset algorithm only applied to those two questions.
The built-in method uses the function pandas.Grouper to group dates (and the values corresponding to it) bases on week
starting at Monday with the use of an offset alias (W-MON).
After the indices of the peaks or valleys are found of both the activity and the type of Covid-19 data with the scipy function 'find_peaks', the values of the activity and type of Covid-19 data and the dates are retrieved from the 'df' dataframe with the use of the indices.
For each activity and type of Covid-19 data the values and the date column are stored in two separate dataframes.
Then, for each of the two newly made dataframes the date column is converted to datetime format.
The data from each of these dataframes are grouped each week starting at Monday.
The largest value of the activity / type of Covid-19 data is chosen when it concerns peaks, otherwise in the case of valleys the smallest value is chosen.
All the values are stored in a new dataframe.
Hereafter, the NaN values of the newly created dataframe are dropped.
The two dataframes with the grouped values are then merged into one dataframe on the date column resulting in a dataframe
with the common peaks / valleys.
# INITIALISATION
# Set the type of Covid-19 data
covid_data = 'new_cases'
# Use find_peaks to determine the indices of the peaks of the activity
peaks_actcov, _ = find_peaks(df_nz[act_list[5]], height=20, distance=5)
# Use find_peaks to determine the indices of the peaks of the type of Covid-19 data
peaks_actcov_two, _ = find_peaks(df_nz[covid_data], height=10, distance=5)
# Use find_peaks to determine the indices of the valleys of the activity
valleys_actcov, _ = find_peaks(-(df_nz[act_list[5]]), height=1, distance=5)
# Use find_peaks to determine the indices of the valleys of the type of Covid-19 data
valleys_actcov_two, _ = find_peaks(-(df_nz[covid_data]), height=1, distance=5)
# Assign the data related to the peaks of the activity to a dataframe
df_max_actcov = df_nz[[act_list[5], 'date']].iloc[peaks_actcov]
# Assign the data related to the peaks of the type of Covid-19 data to a dataframe
df_max_actcov_two = df_nz[[covid_data, 'date']].iloc[peaks_actcov_two]
# Group the dates of the activity on weeks starting on Monday and drop all NaN values
df_max_actcov['date'] = pd.to_datetime(df_max_actcov['date'])
df_group_actcov = df_max_actcov.groupby(pd.Grouper(freq='W-MON', key='date'))[act_list[5]].max().to_frame(act_list[5]).reset_index()
df_group_actcov = df_group_actcov.dropna(thresh=2)
# Group the dates of the type of Covid-19 data on weeks starting on Monday and drop all NaN values
df_max_actcov_two['date'] = pd.to_datetime(df_max_actcov_two['date'])
df_group_actcov_two = df_max_actcov_two.groupby(pd.Grouper(freq='W-MON', key='date'))[covid_data].max().to_frame(covid_data).reset_index()
df_group_actcov_two = df_group_actcov_two.dropna(thresh=2)
# Merge both groups of dates with Covid-19 data to find the common peaks
df_merge_max_actcov = pd.merge(df_group_actcov, df_group_actcov_two, on='date')
# Display all common peaks of both types of Covid-19 data based on weeks starting on Monday
df_merge_max_actcov
date | residential_percent_change_from_baseline | new_cases | |
---|---|---|---|
0 | 2020-03-30 | 38.0 | 85.0 |
1 | 2020-04-06 | 37.0 | 89.0 |
2 | 2020-04-13 | 42.0 | 44.0 |
3 | 2020-04-20 | 37.0 | 20.0 |
4 | 2021-08-23 | 38.0 | 32.0 |
5 | 2021-08-30 | 38.0 | 84.0 |
# PLOT THE DATAFRAME
# Plot the dataframe with the common peaks between the activity and the type of Covid-19 data
fig = px.scatter(df_merge_max_actcov, x='date', y=[act_list[5], covid_data],
title = 'Common peaks between an activity and a type of Covid-19 data',
labels={'date': 'Date', 'value': 'Percentage change / number of cases',
'variable': 'Type of mobility data / type of Covid-19 data'})
# Move the location of the legend to the upper right corner
fig.update_layout(legend=dict(yanchor='top', y=1.25, xanchor='right', x=1.05))
# Show the graph
fig.show()
Analysis:
From the plot above it can be seen that most of the common peaks are concentrated at two periods: one period is at the end of March and throughout April 2020 and the other period is at the end of August 2021.
According to Wikipedia, on 21 March 2020 and on 17 August 2021 New Zealand moved to alert level 4, which corresponds to a lockdown. This explains the increase in residential percentages compared to baseline around the periods.
Although the first lockdown was initiated due to a increase in new cases, which could imply an indirect correlation between new cases and the change in residential percentage: increase in new cases → lockdown → increase in change of residential percentage.
However, this was not the case with the second lockdown, since that lockdown was issued due to a specific case of the Delta-variant (according to Wikipedia). A possible explanation for this could be that after the discovery of the case of the Delta-variant, more people were tested, which resulted in an increase in new cases. Then, there is no correlation -direct or indirect- between new cases and the changes in residential percentages compared to baseline.
Wikipedia source:
https://en.wikipedia.org/wiki/COVID-19_alert_levels_in_New_Zealand
# Calculate the covariance matrix
covariance = np.cov(df_merge_max_actcov[act_list[5]], df_merge_max_actcov[covid_data])
print('The covariance matrix is:')
print()
print(covariance)
print()
# Calculate the Pearson's correlation coefficient
pearson, _ = pearsonr(df_merge_max_actcov[act_list[5]], df_merge_max_actcov[covid_data])
print('The Pearson\'s correlation coefficient is: %.3f' % pearson)
The covariance matrix is: [[ 3.46666667 -10.2 ] [-10.2 935.2 ]] The Pearson's correlation coefficient is: -0.179
Analysis:
The covariance matrix and the Pearson's correlation coefficient are used to measure the linear correlation between two variables.
From the results of the statistical analysis above, it can be concluded that there is no positive direct correlation between the peaks of the new cases and the residential percentage compared to baseline. There is even a small negative direct correlation.
As said before, there might however be a indirect relationship.
That indirect relationship is only causal in the sense that an increase in new cases → lockdown → increase in change of residential percentage.
It does not influence the values itself. For example, if there is an increase in new cases with 50 new cases that does not mean that there is an increase
with 50 % in change of residential percentage compared to baseline.
# Assign the data related to the valleys of the activity to a dataframe
df_min_actcov = df_nz[[act_list[5], 'date']].iloc[valleys_actcov]
# Assign the data related to the valleys of the type of Covid-19 data to a dataframe
df_min_actcov_two = df_nz[[covid_data, 'date']].iloc[valleys_actcov_two]
# Group the dates of the activity on weeks starting on Monday and drop all NaN values
df_min_actcov['date'] = pd.to_datetime(df_min_actcov['date'])
df_group_actcov = df_min_actcov.groupby(pd.Grouper(freq='W-MON', key='date'))[act_list[5]].min().to_frame(act_list[5]).reset_index()
df_group_actcov = df_group_actcov.dropna(thresh=2)
# Group the dates of the type of Covid-19 data on weeks starting on Monday and drop all NaN values
df_min_actcov_two['date'] = pd.to_datetime(df_min_actcov_two['date'])
df_group_actcov_two = df_min_actcov_two.groupby(pd.Grouper(freq='W-MON', key='date'))[covid_data].min().to_frame(covid_data).reset_index()
df_group_actcov_two = df_group_actcov_two.dropna(thresh=2)
# Merge both groups of dates with Covid-19 data to find the common valleys
df_merge_min_actcov = pd.merge(df_group_actcov, df_group_actcov_two, on='date')
# Display all common valleys of both types of Covid-19 data based on weeks starting on Monday
df_merge_min_actcov
date | residential_percent_change_from_baseline | new_cases |
---|
# PLOT THE DATAFRAME
# Plot the dataframe with the common valleys between the activity and the type of Covid-19 data
fig = px.scatter(df_merge_min_actcov, x='date', y=[act_list[5], covid_data],
title = 'Common valleys between an activity and a type of Covid-19 data',
labels={'date': 'Date', 'value': 'Percentage change / number of cases',
'variable': 'Type of mobility data / type of Covid-19 data'})
# Move the location of the legend to the upper right corner
fig.update_layout(legend=dict(yanchor='top', y=1.25, xanchor='right', x=1.05))
# Show the graph
fig.show()
Analysis:
Since no common valleys were found, no further analysis was performed.
I chose to visualise the mobility and (several parameters of) the Covid-19 data of New Zealand and Australia, in order to compare the two countries. With this comparison the difference between mobility trends and the Covid-19 effects and response can be seen and analysed.
Although the data from New Zealand is ready for use, the data of Australia also needs to be prepared and made ready for use.
The data of the mobility reports of Australia ('2020_AU_Region_Mobility_Report.csv', '2021_AU_Region_Mobility_Report.csv') were retrieved from Google and stored in the folder with the assignment and the other data from New Zealand. The Covid-19 of Australia was retrieved from the same file of OurWorldInData as that of New Zealand (owid-covid-data.csv).
# Read the 2020 and 2021 Google mobility data of Australia as dataframes
df_2020 = pd.read_csv('2020_AU_Region_Mobility_Report.csv')
df_2021 = pd.read_csv('2021_AU_Region_Mobility_Report.csv')
# Combine the two dataframes into one
df_aus = pd.concat([df_2020,df_2021]).reset_index(drop=True)
# Select only the country data of Australia
df_mobi_aus = df_aus[df_aus['sub_region_1'].isna() & df_aus['sub_region_2'].isna()]
# Clean the data by dropping columns of the dataframe that will not be used
df_mobi_aus = df_mobi_aus.drop(columns=['country_region_code', 'country_region', 'sub_region_1',
'sub_region_2','metro_area', 'iso_3166_2_code',
'census_fips_code', 'place_id'])
# Select only the Australian Covid-19 data from the OurWorldInData file
df_covid_aus = df_covid[df_covid.loc[:, 'location'] == 'Australia']
# Clean the data by dropping columns of the dataframe that will not be used
df_covid_aus = df_covid_aus.drop(columns=['iso_code', 'continent', 'location'])
In order to compare both dataframes, visually but also statistically, both dataframes need to cover the same time period. Therefore a two if-statements are made to test which dataframe is shorter and then the other dataframe is transformed to a dataframe with the same length. Since the dates are in chronological order, the same length of dataframes also results in a same time period.
if len(df_mobi_nz) < len(df_mobi_aus):
df_mobi_aus = df_mobi_aus.iloc[0:len(df_mobi_nz)]
elif len(df_mobi_nz) > len(df_mobi_aus):
df_mobi_nz = df_mobi_nz.iloc[0:len(df_mobi_aus)]
All types of mobility data of New Zealand
# Transform the dataframe into a structure that can better be read by plotly.express
nz_names = [nz for nz in df_mobi_nz.columns if nz != 'date']
df_nz_melt = df_mobi_nz.melt(id_vars=['date'], var_name='name', value_vars=nz_names)
# Plot the dataframe with the different types of mobility data
fig = px.line(df_nz_melt, 'date', 'value', color='name', symbol='name',
title = 'Mobility data of New Zealand during 2020/2021',
labels={'date': 'Date', 'value': 'Percentage change', 'name': 'Type of mobility data'})
# Move the location of the legend to the lower right corner
fig.update_layout(legend=dict(yanchor='top', y=1.5, xanchor='right', x=0.99))
# Show the graph
fig.show()
Pearson's correlation coefficient - New Zealand
# Make a separate dataframe to exlude the date column in order to perform the computation
df_mobi_nz_pear = df_mobi_nz.iloc[: , 1:]
# Compute the Pearson's correlation coefficient
df_mobi_nz_hm = df_mobi_nz_pear.corr(method='pearson')
# Make a heatmap to display the results
sns.heatmap(df_mobi_nz_hm, annot=True)
plt.title('Pearson\'s correlation coefficient - New Zealand')
Text(0.5, 1.0, "Pearson's correlation coefficient - New Zealand")
Analysis:
From the heatmap above, several strong correlations can be seen.
Between the 'Residential' activity and and the three activities: 'Retail and recreation', 'Public transport', 'Workplaces' there is a strong negative correlation. When the three activities, that are outdoor, decrease, 'Residential' which is indoor, increases.
Between 'Retail and recreation' and the two activities: 'Grocery and pharmacy' and 'Public transport' there is a strong positive correlation. When people go shopping in retail stores, they often also shop for groceries and might use public transport to get to the stores.
Between 'Workplaces' and 'Public transport' there is a strong correlation. When people go to work, they often use the public transport.
All types of mobility data of Australia
# Transform the dataframe into a structure that can better be read by plotly.express
aus_names = [aus for aus in df_mobi_aus.columns if aus != 'date']
df_aus_melt = df_mobi_aus.melt(id_vars=['date'], var_name='name', value_vars=aus_names)
# Plot the dataframe with the different types of mobility data
fig = px.line(df_aus_melt, 'date', 'value', color='name', symbol='name', title = 'Mobility data of Australia during 2020/2021',
labels={'date': 'Date', 'value': 'Percentage change', 'name': 'Type of mobility data'})
# Move the location of the legend to the lower right corner
fig.update_layout(legend=dict(yanchor='top', y=1.5, xanchor='right', x=0.99))
# Show the graph
fig.show()
Pearson's correlation coefficient - Australia
# Make a separate dataframe to exlude the date column in order to perform the computation
df_mobi_aus_srcc = df_mobi_aus.iloc[: , 1:]
# Compute the Pearson's correlation coefficient
df_mobi_aus_hm = df_mobi_aus_srcc.corr(method='pearson')
# Make a heatmap to display the results
sns.heatmap(df_mobi_aus_hm, annot=True)
plt.title('Pearson\'s correlation coefficient - Australia')
Text(0.5, 1.0, "Pearson's correlation coefficient - Australia")
Analysis:
From the heatmap above, several strong correlations can be seen.
Between the 'Residential' activity and the three activities: 'Retail and recreation', 'Public transport', 'Workplaces' there is a strong negative correlation. When the three activities, that are outdoor, decrease, 'Residential' which is indoor, increases.
Between 'Workplaces' and 'Public transport' there is a strong correlation. When people go to work, they often use the public transport.
Same types of mobility data for New Zealand and Australia
# RETAIL AND RECREATION
# Plot the 'Retail and recreation' percentage change of New Zealand and Australia
ax = df_mobi_nz.plot(x = 'date', y = act_list[0], label='New Zealand',
figsize=([15,8]))
df_mobi_aus.plot(x = 'date', y = act_list[0], label='Australia', ax=ax)
# Set the title and labels of x and y-axis
plt.title('Retail and recreation', fontsize=18)
plt.xlabel('Date', fontsize=15)
plt.ylabel('Percentage change from baseline', fontsize=15)
# Set the ticks of the x and y-axis
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
# Set the x-ticks to the left
plt.setp(ax.xaxis.get_majorticklabels(), ha='left' )
# Set a grid
plt.grid(b=True, which='major', axis='y')
# Change the size and location of the legend
plt.legend(loc=2, prop={'size': 12})
<matplotlib.legend.Legend at 0x2c6029db50>
# GROCERY AND PHARMACY
# Plot the 'Grocery and pharmacy' percentage change of New Zealand and Australia
ax = df_mobi_nz.plot(x = 'date', y = act_list[1], label='New Zealand',
figsize=([15,8]))
df_mobi_aus.plot(x = 'date', y = act_list[1], label='Australia', ax=ax)
# Set the title and labels of x and y-axis
plt.title('Grocery and pharmacy', fontsize=18)
plt.xlabel('Date', fontsize=15)
plt.ylabel('Percentage change from baseline', fontsize=15)
# Set the ticks of the x and y-axis
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
# Set the x-ticks to the left
plt.setp(ax.xaxis.get_majorticklabels(), ha='left' )
# Set a grid
plt.grid(b=True, which='major', axis='y')
# Change the size and location of the legend
plt.legend(prop={'size': 12})
<matplotlib.legend.Legend at 0x2c60285b80>
# PARKS
# Plot the 'Parks' percentage change of New Zealand and Australia
ax = df_mobi_nz.plot(x = 'date', y =act_list[2], label='New Zealand', figsize=([15,8]))
df_mobi_aus.plot(x = 'date', y =act_list[2], label='Australia', ax=ax)
# Set the title and labels of x and y-axis
plt.title('Parks', fontsize=18)
plt.xlabel('Date', fontsize=15)
plt.ylabel('Percentage change from baseline', fontsize=15)
# Set the ticks of the x and y-axis
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
# Set the x-ticks to the left
plt.setp(ax.xaxis.get_majorticklabels(), ha='left' )
# Set a grid
plt.grid(b=True, which='major', axis='y')
# Change the size and location of the legend
plt.legend(prop={'size': 12})
<matplotlib.legend.Legend at 0x2c5fa1a940>
# PUBLIC TRANSPORT
# Plot the 'Public transport' percentage change of New Zealand and Australia
ax = df_mobi_nz.plot(x = 'date', y =act_list[3], label='New Zealand', figsize=([15,8]))
df_mobi_aus.plot(x = 'date', y =act_list[3], label='Australia', ax=ax)
# Set the title and labels of x and y-axis
plt.title('Public transport', fontsize=18)
plt.xlabel('Date', fontsize=15)
plt.ylabel('Percentage change from baseline', fontsize=15)
# Set the ticks of the x and y-axis
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
# Set the x-ticks to the left
plt.setp(ax.xaxis.get_majorticklabels(), ha='left' )
# Set a grid
plt.grid(b=True, which='major', axis='y')
# Change the size and location of the legend
plt.legend(prop={'size': 12})
<matplotlib.legend.Legend at 0x2c601ddf70>
# WORKPLACES
# Plot the 'Workplaces' percentage change of New Zealand and Australia
ax = df_mobi_nz.plot(x = 'date', y =act_list[4], label='New Zealand', figsize=([15,8]))
df_mobi_aus.plot(x = 'date', y =act_list[4], label='Australia', ax=ax)
# Set the title and labels of x and y-axis
plt.title('Workplaces', fontsize=18)
plt.xlabel('Date', fontsize=15)
plt.ylabel('Percentage change from baseline', fontsize=15)
# Set the ticks of the x and y-axis
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
# Set the x-ticks to the left
plt.setp(ax.xaxis.get_majorticklabels(), ha='left' )
# Set a grid
plt.grid(b=True, which='major', axis='y')
# Change the size and location of the legend
plt.legend(loc=4, prop={'size': 12})
<matplotlib.legend.Legend at 0x2c63e2a280>
# RESIDENTIAL
# Plot the 'Residential' percentage change of New Zealand and Australia
ax = df_mobi_nz.plot(x = 'date', y =act_list[5], label='New Zealand', figsize=([15,8]))
df_mobi_aus.plot(x = 'date', y =act_list[5], label='Australia', ax=ax)
# Set the title and labels of x and y-axis
plt.title('Residential', fontsize=18)
plt.xlabel('Date', fontsize=15)
plt.ylabel('Percentage change from baseline', fontsize=15)
# Set the ticks of the x and y-axis
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
# Set the x-ticks to the left
plt.setp(ax.xaxis.get_majorticklabels(), ha='left' )
# Set a grid
plt.grid(b=True, which='major', axis='y')
# Change the size and location of the legend
plt.legend(prop={'size': 12})
<matplotlib.legend.Legend at 0x2c63e2a5b0>
Pearson's correlation coefficient - Same types of mobility data of New Zealand and Australia
# Calculate the Pearson's correlation coefficient
pear = []
for i in act_list:
pearson, _ = pearsonr(df_mobi_nz[i], df_mobi_aus[i])
pear.append(pearson)
# Convert the list with activities and the list with the Spearman's coefficients to a dataframe
df_stat = pd.DataFrame({'Activity':act_list, 'Pearson\'s correlation coefficient':pear})
# Display the dataframe
df_stat
Activity | Pearson's correlation coefficient | |
---|---|---|
0 | retail_and_recreation_percent_change_from_base... | 0.764589 |
1 | grocery_and_pharmacy_percent_change_from_baseline | 0.758653 |
2 | parks_percent_change_from_baseline | 0.488620 |
3 | transit_stations_percent_change_from_baseline | 0.721324 |
4 | workplaces_percent_change_from_baseline | 0.677701 |
5 | residential_percent_change_from_baseline | 0.752879 |
Analysis:
The highest values of the Pearson's correlation coefficient can be seen with the activities: 'Retail and recreation', 'Grocery and pharmacy', 'Public transport', and 'Residential'.
This implies that when the percentage of change increased or decreased with these activities in New Zealand, this also largely happened in Australia.
Although an increase or decrease in percentage of change in activity in New Zealand does not cause a increase or decrease in percentage of change in activity in Australia, it does imply that in both countries people act the same way, resulting in the same mobility trends with regards to these activities.
With the other activities this correlation was a lot weaker.
In order to compare both dataframes, visually but also statistically, both dataframes need to cover the same time period. Therefore, two if-statements are made to test which dataframe is shorter and then the other dataframe is transformed to a dataframe with the same length. Since the dates are in chronological order, the same length of dataframes also results in a same time period.
if len(df_covid_nz) < len(df_covid_aus):
df_covid_aus = df_covid_aus.iloc[0:len(df_covid_nz)]
elif len(df_covid_nz) > len(df_covid_aus):
df_covid_nz = df_covid_nz.iloc[0:len(df_covid_aus)]
In order not to copy or write the names of the types of Covid-19 data everytime, a list was made with the names of the columns of the types of Covid-19 data that was to be visualised.
cov_list = ['total_cases_per_million', 'total_deaths_per_million', 'stringency_index']
# NEW ZEALAND
# Select the date column and the first parameter of the cov_list into a new dataframe
df_cov_nz_one = df_covid_nz[['date', cov_list[0]]]
# Change the name of the first parameter of the cov_list
df_cov_nz_one.rename({cov_list[0]: 'New Zealand'},
axis=1, inplace=True)
# Transform the dataframe into a structure that can better be read by plotly.express
nz_names_one = [nz for nz in df_cov_nz_one.columns if nz != 'date']
nz_melt_one = df_cov_nz_one.melt(id_vars=['date'], var_name='name', value_vars=nz_names_one)
# AUSTRALIA
# Select the date column and the first parameter of the cov_list into a new dataframe
df_cov_aus_one = df_covid_aus[['date', cov_list[0]]]
# Change the name of the first parameter of the cov_list
df_cov_aus_one.rename({cov_list[0]: 'Australia'},
axis=1, inplace=True)
# Transform the dataframe into a structure that can better be read by plotly.express
aus_names_one = [aus for aus in df_cov_aus_one.columns if aus != 'date']
aus_melt_one = df_cov_aus_one.melt(id_vars=['date'], var_name='name', value_vars=aus_names_one)
# MERGE THE TWO DATAFRAMES INTO ONE
df_nz_aus_one= pd.concat([nz_melt_one, aus_melt_one])
# PLOT THE DATAFRAME
# Plot the dataframe with the different types of mobility data
fig = px.line(df_nz_aus_one, 'date', 'value', color='name', symbol='name',
title = 'Total cases per million inhabitants',
labels={'date': 'Date', 'value': 'Number of cases per million inhabitants', 'name': 'Country'})
# Move the location of the legend to the lower right corner
fig.update_layout(legend=dict(yanchor='top', y=1.25, xanchor='right', x=0.99))
# Show the graph
fig.show()
C:\Users\sila\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\frame.py:5039: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
# NEW ZEALAND
# Select the date column and the second parameter of the cov_list into a new dataframe
df_cov_nz_two = df_covid_nz[['date', cov_list[1]]]
# Change the name of the second parameter of the cov_list
df_cov_nz_two.rename({cov_list[1]: 'New Zealand'},
axis=1, inplace=True)
# Transform the dataframe into a structure that can better be read by plotly.express
nz_names_two = [nz for nz in df_cov_nz_two.columns if nz != 'date']
nz_melt_two = df_cov_nz_two.melt(id_vars=['date'], var_name='name', value_vars=nz_names_two)
# AUSTRALIA
# Select the date column and the second parameter of the cov_list into a new dataframe
df_cov_aus_two = df_covid_aus[['date', cov_list[1]]]
# Change the name of the second parameter of the cov_list
df_cov_aus_two.rename({cov_list[1]: 'Australia'},
axis=1, inplace=True)
# Transform the dataframe into a structure that can better be read by plotly.express
aus_names_two = [aus for aus in df_cov_aus_two.columns if aus != 'date']
aus_melt_two = df_cov_aus_two.melt(id_vars=['date'], var_name='name', value_vars=aus_names_two)
# MERGE THE TWO DATAFRAMES INTO ONE
df_nz_aus_two= pd.concat([nz_melt_two, aus_melt_two])
# PLOT THE DATAFRAME
# Plot the dataframe with the different types of mobility data
fig = px.line(df_nz_aus_two, 'date', 'value', color='name', symbol='name',
title = 'Total deaths per millions inhabitants',
labels={'date': 'Date', 'value': 'Number of deaths per million inhabitants', 'name': 'Country'})
# Move the location of the legend to the lower right corner
fig.update_layout(legend=dict(yanchor='top', y=1.25, xanchor='right', x=0.99))
# Show the graph
fig.show()
# NEW ZEALAND
# Select the date column and the third parameter of the cov_list into a new dataframe
df_cov_nz_three = df_covid_nz[['date', cov_list[2]]]
# Change the name of the third parameter of the cov_list
df_cov_nz_three.rename({cov_list[2]: 'New Zealand'},
axis=1, inplace=True)
# Transform the dataframe into a structure that can better be read by plotly.express
nz_names_three = [nz for nz in df_cov_nz_three.columns if nz != 'date']
nz_melt_three = df_cov_nz_three.melt(id_vars=['date'], var_name='name', value_vars=nz_names_three)
# AUSTRALIA
# Select the date column and the first parameter of the cov_list into a new dataframe
df_cov_aus_three = df_covid_aus[['date', cov_list[2]]]
# Change the name of the third parameter of the cov_list
df_cov_aus_three.rename({cov_list[2]: 'Australia'},
axis=1, inplace=True)
# Transform the dataframe into a structure that can better be read by plotly.express
aus_names_three = [aus for aus in df_cov_aus_three.columns if aus != 'date']
aus_melt_three = df_cov_aus_three.melt(id_vars=['date'], var_name='name', value_vars=aus_names_three)
# MERGE THE TWO DATAFRAMES INTO ONE
df_nz_aus_three= pd.concat([nz_melt_three, aus_melt_three])
# PLOT THE DATAFRAME
# Plot the dataframe with the different types of mobility data
fig = px.line(df_nz_aus_three, 'date', 'value', color='name', symbol='name',
title = 'Stringency index',
labels={'date': 'Date', 'value': '-', 'name': 'Country'})
# Move the location of the legend to the lower right corner
fig.update_layout(legend=dict(yanchor='top', y=1.25, xanchor='right', x=0.99))
# Show the graph
fig.show()
Analysis:
From the three graphs above the difference between New Zealand and Australia with regards to these Covid-19 parameters are visualised.
It can be seen that in the beginning New Zealand and Australia has a similar number of cases and deaths per million inhabitants.
However after July 2020, the number of cases and deaths per million rose sharply in Australia, prompting more restrictions which resulted in a higher Stringency Index.
From October 2020 till August 2021 the number of cases and deaths per million rose a little for New Zealand and was relatively steady for Australia.
After August 2021 the number of cases and deaths per million rose sharply in Australia, resulting in a higher Stringency Index. Although the number of cases per million in New Zealand also went up, the number of deaths per million did not increase as much. This might explain why the Stringency Index only went up a little.
Comparison of stringency indici of New Zealand and Australia
A comparison was visualised of the stringency indici of New Zealand and Australia.
The total cases per million inhabitants was set as the x values and the total deaths per million inhabitants was set as the y-value.
With this visualisation the correlation (if there is any) can be seen between stricter government policy with regards to Corona restrictions, total cases and total deaths in New Zealand and Australia.
The data of New Zealand and Australia will both be grouped on weeks starting on Monday in order to make the animation of the visualisation run better and make it more clear.
# NEW ZEALAND
# Group the dates of the Covid-19 data of New Zealand on weeks starting on Monday and drop all NaN values
df_covid_nz['date'] = pd.to_datetime(df_covid_nz['date'])
df_cov_nz_zero = df_covid_nz.groupby(pd.Grouper(freq='W-MON', key='date'))[cov_list[0]].min().to_frame(cov_list[0]).reset_index()
df_cov_nz_one = df_covid_nz.groupby(pd.Grouper(freq='W-MON', key='date'))[cov_list[1]].min().to_frame(cov_list[1]).reset_index()
df_cov_nz_two = df_covid_nz.groupby(pd.Grouper(freq='W-MON', key='date'))[cov_list[2]].min().to_frame(cov_list[2]).reset_index()
# Merge the grouped data into one dataframe
df_cov_nz_gro = pd.merge(df_cov_nz_zero, df_cov_nz_one, on='date')
df_cov_nz_gro = pd.merge(df_cov_nz_gro, df_cov_nz_two, on='date')
# Assign a new column with the value of the country 'New Zealand'
df_cov_nz_gro = df_cov_nz_gro.assign(country = 'New Zealand')
# Fill in all NaN values with zero
df_cov_nz_gro[cov_list[1]] = df_cov_nz_gro[cov_list[1]].fillna(0)
# Change the value of one cell with NaN to specific value
# Since it is only one day, I thought I could fill in the value with that of the day before or after (this was the same value)
# The one missing value of Stringency Index occurred only in the dataset of New Zealand
df_cov_nz_gro[cov_list[2]] = df_cov_nz_gro[cov_list[2]].fillna(81.02)
# AUSTRALIA
# Group the dates of the Covid-19 data of Australia on weeks starting on Monday and drop all NaN values
df_covid_aus['date'] = pd.to_datetime(df_covid_aus['date'])
df_cov_aus_zero = df_covid_aus.groupby(pd.Grouper(freq='W-MON', key='date'))[cov_list[0]].min().to_frame(cov_list[0]).reset_index()
df_cov_aus_one = df_covid_aus.groupby(pd.Grouper(freq='W-MON', key='date'))[cov_list[1]].min().to_frame(cov_list[1]).reset_index()
df_cov_aus_two = df_covid_aus.groupby(pd.Grouper(freq='W-MON', key='date'))[cov_list[2]].min().to_frame(cov_list[2]).reset_index()
# Merge the grouped data into one dataframe
df_cov_aus_gro = pd.merge(df_cov_aus_zero, df_cov_aus_one, on='date')
df_cov_aus_gro = pd.merge(df_cov_aus_gro, df_cov_aus_two, on='date')
# Assign a new column with the value of the country 'Australia'
df_cov_aus_gro = df_cov_aus_gro.assign(country = 'Australia')
# Fill in all NaN values with zero
df_cov_aus_gro[cov_list[1]] = df_cov_aus_gro[cov_list[1]].fillna(0)
# MERGE THE TWO DATAFRAMES INTO ONE
# Concatenate both dataframes into one dataframe
df_nz_aus= pd.concat([df_cov_nz_gro, df_cov_aus_gro])
df_nz_aus['date'] = df_nz_aus['date'].astype(str)
# Plot the dataframe with the stringency index, total cases per million inhabitants, total deaths per million inhabitants
# of New Zealand and Australia
fig = px.scatter(df_nz_aus, x=cov_list[0], y=cov_list[1], animation_frame='date',
animation_group='country', size=cov_list[2], color='country', hover_name='country', size_max=50,
range_x=[-200,3600], range_y=[-10,50], title = 'Stringency index of New Zealand and Australia',
labels={cov_list[0]: 'Total cases per million inhabitants',
cov_list[1]:'Total deaths per million inhabitants',
'country': 'Country'})
# Show the graph
fig.show()
Analysis:
The above graph shows an animation of the change in total cases and deaths per millions and the Stringency Index (SI) as the circle as a function of the date.
While the governments usually look at (an increase in) new cases to increase the restrictions, I think that the total cases and deaths per million might also be interesting to visualise and analyse to see if there is a correlation.
In the beginning (week starting at 2020-03-23) both countries have about the same SI.
Then, till the week starting at 2020-05-18 the SI of New Zealand was higher than Australia even though both countries has comparable number of cases and deaths per million.
Hereafter, the SI of New Zealand decreased a lot, while that of Australia first decreased a bit and then increased a lot. The number of cases and deaths per million in Australia also increased strongly.
From the week starting 2020-08-24 till the week starting at 2020-09-21, the SI of New Zealand increased after which it decreased a bit.
From the week starting at 2020-08-10 till the week starting at 2020-09-21, the SI of Australia stayed the same. And even though there was not a sharp increase of number of cases per million, there was however a very sharp increase in number of deaths per million.
Hereafter, during the week starting at 2020-09-21 till the week starting at 2021-08-16, the SI's of New Zealand and Australia increased and decreased bit.
Then, in the week starting at 2021-08-23 the SI of New Zealand increased very sharp and became higher than that of Australia, even though Australia had a larger number of total cases and deaths per million. The increase in SI of New Zealand might have to do with a new case of the Delta variant that occurred the week before on 17 August 2020 (Wikipedia source:
https://en.wikipedia.org/wiki/COVID-19_alert_levels_in_New_Zealand).
From the week starting at 2021-08-23, the SI of New Zealand stayed the same for two weeks, after which (from the week starting at 2021-09-13) it decreased a bit and stayed the same till the week starting at 2021-11-01 (the end of the data). The number of cases per million did increase during this time period, but the number of deaths per million increased only a little.
For Australia, after the week starting at 2021-08-23 till the week starting at 2021-11-01 (the end of the data), the SI stayed the same, although the number of cases per million did increase a lot, faster than that of New Zealand, and the number of deaths per million increased a bit.
10% of the final grade is for code review for the final assignment. Information about partners will be released after the assignment submission deadline.
90% of the final grade is divided among the following categories, which vary for different questions (see image below):
Criteria:
You can obtain maximum points for the question if you have:
You can obtain bonus points if you use: