TIL6010 - final assignment G.P. van Loon 5408393

For this assignment, I'm going to compare covid and mobility data on a provincial level in the Netherlands. As a first step, I'm importing the necessary libraries.

Part I - Data import

First, I'm going to import and combine dataframes of the two types of data I found:

I'm starting off with the covid data.

Then, I want to get rid of the fact that there are now three rows per date per province. I want to have only one row per date per province that has all information.

When I just resample over the dates, the information is forced into one row, but it ends up idential for all provinces. This is not wanted, so I try the following:

This way I can resample without losing data. The resampling also gets rid of the provinces as those are strings, but I readd them as well.

Now, I'm importing the mobility data. The most noteworthy thing I do is changing the column names and reordering the columns.

Here I do something similar as I did with the covid data, but a bit different:

Merging the dataframes then becomes relatively easily:

Part II - Data processing

In the second part, I'm going to find common peaks for two kinds of trips and for two types of covid data. All for the province of Noord-Brabant.

The peaks_dict is necessary to create a list of the peaks to plot. I will also include the dates_dict containing the dates of the peaks and valleys. I do this because I eventually want to be able to compare dates and not numbers.

When I have the plots, I'm going to use code to look for common peaks and valleys as well. Just reading them from the graphs is too simple of course :)

Additional information is also given in the comments in the code.

Now I'm going to compare the mobility and the covid data

The logic is the same as for the previous questions.

As can be seen, there are quite some peaks in hospitalizations that are followed by peaks in trips. It could therefore be stated that the covid numbers actually affected transport behaviour in Noord Brabant.

Although I don't think it'll add much, I will show plots of the two activities below, also to verify the found results As mentioned, they are not in the same plot because of the different units.

Part III - Data visualisation

For this last part, I'm going to tell check my personal experience on the covid narrative around the province of Noord-Brabant, by:

First, I show my variables for this part.

Then, I'm going to make a graph of the covid data of Noord-Brabant, just to explore the data and do some findings.

What is interesting is that the number of cases seems to be very low during the first wave, especially compared with the end of 2020 and the whole of 2021. This is the case, however, because the Netherlands only started up free CPR testing in June 2020. However, because of the lockdown and the nice weather the number of cases was very low by then. So when looking at the first wave, it's better to toggle off the number of cases and look at the number of hospitalized patients and number of casualties.

I moved to The Hague in July 2020, but before that I lived in Eindhoven. During the first wave from March-May 2020, I noticed the following things:

For this story, I want to check whether this feeling I had was rightfully there, or if Noord-Brabant was actually doing much worse than other provinces.

So let's make a bar chart showing the number of cases per province for every month.

By autoscaling the axes, the number of cases can also be seen in months with high numbers

This shows how the narrative around Noord-Brabant was not completely unfair in the beginning

To further illustrate this, let's add a geospatial map of the cumulative number of cases for each province at the latest date of the dataframe.

Also here it can be seen that Noord-Brabant has not necessarily been the only province with a high number of cases.

Having observed this, let's add Noord-Holland and Zuid-Holland to the line graph a couple of blocks back.

From this plot, it can be seen that Noord-Brabant for a short while had the most reported cases and hospitalizations, but the other two provinces quickly took over, especially Zuid-Holland. The narrative around Brabant being the worst province was thus not really fair, or at least only for April 2020.

As mentioned, another thing I experienced during covid is that this narrative around Noord-Brabant made people in Eindhoven adhere to the covid rules quite strictly

Let's find out! We make a graph showing the trips for three trip purposes during the covid period.

This plot shows some interesting findings. You can look at specific data by toggling. Clearly my personal, anecdotical, experience was not representative of the whole province. The province of Noord-Brabant shows to have the relatively highest number of trips for all three trip purposes over the covid period. It would be interesting to look into the differences between specific cities, as those might give off different results. For now, it is interesting to see that (although the differences are not huge) the citizens of Noord-Brabant seem to travel the most of these three provinces during the covid times despite the high number of cases. In the other two provinces, people have decreased their trips more because of covid.

What would also be interesting, however, is to see if people from the three provinces reacted the same to peaks in the number of cases in their respective province. So if peaks in cases/hospitalizations resulted in fewer trips in the week(s) after. For Noord-Brabant, we have already seen this at the end of part II of this assignment. Let's put the results together for all three provinces.

This could be done for all of the trip purposes, but at least for grocery and pharmacy trips:

All with all, it can be stated that:

It should be noted, however, that taking the provincial data does generalise quite different regions:

Therefore, it would be good to look into the city data before coming to hard conclusions.