Data-Mining the Covid Numbers

Story title image

Today I want to go a little into Covid data as I see that there is definitely some confusion.

The main reason I started this project was the statement that lockdown policies were tied to the “R-Value” and scientists in the public media said, the R-Value could only be guesstimated – unbelievable for me. So let’s see what we can mine out.

At first which data do we have available in the first place?

I will be using the data from the John’s Hopkins university as they seem to have more precise data and faster so far. The two main values we have are the total infected and the currently infected persons. Out of those two metrics, we will generate everything.

the image shows chats of total infected and currently infected

Doesn’t look like much – yet – but lets find some easy things to pull out there:

Preparing the Data

picture is showing some cells of excel with random information

First of all we need something to easily work with. For that matter I will use Excel as it is fast and easy. Programming something with python might work as well but for a short project might be too much work.

The data we have is inserted into two columns as shown above. Each row represents one day in the Pandemic. The pandemic started on february 25 in Switzerland.

New Daily Infections

One value that probably comes in handy are the new infections. I will not gather that data from the Johns Hopkins University but gather it out of the Total infections. This saves up copying time in the future.

We just have to subtract the total infections of the day before from the current days Infections.

the image shows new infections per day and 5d median

What we can see is a small bump in infections in the recent days but more on that later.

Total healed and Infection time

The totally healed persons can be taken by calculating the total infections minus the currently infected. For showing, I will put the graph versus the totally infected.

the chart shows the total infected and total healed. Both curves look similar except a horizontal offset

Now that might not show so much yet, except that our Hospitals are by far not full. Additionally we can get the Infection time from it. By moving the total healed to the left we should get the infection time when they overlap. For Switzerland, that is 16 Days, for Germany 15 days. That seems in the range of measurement.

the chart shows the total infected and total healed. this time moved by amount of days so they overlap

The Infection time will come in handy later for the R-Value

Healed per Day

screenshot of the weird findings.

From the total healed we can generate the daily healed. Similar as the new daily infections, we subtract the yesterdays total healed from the todays total healed.

Here I got some question marks about the method of gathering this data by the John Hopkins university. Apparently since may 16 the healed cases seem to be reported in batches of 100 which seems very odd to me:

Unfortunately we have to assume the values are correct but I will have to use the less accurate average instead of median. I find the median more reliable as it removes outliers effectively. But as our data consists out of outliers here, we have to use average.

the chart shows the total infected and total healed. this time moved by amount of days so they overlap

Also we can now overlay daily healed vs daily infected.

the chart shows thedaily healed vs newly infected. healed lags behind infected, obviously

What we can clearly see here is that the recovered patients were consistently more than the new infected ones from ~April 1. That basically means that the hospitals had less and less work from that day on.

The mystic R-Value and outlook

To get the R-Value, one first has to know the definition of it. The Definition is as Follows:

“the expected number of cases directly generated by one case in a population where all individuals are susceptible to infection.”

How do we get there?

We already have the main information.

- How long does an Infection take by average? 16 days

- How many infected persons are there currently?

- How many new Infections are there per day?

The R-Value can therefore be calculated the following way:

(New Infections * Duration of one infection) / current number of infected people

The result looks like that for Switzerland:

the chart shows the r value

Additionally, I created my own metric in order to highlight if the situation worsens or improves. More green than read means its getting better. More red than green it’s getting worse.

The higher / lower the values, the more intense the effect.

the chart shows a metric which I personally found more useful than the r value

Oh oh –the second wave will roll over us!

but hold on. First let’s access those two metrics from Germany as well, with the Guesstimated R-Value of Robert Koch in the Background:

r value germany infection development germany

Germany has a similar curve. The rising R-Value only had a little bump. Also the Infektionsgeschehen is bumping around like crazy in the end.

Why is that so? Let’s look at the Data around 15th of June right before the spike:

We have a total of 292 active cases and ~15 new cases per day in Switzerland. The Number is in fact so low, a single hot spot will already drastically throw around the Values. A single short spike is therefore not enough to predict that the second Wave is coming.

As some might guess, a virus does not come and then fade away for good. For example, recently cases of Swine Flu have been found again.

I doubt that we can erase the virus entirely (which is what we are currently trying to do). Rather it will emerge into something that we humans have to live with. There will be Hotspots in the near Future.

However I do not expect a new 2nd wave similar to the first one anytime soon. What happens next year in January/February, time will show.