1 Introduction

As discussed in class, the central element of Big Data is making predictions. We want to predict all manner of things, using all manner of sources of data, and protect ourselves against all kinds of mistakes in making our predictions. To the extent that we can do a good job of predicting, we will have accomplished our goals.

Like many other skills, the simplest way to become good at making predictions is practice. To that end, we are going to have a simple prediction competition.

2 Topic

A topic of major concern, for the US at large and potentially for students personally, is the progress of the ongoing campaign to vaccinate the populace against Covid-19. To that end, I would like to know how many US residents will have received at least one dose of any Covid vaccine by the end of the day on April 30. At the risk of being repetitive the target is below:

How many people in the US will have recieved at least one dose of a covid vaccine by the end of April?

3 Questions

Specifically, I want three numbers:

  1. What is your prediction?
  2. What is your 90% confidence interval on your prediction?

4 Details of Question

The answer to this question will be determined by the US CDC’s vaccine tracker, using the total number of people vaccinated with at least one dose. Specifically, I’ll look at the Wayback Machine’s archive of that webpage which is closest to, but after midnight (the morning of) May 1 (US eastern time – i.e. the time zone CDC is in).

5 Data

The CDC provides a rather comprehensive summary of all state vaccination records which may be of use. That data can be found by clicking the button titled “Data Table for Covid-19 Vaccinations in the United States” on the vaccine tracker page. In order to facilitate your predictions, I’ve saved that data table three times in the last month and made that data available to you for: March 3rd, March 18th, and March 28th.

6 Submissions:

To enter the competition proper, you must submit your prediction (consisting of 3 numbers) in two locations before the start of class on April 1. The first location is on Canvas. The second location is a google form here.

The 3 numbers that make up your submission are the following:

  • A single number representing your prediction
  • Two numbers denoting the range that is your 90% confidence region. For the purposes of simplicity, I will not be allowing non-compact sets to be used as confidence regions.

You are permitted to change your submission, up to the start of class.

In the google form, I’ve also put questions about interest in repeats down the line, and space for you to write general feedback, as well as space for you to (optionally) describe your prediction.

7 Scoring:

This is a graded homework assignment. To receive full credit, you must simply submit answers to Canvas. However, as I’m not optimistic about being able to download all your answers from canvas in a usable manner, to actually compete, you need to also complete the google form.

To win the competition, you need merely beat my prediction, which I will publicize at the start of class. [Note: this means you need to have a different prediction than I do.] This means having a point estimate which is closer to the observed outcome than my prediction.

There will also be an honorable mention for whoever has the smallest confidence interval that contains the observed outcome.

8 Awards

Awards are TBD.

9 Future competitions

If there is sufficient interest, we may do several more competitions.