April 1st, 2021

## Today’s Class

• Predictions
• Questions
2. Quick Review
• Regression
• False Discovery Intro
3. False Discovery Rate, More than you wanted to know
4. Loss functions
5. My prediction walkthrough
6. Homework intro (if time?)

## Predictions

“How many people in the US will have had at least one dose by end of day on April 30th?”

• Prediction: 148 million

• 90% CI: [130,169] million.

• Based on CDC trend data – not what I gave you

• but clearly available on the page with target numbers.
• I pulled it in directions that felt better. Code online/later.
• You don’t always have the best data

• But you could probably still do pretty well with the data I gave. CI calibration would be tough.

## Other

• 1-indexing
• Usually you want to save scripts, not workspaces.
• Stay organized. Folders for homeworks, etc.
• Consider using shared drives or github to collaborate
• Office hours will be Fridays at 9AM

## Regression

The basic model is as follows:

$$Perc.OneDose = \beta_0 + \beta_1 Delivered.100k +$$ $$\beta_2 Perc.TwoDose + \epsilon$$
Where $$E[\epsilon] = 0$$.

We care about $$\beta_1$$ or perhaps $$\beta_2$$. What are they?

## Testing

We can compare pvalues, which are measure of extremity, to a pre-set threshold ($$\alpha$$) which controls our false discovery chance.

But with lots of variables, how do we think about things?

1. No correction? $$p\alpha$$ false rejections
2. Bonferonni? 5% chance of any false rejections.

Both seem aggressive. Want a middle ground.

## Large Scale Testing

Notation Changed

We wish to test $$K$$ simultaneous null hypothesis: $H0_1,H0_2,...,H0_K$ Out of the $$K$$ null hypothesis, $$N_0$$ are true nulls and $$N_1 = K-N_0$$ are false – i.e. there is an effect.