April 1st, 2021

Today’s Class

  1. Assorted Business
    • Predictions
    • Questions
  2. Quick Review
    • Regression
    • False Discovery Intro
  3. False Discovery Rate, More than you wanted to know
  4. Loss functions
  5. My prediction walkthrough
  6. Homework intro (if time?)

Assorted Business


“How many people in the US will have had at least one dose by end of day on April 30th?”

  • Prediction: 148 million

  • 90% CI: [130,169] million.

  • Based on CDC trend data – not what I gave you

    • but clearly available on the page with target numbers.
    • I pulled it in directions that felt better. Code online/later.
  • You don’t always have the best data

  • But you could probably still do pretty well with the data I gave. CI calibration would be tough.



  • R comments
    • 1-indexing
    • Usually you want to save scripts, not workspaces.
    • Stay organized. Folders for homeworks, etc.
    • Consider using shared drives or github to collaborate
  • Office hours will be Fridays at 9AM

Questions from you?

Quick Review


The basic model is as follows:

\(Perc.OneDose = \beta_0 + \beta_1 Delivered.100k +\) \(\beta_2 Perc.TwoDose + \epsilon\)
Where \(E[\epsilon] = 0\).

We care about \(\beta_1\) or perhaps \(\beta_2\). What are they?


We can compare pvalues, which are measure of extremity, to a pre-set threshold (\(\alpha\)) which controls our false discovery chance.

But with lots of variables, how do we think about things?

  1. No correction? \(p\alpha\) false rejections
  2. Bonferonni? 5% chance of any false rejections.

Both seem aggressive. Want a middle ground.

FDR Redux

Large Scale Testing

Notation Changed

We wish to test \(K\) simultaneous null hypothesis: \[H0_1,H0_2,...,H0_K \] Out of the \(K\) null hypothesis, \(N_0\) are true nulls and \(N_1 = K-N_0\) are false – i.e. there is an effect.