This homework is going to focus on causal inference.

1 Setup

Download the (cleaned) JTPA data from the website, and load it.

library(tidyverse)
load("../../lectures/l15/jtpa/jtpa.RData")

Drop race and gender characteristics.

jtpa = jtpa %>% select(-male,-black,-hispanic)

Estimate overall ATE

summary(lm(y~offer,data=jtpa))$coefficients
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 15040.504   274.8839 54.715838 0.0000000000
## offer        1159.433   336.2652  3.447973 0.0005668874

2 Questions

2.1 Q1

Calculate the average treatment effect for the married subpopulation (married == 1).

  1. Subset to just the married individuals.
  2. Regress outcome y on offer
  3. Show regression summary

How does this compare to the overall ATE? Does this regression tell us the program have a larger effect for married individuals?

2.2 Q2 - Subgroup Comparisons

Regress y on offer*married (in the full data). Does the program have a different effect for married individuals which is statistically significant?

2.3 Q3 - Targeting

Drop treatment variables, the Index, and uptake.

jtpa = jtpa %>% select(-train,-classroom,-Index,-OJT_JSA,-f2sms)

Estimate a separate fully interacted linear model (y~(.-offer)^5) for the treatment and control groups.

Predict treatment effects for each individual, by getting each of the two above model predictions, and subtracting the control prediction from the treatment prediction.

  1. Plot a histogram of the individual treatment effects.
  2. Find the subset of individuals for whom the predicted treatment effect is greater than $300.
  3. Calculate the average treatment effect for this group (by regressing y~offer in this subset) and show the regression summary.

2.4 Q4

Build a holdout sample.

set.seed(11451)
holdout_ind = sample(1:nrow(jtpa),nrow(jtpa)*0.2,replace=F)
holdout = jtpa[holdout_ind,]
train = jtpa[-holdout_ind,]
  1. As in Q3, estimate a fully interacted model separately for each of treatment and control – but only using the training data.
  2. Predict the treatment effects for individuals in the holdout sample (as in Q3).
  3. Plot outcomes (y) against predicted treatment effects in the holdout sample. (recommended: try to deal with overplotting using ggplot and alpha=0.1 or geom_hex)
  4. As in Q3, find the average treatment effect for individuals in the holdout sample who have a predicted treatment effect greater than $300.

Is the ATE out of sample different from insample? What does this tell you about our targeting choice?

3 Submission

Due Wed May 26th at 11:59:59 pm.