This homework is going to focus on causal inference.
Download the (cleaned) JTPA data from the website, and load it.
library(tidyverse)
load("../../lectures/l15/jtpa/jtpa.RData")
Drop race and gender characteristics.
jtpa = jtpa %>% select(-male,-black,-hispanic)
Estimate overall ATE
summary(lm(y~offer,data=jtpa))$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15040.504 274.8839 54.715838 0.0000000000
## offer 1159.433 336.2652 3.447973 0.0005668874
Calculate the average treatment effect for the married subpopulation (married == 1
).
y
on offer
How does this compare to the overall ATE? Does this regression tell us the program have a larger effect for married individuals?
Regress y
on offer*married
(in the full data). Does the program have a different effect for married individuals which is statistically significant?
Drop treatment variables, the Index, and uptake.
jtpa = jtpa %>% select(-train,-classroom,-Index,-OJT_JSA,-f2sms)
Estimate a separate fully interacted linear model (y~(.-offer)^5
) for the treatment and control groups.
Predict treatment effects for each individual, by getting each of the two above model predictions, and subtracting the control prediction from the treatment prediction.
Build a holdout sample.
set.seed(11451)
holdout_ind = sample(1:nrow(jtpa),nrow(jtpa)*0.2,replace=F)
holdout = jtpa[holdout_ind,]
train = jtpa[-holdout_ind,]
Is the ATE out of sample different from insample? What does this tell you about our targeting choice?
Due Wed May 26th at 11:59:59 pm.