Adding and Understanding features
Multiple Linear Regression models (MLRM), estimated via OLS, have very good properties, if all Assumptions (A1-A5,A6’) Hold.
Up until now, we have discussed how to estimate them, and analyze them under “optimal” assumptions, in simplified cases.
Today we will be adding other “minor” Features to MLR, and aim to better understand its features
Something that we do not emphasize enough. Before analyzing your data, its important to analyze the nature of the data (summary stats, ranges, scales)
When I talk about Scaling and shifting, I refer exclusibly to affine transormations of the following type:
\[x^* = a*x+c \text{ or } x^* = a*(x+c1)+c2 \]
They either Shift, or change the scale of the data. Not the shape! (logs change shape)
If one applies affine transformations to the data, it will have NO effect on your model what-so-ever. (Same t’s same F’s, same \(R^2\))
But, your \(\beta's\) will change. This could help understading and explaining the results.
Oz | lbs | Kgs | Gr | |||||
---|---|---|---|---|---|---|---|---|
male | 3.123*** | 0.195*** | 0.089*** | 88.605*** | ||||
(1.071) | [2.916] | (0.067) | [2.916] | (0.030) | [2.916] | (30.389) | [2.916] | |
white | 5.404*** | 0.338*** | 0.153*** | 153.346*** | ||||
(1.392) | [3.882] | (0.087) | [3.882] | (0.039) | [3.882] | (39.497) | [3.882] | |
cigs | -0.480*** | -0.030*** | -0.014*** | -13.628*** | ||||
(0.091) | [-5.288] | (0.006) | [-5.288] | (0.003) | [-5.288] | (2.577) | [-5.288] | |
lfaminc | 1.053* | 0.066* | 0.030* | 29.867* | ||||
(0.632) | [1.664] | (0.040) | [1.664] | (0.018) | [1.664] | (17.946) | [1.664] | |
_cons | 110.603*** | 6.913*** | 3.138*** | 3138.351*** | ||||
(2.071) | [53.410] | (0.129) | [53.410] | (0.059) | [53.410] | (58.760) | [53.410] | |
N | 1388 | 1388 | 1388 | 1388 | ||||
R2 | 0.046 | 0.046 | 0.046 | 0.046 |
Important
Re-Scaling is an important tool/trick that can be used for interpreting more complex models.
\[\tilde w = \frac{w-\bar w}{\sigma_w} \rightarrow E(\tilde w)=0 \text{ and } Var(\tilde w) = 1 \]
Dummies are variables that take only two values (preferably 0 and 1).
They are used to capture qualitative (binary) characteristics (ie Democrat, Union worker, etc)
When used in regression analysis, they represent “shifts” in the Intercept: \[y = b_0 + b_1 male + b_2 x_1 + b_3 x_2 + e \]
Unless further restrictions are used, you can’t add Dummies for both categories in the model.
\[\begin{aligned} y &= b_0 + b_1 black + b_2 hispanic + b_3 other + b_4 x + e & || Base = White \\ y &= b_0 + b_1 young + b_2 old + b_3 x + e & || Base = Adult \end{aligned} \]
frause beauty, clear
** Union also a dummy.
** looks as Continous
qui:reg lwage exper union educ female looks
est sto m1
gen looks_good = looks>=4 if !missing(looks)
qui:reg lwage exper union educ female looks_good
est sto m2
qui:reg lwage exper union educ female i.looks
est sto m3
qui:reg lwage exper union educ female ib3.looks
est sto m4
esttab m1 m2 m3 m4, se star( * 0.1 ** 0.05 *** 0.01 ) nogaps nomtitle
display _n "Exact Change Union : " %5.3f (exp(_b[union])-1)*100 "%"
----------------------------------------------------------------------------
(1) (2) (3) (4)
----------------------------------------------------------------------------
exper 0.0137*** 0.0134*** 0.0135*** 0.0135***
(0.00119) (0.00120) (0.00120) (0.00120)
union 0.201*** 0.201*** 0.196*** 0.196***
(0.0305) (0.0307) (0.0306) (0.0306)
educ 0.0737*** 0.0750*** 0.0735*** 0.0735***
(0.00528) (0.00528) (0.00528) (0.00528)
female -0.448*** -0.450*** -0.446*** -0.446***
(0.0293) (0.0294) (0.0293) (0.0293)
looks 0.0555***
(0.0201)
looks_good 0.0276
(0.0299)
1.looks 0 -0.266**
(.) (0.134)
2.looks 0.146 -0.121***
(0.139) (0.0439)
3.looks 0.266** 0
(0.134) (.)
4.looks 0.264* -0.00255
(0.136) (0.0312)
5.looks 0.422** 0.156
(0.173) (0.111)
_cons 0.408*** 0.565*** 0.338** 0.604***
(0.0968) (0.0774) (0.149) (0.0781)
----------------------------------------------------------------------------
N 1260 1260 1260 1260
----------------------------------------------------------------------------
Standard errors in parentheses
* p<0.1, ** p<0.05, *** p<0.01
Exact Change Union : 21.598%
But:
\[\begin{aligned} y &=b_0+b_1 x_1 + b_2 x_1^2 + b_3 x_2 + e \\ \frac{dy}{dx_1} &= b_1+2b_2 x_1 =0 \\ x_1^* &= - \frac{b_1}{2b_2} x_1 \end{aligned} \]
To consider
frause hprice2, clear
gen rooms2=rooms*rooms
qui:reg lprice lnox dist rooms
est sto m0
qui:reg lprice lnox dist rooms rooms2
est sto m1
qui:reg lprice lnox dist c.rooms c.rooms#c.rooms
est sto m2
esttab m0 m1 m2, se varwidth(20) star(* 0.1 ** 0.05 *** 0.01) nogaps
--------------------------------------------------------------------
(1) (2) (3)
lprice lprice lprice
--------------------------------------------------------------------
lnox -0.968*** -0.975*** -0.975***
(0.110) (0.106) (0.106)
dist -0.0291*** -0.0223** -0.0223**
(0.0102) (0.00995) (0.00995)
rooms 0.302*** -0.724*** -0.724***
(0.0189) (0.171) (0.171)
rooms2 0.0794***
(0.0131)
c.rooms#c.rooms 0.0794***
(0.0131)
_cons 9.793*** 13.05*** 13.05***
(0.271) (0.599) (0.599)
--------------------------------------------------------------------
N 506 506 506
--------------------------------------------------------------------
Standard errors in parentheses
* p<0.1, ** p<0.05, *** p<0.01
Turn point: 4.55
Variable | Min p1 p5 p10 p25 p50 p75 p90 p99 Max
-------------+----------------------------------------------------------------------------------------------------
rooms | 3.56 4.52 5.3 5.59 5.88 6.21 6.62 7.15 8.34 8.78
------------------------------------------------------------------------------------------------------------------
qui:reg lprice lnox dist rooms rooms2
margins, dydx(rooms)
qui:reg lprice lnox dist c.rooms c.rooms#c.rooms
margins, dydx(rooms)
Average marginal effects Number of obs = 506
Model VCE: OLS
Expression: Linear prediction, predict()
dy/dx wrt: rooms
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
rooms | -.7236433 .1706763 -4.24 0.000 -1.058973 -.3883139
------------------------------------------------------------------------------
Average marginal effects Number of obs = 506
Model VCE: OLS
Expression: Linear prediction, predict()
dy/dx wrt: rooms
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
rooms | .2747106 .0188463 14.58 0.000 .2376831 .3117382
------------------------------------------------------------------------------
For this, you may need to use manual dummy creation, or use explicit interactions:
Stata
does it for youOptions 1 and 3 will allow you using margins
. For overall groups (all women, all unions) you need to decide how to get representative samples.
\[\begin{aligned} y &= a_0 + a_1 x_1 + a_2 x_2 + a_3 x_1 x_2 + e \\ \frac{\Delta E(y|x_1,x_2) }{\Delta x_1} &= a_1 + a_3 x_2 \\ \frac{\Delta E(y|x_1,x_2) }{\Delta x_2} &= a_2 + a_3 x_1 \end{aligned} \]
\[\begin{aligned} y &= b_0 + b_1 x_1 + b_2 x_2 + b_3 (x_1-\bar x_1)(x_2-\bar x_2) + e \\ \frac{\Delta E(y|x_1,x_2) }{\Delta x_1} &= b_1 + b_3 (x_2-\bar x_2) \simeq b_1 \\ \frac{\Delta E(y|x_1,x_2) }{\Delta x_2} &= b_2 + b_3 (x_1-\bar x_1) \simeq b_2 \end{aligned} \]
\[\begin{aligned} y &= b_0 + b_1 x_1 + b_2 (x_1-\bar x_1)^2 + b_3 x_2 + e \\ \frac{\Delta E(y|x_1,x_2) }{\Delta x_1} &= b_1 + 2 b_2 (x_1-\bar x_1) \simeq b_1 \\ \end{aligned} \]
\[wage=b_0 + b_1 female + b_2 educ + b_3 educ \times female + e \]
Stata
:
\[\begin{aligned} FT: & y = b_0 + b_1 x_1 + b_2 x_2 + g_0 d +g_1 x_1 d +g_2 x_2 d +e \\ D0: & y = b_0 + b_1 x_1 + b_2 x_2 +e && \text{ if d=0 } \\ D1: & y = (b_0+g_0) + (b_1+g_1) x_1 + (b_2+g_2) x_2 +e && \text{ if d=1 } \\ & y = a_0 + a_1 x_1 + a_1 x_2 +e && \text{ if d=1 } \\ CS1: & H_0: g_0=g_1=g_2=0 \end{aligned} \]
\[\begin{aligned} M1 &: y = b_0 + b_1 x_1 + b_2 x_2 + e \\ M2 &: y = b_0 + b_1 x_1 + b_2 x_2 + b_3 d + e \\ if \ D=0 &: y = b_{00} + b_{01} x_1 + b_{02} x2 + e_0 \\ if \ D=1 &: y = b_{10} + b_{11} x_1 + b_{12} x2 + e_1 \end{aligned} \]
F-Stat (similar to before):
\[\begin{aligned} F_{M1} = \frac{(SSR_{M1}-SSR_0-SSR_1)/(k+1)}{(SSR_0+SSR_1)/(n - 2(k+1))} \\ F_{M2} = \frac{(SSR_{M2}-SSR_0-SSR_1)/k}{(SSR_0+SSR_1)/(n - 2(k+1))} \end{aligned} \]
frause gpa3, clear
drop if cumgpa==0
replace sat = sat /100
qui:reg cumgpa sat hsperc tothrs
est sto m1
qui:reg cumgpa sat hsperc tothrs female
est sto m2
qui:reg cumgpa sat hsperc tothrs if female==0
est sto m3
qui:reg cumgpa sat hsperc tothrs if female==1
est sto m4
qui:reg cumgpa i.female##c.(sat hsperc tothrs)
est sto m5
esttab m1 m2 m3 m4 m5, mtitle( Simple With_fem Men Women Full_int) ///
se star(* .1 ** 0.05 *** 0.01) nogaps noomitted
(98 observations deleted)
variable sat was int now float
(634 real changes made)
--------------------------------------------------------------------------------------------
(1) (2) (3) (4) (5)
Simple With_fem Men Women Full_int
--------------------------------------------------------------------------------------------
sat 0.0933*** 0.0938*** 0.0679*** 0.177*** 0.0679***
(0.0133) (0.0130) (0.0151) (0.0244) (0.0146)
hsperc -0.00865*** -0.00730*** -0.00748*** -0.00869*** -0.00748***
(0.00105) (0.00106) (0.00119) (0.00219) (0.00116)
tothrs -0.000599 -0.000586 -0.00155** 0.00141 -0.00155**
(0.000662) (0.000647) (0.000771) (0.00111) (0.000748)
female 0.277***
(0.0493)
0.female 0
(.)
1.female -0.855**
(0.333)
1.female#c~t 0.109***
(0.0310)
1.female#c~c -0.00121
(0.00271)
1.female#c~s 0.00296**
(0.00145)
_cons 1.900*** 1.782*** 2.070*** 1.215*** 2.070***
(0.149) (0.147) (0.173) (0.257) (0.168)
--------------------------------------------------------------------------------------------
N 634 634 483 151 634
--------------------------------------------------------------------------------------------
Standard errors in parentheses
* p<.1, ** p<0.05, *** p<0.01
test 1.female#c.sat 1.female#c.hsperc 1.female#c.tothrs
test 1.female 1.female#c.sat 1.female#c.hsperc 1.female#c.tothrs
margins female, dydx(sat hsperc tothrs)
( 1) 1.female#c.sat = 0
( 2) 1.female#c.hsperc = 0
( 3) 1.female#c.tothrs = 0
F( 3, 626) = 6.26
Prob > F = 0.0003
( 1) 1.female = 0
( 2) 1.female#c.sat = 0
( 3) 1.female#c.hsperc = 0
( 4) 1.female#c.tothrs = 0
F( 4, 626) = 12.75
Prob > F = 0.0000
Average marginal effects Number of obs = 634
Model VCE: OLS
Expression: Linear prediction, predict()
dy/dx wrt: sat hsperc tothrs
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
sat |
female |
0 | .0679302 .014607 4.65 0.000 .0392454 .0966149
1 | .1772582 .0273126 6.49 0.000 .1236229 .2308936
-------------+----------------------------------------------------------------
hsperc |
female |
0 | -.00748 .0011573 -6.46 0.000 -.0097526 -.0052073
1 | -.0086922 .0024524 -3.54 0.000 -.0135081 -.0038763
-------------+----------------------------------------------------------------
tothrs |
female |
0 | -.0015482 .0007477 -2.07 0.039 -.0030165 -.0000798
1 | .001412 .0012472 1.13 0.258 -.0010371 .0038612
------------------------------------------------------------------------------
\[y = b_0 + b_1 x_1 + b_2 x_1^2 + e \rightarrow \frac{dy}{dx} = b_1 + 2b_2 x_1 \]
Stata
you can do this only for interactions. For constructed variables you need f_able
, or do it by hand.IMPORTANT: Low \(R^2\) does not mean a bad model, nor high \(R^2\) mean a good one.
\[R^2_{adj} = 1-\frac{SSR/(n-k-1)}{SST/(n-1)}=1-(1-R^2)\frac{n-1}{n-k-1} \]
\[\begin{aligned} M1: & y = b_0 + b_1 x_1 + b_2 x_2 + e \\ M2: & y = b_0 + b_1 x_1 + b_3 x_3 + e \\ M3: & y = b_0 + b_1 ln(x_1) + b_2 ln(x_2) + e \\ M4: & y = b_0 + b_1 x_1 + b_2 x_2 + b_3 x_3 + e \end{aligned} \]
\[\begin{aligned} ln(y) &= a_0 + a_1 x_1 + a_2 x_2 + \varepsilon \\ y &= exp(a_0 + a_1 x_1 + a_2 x_2 + \varepsilon ) \\ E(y|x_1,x_2) &=E(e^{a_0 + a_1 x_1 + a_2 x_2}) \times E(e^ \varepsilon ) \\ & E(e^ \varepsilon )\neq 1 \end{aligned} \]
Lets call \(E(e^ \varepsilon ) = \alpha_0\)
Option 1 : \(\alpha_0 = n^{-1} \sum( \exp {\hat\varepsilon})\)
Option 2 : Under Normality of \(\varepsilon\), \(\alpha_0 = \exp(\hat \sigma^2/2)\)
Option 3 : Call \(\hat m = \exp(a_0 + a_1 x_1 + a_2 x_2)\).
Regress \(y\) on \(\hat m\) without intercept. \(\alpha_0 = \frac{\hat m'y}{\hat m'\hat m}\)
\[R^2 = Corr(y,\hat y)^2 \text{ or } 1-\frac{\sum(y_i-\alpha_0 \hat m_i)^2}{\sum(y-\bar y)^2} \]
frause oaxaca, clear
drop if lnwage==.
gen wage = exp(lnwage)
qui:reg lnwage educ exper tenure female married divorced
predict lnw_hat
predict lnw_res, res
** Case 1:
egen alpha_01 = mean( exp(lnw_res))
** Case 2:
qui:sum lnw_res
gen alpha_02 = exp(r(Var)/2)
gen elnw_hat = exp(lnw_hat)
qui: reg wage elnw_hat, nocons
gen alpha_03 = _b[elnw_hat]
gen wage_1 = elnw_hat
gen wage_2 = elnw_hat*alpha_01
gen wage_3 = elnw_hat*alpha_02
gen wage_4 = elnw_hat*alpha_03
mata: y = st_data(.,"wage"); my = mean(y)
mata: yh = st_data(.,"wage_1 wage_2 wage_3 wage_4")
mata:"R2_1 "; 1 - sum((y:-yh[,1]):^2)/sum( (y:-my):^2 )
mata:"R2_2 "; 1 - sum((y:-yh[,2]):^2)/sum( (y:-my):^2 )
mata:"R2_3 "; 1 - sum((y:-yh[,3]):^2)/sum( (y:-my):^2 )
(Excerpt from the Swiss Labor Market Survey 1998)
(213 observations deleted)
(option xb assumed; fitted values)
R2_1
.1569552664
R2_2
.1692562931
R2_3
.1658805115
\[\begin{aligned} D &= b_0 + b_1 x_1 + b_2 x_2 +b_3 x_3 + e \\ E(D|Xs) &= P(D=1|Xs) \\ &= b_0 + b_1 x_1 + b_2 x_2 +b_3 x_3 \end{aligned} \]
Note:
E(inlf|X) = 0.707 - 0.018 age + 0.040 educ + 0.023 exper + 0.013 kidsge6 - 0.272 kidslt6 - 0.003 nwifeinc
N = 753 R^2 = 0.254
\[Var(y|x)=p(x)*(1-p(x)) \]
Thus SE will be incorrect, affecting inference
\[Children = b_0 + b_1 age + b_2 education + e\]
E(children|X) = -1.997 + 0.175 age - 0.090 educ
N = 4361 R^2 = 0.560
In this case \(b_0\) is the expected value of \(y\) when \(x_1=c_1\), \(x_2=c_2\) and \(x_3=c_3\). Thus, its now Useful!
Using this affine transformation, we can easily make predictions (and get SE) for any specific values of interest.
Adjusted predictions Number of obs = 4,137
Model VCE: OLS
Expression: Linear prediction, predict()
At: sat = 1200
hsperc = 30
hsize = 5
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
_cons | 2.700075 .0198778 135.83 0.000 2.661104 2.739047
------------------------------------------------------------------------------
Source | SS df MS Number of obs = 4,137
-------------+---------------------------------- F(4, 4132) = 398.02
Model | 499.030503 4 124.757626 Prob > F = 0.0000
Residual | 1295.16517 4,132 .313447524 R-squared = 0.2781
-------------+---------------------------------- Adj R-squared = 0.2774
Total | 1794.19567 4,136 .433799728 Root MSE = .55986
------------------------------------------------------------------------------
colgpa | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
sat0 | .0014925 .0000652 22.89 0.000 .0013646 .0016204
hsperc0 | -.0138558 .000561 -24.70 0.000 -.0149557 -.0127559
hsize0 | -.0608815 .0165012 -3.69 0.000 -.0932328 -.0285302
hsize20 | .0054603 .0022698 2.41 0.016 .0010102 .0099104
_cons | 2.700075 .0198778 135.83 0.000 2.661104 2.739047
------------------------------------------------------------------------------
When modeling \(y = b_0 + \delta \ trt + b_1 x_1 + b_2 x_2 + e\) the treatment effect \(\delta\) was estimated under homogeneity assumption (only intercept shift)
This assumption can be relaxed by estimating separate models or using interactions.
Effects can be estimated manually (separate models), margins or using shifts!
Using Separate models: \[\begin{aligned} y &= b^0_0 + b^0_1 x_1 + b^0_2 x_2 + e^0 \text{ if trt=0} \\ y &= b^1_0 + b^1_1 x_1 + b^1_2 x_2 + e^1 \text{ if trt=1} \\ & ATE = E(\hat y_1 - \hat y_0 ) \\ & ATT = E(\hat y_1 - \hat y_0 | trt=1) \\ & ATU = E(\hat y_1 - \hat y_0 | trt=0) \end{aligned} \]
Or using Model Shits
\[\begin{aligned} y &= b_0 + \delta_{ate} trt + b_1 x_1 + g_1 trt (x_1- E(x_1)) + e \\ y &= b_0 + \delta_{att} trt + b_1 x_1 + g_1 trt (x_1- E(x_1|trt=1)) + e \\ y &= b_0 + \delta_{atu} trt + b_1 x_1 + g_1 trt (x_1- E(x_1|trt=0)) + e \\ \end{aligned} \]
frause jtrain98, clear
foreach i in earn96 educ age married {
sum `i' if train==0, meanonly
gen atu_`i' = (`i' - r(mean))*train
sum `i' if train==1, meanonly
gen att_`i' = (`i' - r(mean))*train
sum `i' , meanonly
gen ate_`i' = (`i' - r(mean))*train
}
qui:reg earn98 train earn96 educ age married
est sto m1
qui:reg earn98 train earn96 educ age married ate*
est sto m2
qui:reg earn98 train earn96 educ age married atu*
est sto m3
qui:reg earn98 train earn96 educ age married att*
est sto m4
esttab m1 m2 m3 m4, keep(train) mtitle(Homogenous ATE ATU ATT) se
----------------------------------------------------------------------------
(1) (2) (3) (4)
Homogenous ATE ATU ATT
----------------------------------------------------------------------------
train 2.411*** 3.106*** 3.533*** 2.250***
(0.435) (0.532) (0.667) (0.449)
----------------------------------------------------------------------------
N 1130 1130 1130 1130
----------------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Next week Lifting Assumptions: Heteroskedasticity