Multiple Regression Analysis: Inference and Asymptotics

Are they Significant?

Fernando Rios-Avila

How do you know if what you see is relevant?

  • Last time, we talk a bit about the estimation of MLRM. For those who do not remember:

\[\hat\beta=(X'X)^{-1}X'y \]

  • We also defined how, under A5 (homoskedasticity), we can estimate the variance covariance of coefficients:

\[Var(\beta) = \frac{\sum \hat e^2}{N-K-1} (X'X)^{-1} \]

  • The next question: how to know how precise your estimates are?

  • That should be simple, just divide coefficient by its Standard error. The larger this is, the more precise, and more significant.

  • Is this enough to say something about the population coefficients?

(lets assume A1-A5 holds)

Distribution of coefficients

  • The right answer is…Perhaps.

  • Unless you know something about the distribution of \(\beta's\), it would be hard to make any inferences from the estimates. Why?

  • Because not all distributions are made equal!

Code
clear
range x -4 4 1000
gen funiform = 0 
replace funiform = 1/(2*sqrt(3)) if inrange(x,-sqrt(3),sqrt(3))

gen fnormal = 0 
replace fnormal = normalden(x)

gen fchi2 = 0 
replace fchi2 = sqrt(8)*chi2den(4,x*sqrt(8)+4)

integ funiform x, gen(F1)
integ fnormal x, gen(F2)
integ fchi2 x, gen(F3)

set scheme white2
color_style egypt
replace x = x + 1.5
two (area funiform x           , pstyle(p1) color(%20)) ///
    (area funiform x if F1<0.05, pstyle(p1) color(%80)) /// 
    (area funiform x if F1>0.95, pstyle(p1) color(%80)) /// 
    (area fnormal  x           , pstyle(p2) color(%20)) ///  
    (area fnormal  x if F2<0.05, pstyle(p2) color(%80)) /// 
    (area fnormal  x if F2>0.95, pstyle(p2) color(%80)) /// 
    (area fchi2    x           , pstyle(p3) color(%20)) /// 
    (area fchi2    x if F3>0.95, pstyle(p3) color(%80)) /// 
    (area fchi2    x if F3<0.05, pstyle(p3) color(%80)), ///
    xlabel(-4 / 4) legend(order(2 "Uniform" 5 "Normal" 8 "C-Chi2")) /// 
    xtitle("Beta hat Distribution") ///
    xline( 0, lstyle(1) lwidth(1)) xline(1.5)

graph export images/f4_1.png, replace width(1200)  

Not all Distributions are the Same

New Assumption

  • A6: Errors are normal \(e\sim N(0,\sigma^2_e)\).
    • A1-A6 are the Classical Linear Model Assumption
    • This assumes the outcome is “conditionally” normal. \(y|X \sim N(X\beta,\sigma^2_e)\)
    • And with this assumption OLS is no longer blue. Its now BUE!

Why does it matter?

  • If you combine two variables with the same distributions, the combined variable will not have the same distributions as the “parents”
  • Except with normals! if you add two -normal- distributions together. The outcome will also be normal. (Dont believe me try it)

Recall:

\[\hat \beta=\beta + (X'X)^{-1}X'e \]

If \(e\) is normal, then \(\beta's\) will also be normal

And this works for ANY Sample size!

If \(e\) normal then \(\beta\) is normal

  • If \(\hat \beta's\) are normal, then we can use this distribution to make inferences about \(\beta's\) using normal distribution.

  • This is good, because we know how to do math with Normal distributions. And can used the modified Ratio:

\[z_j = \frac{\hat \beta_j - \beta_j}{sd(\hat\beta)}\sim N(0,1) \]

  • Where \(\beta_j\) is what you think the True Population parameter is (your hypothesis), and \(\hat\beta_j\) is what you estimate in your data.
  • Depending on the size of this, you can either reject your hypothesis, or not Reject it.

but do we “know” \(sd(\beta)\)?

Do we “know” \(sd(\beta)\)?

We don’t, which is why we can use a normal directly. Instead we use a t-distribution, which uses \(se(\hat\beta )\)

\[t_j = \frac{\hat \beta_j - \beta_j}{se(\hat\beta)}\sim t_{N-k-1} \]

Then

  • If \(e\) is normal, \(\beta\) will be normal.
  • When Samples are “small” Standardized \(\beta\) will follow a t-distribution
  • But, as \(N\rightarrow \infty\), \(t_{N-k-1}\sim N(0,1)\)

Testing Hypothesis

  • The idea of hypothesis testing is contrasting the “evidence” from your data (estimates) with the beliefs we have about the population.

\[y=\beta_0 + \beta_1 x_1 + \beta_2 x_2 + e \]

Say I have two hypothesis.

  • \(x_2\) has no effect on \(y\). ie \(H_0: \beta_2 = 0\)

  • \(x_1\) has an effect equal to 1. ie \(H_0: \beta_1 = 1\)

    • Notice we make hypothesis about the population coefficients not the estimates

I can “test” each hypothesis separately using a “t-statistic”

\[ \color{green}{t_2=\frac{\hat \beta_2 - 0}{se(\hat \beta_2)}} ; t_1=\frac{\hat \beta_1 - 1}{se(\hat \beta_1)} \]

Types of Hypothesis:

When talking about hypothesis testing there are two types:

  • One sided: when your alternative hypothesis compares your null to something either strictly larger, or strictly smaller than your hypothesis.
    • Education has no effect on wages vs Returns to education are positive.
    • Skipping class has no effect on grades vs Skiping class reduces grades.
  • Two sided: When your alternative hypothesis is to say, “its different than”
    • Returns to education is 10%, vs is not 10%
    • Skipping class reduces grades in 0.5 points, vs not 0.5points

In both cases, you use the same t-statistic.

\[t_\beta=\frac{\hat\beta - \beta_{hyp}}{se(\hat \beta)} \sim t_{N-k-1} \]

What changes are the “thresholds” to Judge something significant or not.

One sided test:

\[\begin{aligned} H_0: & \beta_k=\beta^{hyp}_k \text{ vs } H_1: \beta_k>\beta^{hyp}_k \\ & t_{\beta_k}>t_{N-k-1}(1-\alpha) \\ H_0: & \beta_k=\beta^{hyp}_k \text{ vs } H_1: \beta_k<\beta^{hyp}_k \\ & t_{\beta_k}<-t_{N-k-1}(1-\alpha) \end{aligned} \]

  • Where \(\alpha\) is your level of significance, and \(t_{N-k-1}(1-\alpha)\) is the critical value.

  • \(\alpha\) determines the “risk” of commiting an error type I: Rejecting the Null when its true.

  • Intuitively, the smaller \(\alpha\) is, the more possitive (negative) “t” needs to be reject the Null.

Two sided test:

\[\begin{aligned} H_0: & \beta_k=\beta^{hyp}_k \text{ vs } H_1: \beta_k \neq \beta^{hyp}_k \\ & | t_{\beta_k} | >t_{N-k-1}(1-\alpha/2) \end{aligned} \]

  • Similar to before, except the one needs to consider both tails of the distribution to determine critical values (see \(t_{N-k-1}(1-\alpha/2)\))

  • Intuitively, the smaller \(\alpha\) is, the larger the absolute value of “t” needs to be reject the Null.

But, what is an error type I? and why we don’t we “accept” \(H_0\) s?

Why we never accept?:

  • As stated few times before, \(\hat \beta\) are just approximations to the true \(\beta\) coefficients. Its the “evidence” you have based on the data available.
  • With this evidence, you can reject some hypothesis. (Some more strongly than others)
  • However, there could exists many scenarios that would fit the evidence.

Code
clear
range x -5 5 1000
gen fx = normalden(x) 
set scheme white2
color_style tableau
gen xx = x+1
two (area fx x , pstyle(p1) color(%20)) ///
    (area fx x if x<invnormal(.025), pstyle(p1) color(%80) ) ///
    (area fx x if x>invnormal(.975), pstyle(p1) color(%80) ) ///
    (area fx xx , pstyle(p2) color(%20)) ///
    (area fx xx if x<invnormal(.025), pstyle(p2) color(%80) ) /// 
    (area fx xx if x>invnormal(.975), pstyle(p2) color(%80) ) ///
    , xline(1.8) legend(order(1 "H0: b=0" 4 "H0: b=1"))  ///
    xlabel(-4(2)4)  ylabel(0(.1).5)  xsize(8) ysize(4)
graph export images/f4_2.png, height(1000)  replace

What about Type error I and II?

  • Because we do not know the truth, we are bound to commit errors in our assessment of the data.

  • So given the data evidence and the hypothesis, there could be 2 scenarios:

    • GOOD: You either reject when \(H_0\) is false, or not reject when \(H_0\) is true.
    • \(TE-I\): You reject \(H_0\) when it is true,
    • \(TE-II\): Not reject \(H_0\) when it is false (Something else was true)

Code
clear
range x -5 5 1000
gen fx = normalden(x) 

gen xxx=x+3 
two (area fx x , pstyle(p1) color(%20)) ///
    (area fx xxx, pstyle(p2) color(%20)) ///
    (area fx x if x>2, pstyle(p1) color(%80)) ///
    (area fx xxx if xxx <2, pstyle(p2) color(%80))  ///
    ,legend(order(1 "H0"3 "Type I " 2 "H1"  4 "Type II ") cols(2))   ///
    xlabel(-4(2)7)  ylabel(0(.1).5) xsize(8) ysize(4)  
graph export images/f4_3.png, height(1000)  replace

Example: Determinants of College GPA

Code
frause gpa1, clear
reg colgpa hsgpa act skipped

      Source |       SS           df       MS      Number of obs   =       141
-------------+----------------------------------   F(3, 137)       =     13.92
       Model |  4.53313314         3  1.51104438   Prob > F        =    0.0000
    Residual |  14.8729663       137  .108561798   R-squared       =    0.2336
-------------+----------------------------------   Adj R-squared   =    0.2168
       Total |  19.4060994       140  .138614996   Root MSE        =    .32949

------------------------------------------------------------------------------
      colgpa | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       hsgpa |   .4118162   .0936742     4.40   0.000     .2265819    .5970505
         act |   .0147202   .0105649     1.39   0.166    -.0061711    .0356115
     skipped |  -.0831131   .0259985    -3.20   0.002    -.1345234   -.0317028
       _cons |   1.389554   .3315535     4.19   0.000     .7339295    2.045178
------------------------------------------------------------------------------
  • Hypothesis: Skipping classes has no effect on College GPA.

\[H_0: \beta_{skip} = 0 \text{ vs } H_1: \beta_{skip} \neq 0 \]

  • Test, \(a=95%\), \(|t_{skip}|=3.2\) vs \(t_{n-k-1}(0.975)\):
Code
display invt(141-4,0.975)

1.9774312

  • Conclusion: \(H_0\) is rejected.

  • Hyp: Skipping college has no effect on College GPA vs has a negative effect

\[H_0: \beta_{skip} = 0 \text{ vs } H_1: \beta_{skip}<0 \]

  • Test, \(a=95\%\), \(|t_{skip}|=3.2\) vs \(t_{n-k-1}(0.95)=1.6560\)
  • Also Reject \(H_0\)

  • \(t_{ACT}=1.39\)
  • Hyp: ACT has no effect on College GPA vs It has a non-zero effect
  • Hyp: ACT has no effect on College GPA vs it has a positive effect
  • Critical:
    • \(t_{137}(0.95)=1.6560\) Donot Reject\(H_0\) with \(\alpha = 5\%\)
    • \(t_{137}(0.90)=1.2878\) Reject \(H_0\) with \(\alpha = 10\%\)
  • Each GPA point in highschool translates into half a point in College GPA. vs Is less than .5
test hsgpa = 0.5
lincom hsgpa - 0.5

 ( 1)  hsgpa = .5

       F(  1,   137) =    0.89
            Prob > F =    0.3482

 ( 1)  hsgpa = .5

------------------------------------------------------------------------------
      colgpa | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         (1) |  -.0881838   .0936742    -0.94   0.348    -.2734181    .0970505
------------------------------------------------------------------------------
  • Critical at 5%: \(t=-1.6560\)
  • Cannot Reject \(H_0\)

p-values

  • Something you may or may not have noticed. The significance level \(\alpha\) can be choosen by the researcher.
    • Conventional levels are 10%, 5% and 1%.
  • This may lead to researchers choosing any value that would make their theory fit.
  • There is a better alternative. Using \(p-values\) to capture the smallest significance level that you could use to reject your Null.

\[p-value = P(|t|>|t-stat|) \text{ or } p-value = 2*P(|t|>|t-stat|) \]

  • The smallest the better! (for rejection)
  • How?
    • One tail : display 1-t(df = n-k-1, |t-stat|)
    • two tails: display 2-2*t(df = n-k-1, |t-stat|)

Code
clear
range x -5 5 1000
gen fx = normalden(x) 
local p1:display %5.1f 100*2*(1-normal(1.5)) 
local p2:display %5.1f 100*2*(1-normal(2.5)) 
two (area fx x , pstyle(p1) color(%20)) ///
    (area fx x if x<-1.5, pstyle(p3) color(%80) ) ///
    (area fx x if x>+1.5, pstyle(p3) color(%80) ) ///
    (area fx x if x<invnormal(.025), pstyle(p1) color(%80) ) ///
    (area fx x if x>invnormal(.975), pstyle(p1) color(%80) ) ///
    (area fx x if x<-2.5, pstyle(p2) color(%80) ) ///
    (area fx x if x>+2.5, pstyle(p2) color(%80) ) , ///
    xline(-1.5 2.5) legend(order(4 "{&alpha}=5%" 6 "p-value = `p2'%" 2 "p-value = `p1'%")) ///
    ylabel(0(.1).5) xlabel(-5 0 5 -1.5 2.5)
graph export images/f4_4.png, replace width(1200)

Note on Statistical Significance

  1. Statistically significant doesnt mean meaninful. And lack of it, doesnt mean is not important
    • Keep in mind that SE may be larger or smaller due to other factors (N or Mcollinearity)
  2. Be careful of discussing the effect size. (a 1US increase in min wage is different from 1chp in min Wage)
  3. If non-significant, pay attention to the magnitude and relevance for your research. Does it have the correct sign?
  4. Incorrect signs with significant results. Either there is something wrong, or you found something interesting.

Confidence Intervals

  • This is the third approach to assess how precise or significant an estimate is. You provide a Range of possible values, given the level of coverage, and SE.

\[CI(\beta_i) = [\hat \beta_i - \hat \sigma_{\beta_i} t_{n-k-1}(1-\alpha),\hat \beta_i +\hat \sigma_{\beta_i} t_{n-k-1}(1-\alpha)] \]

  • Interpretation:
    • If we were to draw M samples, the true beta would be in this interval \(1-\alpha\%\) of the time.
  • It allows you to see what other “hypothesis” would be consistent with the evidence of the estimate (you wouldnt be able to reject the Null)

CI vs T-critical and P values

  1. t-stat and p-values are calculated based on standardized coefficients (ratio of coefficient and SE)
  2. CI are calculated based on the actual coefficient and SE.

If the p-value of a t-statistic is exactly 0.05, then the 95% CI will not include 0 (at the limit), and the t-critical (\(\alpha=5\%\)) will be the same as the t-statistic.

In other words. If you use the same \(\alpha\), your conclusions would be the same regardless of using t-stat, p-value or CI.

Code
clear
range x -5 5 1000
gen fx = normalden(x) 
local p1:display %5.1f 100*2*(1-normal(1.5)) 
local p2:display %5.1f 100*2*(1-normal(2.5)) 
gen xx = x+2

two (area fx x , pstyle(p1) color(%10)) ///
    (area fx x if x<invnormal(.025), pstyle(p1) color(%60) ) ///
    (area fx x if x>invnormal(.975), pstyle(p1) color(%60) ) ///
    (area fx xx , color(gs1%10) ) ///
    (area fx xx if x<invnormal(.005),  color(gs1%80) ) ///
    (area fx xx if x>invnormal(.995),  color(gs1%80) ) ///
    (area fx xx if x<invnormal(.025),  color(gs1%60) ) ///
    (area fx xx if x>invnormal(.975),  color(gs1%60) ) ///
    (area fx xx if x<invnormal(.05),  color(gs1%40) ) ///
    (area fx xx if x>invnormal(.95),  color(gs1%40) ), ///
    xline(2) legend(order(5 "CI-1%" 7 "CI-5%" 9 "CI-10%")) ///
    ylabel(0(.1).5) xlabel(-5 0 5 )  
graph export images/f4_5.png, replace width(1000)

Lets make things interestings (Harder)

Testing Linear Combinations:

  • You may be interested in testing particular linear combinations of coefficients:

\(b_1 - b_2 =0 ; b_2+b_3=1 ; 2*b_4-b_5=b_6\)

  • Doing this is “simple”. Because is a single linear combination, you can still use “t-stat”.

\(t-stat = \frac{2*\hat b_4 -\hat b_5 -\hat b_6}{se(2*\hat b_4 -\hat b_5 -\hat b_6)}\)

  • Just need SE for combined coefficients (requires knowing Variances and Covariances)

  • Easy way, you could use Stata:

reg y x1 x2 x3 x4 x5 x6
lincom x1-x2 or lincom 2*x4-x5-x6
test (x1-x2=0) (x2+x3=1) (2*x4-x5=x6), mtest

Harder Way: (if you dare)

Matrix Multiplication

Assume Constant is the last coefficient: \[V( 2*b_4-b_5- b_6) = R' V R ; R = [0,0,0,2,-1,-1] \]

where R are the restrictions, and V is the variance covariance matrix of \(\beta's\).

Then your t-stat

\[t-stat = \frac{2*b_4-b_5- b_6}{\sqrt{V(2*b_4-b_5- b_6)}} \]

Alternative: Substitution

  • One can manipulate the regression model to consider a model with the contrained coefficient.
  • Once model is estimated, it simplifies testing:

\[\begin{aligned} & y = b_0 + b_1 x_1 + b_2 x_2 + b_3 x_3 + e \\ h0: & b_1 - 2b_2 +b_3=0 \rightarrow \theta = b_1 - 2b_2 +b_3 \rightarrow b_1 = \theta + 2b_2 - b_3 \\ & y = b_0 + ( \theta + 2b_2 - b_3) x_1 + b_2 x_2 + b_3 x_3 + e \\ & y = b_0 + \theta x_1 + b_2( x_2 +2 x_1) + b_3 (x_3-x_1) + e \\ & y = b_0 + \theta x_1 + b_2 \tilde x_2 + b_3 \tilde x_3 + e \\ \end{aligned} \]

Here testing for \(\theta=0\) is the same as testing for \(b_1 - 2b_2 +b_3\) in the original model.

Note

Always ask something like this in Midterm, so brush up your math.

Testing Multiple Restrictions

What if you are interested in testing multiple restrictions:

\[\begin{aligned} y &= b_0 + b_1 x_1 + b_2 x_2 +b_3 x_3 + e \\ & H_0: b_1 = 0 ; b_2 - b_3 =0 \\ & H_1: H_0 \text{ is false} \end{aligned} \]

Easy way: Stata command test allows you to do this

Otherwise, you can do it by hand:

  1. Estimate unrestricted model (original) and “save” \(SSR_{ur}\) or \(R_{ur}^2\)
  2. Impose restrictions on the model and “save” \(SSR_r\) or \(R_{r}^2\)
  3. Estimate F-stat:

\[F_{q,n-k-1} = \frac{(SSR_r - SSR_{ur})/q}{SSR_{ur}/(n-k-1)} \text{ or } \frac{(R^2_{ur}-R^2_r)/q}{(1-R^2_{ur})/(n-k-1)} \sim F(q,n-k-1) \]

\(SSR\) Sum of Squared Residuals, \(q\) number of restrictions

  • Idea, you are comparing how the overall fitness of the model changes with restrictions.
  • If restrictions slightly decreases the model Fitness, you cannot be rejected them.
  • Otherwise, They are rejected! (you just dont know which)

Overall Model Significance

One test, we often don’t do anymore, is testing the overall fitness of a model:

\[H_0: x_1, x_2, \dots , x_k \text{ do not explain y} \]

\[H_0: \beta_1=\beta_2=\dots=\beta_k =0 \]

Where we kind of suggest that a model with only an intercept is better than the one with covariates.

\[F_{q,n-k-1} = \frac{(R^2_{ur}-\color{red}{R^2_r})/q}{(1-R^2_{ur})/(n-k-1)} \sim F(q,n-k-1) \]

In this case \(\color{red}{R^2_r}=0\)

Not For the faint for heart

Matrix form for F-Stat! Restrictions:

\[H_0: R_{q,k+1}\beta_{k+1,1}=c_{q,1} \]

First. Define matrix with all Matrix Restriction \[ \Sigma_R = R_{q,k+1} V_\beta R'_{q,k+1} \]

Second: F-statistic

\[ F-stat = \frac 1 q (R\beta-c)' \Sigma_R^{-1} (R\beta-c) \]

Example

frause hprice1, clear
reg lprice lasses bdrms llotsize lsqrft

      Source |       SS           df       MS      Number of obs   =        88
-------------+----------------------------------   F(4, 83)        =     70.58
       Model |  6.19607473         4  1.54901868   Prob > F        =    0.0000
    Residual |  1.82152879        83   .02194613   R-squared       =    0.7728
-------------+----------------------------------   Adj R-squared   =    0.7619
       Total |  8.01760352        87  .092156362   Root MSE        =    .14814

------------------------------------------------------------------------------
      lprice | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     lassess |   1.043065    .151446     6.89   0.000     .7418453    1.344285
       bdrms |   .0338392   .0220983     1.53   0.129    -.0101135    .0777918
    llotsize |   .0074379   .0385615     0.19   0.848    -.0692593    .0841352
      lsqrft |  -.1032384   .1384305    -0.75   0.458     -.378571    .1720942
       _cons |    .263743   .5696647     0.46   0.645    -.8692972    1.396783
------------------------------------------------------------------------------
test (lasses=1)
test (lasses=1) (bdrms=llotsize=lsqrft=0)

 ( 1)  lassess = 1

       F(  1,    83) =    0.08
            Prob > F =    0.7768

 ( 1)  lassess = 1
 ( 2)  bdrms - llotsize = 0
 ( 3)  bdrms - lsqrft = 0
 ( 4)  bdrms = 0

       F(  4,    83) =    0.67
            Prob > F =    0.6162
frause mlb1, clear
reg lsalary years gamesyr bavg hrunsyr rbisy

      Source |       SS           df       MS      Number of obs   =       353
-------------+----------------------------------   F(5, 347)       =    117.06
       Model |  308.989208         5  61.7978416   Prob > F        =    0.0000
    Residual |  183.186327       347  .527914487   R-squared       =    0.6278
-------------+----------------------------------   Adj R-squared   =    0.6224
       Total |  492.175535       352  1.39822595   Root MSE        =    .72658

------------------------------------------------------------------------------
     lsalary | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       years |   .0688626   .0121145     5.68   0.000     .0450355    .0926898
     gamesyr |   .0125521   .0026468     4.74   0.000     .0073464    .0177578
        bavg |   .0009786   .0011035     0.89   0.376    -.0011918     .003149
     hrunsyr |   .0144295    .016057     0.90   0.369    -.0171518    .0460107
      rbisyr |   .0107657    .007175     1.50   0.134    -.0033462    .0248776
       _cons |   11.19242   .2888229    38.75   0.000     10.62435    11.76048
------------------------------------------------------------------------------
test bavg hrunsyr rbisy

 ( 1)  bavg = 0
 ( 2)  hrunsyr = 0
 ( 3)  rbisyr = 0

       F(  3,   347) =    9.55
            Prob > F =    0.0000

Lets take one more Step: What if errors are not normal?

There we go A6

Introduction

  • When considering the topic of asymptotic theory, there are few concepts that are important ton consider.

    1. Asymtotics refer to properties of OLS when \(N\rightarrow \infty\)
    2. When samples grow, we are more concern about consistency rather than “just” unbiased estimators.
    3. We are also concern with how flexible is the normality assumption when samples grow large.

What is consistency?

  • Up until now, we have been concerned with Unbiased estimates

\[E(\hat\beta)=\beta \]

  • In large samples, this is no longer enough. One requires Consistency!
    • Consistency says that as \(N\rightarrow \infty\) then \(plim \hat \beta = \beta\).
    • \(p(|\hat \beta - \beta|<\varepsilon) = 1\) or that The variance shrinks to zero, or we can estimate \(\beta\) almost surely.
    • This is also known as asymptotic unbiasness.
  • In linear regression analysis, consistency can be achieved with a weaker A4’: \(Cov(e,x)=0\), assuming that we require only linear independence.

Consistency vs Bias

What about Normality Assumption?

  • Everything we have seen so far was possible under the normality assumption of the errors.
    • if \(e\) is normal, then \(b\) is normal (even in small samples), thus we can use \(t\), \(F\), etc
  • But what if this assumption fails? would we care?
  • Perhaps. If your sample is small, \(b\) will not be normal, and standard procedures will not work.
  • In large Samples, however, \(\beta's\) will be normal, even if \(e\) is not. Thanks to CLT

Good news

  • Bottom line, when \(N\) is large, you do not need \(e\) to be normal.

  • if A1-A5 hold, you can rely on asymptotic normality!

  • Thus you can still use t’s and F’s, but you can also use LM

LM-Lagrange Multiplier

  • While you can still use t-stat and F-stat to draw inference from your model, there is a better test (given the large sample): Lagrange Multiplier Statistic
  • The idea: Does impossing restrictions affect the model Fitness?
  1. Regress \(y\) on restricted \(x_1,\dots,x_{k-q}\), and obtain \(\tilde e\)
  2. Regress \(\tilde e\) on all \(x's\), and obtain \(R^2_e\).

If the excluded regressors were not significant, the \(R^2_e\) should be very small.

  1. Compare \(nR^2_e\) with \(\chi^2(n,1-\alpha)\), and draw conclusions.

Example:

frause crime1, clear
qui: reg narr86 pcnv avgsen tottime ptime86 qemp86
** H0: avgsen=0 and tottime=0
test (avgsen=0) (tottime=0)

 ( 1)  avgsen = 0
 ( 2)  tottime = 0

       F(  2,  2719) =    2.03
            Prob > F =    0.1310
qui: reg narr86 pcnv                ptime86 qemp86
* Predict residuals of constrained model
predict u_tilde , res
* regress residuals againts all variables
reg u_tilde  pcnv avgsen tottime ptime86 qemp86

      Source |       SS           df       MS      Number of obs   =     2,725
-------------+----------------------------------   F(5, 2719)      =      0.81
       Model |  2.87904835         5  .575809669   Prob > F        =    0.5398
    Residual |  1924.39392     2,719  .707757969   R-squared       =    0.0015
-------------+----------------------------------   Adj R-squared   =   -0.0003
       Total |  1927.27297     2,724  .707515773   Root MSE        =    .84128

------------------------------------------------------------------------------
     u_tilde | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        pcnv |  -.0012971    .040855    -0.03   0.975    -.0814072    .0788129
      avgsen |  -.0070487   .0124122    -0.57   0.570     -.031387    .0172897
     tottime |   .0120953   .0095768     1.26   0.207    -.0066833     .030874
     ptime86 |  -.0048386   .0089166    -0.54   0.587    -.0223226    .0126454
      qemp86 |   .0010221   .0103972     0.10   0.922    -.0193652    .0214093
       _cons |  -.0057108   .0331524    -0.17   0.863    -.0707173    .0592956
------------------------------------------------------------------------------
Code
display "Chi2(2)=" `=e(N)*e(r2)'
display "Its p-value=" %5.4f `=1-chi2(2, `=e(N)*e(r2)' )'

Chi2(2)=4.0707294 Its p-value=0.1306

Try making it a program if you “dare”

Thats All folks!

Part II:

Addressing Problems with MRA