Research Methods II

Session 7: Micro-Simulations, and Monte Carlo Methods

Fernando Rios-Avila

Simulation Methods in Economics

What is Monte Carlo Simulation?

  • Monte Carlo simulation are a generic name given to methods that use random numbers to simulate a process.

  • In econometrics, Monte Carlo methods are used to study the properties of estimators, and to evaluate the performance of statistical tests.

  • This can be a useful tool to understand some of the properties of estimators, or even problems related to violations of assumptions.

  • It can also be used to evaluate the performance of estimators in finite samples, and to compare different estimators.

Example: Mean vs Median

  • Which of this estimators is more robust and efficient, when samples are small ?
  • Lets setup a program that would simulate this:
// define a program
capture program drop mean_vs_median
program define mean_vs_median, eclass
  syntax, [nobs(int 100)]
  clear
  ** Set  # of obs
  set obs `nobs'
  ** Generate a random variable
  gen x = rnormal(0,1)
  ** Calculate mean and median
  qui:sum x,d
  ** Store results
  matrix b = r(mean), r(p50)
  ** post results
  matrix colname b = "mean" "median"
  ereturn post b
end
mean_vs_median
ereturn display

Number of observations (_N) was 0, now 100.
------------------------------------------------------------------------------
             | Coefficient
-------------+----------------------------------------------------------------
        mean |  -.1445693
      median |  -.1970271
------------------------------------------------------------------------------

Now that the program is SET, lets run it 1000 times:

set seed 101
simulate, reps(1000): mean_vs_median, nobs(500)
sum

      Command: mean_vs_median, nobs(500)

Simulations (1,000): .........10.........20.........30.........40.........50...
> ......60.........70.........80.........90.........100.........110.........120
> .........130.........140.........150.........160.........170.........180.....
> ....190.........200.........210.........220.........230.........240.........2
> 50.........260.........270.........280.........290.........300.........310...
> ......320.........330.........340.........350.........360.........370........
> .380.........390.........400.........410.........420.........430.........440.
> ........450.........460.........470.........480.........490.........500......
> ...510.........520.........530.........540.........550.........560.........57
> 0.........580.........590.........600.........610.........620.........630....
> .....640.........650.........660.........670.........680.........690.........
> 700.........710.........720.........730.........740.........750.........760..
> .......770.........780.........790.........800.........810.........820.......
> ..830.........840.........850.........860.........870.........880.........890
> .........900.........910.........920.........930.........940.........950.....
> ....960.........970.........980.........990.........1,000 done

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
     _b_mean |      1,000     .001419     .043875  -.1660358   .1525889
   _b_median |      1,000    .0000899    .0544004  -.1851489   .1853068

Conclusion, when N=100, and the distribution is normal, the mean is more efficient than the median.

Change assumptions, from N to t-distribution

// define a program
capture program drop mean_vs_median
program define mean_vs_median, eclass
  syntax, [nobs(int 100) rt(int 5)]
  clear
  set obs `nobs'
  gen x = rt(`rt')
  qui:sum x,d
  matrix b = r(mean), r(p50)
  matrix colname b = "mean" "median"
  ereturn post b
end
 set seed 101
simulate, reps(1000) nodots: mean_vs_median, nobs(500) rt(2)
sum
simulate, reps(1000) nodots: mean_vs_median, nobs(500) rt(4)
sum
simulate, reps(1000) nodots: mean_vs_median, nobs(500) rt(6)
sum


      Command: mean_vs_median, nobs(500) rt(2)


    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
     _b_mean |      1,000   -.0199475    .2203307  -4.369244   .6070946
   _b_median |      1,000    .0002406    .0640181  -.1970648   .1889922

      Command: mean_vs_median, nobs(500) rt(4)


    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
     _b_mean |      1,000    .0007863    .0647392  -.2136446   .2056455
   _b_median |      1,000    .0024002    .0602134  -.1902502   .2014696

      Command: mean_vs_median, nobs(500) rt(6)


    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
     _b_mean |      1,000    .0007195    .0531958  -.1984012   .2054568
   _b_median |      1,000    .0008841    .0580835  -.1863388   .2035508

Properties of estimators

  • Monte Carlo methods can also be used to study the properties of estimators.
  • Consider the following example:
    • We want to study the properties of the OLS estimator when the error term is heteroskedastic.

\[y_i = \beta_0 + \beta_1 x_i + u_i*exp(\gamma x_i)\]

  • What are the consequences of heteroskedasticity in the OLS estimator?

  • lets set up a simulation to study this.

// define a program
capture program drop ols_hetero
program define ols_hetero, eclass
  syntax, [nobs(int 100) b0(real 1) b1(real 1) gamma(real 1)]
  clear
  set obs `nobs'
  gen x = rnormal(0,1)
  gen u = rnormal(0,1)
  gen y = `b0' + `b1' * x + u*exp(`gamma'*x)
  // run regression (under homoskedasticity)
  qui:reg y x
  // store results
  matrix b = _b[_cons], _se[_cons], _b[x], _se[x]
  matrix colname b = "b0" "se0" "b1" "se1"
  ereturn post b
end
simulate, reps(1000) nodots: ols_hetero, nobs(500) b0(1) b1(1) gamma(1)
sum 


      Command: ols_hetero, nobs(500) b0(1) b1(1) gamma(1)


    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       _b_b0 |      1,000    .9979773    .1210228   .6013685   1.434932
      _b_se0 |      1,000    .1196379    .0220658   .0737395   .2717805
       _b_b1 |      1,000    .9864205    .2661498   .0342865     1.8922
      _b_se1 |      1,000    .1195836    .0210792   .0808713   .2756631

Correcting for heteroskedasticity

We can correct for heteroskedasticity using robust standard errors.

// define a program
capture program drop ols_hetero
program define ols_hetero, eclass
  syntax, [nobs(int 100) b0(real 1) b1(real 1) gamma(real 1)]
  clear
  set obs `nobs'
  gen x = rnormal(0,1)
  gen u = rnormal(0,1)
  gen y = `b0' + `b1' * x + u*exp(`gamma'*x)
  // run regression (under homoskedasticity)
  qui:reg y x, robust
  // store results
  matrix b = _b[_cons], _se[_cons], _b[x], _se[x]
  matrix colname b = "b0" "se0" "b1" "se1"
  ereturn post b
end
simulate, reps(1000) nodots: ols_hetero, nobs(500) b0(1) b1(1) gamma(1)
sum 


      Command: ols_hetero, nobs(500) b0(1) b1(1) gamma(1)


    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       _b_b0 |      1,000     1.00342    .1180407   .6255065     1.3921
      _b_se0 |      1,000    .1171892    .0219696   .0807092   .3564852
       _b_b1 |      1,000     1.00355    .2589393  -.2493259   1.956252
      _b_se1 |      1,000    .2407364    .0913925   .1092693   1.174903

or using weighted least squares.

// define a program
capture program drop ols_hetero
program define ols_hetero, eclass
  syntax, [nobs(int 100) b0(real 1) b1(real 1) gamma(real 1)]
  clear
  set obs `nobs'
  gen x = rnormal(0,1)
  gen u = rnormal(0,1)
  gen y = `b0' + `b1' * x + u*exp(`gamma'*x)
  // run regression (under homoskedasticity)
  qui:reg y x, 
  predict uhat, resid
  gen lnuhat2 = ln(uhat^2)
  reg lnuhat2 x
  predict lnhx
  gen hx=exp(lnhx)
  // store results
  qui:reg y x [w=1/hx], 
  matrix b = _b[_cons], _se[_cons], _b[x], _se[x]
  matrix colname b = "b0" "se0" "b1" "se1"
  ereturn post b
end
simulate, reps(1000) nodots: ols_hetero, nobs(500) b0(1) b1(1) gamma(1)
sum 


      Command: ols_hetero, nobs(500) b0(1) b1(1) gamma(1)


    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       _b_b0 |      1,000    1.001033    .0465683   .8545606   1.252005
      _b_se0 |      1,000    .0478398    .0084316   .0288236   .1462429
       _b_b1 |      1,000    1.001063    .0277128   .8875045   1.301835
      _b_se1 |      1,000    .0270911    .0091106   .0088032   .1357872

More on Monte Carlo Simulations

  • You can use Monte Carlo simulations to study the properties of new estimators as well. (most often)

  • The structure of the simulation, however, will depend on the estimator you want to study, and may not be fully generalizable.

    • Notice that in the previous example, we assumed that all data needed to be simulated.
    • But, we could just as well simulate only “parts” of the data, and use observed data for the rest.
frause oaxaca, clear
probit lfp female educ age agesq married divorced
predict lfp_xb, xb
matrix b=e(b)
(Excerpt from the Swiss Labor Market Survey 1998)

Iteration 0:  Log likelihood = -634.26553  
Iteration 1:  Log likelihood = -453.80541  
Iteration 2:  Log likelihood = -429.25759  
Iteration 3:  Log likelihood = -428.40991  
Iteration 4:  Log likelihood = -428.40986  
Iteration 5:  Log likelihood = -428.40986  

Probit regression                                       Number of obs =  1,647
                                                        LR chi2(6)    = 411.71
                                                        Prob > chi2   = 0.0000
Log likelihood = -428.40986                             Pseudo R2     = 0.3246

------------------------------------------------------------------------------
         lfp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      female |  -1.644408   .1498206   -10.98   0.000    -1.938051   -1.350765
        educ |   .1051167   .0251728     4.18   0.000     .0557789    .1544544
         age |   .1045008   .0385916     2.71   0.007     .0288627    .1801389
       agesq |  -.0012596   .0004463    -2.82   0.005    -.0021344   -.0003848
     married |  -1.692076   .1887409    -8.97   0.000    -2.062001    -1.32215
    divorced |    -.68518   .2319883    -2.95   0.003    -1.139869   -.2304912
       _cons |   .4413072   .7648821     0.58   0.564    -1.057834    1.940448
------------------------------------------------------------------------------
* Latent model LFP =1(lfp_xb + e>0)
capture program drop probit_sim
program  probit_sim, eclass
  capture drop lfp_hat
  gen lfp_hat=(lfp_xb + rnormal(0,1))>0
  probit lfp_hat female educ age agesq married divorced, from(b, copy)
end
simulate _b _se, reps(500) nodots: probit_sim
ren lfp_hat* *
sum ,sep(7)


      Command: probit_sim


    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   _b_female |        500    -1.67208    .1594671  -2.262682  -1.280128
     _b_educ |        500     .108218    .0251913   .0453378   .1974491
      _b_age |        500    .1022977     .038849   -.004108   .2443264
    _b_agesq |        500   -.0012359    .0004483  -.0029406  -.0000835
  _b_married |        500   -1.720405    .1903062  -2.425838  -1.247641
 _b_divorced |        500   -.7046396    .2310935  -1.387282   .1018858
     _b_cons |        500    .5167748    .7899037  -1.920398   2.650355
-------------+---------------------------------------------------------
  _se_female |        500    .1548574    .0187049   .1234346   .2547092
    _se_educ |        500    .0255194    .0012991   .0223761    .029805
     _se_age |        500    .0389973    .0017384   .0336116   .0466778
   _se_agesq |        500    .0004507    .0000184   .0003941   .0005336
 _se_married |        500    .1987176    .0234131   .1538517   .3142596
_se_divorced |        500    .2410041    .0207673    .199707   .3420741
    _se_cons |        500    .7709869    .0397739   .6426999   .9831291
  • To some extent, this is similar to the imputation methods we have seen before.
  • More complex versions of this can be used to elaborate micro-simulations.

What are micro-simulations?

  • Micro-simulation is a technique that is used to make micro units act and interact in a way that it is possible to aggregate to the level of interest.

  • A micro simulation model can be seen as a set of rules, which operates on a sample of micro units (individuals, households, firms, etc.) to produce a set of outcomes.

  • The goal is to produce synthetic datasets that can be used to estimate the effects of policy changes.

  • Because micro-simulations are based on micro-data, they have the potential to capture the heterogeneity of the population. (decisions, preferences, etc.)

  • We can also think about micro-simulations as a way to simulate/predict/impute the behavior of a population of interest.
    • Thus a lot of what you learned in terms of Modeling, imputations, and matching can be applied here.

So what do we need?

  • We need a population of micro units (individuals, households, firms, etc.) that is representative of the population of interest. (survey data)

  • Details on a policy change that we want to simulate. (policy parameters)

  • A set of rules that describe how the micro units interact with each other and with the policy change. (model for behavior)

  • An outcome of interest that we want to study. (outcome variable)

How do we do it?

  • Depending on the type of analysis we want to do, the structure of a micro-simulation can be very simple or very complex.

  • Consider the following example:

    • We want to study the effect of a policy that aims to increase the minimum wage in the labor market. What effects would this have?
      • Higher wages ?
      • increase/decrease in employment?
      • Changes in the distribution of wages?
      • Changes in the Economic structure?
  • More complex models require more sophisticated interactions between the micro/and macro units and the policy change.

  • However, simpler models can be useful at least to study first order/statistical effects of a policy change.

Example: The case of higher education

  • Consider the following. The government wants to increase educational attainment in the population.

  • To do so, they want to evaluate the impact that a 2 additional years for people with less than 12 years of education would have on the population.

  • How do we do this?

  • For simplicilty , lets use the oaxaca dataset

We can start by modeling the effect of education on wages.

\[log(wage)= \beta X + \beta_e educ + u\]

frause oaxaca, clear
reg lnwage educ exper tenure female age, 
predict  res, res
(Excerpt from the Swiss Labor Market Survey 1998)

      Source |       SS           df       MS      Number of obs   =     1,434
-------------+----------------------------------   F(5, 1428)      =    101.03
       Model |   105.60186         5   21.120372   Prob > F        =    0.0000
    Residual |  298.517944     1,428  .209046179   R-squared       =    0.2613
-------------+----------------------------------   Adj R-squared   =    0.2587
       Total |  404.119804     1,433  .282009633   Root MSE        =    .45722

------------------------------------------------------------------------------
      lnwage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        educ |   .0743986   .0052612    14.14   0.000     .0640782    .0847191
       exper |   .0022628   .0018958     1.19   0.233    -.0014562    .0059817
      tenure |   .0021805   .0019783     1.10   0.271    -.0017001    .0060611
      female |  -.1321951   .0254327    -5.20   0.000    -.1820846   -.0823056
         age |   .0136344   .0017751     7.68   0.000     .0101523    .0171165
       _cons |   1.985785   .0732567    27.11   0.000     1.842083    2.129488
------------------------------------------------------------------------------
(213 missing values generated)

ID the policy change

clonevar educ2=educ
replace educ=educ+2 if educ<12
predict yhat2, xb
(1,099 real changes made)
(213 missing values generated)

So what is the effect on wages? (if there is no selection bias)

gen wage = exp(lnwage)
gen wage2 = exp(yhat2+res)
gen wage_diff = wage2-wage
tabstat wage_diff
replace educ = educ2
(213 missing values generated)
(213 missing values generated)
(213 missing values generated)

    Variable |      Mean
-------------+----------
   wage_diff |  2.947924
------------------------
(1,099 real changes made)

Wage has increased in 2.95.

But is this the only effect??

  • What about the effect on employment?

  • This is a more complex model, and we need further assumptions

    • Assume anyone who wants to work, will find a job.
    • Those employed remain employed
    • Those non-employed will transition to employment marginally

\[P(lfp=1|X,educ)= \beta X + \beta_e educ + u \]

First model the probability of employment

frause oaxaca, clear
qui:probit lfp educ female age single married kids6 kids714
predict lfp_xb, xb
predict pr_org, pr

replace lfp_xb = lfp_xb + _b[educ] * 2 if educ<12
gen plfp = normal(lfp_xb)

** Before and after the policy change
sum plfp pr_org 
(Excerpt from the Swiss Labor Market Survey 1998)
(1,099 real changes made)

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        plfp |      1,647    .8995986    .1614962   .0777393   .9999999
      pr_org |      1,647    .8707874    .1925782   .0604887   .9999999

So now we have the original probability of employment and the probability of employment after the policy change.

How do we know who will transition from not working to working?

  • Option 1. Assign new workers based on the relative change in the probability of employment.
gen dprob = (plfp-pr_org)/pr_org
clonevar lfp_post1 = lfp
replace lfp_post1 =1 if lfp==0 & dprob>runiform()
sum  lfp_post1 lfp
(35 real changes made)

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   lfp_post1 |      1,647    .8919247    .3105698          0          1
         lfp |      1,647     .870674    .3356624          0          1
  • Option 2. Simulate employment status based on the original and post-policy probability of employment.
** Option 2
drop2 unf lfp_org lfp_post
gen unf = runiform()
gen lfp_org  = pr_org>unf
gen lfp_post = plfp  >unf

sum lfp_post  lfp_org
variable unf not found
variable lfp_org not found
variable lfp_post not found

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
    lfp_post |      1,647    .9046752    .2937523          0          1
     lfp_org |      1,647    .8737098    .3322771          0          1
  • Both options are valid, but
    • First one requires to impute wages for the new workers only.
    • Second one requires to impute wages for the entire population.

Imputing wages:

  • We could assume no selection bias.
  • We may need to use only data available for everyone, or use imputed data (exper tenure) (perhaps at 0?)
reg lnwage educ female age single married kids6 kids714, 
predict  res, res
predict lnwage_hat, xb
** Simulating Known wages component
replace lnwage_hat = lnwage_hat + _b[educ] * 2 if educ<12
** Simulating random component
** For those already working:
replace lnwage_hat = lnwage_hat + res if lfp==1
** Simulate unobserved
qui: sum res, 
replace lnwage_hat = lnwage_hat + rnormal(0,r(sd)) if lfp_post1==1 & lfp==0

gen wage_post = exp(lnwage_hat) if lfp_post1==1 | lfp==1
gen wage = exp(lnwage)
sum wage wage_post
sgini wage wage_post

      Source |       SS           df       MS      Number of obs   =     1,434
-------------+----------------------------------   F(7, 1426)      =     79.23
       Model |  113.158257         7  16.1654653   Prob > F        =    0.0000
    Residual |  290.961547     1,426  .204040355   R-squared       =    0.2800
-------------+----------------------------------   Adj R-squared   =    0.2765
       Total |  404.119804     1,433  .282009633   Root MSE        =    .45171

------------------------------------------------------------------------------
      lnwage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        educ |   .0709179   .0050262    14.11   0.000     .0610584    .0807774
      female |  -.1427501   .0244736    -5.83   0.000    -.1907583   -.0947419
         age |    .016475   .0014033    11.74   0.000     .0137222    .0192277
      single |  -.0711724   .0443651    -1.60   0.109    -.1582002    .0158554
     married |  -.0977016   .0379654    -2.57   0.010    -.1721755   -.0232276
       kids6 |   .1085073   .0239699     4.53   0.000     .0614873    .1555272
     kids714 |   .0656187    .019681     3.33   0.001      .027012    .1042254
       _cons |   1.999222   .0902563    22.15   0.000     1.822173    2.176272
------------------------------------------------------------------------------
(213 missing values generated)
(1,099 real changes made)
(1,434 real changes made)
(35 real changes made)
(178 missing values generated)
(213 missing values generated)

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        wage |      1,434    32.39167    16.12498   1.661434   192.3077
   wage_post |      1,469    35.19396    16.94609   1.873127   221.6129

Gini coefficient for wage, wage_post

-----------------------
    Variable |      v=2
-------------+---------
        wage |   0.2460
   wage_post |   0.2356
-----------------------

This is equivalent to running a single imputation round. However, we could repeat the process M times to get a measure for the precission of our estimates.

What else can we do?

  • We could go further and also simulate where would people work, and how would the economy change.
  • We could also account for selection problems
  • or make a more explicit model for distributional analysis.

1st Micro-simulation

a Review

  • We have just use a simple micro-simulation to study the effect of a policy change (education) on wages and employment.

  • This simulation had many assumptions, and we could have done it in many different ways.

  • Among others, we assume no selection bias, with an instantaneous change in education, and no again of the population.

  • We also made important assumptions regarding the transition from non-employment to employment, and the imputation of wages.

Not the only way to do it

  • While many micro-simulations are based on stochastic simultions, there are other ways to do it.

  • In Hotckiss et al. (forthcoming), we use a deterministic micro-simulation to study the effect of tax-reforms on welfare changes and its distribution for the - first

  • How did we do it?

Tax Reform on Households Welfare

  • We concentrated mostly on couple households with young children (0-18 years old).
  • Using a heckman selection model, we impute wages for the non-working people (based on PMM).
  • Using observed and imputed wages, we impute the tax liability for each household before and after the reform. (TAX-SIM), and estimate after-tax wages.
  • Estimate Household Labor Supply based on HH utility and Non-linear Tobit model
  • Use HLS models, we make predictions for Labor Supply changes and utility changes given the Reform.
    • The Outcome was how much better/worse off would households be after the reform.

Levy Institute Microsimulation Model (LIMM)

Intro

  • At Levy, we have also constructed a micro-simulation model to study employment simulations.
  • Method first developed for estimating the impact of the American Recovery and Reinvestment Act of 2009
    • Convert spending into jobs by industry and occupation (I/O matrix)
    • Assign potential workers to jobs
    • Predict earnings and hours
  • For the work with LIMEW and LIMTIP, this has also been used for distributional analysis of employment assigment and services use.

LIMM: Step I

Job Creation

  • Consider a policy: Road construction, Services provision, etc.
    • Calculate changes in final demand for each industry the policy creates
    • Using I-O tables estimate change in total output for each industry
    • Use that change in output to estimate change demand for labor inputs
      • Transform the changes from labor imputs to generated jobs (consider wages)
    • Distribute changes across occupations (within industry)
      • Using, for example, shares of employment by occupation within industry
  • With this we have a total number of jobs created by industry and occupation.

LIMM: Step II

Job Assigment

  • Given the Total change in Jobs, we need to assign workers to those jobs.
    • Who are the potential workers?
      • Not in LF: Potential, but not looking for work (may depend on characteristics). Avoid retired, disabled and students
      • Unemployed: Most likely to take a job (pool may not be large enough)
      • Underemployed: Working part-time, but willing to work full-time, or people aiming for better jobs. (May create job openings)
      • and Employed: May be willing to change jobs (may create job openings)

Job Assigment II

  • Two Steps, modeling job creation and job assigment

  • A probit/logit model would be estimated to predict the likelihood of working, and this will be used to assign jobs.

    • Job assignment is done using a multinomial model (mprobit) to predict likelihoods of working on a given occupation & industry.

    • Ideally, you would like to do both at the same time, but that is a very complex model to estimate. Instead, each one is estimated separately.

\[\begin{aligned} I^*(ind = 1 ) &= \beta_1 X + e_1 \\ I^*(ind = 2 ) &= \beta_2 X + e_2 \\ &\vdots \\ I^*(ind = k ) &= \beta_k X + e_k \end{aligned} \]

Where \(I^*(ind = k )\) is the latent likelihood of working on industry \(k\) given characteristics. Mprobit or Mlogit depends on the distribution of \(e_k\).

You choose Industry \(k\) if \(I^*(ind = k )\) is the highest among all industries.

Job Assigment III

  • Job Assigment is done as follows:

    • For each potential worker, Calculate Prob of working. Those with the highest probability are assigned jobs first.

    • For Worker, \(i\), the individual is assigned to the industry with the highest likelihood of working. (until all jobs are assigned)

      • Can be done followed by doing similar assigment for occupation within industry.
      • Or combine both industry and occupation (p_o * p_i)

Note

  • Job allocation must consider jobs available and likehood of working on a given Occupation/Industry.
  • May want to avoid deterministic assignment.
  • could assigned based on random draws from the distribution of likelihoods.

Earning contributions

  • Because some of the newly created jobs go to people who are employed in Family businesses (Farm and non farm income), one has to account for the “loss” of income from those jobs.

  • This is done by estimating the contribution of each member to the family business, and imputing the loss of income from the now “formally employed” member.

    • For example, modeling farm income, predicting that income without the family member, and imputing the difference as the loss of income.

Job Assigment IV: Wages and Hours

  • Hours and Wages are imputed using a heckman model, and multiple stage process
  1. Use a probit model to predict the likelihood of working.
  2. Given the model obtain the inverse mills ratios. \(imr = \phi(\alpha'x)/\Phi(\alpha'x)\)
  3. Model wages and hours using IMR as a regressor. (Heckit model) (This includes info on imputed occupations/industry)
  4. Use Imputed wages and hours to Match data to “donor” individuals. (PMM)
  5. For people “potentially” leaving family business, compare new wages to family business contributions

Hours of HP re-assigment

  • People taking jobs may have to change their contributions to household production. (less time for HP)
  • But the rest of the family may also have to adjust their contributions to HP. (more time for HP?)
    • This is done via matching, where the “donor” are working individuals with similar family characteristics.
    • Or All working individuals
    • Currently, “recipients” are all individuals in the household, with atleast one new worker.
      • May be more sensible to impute hours
      • One may argue those not working may have to increase hours of HP. Or do nothing

Other considerations

  • Assigment of Child Care services

Assessing the quality of the simulation

  • In principle, there is no way to know if the simulation is correct.
    • There is no “true” value to compare to.
    • However, one may want to at least ensure the simulation replicates the data.
    • One can potetially use the simulation to predict changes of a past policy, and compare to the actual changes.
  • We do want to make “sanity” checks
    • Are results consistent and plausible?
    • Are distributions of post-assigment outcomes consistent with the data?

Limitations

  • Since we use existing data, we make the implicit assumption “no behavioral changes” in other aspects
    • Things change as far as we can model them.
  • Results are typically point estimates, and do not account for uncertainty.
    • We can use multiple imputation, monte carlo simulations, or bootstrapping to account for uncertainty.
  • The results are only as good as the data we use and the assumptions we make.
    • We can use sensitivity analysis to test the robustness of the results to changes in assumptions.

Thats All Folks!