Research Methods II

Session 3: Measuring Inequality

Fernando Rios-Avila

Measuring Inequality

What is Inequality?

  • Economic inequality refers to how economic variables are distributed among individuals in a group, among groups in a population, or among countries.
  • Inequality of What?
    • inequality of opportunities, for example access to employment or education
    • inequality of outcomes, for example material dimensions of human well-being, such as the level of income, educational attainment, health status and so on.
  • For now we will focus on income inequality.

How do you analyze (measure) inequality?

  • There are various approaches that have been used for the analysis of Inequality

    • Intuitive approach
      • Unaxiomatic approach used to describe inequality.
    • Normative approach-Social welfare
      • Uses explicit concepts of welfare functions to quantify inequality
    • Information theory
      • Quantifies inequality treating it as a problem of comparing income distribution probabilities.
    • Axiomatic approach
      • Uses a series of axioms to create measures of inequality

Preliminaries

  • Regardless of the approach, there are some basic steps required to measure inequality
    • Define the population of interest
    • Define the measure of interest
    • Adjust for prices (if necessary)
    • Adjust for individual heterogeneity (needs) (if necessary)

Mathematical Preliminaries

  • Let \(y_i\) be the income of individual \(i\) in the population. Assume that \(y_i>>0\).
  • Assume that \(y_i\) can be characterized by a probability distribution function \(f(y)\).

\[\begin{aligned} y_i &\sim f(y) \rightarrow \int_{-\infty}^{z} f(y) dy = F(z) \\ F(0)&=0 \ \& \ F(\infty)=1 \\ F(Q_y(p)) &=p \rightarrow Q_y(p) = F^{-1}(p) \end{aligned} \]

The \(p_{th}\) quantile of \(y_i\) is the value \(Q_y(p)\) such that \(p\) percent of the population has income below \(Q_y(p)\).

Mathematical Preliminaries

Mean of Standard of Living:

\[\mu_y = E(y) = \int_{-\infty}^{\infty} y f(y) dy=\int_0^1Q(p)dp \]

Finally, the inequality measure can be written as:

\[I(y)=I(\mu_y, f(.)) = I(\mu_y, F(.)) \]

Visualization tools

  • There are several tools that can be used to visualize income distribution:
    • Density Function/Histogram
    • Pen parade/Cumulative Distribution Function
    • Lorenz Curve

  • Density functions and histograms are used to visualize the distribution of income in the population.

  • They could be used to detect multimodality, skewness, etc

  • And could be used to compare distributions across groups.

  • Stata Commands

    histogram varname [weight] [if]

    kdensity varname [weight] [if]

Code
set scheme white2
set linesize 255
color_style tableau
qui:frause oaxaca, clear
sum wt, meanonly
gen int wt2 = round(wt/r(min))
qui:two histogram lnwage [fw=wt2] ///
    || kdensity lnwage [w=wt2], ///
    ysize(5) xsize(9) xtitle("Log Wages") ///
    legend(order(1 "Histogram" 2 "Kernel Density") pos(6) col(2)) 

  • A different approach to visualize the distribution.
  • The Pen Parade plots the values of the variable of interest in ascending order.
    • y-axis: Q(p); x-axis: p
  • The CDF plots the cumulative distribution of the variable of interest.
    • y-axis: p; x-axis: Q(p)
  • They give you a sense of the distribution, and easy comparison across high and low values.
Code
qui:pctile qlnwage = lnwage [w=wt], nq(100)
qui:gen  qwage = exp(qlnwage)
qui:gen p = _n if _n<100
scatter qwage p, connect(l) name(m1, replace) ysize(5) xsize(8) title("Pen Parade")
scatter p qwage, connect(l) name(m2, replace) ysize(5) xsize(8) title("Cumulative Distribution Function")

  • Perhaps the most popular tool to visualize income distribution.

  • This curve plots the cumulative share of income vs the cumulative share of population.

    • How much of total income is held by the bottom X% of the population ?
  • “Easy” to read:

    • The closest it is to the 45 degree line, the more equal the distribution.
  • Not easy to implement with negative and zero incomes.

  • Comparison across groups may be ambiguous.

  • its always increasing at an increasing rate respect to X%

Assume data is sorted by income

x-axis: Cum share of population

\[P_j = \frac{\sum_{i=1}^j w_i}{\sum_{i=1}^n w_i}\]

y-axis: Cum share of income

\[LC_j = \frac{\sum_{i=1}^j y_i w_i}{\sum_{i=1}^n y_iw_i}\]

frause oaxaca, clear
gen wage = exp(lnwage)
sort wage wt  // sort by income and weight 
// Estimate Totals for non missing data
egen twage = sum(wage * wt) if wage!=.
egen tpop  = sum(wt)        if wage!=.
// get cumulative shares
gen lc_i = sum( (wage*wt/twage) )*100 if wage!=.
gen p_i  = sum( (wt/tpop) )*100      if wage!=.
Code
two (line lc_i p_i) /// Lorenz Curve
    ( function x, range(0 100) ) , /// 45 degree line
    aspect(1) ysize(5) xsize(8) ///
    xtitle("Cumulative Share of Population") ///
    ytitle("Cumulative Share of Income") ///
    legend(off)

Lorenz Curve
ssc install glcurve // installs command for Generalized Lorenz Curve
glcurve wage [aw = wt], /// provides variable and weight
    lorenz // Request ploting the Lorenz Curve

Inequality Measures

There are several measures of inequality. The most popular are:

  • Interquantile Range (IQR) (or normalizations) \[IQR(\#1,\#2) = Q(\#2) - Q(\#1) \]

  • Interquantile Share Ratio (Palma ratio (10/40)) \[ISR(\#1,\#2) = \frac{1-LC(\#2)}{LC(\#1)} \]

  • Coefficient of Variation (CV) \[CV = \frac{\sigma_y}{\mu_y} \]

Inequality Measures

  • Lorenz Curve:

\[LC(p) = \frac{\int_0^p Q_y(u)du}{\int_0^1 Q_y(u)du} = \frac{1}{\mu_y} \int_0^p Q_y(u)du \]

  • Properties 1: Lorenz Curve is a non-decreasing function of \(p\).

\[\frac{\partial LC(p)}{\partial p} = \frac{Q_y(p)}{\mu_y} \geq 0\]

  • Properties 2: Lorenz Curve is a concave function of \(p\) (increases at a fasterate). \[\frac{\partial^2 LC(p)}{\partial p^2} = \frac{1}{\mu_y f(y)} \geq 0\]

Inequality Measures: Gini Coefficient

  • The Gini coefficient is the most popular measure of inequality.
  • It is defined as (2x) the area between the Lorenz Curve and the 45 degree line.

\[Gini(y) = 2 \int_0^1 (p-LC(p)) dp\]

  • where \(p-LC(p)\) is the “loss” of income the Bottom \(p\) percent of the population experiences.

  • It is bounded between 0 (perfect Equality) and 1 (complete Inequality).

  • When Lorenz do not cross, Gini provides unambiguous ranking of inequality.

\[Gini(y) = \frac{2}{\mu_y} Cov(y_p,p)\]

Implementation

  • Stata has plenty of commands that can be used to estimate Gini
    • search gini for few examples
  • I suggest 3 commands:
    • fastgini (ssc install fastgini)
    • ineqdeco (ssc install ineqdeco)
    • sgini (ssc install sgini)
    • rif (ssc install rif)

capture:ssc install sgini
sgini wage 

Gini coefficient for wage

-----------------------
    Variable |      v=2
-------------+---------
        wage |   0.2460
-----------------------
capture:ssc install ineqdeco
ineqdeco wage [pw=wt]
 
Percentile ratios

----------------------------------------------------------
  All obs |    p90/p10     p90/p50     p10/p50     p75/p25
----------+-----------------------------------------------
          |      3.154       1.694       0.537       1.771
----------------------------------------------------------
  
Generalized Entropy indices GE(a), where a = income difference
 sensitivity parameter, and Gini coefficient

----------------------------------------------------------------------
  All obs |     GE(-1)       GE(0)       GE(1)       GE(2)        Gini
----------+-----------------------------------------------------------
          |    0.23199     0.14240     0.12282     0.13398     0.26273
----------------------------------------------------------------------
   
Atkinson indices, A(e), where e > 0 is the inequality aversion parameter

----------------------------------------------
  All obs |     A(0.5)        A(1)        A(2)
----------+-----------------------------------
          |    0.06292     0.13273     0.31693
----------------------------------------------

Other Inequality Measures

  • There are other approaches that can be used to measure inequality.

    • Normative approach-Social welfare: Uses explicit concepts of welfare functions to quantify inequality

\[I_A(y,\varepsilon) = 1 - \left( \frac{1}{N} \sum_{i=1}^N \left(\frac{y_i}{\mu_y}\right)^{1-\varepsilon} \right)^\frac{1}{1-\varepsilon} \]

where is a measure of inequality aversion.

  • Information theory: Quantifies inequality treating it as a problem of comparing income distribution probabilities. How far are we from Full Entropy

\[I_{GE}(Y,\alpha)=\frac{1}{\alpha(1-\alpha)}\left[\frac{1}{N} \sum \left(\frac{y_i}{\mu_y}\right)^\alpha -1\right] \]

  • Axiomatic approach
    • Uses a series of axioms to create measures of inequality

How to Compare Inequality

a note

Significance test

  • As discussed in Session 1, we can use a t-test to compare means.
    • This requires estimating the standard error of the mean, use mean command, or regress
  • Similarly, it may be as important to test whether two distributions (or inequality measures) are different.
    • This requires estimating the standard error of the inequality measure.
    • This is not as straightforward as the mean.
  • Easiest methods:
    • Bootstrap: requires bootstrap weights for survey data.
    • Influence function: requires deriving the influence function of the inequality measure.

Bootstrap

  • Bootstrap its a non-parametric method to estimate the standard error of a statistic. Its based on Resampling and re-estimating data.

bootstrap gini=r(coeff): sgini wage


warning: sgini does not set e(sample), so no observations will be excluded from the resampling because of missing values or other reasons. To exclude observations, press Break, save the data, drop any observations that are to be excluded, and rerun
         bootstrap.

Bootstrap results                                        Number of obs = 1,647
                                                         Replications  =    50

      Command: sgini wage
         gini: r(coeff)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        gini |   .2460329   .0063639    38.66   0.000     .2335599    .2585059
------------------------------------------------------------------------------
  • RIF (Recentered Influence Function) is a method that uses the moment conditions to estimate the standard error of a statistic.

rifhdreg wage , rif(gini)


Linear regression                               Number of obs     =      1,434
                                                F(0, 1433)        =       0.00
                                                Prob > F          =          .
                                                R-squared         =     0.0000
                                                Root MSE          =     .24613

------------------------------------------------------------------------------
             |               Robust
        wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _cons |   .2460329   .0064995    37.85   0.000     .2332833    .2587825
------------------------------------------------------------------------------
Distributional Statistic: gini
Sample Mean    RIF gini :  .24603
  • Better yet, because you can use regressions (RIF-regressions), you can use weights, and test for differences in inequality across groups.

rifhdreg wage ibn.female [pw=wt], rif(gini) over(female) noconstant


Linear regression                               Number of obs     =      1,434
                                                F(2, 1432)        =     640.56
                                                Prob > F          =     0.0000
                                                R-squared         =     0.5056
                                                Root MSE          =     .25631

------------------------------------------------------------------------------
             |               Robust
        wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      female |
          0  |   .2377884   .0084103    28.27   0.000     .2212907    .2542862
          1  |   .2820059   .0128488    21.95   0.000     .2568015    .3072103
------------------------------------------------------------------------------
Distributional Statistic: gini
Sample Mean    RIF gini :  .25805

rifhdreg wage i.female [pw=wt], rif(gini) over(female)


Linear regression                               Number of obs     =      1,434
                                                F(1, 1432)        =       8.29
                                                Prob > F          =     0.0040
                                                R-squared         =     0.0073
                                                Root MSE          =     .25631

------------------------------------------------------------------------------
             |               Robust
        wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
    1.female |   .0442175   .0153565     2.88   0.004     .0140938    .0743412
       _cons |   .2377884   .0084103    28.27   0.000     .2212907    .2542862
------------------------------------------------------------------------------
Distributional Statistic: gini
Sample Mean    RIF gini :  .25805

How to Decompose Inequality

Introduction

  • Some times, we may be interested in determining what factors, and what extent, explain inequality.
    • But what do we mean by “explain”?
  • There are several approaches to decompose inequality.
    • by sources: Explain how inequality is related to source of income
    • by groups: How much of the inequality is explained by inequality within groups, and between groups.
  • We could also consider how inequality gaps are related to characteristics or returns to such characteristics.

Decompose by sources

  • Some inequality indices are well suited to decompose inequality by sources.
    • Variance Decomposition (Shorrocks 1982)
    • Gini Decomposition by source (Lerman and Yitzhaki 1985)
  • The idea is assess inequality (or concentration) of each source of income, and then combine them to obtain their contribution to overall inequality.

Variance Decomposition

Setup: \[\begin{aligned} Y &= y_1 + y_2 + \cdots + y_n \\ I_V(Y) &= V(Y) = Cov(Y,Y) \\ &= Cov(Y,y_1) + Cov(Y,y_2) + \cdots \\ \end{aligned} \]

Covariance, however could be rewritten as: \(Cov(Y,y_k)=\rho_k \sigma_Y \sigma_{y_k}\), thus

\[\begin{aligned} V(y) = \sigma^2_Y &= \rho_1 \sigma_Y \sigma_{y_1} + \rho_2 \sigma_Y \sigma_{y_2} + \cdots \\ \sigma_Y&= \rho_1 \sigma_{y_1} + \rho_2 \sigma_{y_2} + \cdots \\ \end{aligned} \]

Finally, if we divide all by \(\mu_y\) and multiply by \(\mu_{yk}/\mu_{yk}\), we get:

\[\begin{aligned} \frac{\sigma_Y}{\mu_y} = CV(y) &= \rho_1 \frac{1}{\mu_{y}} \frac{\mu_{y1}}{\mu_{y1}}\sigma_{y_1} + \rho_2 \frac{1}{\mu_{y}} \frac{\mu_{y2}}{\mu_{y2}} \sigma_{y_2} + \cdots \\ &= \rho_1 sh_1 CV(y_1) + \rho_2 sh_2 CV(y_2) + \cdots \\ \end{aligned} \]

Example Stata

ssc install ineqfac

frause limew1972.dta, clear
capture: ssc install ineqfac
ineqfac basemi inchomewealth incnonhomewealth govtr_n pubcon tx_n valhp [aw=hhwgt]
 
Inequality decomposition by factor components (Shorrocks' method)
-------------------------------------------------------------------------------
Factor              |  100*s_f     S_f   100*m_f/m  rho_f   CV_f CV_f/CV(Total)
--------------------+----------------------------------------------------------
basemi              |   29.115    0.284   44.434   0.711   0.900    0.922
inchomewealth       |    1.872    0.018    4.302   0.318   1.335    1.367
incnonhomewealth    |   43.136    0.421    7.905   0.754   7.065    7.235
govtr_n             |   -0.979   -0.010    5.868  -0.107   1.528    1.565
pubcon              |    2.091    0.020    8.303   0.238   1.035    1.060
tx_n                |   15.629    0.153   11.462   0.754   1.767    1.809
valhp               |    9.136    0.089   17.726   0.502   1.003    1.027
--------------------+----------------------------------------------------------
Total               |  100.000    0.977  100.000   1.000   0.977    1.000
-------------------------------------------------------------------------------
Note: s_f is the proportionate contribution of factor f
 to inequality of Total, where ...
 s_f = rho_f*sd(f)/sd(Total) 
     = rho_f*[m(factor_f)/m(totvar)]*[CV(factor_f)/CV(totvar)].
 S_f = s_f*CV(Total). rho_f = corr(f,Total). 
 m_f = mean(f). sd(f) = std.dev. of f. CV_f = sd(f)/m_f.

Gini Decomposition by source

  • The Gini index could also be decomposed by source of income.

\[\begin{aligned} Gini(y) &= \frac{2}{\mu_y} Cov(y,F(y)) = \frac{2}{\mu_y} Cov\left(\sum y_k,F(y)\right) =\frac{2}{\mu_y} \sum Cov\left( y_k,F(y)\right) \end{aligned} \]

  • So Gini is the sum of the covariance of each source of income with \(F(y)\)

\[\begin{aligned} Gini(y) &= \sum \frac{2}{\mu_y} \times Cov( y_k,F(y)) \times \frac{\mu_{yk}}{\mu_{yk}} \times \frac{Cov( y_k,F(y_k))}{Cov ( y_k,F(y_k))} \\ &=\sum \frac{Cov ( y_k,F(y) )}{Cov ( y_k,F(y_k) )} \times \frac{2Cov( y_k,F(y_k)}{\mu_{yk}} \times \frac{\mu_{yk}}{\mu_y} \\ &= \sum R_k \times G_k \times sh_k \end{aligned} \]

Gini Decomposition by source

\[\begin{aligned} Gini(y)&= \sum R_k \times G_k \times sh_k \end{aligned} \]

  • \(R_k\) = Gini Correlation; \(G_k\) = Gini of source \(k\); \(sh_k\) = share of source \(k\) in total income.
  • \(R_k\time G_k\) is the Concentration Index of \(Y_k\)

Example Stata

You probably already have it installed

ssc install sgini

sgini basemi inchomewealth incnonhomewealth govtr_n pubcon tx_n valhp [aw=hhwgt], source

Gini coefficient for basemi, inchomewealth, incnonhomewealth, govtr_n, pubcon, tx_n, valhp

Note: inchomewealth has 690 negative observations (used in calculations).
Note: incnonhomewealth has 10359 negative observations (used in calculations).
-----------------------
    Variable |      v=2
-------------+---------
      basemi |   0.4787
inchomewea~h |   0.6161
incnonhome~h |   0.9503
     govtr_n |   0.7099
      pubcon |   0.4913
        tx_n |   0.6036
       valhp |   0.5021
-----------------------

Decomposition by source:
  TOTAL =  basemi +  inchomewealth +  incnonhomewealth +  govtr_n +  pubcon +  tx_n +  valhp


Parameter: v=2
--------------------------------------------------------------------------------
             |    Share   Coeff.    Corr.    Conc.  Contri.  %Contri. Elasticity
    Variable |        s        g        r    c=g*r    s*g*r   s*g*r/G  s*g*r/G-s
-------------+------------------------------------------------------------------
      basemi |   0.4443   0.4787   0.8870   0.4246   0.1887   0.4868    0.0425
inchomewea~h |   0.0430   0.6161   0.4617   0.2845   0.0122   0.0316   -0.0114
incnonhome~h |   0.0790   0.9503   0.7724   0.7340   0.0580   0.1497    0.0707
     govtr_n |   0.0587   0.7099  -0.2298  -0.1631  -0.0096  -0.0247   -0.0834
      pubcon |   0.0830   0.4913   0.4593   0.2257   0.0187   0.0483   -0.0347
        tx_n |   0.1146   0.6036   0.8776   0.5297   0.0607   0.1567    0.0420
       valhp |   0.1773   0.5021   0.6601   0.3314   0.0587   0.1516   -0.0257
-------------+------------------------------------------------------------------
       TOTAL |   1.0000   0.3876   1.0000   0.3876   0.3876   1.0000    0.0000
--------------------------------------------------------------------------------

Decompose by groups

  • There is a second type of decomposition one may be interested in.
    • How much of the inequality is explained by inequality within groups, and between groups.
  • For example, consider two cases:
    • Two (eq size) groups that have access to the same level of income, but within each group, all resources are held by one individual.
    • Two (eq size) groups, one has 80% of the income, and the other 20%, but within groups income is equally distributed.
  • It is possible to understand the source of inequality by decomposing it by groups.
    • Entropy Indices (and Atkinson) are well suited for this type of decomposition. (see help ineqdeco)
    • GINI is not as straight forward but possible.

GINI Decomposition by groups

  • Decomposition of the GINI coefficient by groups Milanovic and Yitzhaki (2002)

  • The method: To decompose the GINI by groups, one can use the following:

\[\begin{aligned} Gini(y) &= \sum_{k=1}^K s_k O_k Gini(y_k) + Gini_{bw} \end{aligned} \]

  • where \(s_i\) is the share of group \(i\) in total income, \(Gini(y_k)\) is the Gini of group \(k\), \(O_k\) is a measure of overlapping across groups, and \(Gini_{bw}\) is the Gini between groups.

\[Gini_{bw} = \frac{2}{\mu_y} Cov(\mu_i,\bar F_i)\]

Overlapping

  • Overlapping \(O_k\) measures to what extend the distribution of income in group \(k\) overlaps with the distribution of income in other groups.

  • If there is no overlapping, then \(O_k=p_i\) (the population share of group k, and incomes are fully stratified).

  • Otherwise, this adjustment factors ensures that the sum of the Gini of each group is equal to the Gini of the total population + Between Gini.

Example Stata

ssc install anogi (Tom Masterson is one of the authors)
ssc install moremata (needed for anogi)

capture:ssc install anogi
capture:ssc install moremata
anogi limew [aw= hhwgt ], by(educl) detail

Analysis of Gini

--------------------------------------------------
                          |      Coef.          %
--------------------------+-----------------------
Overall Gini              |   .3875623     100.00
                          |
G_wo = sum s_i*G_i*O_i    |   .3405469      87.87
G_b                       |   .0470154      12.13
                          |
IG   = sum s_i*G_i        |   .3657639      94.38
IGO  = sum s_i*G_i(O_i-1) |   -.025217      -6.51
BGp  = G_bp               |   .1317698      34.00
BGO  = G_b - G_bp         |  -.0847544     -21.87
--------------------------+-----------------------
Mean of limew             |    20403.6
N. of obs                 |      44872
N. of subgroups           |          4
--------------------------------------------------


Detailed statistics for subgroups

-------------------------------------------------------------------------------------------
             |         N          p       mean          s          G          O          F 
-------------+-----------------------------------------------------------------------------
           1 |  18244.58   .4065916   15466.07   .3081992   .3844549   .9986868   .3963075 
           2 |  14552.19   .3243045   21019.76   .3340981    .338758   .9239628   .5363091 
           3 |  5609.305   .1250068   22705.72   .1391112   .3640736   .9435821   .5546081 
           4 |  6465.924   .1440971   30951.73   .2185916   .3817625   .8370495   .6634934 
-------------------------------------------------------------------------------------------

Thats all for today…

Until next week!