The data come from the Panel Study of Income Dynamics, years 1981 to 1992 (also contains earnings data from 1980). The sample consists of 579 white females, who were followed over the considered period. In total, there are 6,948 observations over the 12-year period (1981-1992). This data frame contains the following columns:

  • id: Individual identifier

  • year: Survey year

  • age: Calculated age in years (based on year and month of birth)

  • educ: Years of schooling

  • children: Total number of children in family unit, ages 0-17

  • s: Participation dummy, =1 if worked (hours>0)

  • lnw: Log of real average hourly earnings

  • lnw80: Log earnings in 1980

  • agesq: Age squared

  • children_lag1: Number of children in t-1

  • children_lag2: Number of children in t-2

  • lnw2: Log of real average hourly earnings

  • Lnw: Log of real average hourly earnings

PSID2

Format

An object of class data.frame with 6948 rows and 13 columns.

References

Anastasia Semykina, Jeffrey M Wooldridge (2013). “Estimation of dynamic panel data models with sample selection.” Journal of Applied Econometrics, 28(1), 47--61. Mikhail Zhelonkin, Marc G. Genton, Elvezio Ronchetti (2019). ssmrob: Robust Estimation and Inference in Sample Selection Models. R package version 0.7, https://CRAN.R-project.org/package=ssmrob. Ott Toomet, Arne Henningsen (2008). “Sample Selection Models in R: Package sampleSelection.” Journal of Statistical Software, 27(7). https://www.jstatsoft.org/article/view/v027i07.

Examples

data(PSID2)
attach(PSID2)
#> The following objects are masked from data:
#> 
#>     age, educ, id
#> The following objects are masked from data3:
#> 
#>     age, educ, id
#> The following objects are masked from nhanes:
#> 
#>     age, educ, id
#> The following objects are masked from Mroz87 (pos = 6):
#> 
#>     age, educ
#> The following objects are masked from MEPS2001 (pos = 7):
#> 
#>     age, educ
#> The following objects are masked from MEPS2001 (pos = 8):
#> 
#>     age, educ
#> The following objects are masked from MEPS2001 (pos = 9):
#> 
#>     age, educ
#> The following objects are masked from Mroz87 (pos = 10):
#> 
#>     age, educ
#> The following objects are masked from MEPS2001 (pos = 11):
#> 
#>     age, educ
#> The following objects are masked from MEPS2001 (pos = 12):
#> 
#>     age, educ
#> The following objects are masked from MEPS2001 (pos = 13):
#> 
#>     age, educ
#> The following objects are masked from MEPS2001 (pos = 14):
#> 
#>     age, educ
hist(Lnw)

selectEq <- s ~ educ+ age+ children+ year
outcomeEq <- Lnw ~ educ+ age+ children
HCinitial(selectEq,outcomeEq, data = PSID2)
#> xs(Intercept)        xseduc         xsage    xschildren        xsyear 
#>   1.904417294   0.021724081  -0.019771859  -0.169149600  -0.021483592 
#> xo(Intercept)        xoeduc         xoage    xochildren         sigma 
#>   0.492835520   0.128685876  -0.009081435  -0.119664822   0.854195970 
#>           rho 
#>   1.426127369 
#Note that the estimated value of rho by the two-step
#method is greater than 1
summary(HeckmanGe(selectEq,outcomeEq, 1, 1, data = PSID2))
#> 
#> --------------------------------------------------------------
#>        Generalized Heckman Model (Package: ssmodels)          
#> --------------------------------------------------------------
#> --------------------------------------------------------------
#> Maximum Likelihood estimation 
#> optim function with method BFGS-iterations numbers: 36 
#> Log-Likelihood: -7461.423 
#> AIC: 14944.85 BIC: 15020.15 
#> Number of observations: ( 1057 censored and 5891 observed ) 
#> 11 free parameters ( df= 6937 ) 
#> --------------------------------------------------------------
#> Probit selection equation:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  1.887573   0.156411  12.068  < 2e-16 ***
#> educ         0.021938   0.008207   2.673 0.007532 ** 
#> age         -0.019202   0.002390  -8.034 1.10e-15 ***
#> children    -0.166495   0.020306  -8.199 2.85e-16 ***
#> year        -0.023440   0.006078  -3.857 0.000116 ***
#> --------------------------------------------------------------
#> Outcome equation:
#>               Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  0.4878258  0.0566149   8.617   <2e-16 ***
#> educ         0.1205652  0.0031053  38.826   <2e-16 ***
#> age         -0.0003632  0.0008558  -0.424    0.671    
#> children    -0.0494552  0.0075144  -6.581    5e-11 ***
#> --------------------------------------------------------------
#> Dispersion terms:
#>       Estimate Std. Error t value Pr(>|t|)    
#> Sigma   1.6934     0.0096   176.4   <2e-16 ***
#> --------------------------------------------------------------
#> Correlation terms:
#>     Estimate Std. Error t value Pr(>|t|)
#> Rho  0.08879    0.08450   1.051    0.293
#> --------------------------------------------------------------