| Peer-Reviewed

Using the Markov Chain Monte Carlo Method to Make Inferences on Items of Data Contaminated by Missing Values,

Received: 7 April 2013     Published: 2 May 2013
Views:       Downloads:
Abstract

The Markov Chain Monte Carlo (MCMC) is a method that is used to estimate parameters of interest under difficult conditions such as missing data or when underlying distributions do not fit the assumptions of Maximum Likelihood processes. The objective of this process is to find a probability distribution known as a posterior distribution in Bayesian analysis that can be used to estimate target parameters. In this paper, we consider a case where data are contaminated with missing values and therefore need to be adequately handled using missing data techniques before making inferences on them. A review of the mathematics involved in MCMC procedures in the presence of missing data is presented. Furthermore, we use real data to compare inferences made using multiple imputation based on the multivariate normal model (MVN) that uses the MCMC procedure, the case deletion (CD) missing data method that discards subjects with missing values from the analysis, and the fully conditional specification (FCS) multiple imputation method that uses a sequence of regression models to fill in missing values. Assuming that data are missing completely at random (MCAR) on continuous and normally distributed variables, the following findings are obtained: (1) The higher the proportion of missing data on a variable of interest, the more the relationship between that variable and the dependent variable is distorted when all missing data methods are applied. (2) Multiple imputation based methods produce similar estimates which are better than estimates from the case deletion method. (3) At some stage (when the proportion of missing data becomes high), none of the missing data techniques can help to maintain an initially existing relationship between the dependent variable and some of the covariates of interest in the dataset.

Published in American Journal of Theoretical and Applied Statistics (Volume 2, Issue 3)
DOI 10.11648/j.ajtas.20130203.12
Page(s) 48-53
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2013. Published by Science Publishing Group

Keywords

Markov Chain Monte Carlo (MCMC), Missing Data, Missing Completely At Random (MCAR), Multiple Imputation, Multivariate Normal Model (MVN), Fully Conditional Specification (FCS)

References
[1] Demirtas, H., Freels, S.A. and Yucel, R.M. (2008). Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: a simulation assessment. Journal of Statistical Computation and Simulation, 78(1): 69-84.
[2] Enders, C. K. (2010). "Applied Missing Data Analysis", 1st ed. Guildford Press, New York.
[3] Enders, C.K. (2006). A primer on the use of modern missing data methods in psychosomatic medicine research, Psycho-somatic Medicine, 68(3): 427-437.
[4] Galati, J. C. and Carlin, J. B. (2008). INORM: Stata Module to Perform Multiple Imputation Using Schafer’s Method [software]. Chestnut Hill, MA: Department of Economics, Boston College, USA.
[5] Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (2004). "Bayesian data analysis", second edition. Boca Raton, Chapman & Hall.
[6] Graham, J.W. (2009). Missing Data Analysis: Making It Work in the Real World, Annual review of psychology, 60: 549–576.
[7] Horton, N.J. and Lipsitz, S.R. (2001). Multiple Imputation in Practice: Comparison of Software Packages for Regression Models With Missing Variables, American Statistical Asso-ciation, 55: 244-254.
[8] Jackman, S. (2000). Estimation and Inference via Bayesian Simulation: An Introduction to Markov Chain Monte Carlo, American Journal of Political Science, 44: 375-404.
[9] Janssen, K.J.M. (2010). Missing covariate data in medical research: To impute is better than to ignore. Journal of clinical epidemiology, 63: 721-727.
[10] Lee, K.J. and Carlin, J.B. (2010). Multiple Imputation for Missing Data: Fully Conditional Specification Versus Mul-tivariate Normal Imputation. American Journal of Epidemi-ology, 171(5).
[11] Leeuw, E.D. and Huisman, J. Hox, M. (2003). Prevention and treatment of item nonresponse, Journal of Official Statistics, 19: 153-176.
[12] Little, R. and Rubin, D. (2002). "Statistical Analysis with Missing Data". John Wiley and Sons Inc, New York.
[13] McKnight, P.E. and McKnight, K.M., Sidani, S. and Fi-gueredo, A.J. (2007). "Missing Data: A Gentle Introduction". Guilford Press, New York.
[14] Schafer, J.L. (1997). "Analysis of Incomplete Multivariate Data". Chapman and Hall, London.
[15] Schafer, J.L. and Graham, J.W. (2002). Missing Data: Our View of the State of the Art. Psychological methods, 7(2): 147-177.
[16] Tsikriktsis, N. (2005). A review of techniques for treating missing data in OM survey research. Journal of Operations Management, 24: 53-62.
[17] van Buuren, S. (2007). Multiple of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16(3): 219-242.
[18] van Buuren, S. and Knook, D.L. (1999). Multiple Imputation of Missing Blood Pressure Covariates in Survival Analysis. Statistics in Medicine, 18: 681-694.
[19] Yu, L.M. and Burton, A. and Rivero-Arias, O. (2007). Eval-uation of software for multiple of semi-continuous data. Sta-tistical Methods in Medical Research: 16(3): 243-258.
Cite This Article
  • APA Style

    I. Karangwa, D. Kotze. (2013). Using the Markov Chain Monte Carlo Method to Make Inferences on Items of Data Contaminated by Missing Values,. American Journal of Theoretical and Applied Statistics, 2(3), 48-53. https://doi.org/10.11648/j.ajtas.20130203.12

    Copy | Download

    ACS Style

    I. Karangwa; D. Kotze. Using the Markov Chain Monte Carlo Method to Make Inferences on Items of Data Contaminated by Missing Values,. Am. J. Theor. Appl. Stat. 2013, 2(3), 48-53. doi: 10.11648/j.ajtas.20130203.12

    Copy | Download

    AMA Style

    I. Karangwa, D. Kotze. Using the Markov Chain Monte Carlo Method to Make Inferences on Items of Data Contaminated by Missing Values,. Am J Theor Appl Stat. 2013;2(3):48-53. doi: 10.11648/j.ajtas.20130203.12

    Copy | Download

  • @article{10.11648/j.ajtas.20130203.12,
      author = {I. Karangwa and D. Kotze},
      title = {Using the Markov Chain Monte Carlo Method to Make Inferences on Items of Data Contaminated by Missing Values,},
      journal = {American Journal of Theoretical and Applied Statistics},
      volume = {2},
      number = {3},
      pages = {48-53},
      doi = {10.11648/j.ajtas.20130203.12},
      url = {https://doi.org/10.11648/j.ajtas.20130203.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20130203.12},
      abstract = {The Markov Chain Monte Carlo (MCMC) is a method that is used to estimate parameters of interest under difficult conditions such as missing data or when underlying distributions do not fit the assumptions of Maximum Likelihood processes. The objective of this process is to find a probability distribution known as a posterior distribution in Bayesian analysis that can be used to estimate target parameters. In this paper, we consider a case where data are contaminated with missing values and therefore need to be adequately handled using missing data techniques before making inferences on them. A review of the mathematics involved in MCMC procedures in the presence of missing data is presented. Furthermore, we use real data to compare inferences made using multiple imputation based on the multivariate normal model (MVN) that uses the MCMC procedure, the case deletion (CD) missing data method that discards subjects with missing values from the analysis, and the fully conditional specification (FCS) multiple imputation method that uses a sequence of regression models to fill in missing values. Assuming that data are missing completely at random (MCAR) on continuous and normally distributed variables, the following findings are obtained: (1) The higher the proportion of missing data on a variable of interest, the more the relationship between that variable and the dependent variable is distorted when all missing data methods are applied. (2) Multiple imputation based methods produce similar estimates which are better than estimates from the case deletion method. (3) At some stage (when the proportion of missing data becomes high), none of the missing data techniques can help to maintain an initially existing relationship between the dependent variable and some of the covariates of interest in the dataset.},
     year = {2013}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Using the Markov Chain Monte Carlo Method to Make Inferences on Items of Data Contaminated by Missing Values,
    AU  - I. Karangwa
    AU  - D. Kotze
    Y1  - 2013/05/02
    PY  - 2013
    N1  - https://doi.org/10.11648/j.ajtas.20130203.12
    DO  - 10.11648/j.ajtas.20130203.12
    T2  - American Journal of Theoretical and Applied Statistics
    JF  - American Journal of Theoretical and Applied Statistics
    JO  - American Journal of Theoretical and Applied Statistics
    SP  - 48
    EP  - 53
    PB  - Science Publishing Group
    SN  - 2326-9006
    UR  - https://doi.org/10.11648/j.ajtas.20130203.12
    AB  - The Markov Chain Monte Carlo (MCMC) is a method that is used to estimate parameters of interest under difficult conditions such as missing data or when underlying distributions do not fit the assumptions of Maximum Likelihood processes. The objective of this process is to find a probability distribution known as a posterior distribution in Bayesian analysis that can be used to estimate target parameters. In this paper, we consider a case where data are contaminated with missing values and therefore need to be adequately handled using missing data techniques before making inferences on them. A review of the mathematics involved in MCMC procedures in the presence of missing data is presented. Furthermore, we use real data to compare inferences made using multiple imputation based on the multivariate normal model (MVN) that uses the MCMC procedure, the case deletion (CD) missing data method that discards subjects with missing values from the analysis, and the fully conditional specification (FCS) multiple imputation method that uses a sequence of regression models to fill in missing values. Assuming that data are missing completely at random (MCAR) on continuous and normally distributed variables, the following findings are obtained: (1) The higher the proportion of missing data on a variable of interest, the more the relationship between that variable and the dependent variable is distorted when all missing data methods are applied. (2) Multiple imputation based methods produce similar estimates which are better than estimates from the case deletion method. (3) At some stage (when the proportion of missing data becomes high), none of the missing data techniques can help to maintain an initially existing relationship between the dependent variable and some of the covariates of interest in the dataset.
    VL  - 2
    IS  - 3
    ER  - 

    Copy | Download

Author Information
  • Department of Statistics and Population Studies, University of the Western Cape, Cape Town, South Africa

  • Department of Statistics and Population Studies, University of the Western Cape, Cape Town, South Africa

  • Sections