医学数据统计分析中MCMC算法的实现与应用

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

医学数据统计分析中MCMC算法的实现与应用

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：The Implementation and Application of MCMC in Statistical Analysis of Medical Studies
作者：马跃渊
论文级别：硕士
学科专业名称：流行病与卫生统计学
中文关键词：MCMC ; Gibbs ; sampler ; Bayes ; 算法 ; 软件
英文关键词：MCMC ; Gibbs sampler ; Bayes ; algorithm ; software
学位年度：2004
导师：徐勇勇
学科代码：100401
学位授予单位：第四军医大学
论文提交日期：2004-04-01
答辩委员会主席：颜虹

摘要

计算问题是Bayes统计学发展和得以普及应用的命脉，而MCMC技术是解决这一问题的利器，对MCMC及其相关算法的研究有助于Bayes方法在实践中更为广泛的应用。
     目前虽然在Bayes理论框架下国外学者对MCMC已有大量研究并提出了诸多算法，但实际上真正易于在计算机上实现的方法并不多，因此本课题致力于MCMC方法的具体实现的研究，这一实现体现在计算的自动化方面。在对MCMC方法的基本理论及其算法研究的基础上，本研究运用计算机随机模拟的原理和方法，采用面向对象的编程技术，对如何编程实现MCMC算法以及如何将其应用到统计学模型上进行了理论和实践探索。
     通过理论研究及实践总结，对如何实现Gibbs抽样算法的基本思路归纳如下：首先根据实际问题确定统计学模型并选取各个参数的先验分布，然后依据模型构建出DAG图模型中各个节点(参数)之间的父子关系，在此关系上构造各待估参数的完全条件分布，最后应用适应性舍选抽样算法(ARS)对各参数进行随机抽样，循环此抽样过程直到取得了足够多的样本用于估计。依

     第四军医大学硕士学位论文
    据此实现思路，用Delphi编制出了一系列对象及函数，初步实现
    了Gibbs抽样算法，构建了一个便于持续开发的软件环境并将计
    算软件暂命名为ARSP。在此软件环境下，后来的开发者仅需书
    写定义变量和为变量进行赋值的代码语句即可实现对一个新的统
    计模型的计算，依据随机变量的父子关系，系统能够自动计算其
    完全条件分布，在此过程中开发者完全不用考虑此计算的任何实
    现细节。要进一步扩展开发环境，开发者只需定义新的分布类型，
    具体要做的就是定义此分布的参数及密度函数的计算公式。目前
    在开发环境中己定义了均匀分布、二项分布、泊松分布、正态分
    布、伽玛分布、贝塔分布、t分布和帕莱托分布:计算结果包括
    描述性统计量:均数、中位数、标准差、四分位数、95%Cl、峰
    度系数、偏度系数:统计图有直方图(用于描述参数的后验分布)
    和踪迹图(用于参数模拟收敛性的诊断)。要对模拟的Markov链
    进行更深入的分析，用户可将中间结果以多种文件格式导出到外
    部文件，这些文件可以被多数统计软件读取。此外，本系统较
    WinBUGS提供了更为丰富的数据接口，包括dBASE、Paradox、
    MSAeeess、MS Exeel和TXT格式的数据，这使得数据采集范围
    更加广泛，对数据的输入编辑以及核查校验更为方便。本软件采
    用Windows风格界面，支持鼠标操作，通过菜单、按钮、文本框
    等与用户交互，目前已形成基本的界面框架。
     在此环境下，ARSP可完成对下列统计问题的建模和模拟计
    算:描述性统计量、一元及多元线性回归、有随机效应的Logisti。

     第四军医大学硕士学位论文
    回归、方差分量模型、正态分层模型、交叉设计的生物等效性检
    验、Poisson模型、Meta分析等，绝大部分结果可与WinBUGS
    软件的计算结果相互验证。目前，ARSP的限制在于只适应于广
    义线型模型的计算，不足之处是计算效率较WinBUGS低，计算
    结果尚存在一些偏差，软件的用户界面也还有待进一步完善。
     文中还探讨了一些应用MCMC和提高MCMC算法性能的问
    题，如迭代次数、收敛性诊断以及重新参数化等。
     总之，通过本研究初步总结出了编程实现MCMC方法的思
    路，该思想经过实践验证总体上是可行的且易于在计算机上实现。
    据此开发的软件ARSP运行基本稳定，程序易于扩展，具备良好
    的可持续开发的特性。
So far, computational problems is the key point of Bayesian methods. MCMC is being increasingly used as an effective approach for such problems. Having a study on MCMC can boost the wider



    applications of Bayesian statistics.
    Although there are many algorithms for MCMC provided by foreign statisticians, few of those are really easy to implement on computer. So, our aim is to construct a software frame under which users can not only evaluate their Bayesian models, but also expand the environment itself to suit their specified models. Based on the MCMC theory we introduced Monte Carlo methods and object oriented programming technique to implement our application. During this process we also made an attempt to find a general programming method for MCMC and how to apply it to Bayesian models.
    We summarized our implementing approach as such: first, determine the form of the model and its parameters' prior distributions, then construct the DAG graph according to the model you set and build the full conditional distribution for each parameter, then sample from full conditional distribution using ARS and loop this process until enough samples are obtained. According to this idea, we wrote some codes and built a computational software, though in its initial form. In our application, we set up a development environment where other developers can build their computations merely by using some defining and assigning syntaxes without knowing any details of implementation. We have defined
    many commonly used distributions including uniform, binomial,


    Poisson, normal, gamma, beta and Pareto distributions. The results are represented by mean, median, standard deviation, quartiles, skewness and kurtosis. The statistical charts include histogram and trace plot. In addition, our software supports richer types of data than WinBUGS. It supports paradox, dbase, MS Access, MS Excel, ASCII TXT.
    We applied our software to a single and a multiple linear regression, a logistic regression with random effects, a variance components model, a normal hierarchical model, a crossover design for bio-equivalence test, a Poisson model and a Meta analysis. Most of our evaluations were similar to those of WinBUGS. The restriction of our software is that model we assumed must be of generalized linear model. The efficiency of our software is a little lower than that of WinBUGS. Its user interface needs further development.
    In this article, We also discussed some issues about strategies for improving MCMC.
    Our idea for implementing MCMC proved right and the software we developed runs stably. Our software is an open system and can be easily expanded.

引文

[1] Stephen P. Brooks. Markov chain Monte Carlo method and its application. The Statistician. 1998, 47(1), 69-100.
    [2] W.J. Fitzgerald. Markov Chain Monte Carlo methods with applications to signal processing. 2001, 81: 2～18.
    [3] W.R.Gilks, et al. Introducing Markov chain Monte Carlo. Markov Chain Monte Carlo in Practice. UK: Chapman & Hall, 1996: 1～8.
    [4] 蒋庆琅．随机过程原理与生命科学模型．上海：上海翻译出版公司，1999．
    [5] Julian Besag. Markov Chain Monte Carlo for Statistical Inference. Washington: University of Washington. 2001
    [6] G.O. Roberts & S. K. Sahu. Updating schemes, correlation structure, blocking and parameterization for Gibbs sampler. J. R. Statist. Soc.B.1997, 59(2): 291～317.
    [7] Steffen L. Lauritzen. Chain graph models and their causal interpretations. J. R. Statist. Soc. B. 2002, 64(3): 321～361.
    [8] 杨肇夏．计算机模拟及其应用．北京：中国铁道出版社，1999．
    [9] 张传林，林立东．伪-随机数发生器及其应用．数值计算与计算机应用．2002，3：188～208．
    [10] 周德才，孙亦明．计算机随机模拟原理、方法及计算程序．武昌：华中理工大学出版社，1998．
    [11] Arill. Simulation Modeling and Analysis. McGraw-Hill Book,1982: 227.
    [12] 朱友芹．新编Windows API参考大全．北京：电子工业出版社，2000．
    [13] George Casella. Introduction to Monte Carlo statistical methods.1999.
    [14] 吴新瞻．随机模型与计算机模拟．北京：电子工业出版社 1990．9．
    [15] 茆诗松．贝叶斯统计．北京：中国统计出版社，1999．
    [16] Peter M. Lee. Bayesian Statistics: An Introduction. New York: John Wiley & Sons Inc., 1997.
    [17] 张尧庭，陈汉峰．贝叶斯统计推断．北京：科学出版社，1991．
    [18] Grady Booch．面向对象分析与设计．北京：机械工业出版社 2003．
    [19] W. R. Gilks. Full conditional distributions. Markov Chain Monte Carlo in Practice. UK: Chapman & Hall, 1996: 79～82.
    [20] 杜琪，朱涛江．Delphi算法与数据结构．北京：中国电力出版社，2003．


    [21] W.R. Gilks. Adaptive rejection sampling for Gibbs sampling. Appl. Statist. 1992, 41(2): 337～348.
    [22] W.R. Gilks. Derivative-free adaptive rejection sampling for Gibbs sampling. Bayesian Statistics 4. UK: Oxford University Press, 1992.
    [23] Neal R. Markov chain Monte Carlo methods based on 'slicing' the density function. Technical Report 9722. Canada: University of Toronto. 1997.
    [24] Raftery, A. E. and Lewis, S. M. How many iterations in the Gibbs sampler. Bayesian Statistics 4. Oxford: Oxford University Press, 1992.765～776.
    [25] Stephen P. Brooks and Gareth O. Roberts. Convergence assessment techniques for Markov chain Monte Carlo. Statistics and Computing.1998, 8: 319～335.
    [26] Brooks, S. P., and Gelman, A. General Methods for Monitoring Convergence of Iterative Simulations. Journal of Computational and Graphical Statistics 1998, 7(4): 434～55.
    [27] Dankmar Bohning, Wilfried Seidel. Recent developments in mixture models. Computational Statistics & Data Analysis. 2003, 41, 349-357.
    [28] Gelman, A., and Rubin, D. Inference from Iterative Simulation Using Multiple Sequences. Statistical Science. 1992, 7: 457～511.
    [29] David P. M. Scollnik. Actuarial modeling with meme and bugs. North American Actuarial Journal, 2000, 5(2): 96～126.
    [30] S. P. Brooks. Quantitative convergence assessment for Markov chain Monte Carlo via cusums. Statistics and Computing. 1998, 8: 267～274
    [31] Gelfand, A. E., et al. Efficient parametetrizations for generalized linear mixed models. Bayesian Statistics 5. Oxford: Oxford University Press.
    [32] Vines, S. K. and Gilks, W. R. Reparameterising random interactions for Gibbs sampling. Technical report. Cambridge: MRC Biostatistics,1994.
    [33] Bennett, J. E.MCMC for nonlinear hierachieal models. Markov chaih Monte Carlo. UK: Chapman & Hall, 1996: 339～357.
    [34] David Spiegelhalter et al. WinBUGS User Manual version 1.4. 2003.
    [35] Gelfand, A. E., et al. Illustration of Bayesian inference in normal data models using Gibbs sampling. J. Am. Statist. Ass., 1990; 85: 981～983.
    [36] Crowder M. J. Beta-binomial Anova for proportions. Applied Statistics. 1978, 27, 34-37.
    [37] Zeger, S. L., and Karim, M. R. Generalized linear model with random

    effects. J. Am. Statist. Ass. 1991, 86: 79～86.
    [38] 郭秀娥．医学研究中的Bayes统计分析．第四军医大学．博士学位论文，2000．
    [39] George E. P. Box and George C. Tiao. Bayesian Inference in statistical analysis. New York: John Wiley & Sons Inc.,1992.
    [40] 徐勇勇等，孙振球，颜虹等．医学统计学．北京：高等教育出版社，2004．184．
    [41] 姚晨，陈峰等．交叉设计资料的等效性检验．中国临床药理学杂志．2001，17(4)：294～297．
    [42] S. Senn. Cross-over trials in drug development: theory and practice. Journal of Statistical Planning and Inference, 2001; 96: 33～34.
    [43] 黄圣凯，韩可勤．生物等效性评价的几种统计方法．中国临床药理杂志，1993；9(1)：43～46．
    [44] 刘丹红，等．关于生物等效性的t检验与贝叶斯方法．中国卫生统计，2001：18(6)：376～377．
    [45] 李高．贝叶斯法评价药物制剂生物等效性的实验及其应用．中国医院药学杂志，1997；17(2)：53～95．
    [46] 柳晓泉，等．格列吡胶囊的相对生物利用度及其生物等效性评价．中国药科大学学报．1995；26(5)：3ll一313．
    [47] 孙振球，徐勇勇等．医学统计学．北京：人民卫生出版社．2002．242～243．
    [48] Kass R. and Raftery A. Bayes factor and model uncertainty. J. Amer.Statist. Assoc. 1995, 90: 773.
    [49] Gelfand A. E. and Dey D. K. Bayesian model choice: asymptoties and exact calculations. J. R. Statist. Soc. B. 1994, 56: 501～514.
    [50] Dempster, A. P. The direct use of likelihood for significance testing. Proceedings of Conference on foundational Questions in Statistical Inference. Aarhus: University of Aarhus. 1974: 335～352.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700