用户名: 密码: 验证码:
非典型数据的多元统计分析方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
多元分析方法是医学研究中常用的统计方法。经典多元分析模型及其建模方法的优良性质是以典型数据为基础的。研究中获得的数据关于某种假定模型的典型性是未知的,常常是非典型的。非典型数据将干扰经典多元统计分析模型的建模过程,所建模型稳健性差。本文提出了一类多元分析模型的建模方法,这类方法受数据的非典型性影响较小,具有较好的稳健性,经典多元分析建模方法是这类方法的特例。本研究作了下面几个方面工作:
     1.提出了回归系数的稳健有偏估计。回归系数的稳健估计和有偏估计是两种根据非典型数据建立线性回归模型的方法。稳健估计抗异常点的干扰,有偏估计能克服自变元多元共线的影响。通过模拟试验,发现稳健估计受多元共线影响,有偏估计受异常点影响。当非典型数据中异常点和多元共线关系同时存在时,用稳健估计和有偏估计都难于获得正确的线性回归模型,在回顾文献和模拟试验的基础上定义了三种稳健有偏估计方法,他们是稳健M-估计和有偏估计的有机结合,具有抗异常值和多元共线的特性。三种方法分别是稳健主成分估计、稳健岭估计和稳健根方估计。针对7种数据类型,模拟试验结果表明三种方法估计结果一致优于LS估计、M-估计、主成分估计、岭估计和根方估计。稳健主成分估计实用上较为方便,但稳健岭估计、稳健根方估计实用上尚有困难,关键是最优k值确定的问题尚未解决,有待进一步深入研究。其理论上的优越性为今后研究提供了依据。
     2.完善和丰富了广义根方估计的理论,并用模拟试验考证了广义根方估计的特性。
     3.提出了logistic回归系数的有偏估计方法。logistic回归模型的目的是描述因变元与自变元之间的关系,回归系数有明确的实际
Multivariable data analysis methods, including linear regression models, logistic models and discriminant models were widely used in medical researches. Their theories were compact and their good properties were very attractive provided the datum used to generate the postulated model being optimal. In medical researches, the optimality of observed data about the postulated model was unknown in many cases. If data was non- optimal, statistical inferences drawn from the built model would be perturbated and lost their theoretical meaning and lead to erroneous conclusions. In this paper, a group of multivariable data analysis methods were present. They are more robust than ordinary methods which, without lost generality, can be considered as special cases of these new methods correspondingly.
    Three robust biased estimates of multiple regression coefficients were proposed in this paper. Robust estimate and biased estimate of regression coefficients are two kinds of regression model fitting methods for non-optimal data. They can overcome the negative influences of outliers and multicollinearity respectively. Simulation shows that robust methods be ineffective as multicollinearity existed and biased methods be breakdown when outliers presented. Then robust methds and biased methods cann't be used to fit regression models when outliers and multicollinearity coexisted in observational data. In order to fit regression model in non-optimal data which included outliers and multicollinearity. biased estimates and M-estimate of linear regression coefficients were combined together mathematically. Finally, we obtained three robust biased model fitting methods which were called as robust principal component estimate, robust ridge estimate and robust root-root estimate respectively. The comparison simulations show that new methods uniformly better than ordinary LS estimate, M-estimate and biased methods. Robust principal estimate is practical, but other two new methods are not convenient in application since there isn't a possible technique to determine the optimal k value until now. The existence of the optimal k value is obvious following the simulations, however, this paper provided a platform for further theoretical studies in this field.
引文
[1] Tukey J. W., Exploratory data analysis. Addison-Wesley, Reading, 1977
    [2] Huber, P. J., Robust statistics. John Wiley & Sons, New York, 1981
    [3] 陈希孺,王松桂,近代实用回归分析.广西人民出版社,南宁,1984
    [4] Effort B., The jackknife, the bootstrap and other resampling plans, Society for Industrial and Applied Mathematics, 1982
    [5] Friedman J. H. and Tukey, J. W., A projection pursuit algorithm for exploratory data analysis, IEEE Transactions on Computer, Ser. C, 23, 881-889,1982
    [6] Scarlc S. R., Linear models, John Wiley & Sons, New York, 1971
    [7] Sober, G. A. F., Linear regression analysis, John Wiley & Sons, New York, 1971
    [8] 张尧庭。方开泰,多元统计分析引论,科学出版社,北京,1983
    [9] 郭祖超,医用数理统计方法,人民卫生出版社,北京,1988
    [10] Belsley D. A., Kuh E. and Welsch R. E., Regression Diagnostics, John Wiley & Sons, New York, 1980
    [11] Cook R. D., Detection of influential observations in linear regression, Technometrics, 19, 15-18, 1977
    [12] Pregibon D., Logistic regression diagnostics. Ann. Statist., 9, 705-724, 1981
    [13] Pregibon D., Resistant fits for some commonly used logistic models with medical applications, Biometrics, 38, 485-498, 1982
    [14] Wang P. C., Residual plots for detecting nonlinearity in generilized linear models, Technometrics, 29, 435-438, 1987
    [15] McCullagh P. and Nelder J. A. Generilized linear models, Chapman and Hail, London, 1983
    [16] 王斌会,数据诊断的图示法,第四军医大学硕士研究生学位论文,1992
    [17] 赵清波,Logistic回归诊断的图示法,第四军医大学硕士研究生学位论文,1993
    [18] Weisbcrg S., Some principles for regression diagnostics and influence analysis, Technometrics, 25, 240-244, 1983
    [19] 王松桂,回归诊断发展综述,应用概率统计,4,310—321,1988
    [20] 韦博成,鲁国斌,史建清,统计诊断引论,东南大学出版社,南京,1991
    [21] Beckman R. J. and Cook R. D., Outlier...s, Technometrics, 25, 119-149, 1983
    [22] Atkinson A. C., Transformations unmasked, Technometrics, 28, 29-37, 1986
    [23] Fisher R. A., The design of experiments, Oliver & Boyd, 7th ed., Edingburgh, 1960
    [24] Cochran W. G. and Cox G. M. Experimental desisns, 2nd ed., Wiley, New York, 1957
    [25] Cox D. R., Planning of experiments, Wiley, New York, 1958
    [26] knscombe, F. J., Graphs in statistical analysis, American Statistician 27, 17-21, 1973
    [27] 徐勇勇,常规卫生统计资料的探索性分析,中国卫生统计,7(2),1-4,1991
    [28] 夏结来,从曲线拟合谈交互式统计分析方法,中国卫生统计,9(6),46-48,1992
    [29] Lawrence K. D. and Arthur J. L., Robust regression, Marcel Dekker, Inc, New York & Basel, 1990
    [30] Vinod H. D. and Ullah A., Recent advances in regression methods, Marcel Dekker, New York, 1981
    [31] Farrar D. E. and Glouber R. R., Multicollinearity in regression analysis: the problem revisted, Review of Econometrics and Statistics, 49, 92-107, 1976
    [32] Mullet G. M., Why regression coefficients have the wrong sign, J. of Quality Technology, 8, 121-126, 1976
    [33] Webster J. T. and Mason R. L., Latent root regression analysis, Technometrics, 16, 513-522, 1974
    [34] Massy W. F., Principal component regression in exploratory statistical researches, JASA, 60, 234-256, 1965
    [35] Hoerl A. E., Application of ridge analysis to regression problems, Chem. Eng. Progress, 58, 54-59, 1962
    [36] Hoerl A. E. and Kennard R. W., Ridge regression application to nonorthognal problems, Technometrics, 12, 69-82, 1972
    [37] Hoerl A. E. and Kennard R. W., Ridge regression iterative estimation of the biasing parameter, Comm. Statist. Set. A., 5, 77-78, 1976
    [38] Hoerl A. E., Kennard R. W. and Baldwin K. F., Ridge regression: some simulations, Technometrics, 17, 69-82, 1975
    [39] Stewart G. W., Collinearity and least squares regression, Statistical Science, 2(1), 68-100, 1987
    [40] Chen C. H. and Wang P. C., Diagnostic plots in Cox's regression model, Biometrics, 47, 841-850, 1991
    [41] Marasinghe M. G., A multistage procedure for detecting several outliers in linear regression, Technometrics, 27, 395-399, 1985
    [42] Leger C., Politis D. N. and Romano J. P., Bootstrap Technology and applications, Technometrics, 34, 378-398, 1992
    [43] Paul S. R. and Fung K. Y., A generalized extreme studentized residual multiple-outlier-detection procedure in linear regression, Technometrics, 33, 339-348, 1991
    [44] Morris, Stein-Rule Estimator, Sci. Am. 236, 119-127, 1976
    [45] 夏结来.郭祖超,胡琳,回归系数根方有偏估计及其应用,数理统计与应用概率,第三期,21-30,1988
    [46] Hoerl R. W., Ridge analysis 25 years later, Am. Statist., 39, 186-192, 1985
    [47] Hoerl R. w., The application of ridge techniques to mixture data: ridge analysis, Technometrics, 21, 467-473, 1987
    [48] Box G. E. P., Non-normality and tests on variance, Biometrika, 40, 318-335, 1953
    [49] Huber P. J., Robust estimation of a location parameter, Ann. Math. Statist., 35, 73-101
    [50] Huber, P. J. Robust statistics: A review, Ann. Math. Statist., 43, 1041-1067, 1972
    [51] Bickel P. j., On some robust estimates of location, Ann. Math. Statist., 36, 847-858, 1965
    [52] Hodges J. L. and Lehmann E. L., Estimates of location based on rank tests, Ann. Math. statist., 34, 598-611, 1963
    [53] Biekel P. J., On some analogues to linear combinations of order statistics in the linear model, Ann. Math. Statist., 1, 597-616, 1973
    [54] Adichie J. N., Estimate of regression parameters based on rank tests, Ann. Math. Statist., 38, 894-904, 1967
    [55] Jacekel L. A., Estimating regression coefficients by minimizing the dispersion of the residuals, Ann. Math. Statist., 43, 1449-1458, 1972
    [56] Koul H. L., Asymptotic behavior of a class of confidence regions based on ranks in regression, Ann. Math. Statist., 42, 42-57, 1971
    [57] Relies D. A., robust regression by modified least squares. Ph D. thesis, New York, 1968
    [58] Huber P. J., Robust regression, asymptotics, conjecture and Monte Carlo, Ann. Statist., 1, 799-821,1973
    [59] Bunke H. and Bunke O., Nonlinear regression, functional relations and robust methods, John Wiley & Sons, New York, 135-208, 1989
    [60] Freeman D. A., Bootstrapping regression methods, Ann. Statist., 9, 1218-1228, 1981
    [61] Freeman D. A. and Peters S., Bootstrapping a regression equation some empirical result, JAVA, 79, 97-106, 1984
    [62] 颜光宇,PP稳健特征根估计及其医学应用,第四军医大学硕士研究生学位论文,1989
    [63] 夏结来,郭祖超,胡琳,无偏估计与有偏估计,中国卫生统计方法学增刊,1989
    [64] Stein C., Multiple regression, In contributions to probability and statistics, "Essays in honor of Harald Hotelling", 424-433, Stanford Univ. Press, Palo Alto, Ca., 1960
    [65] Peterson J., A general approach to ridge analysis with confidence intervals, Technometrics, 35, 204-214, 1993
    [66] Neter J., Wasserman W. and Kutner M. H., Applied linear statistical models, Richard D. Irwin, Inc. Homewood, Ill, 382-400, 1985
    [67] Lawless, J. F. and Wang P., A simulation study of ridge and other regression estimators, Comm. Statist., 5, 307-323, 1976
    [68] Gunst, R. F. and Mason R. L., Biased estimation in regression: an evaluation using mean square error, JASA, 72, 616-628, 1977
    [69] Yohai V. J. and Maronna R. A., Asymptotic behavior of M-estimators for linear model, Ann. Statist., 7, 258-268, 1979
    [70] Pfaffenberger R. C. and Dielman T. E., A comparison of robust ridge estimator, Proceedings of American Statistical Association Business and Economic Statistical Section, Las ragas, 631-635, 1985
    [71] Pfaffenberger R. C. and Dielman T. E., A modified ridge regression estimator using the least absolute value criterion in the multiple regression model, Proceedings of the American Institute for Decision Scences, Toronto, 791-793, 1984
    [72] 夏结来,颜光宇,回归系数稳健主成分估计,数学的实践与认识,第1期,40-45,1994
    [73] Marquardt D. W. and Snee R. D., Ridge regression in practice, Am. Statist., 29, 3-20, 1975
    [74] 夏结来,郭祖超,胡琳,一种新的回归系数有偏估计—根方估计,第四军医大学硕士研究生学位论文,1986
    [75] 夏结来,郭祖超,胡琳,回归系数的广义根方估计及其模拟,应用数学.7(2),187-192,1994
    [76] Hemmerle W. J. and Carey M. B., Some properties of generalized ridge estimators, Comm. Statist., 12, 239-253
    [77] Kleinbaum D. G. et al., Logistic regression analysis of Epidemiologic data: theory and practice, Comm. Statist, 11, 485-547, 1982
    [78] Prentice R. L. et al., Logistic disease ineredence models and case control studies, Biometrika, 66, 403-411, 1979
    [79] Hosmer D. W., Jovanovic B. and Lemeshow S., Best subsets logistic regression, Biometrics, 45, 1265-1270, 1989
    [80] Efron B., Computer intensive methods in statistics, Acadamie Press, New York, 1988
    [81] 陈友义,涂冬生,判别分析中误判概率的展开估计,Jackknife,Bootsttap估计,应用概率统计,1988
    [82] Cornfield J., Discriminant function, JASA 63, 1399-1412, 1968
    [83] Radhakirshna S., Discriminant analysis in medicine, the Statistician, 14, 147-167, 1964
    [84] Dey D. R. and Srinivasan C., Estimation of covariance matrix under stein's loss, Am. Statist., 13, 1581-1591, 1985
    [85] Dey D. R. and Srinivasan C., Trimmed minimax estimator of a covariance matrix, Ann. Sataist. Math., 38, 101-108 1986
    [86] Haff L. B., Empirical hayes estimation of multivariate normal covariance matrix, Ann. Statist., 8, 586-597, 3980
    [87] Wei-Liem L., Estimating covariance matrices, Ann. Statist.,19, 283-296, 1991
    [88] Cambell N. A., The influence function as an aid in outlier detection in discriminant analysis, Appl. Statist., 27,251-258, 1978
    [89] Gilbert E. S., On Discrimination using quantitative variables, J.ASA, 63, 1399-1412, 1968
    [90] Moore D. H. , Evaluation of five discrimination procedures for binary variables, J.ASA, 68, 399-404, 1973
    [91] Krzanowski W. J., The performance of Fisher's linear discriminant function under non-optimal conditions, Technometrics, 19, 191-197, 1977
    [92] Zhezhel Y. N., The efficiency of linear discriminant function for arbitrary distributions, Engineering Cybermetrics, 6, 107-111,1968
    [93] Lachenbruch, P. A. and Kupper L. A., Discriminant analysis when one population is a mixture of normals, Biometrics,19, 191-200, 1973
    [94] Marks, S. and Dunn O. J. , Discriminant functions when covariance matrices are unequal, J.ASA, 69, 555-559, 1974
    [95] Ashikaga, T. and Zhang P. C., Robustness of Fisher's linear discriminant function under two-component mixed normal models, JASA, 76, 676-680, 1981
    [96] Randies R. H. , Broffitt J. D. , Ramberg J. S. and Hogg R. V., Generalized linear and quadratic discriminant function using robust estimates, J.ASA, 73, 564-568, 1978
    [97] Devlin S. J. , Gnanadesken R. and Kettenring J. R., Robust estimation of dispersion matrices and principal components, JASA, 76, 334-362, 1981
    [ 98] Rousseeuw P. J. and Van Zomeren B. C., Unmasking multivariate outlier and leverage points(with discussion), JASA, 85, 633-651, 1990
    [99] Huber P. J. , Projection pursuit, Am. Statist., 13, 435-475,1985
    [100] Hall P., On projection regression, .Am. Statist., 17, 573-583,1989
    [101] Chen H. , Estimation of an projection pursuit type regression model, .Ann. Stattist. , 19, 142-157, 1991
    [102] Li G. Y. and Chen Z. L., Projection-persuit approach to robust dispersion matrices and principal components: primary theory and Monte Carlo, JASA, 80, 759-766, 1985
    [103] Crownover R. M., A least squares approach to linear discriminant analysis, SIAM J. of Sci. Statist. Comput., 12, 607-647, 1991
    [104] Huber P. J., Robust statistics, John Wiley & Sons, New York, 1981
    [105] 董燕,苗丹民,买双厚,皇甫恩,鄢国勋,飞行人员神经衰弱人格结构的条件logistic回归分析,中华航空医学杂志,4(3),154-156993

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700