用户名: 密码: 验证码:
关于甄别考试作弊的K指数的功效分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着考试功能的不断强化,考试中普遍存在的作弊现象日趋严重,特别是近年来,在一些国家级的考试中,跨区域的使用高科技手段团伙作弊的现象愈演愈烈。这不仅降低了考试的效度,还严重影响到了考试的公平与公正,制约了考试评估与选拔功能的正常发挥。目前国内防范和查处考试作弊的方法多限于考场监测,缺乏试后甄别的有效方法。而使用高科技手段作弊往往很难获得直接的现场证据,导致判定无法有效实施,因而,使用科学的统计方法进行试后甄别就显得尤为重要和迫切。
     K指数是GRE、TOFEL等多选题考试中甄别考生抄袭的统计方法,但仅适用于甄别同一考场内已确定怀疑范围的抄袭者与被抄袭者。而跨区域使用高科技手段团伙作弊的特点是抄袭者与被抄袭者均未确定且不受考场限制,因此,将K指数应用于甄别公务员行政职业能力测验时就需做出调整,要对所有而非仅怀疑范围内的考生都两两比较。现有文献缺乏对如何确定K指数的关键因素b值的介绍,本文通过研究,提出了b值的确定方法,并对行政职业能力测验全卷和各分测验的b值提出了建议值,为K指数用于国内大规模考试创造了条件。此外,原先的K指数方法不适用于存在大量猜答的测验,本研究对K指数方法改进后,大大降低了对行政职业能力测验进行作弊甄别的Ⅰ型错误率。
     无论哪种作弊甄别方法都要确保甄别结果的准确性,因此有必要对甄别结果进行假设检验。功效分析反映了假设检验正确侦测到处理效应的能力。近年来,国外已在假设检验的决策中引入功效分析,而国内依旧把显著性水平作为决策的唯一标准,与国外的研究存在明显的差距。K指数的功效分析有助于正确甄别出真正的作弊考生且避免误判,对于保证考试效度和公平具有重要意义。
     本文通过K指数的功效分析,讨论了抄袭人数、抄袭题数和被抄袭者水平不同的情况下样本容量、效应大小、显著性水平和功效的关系。结果显示,仅当抄袭人数比例、抄袭题目比例和样本容量同时较小时,K指数的功效较低;大多数情况下K指数的功效很高,判定是准确而有把握的,这就为K指数应用于国内大规模考试的作弊甄别提供了有力的支持。
Cheating tends to be serious along with the strengthening of test function. Particularly in the national tests in recent years, inter-regional and high-tech cheating is more and more serious, which has a serious effect on the fairness and honesty of test, reduces the validity of test and restricts the test evaluating and selecting function. Now ways of punishing cheating are limited to the room monitoring and lacks valid detecting methods after tests. But we can't obtain direct locale evidence of high-tech cheating so that detecting can't be carrying effectively. Therefore, using scientific statistical method to detect cheating after a test is specially important and pressing.
     K-index is a statistical method of detecting answer copying on multiple-choice test such as GRE and TOFEL. But it is only applicable to detect copiers and sources in certain area in one examination room. But those copiers and sources are unknown and not limited to one room in inter-regional and high-tech group cheating. So we need compare all examinees while using K-index to Administrative Aptitude Test(AAT). There isn't any introduction of the critical factor b in the literature and this article offers the method of fixing b and recommends b values for AAT, which creates condition for applying K-index to large-scale tests. Besides, the original K-index is not applied to the test including large number of responses by guessing. This research improves K-index and greatly reduces type I error of AAT detecting.
     Every detecting cheating method should ensure that the result is exact, so we should do hypothesis test to it. Power analysis reflects the ability of a hypothesis test that can correctly detect the treatment effect. In recent years, power analysis has been introduced into the decision-making of statistical hypothesis abroad. But domestic researchers still use traditional significance level to be the only criterion, which indicates obvious gap between national and international research. Power analysis of K-index will help correctly detecting real copiers and avoid the miscarriage of justice and contribute to maintaining the validity and fairness of tests.
     This article discusses the relation among sample size, effect size, significance level and power by power analyzing of K-index while changing copying population, number of copying items and source's level. The result shows that the power is low only when the copying population, number of copying items and source's level are all low. In most situations, the power of K-index is high and the conclusion is assured. It supports strongly applying K-index to detect cheating in the national large-scale tests.
引文
Angoff,W.H. The development of statistical indices for detecting cheaters. College Entrance Examination Board Research and Development Reports. 1974,1(72).
    Barker,B.R., &Yu-FangLi. Power Analysis for Experimental Research: A Practical Guide for the Biological, Medical and Social Sciences. New York: Cambridge University Press.2002
    Bird,C.The detection of cheating on objective examinations . School and Society. 1927,25(635),261 -262.
    Bird,C.An improved method of detecting cheating in objective examinations Journal of Educational Research. 1929,19(5),341-348.
    Bellezza,F.S., & Bellezza, S. F. Detection of cheating on multiple-choice tests by using error similarity analysis .Teaching of Psychology. 1989,16(3),151-155.
    Cizek,G.J. Cheating on tests: how to do it, detect it, and prevent it. Mahwah, NJ: Lawrence Erlbaum Associates. 1999,200-220
    Cohen,J. Statistical power analysis for the behavioral science. (2nd ed.) Hillsdale: Lawrence Erlbaum Associates, 1988.
    
    Crawford,C.C. Dishonesty in objective tests. School Review ,38(10). 1930,776-781.
    Dixon,W.F., & Massey, F.J., Jr. Introduction to statistical analysis. (2nd ed.) New York:McGraw-Hill.1957.
    Frary,R.B. Detection of answer copying on multiple-choice tests and interception of g_2 statistics. Educational Statistics. 1977,2,235-256.
    Holland,P.W. Assessing unusual agreement between the incorrect answers of two examinees using the K-index: statistical theory and empirical support. Educational Testing Service Technical Report. 1996,96(4).
    Kevin,R.M., & Brett Myors. Statistical Power Analysis: A Simple and General Model for Traditional and Modern Hypothesis Tests. (2nd ed.)Mahwah, NJ: Lawrence Erlbaum Associates.2004.
    Lewis,C, & Thayer, D.T. The power of the K-index (or PMIR) to detect copying. Educational Testing Service Technical Report. 1998.98(49)
    Saupe,J.L. An empirical model for the corroboration of suspected cheating on multiple-choice tests. Educational and Psychological Measurement 1960,20(3),475-490
    Sotaridona,L.S. Statistical methods for the detection of answer copying on achievement tests. Netherlands: Twente University Press,2003.
    Sotaridona,L.S. & Meijer, R.R. Statistical properties of the K-index for detecting answer copying. Journal of Educational Measurement.2002,39(2),115-132.
    Sotaridona,L.S., van der Linden, W. J., & Meijer R. R. Detecting answer copying using the Kappa Statistic. Applied Psychological Measurement.2006, 30(5), 412-431.
    Tripodi,T. Determining Sample Size: Balancing Power, Precision, and Practicality. Ohio State: Oxford University Press.2008.
    Wollack,J.A. Detection of answer copying using item response theory. Paper presented at annual meeting of the American Education Research Association, New York.1996.
    Wollack,J A.A Nominal Response Model approach for detecting answer copying.Applied Psychological measurement.1997,21(4),307-320.
    Wollack,J.A.,Cohen,A.S.,&Serlin,R.C.Defining error rates and power for detecting answer copying.Applied Psychological Measurement.2001,25(4),385-404.
    Cohen 著;叶佩华等译.统计功效分析——适用于行为科学,广东省教科所教育统计与测量研究室.1986.
    甘怡群,张轶文,邹玲.心理与行为科学统计.北京大学出版社,2005.
    韩丹,郭庆科,王昭,陈雪霞.考试抄袭识别的心理测量学研究回顾.中国考试.2006,06.
    刘景玉,肖立宏.甄别多项选择题考试中答案抄袭的不同方法的比较.考试研究.2008,07.
    盛骤,谢式千,潘承毅.概率论与数理统计(第二版).高等教育出版社,1999.
    谢小庆.教育研究中定量方法的局限性.心理发展与教育.1998,14(1),53-56.
    张厚粲,徐建平.现代心理与教育统计学.北京师范大学出版社,2004.
    张颖,赵世明,于惊涛.多选题作弊雷同的判定标准研究.中国考试.2002,9.
    赵世明.四选项多选题作弊雷同的判定标准研究.中国高等医学教育.2003,02.31-33

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700