混合型数据聚类方法的比较

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

混合型数据聚类方法的比较

详细信息查看全文 | 推荐本文 |

英文篇名：Comparison of Clustering Methods for Mixed Data
作者：刘超 ; 姚清华 ; 乐然
英文作者：Liu Chao;Yao Qinghua;Le Ran;Mathematics and Systems Science Institute,Beijing University of Aeronautics and Astronautics;LMIB of the Ministry of Education,Beijing University of Aeronautics and Astronautics;Academy for Advanced Interdisciplinary Studies,Peking University;
关键词：混合型数据 ; 聚类有效性 ; 聚类稳定性
英文关键词：mixed data;;clustering validity;;clustering stability
中文刊名：TJJC
英文刊名：Statistics & Decision
机构：北京航空航天大学数学与系统科学学院;北京航空航天大学"数学、信息与行为"教育部重点实验室;北京大学前沿交叉学科研究院;
出版日期：2019-05-28 10:17
出版单位：统计与决策
年：2019
期：v.35;No.527
语种：中文;
页：TJJC201911016
页数：4
CN：11
ISSN：42-1009/C
分类号：66-69

摘要

为了科学使用真实世界数据,探索适用于日益常见的混合型数据的聚类方法,文章分析和比较了两种典型的混合型数据聚类方法K-prototypes与ClustMD,改进了聚类方法关键参数选择方法,并提出聚类稳定性指标。结果表明,两种聚类方法均具有很高的有效性和稳定性,各有优缺点。当数据相关性强、数据缺失严重或非连续变量较多时,建议使用K-prototypes。
In order to scientifically use real world data,this paper explores the clustering methods applicable to the increasingly common mixed medical data. The paper analyzes and compares the two typical clustering methods:K-prototypes and ClustMD,improves the key parameter selection method,and also proposes the clustering stability index. Cases analysis results indicate that the two methods are highly effective and stable,each with advantages and disadvantages. When data correlation is strong,data missing is serious or there are relatively more non-continuous variables,K-prototypes is recommended for hybrid data.

引文

[1]Huang Z X.Extentions to the K-means Algorithm for Clustering Large Data Sets With Categorical Values[J].Data Mining and Knowledge Discovery,1998,(2).
    [2]McParland D,Gormley I C.Model Based Clustering for Mixed Data:clustMD[J].Advances in Data Analysis and Classification,2016,10(2).
    [3]刘强,邓磊,贾振红等.一种改进的加权K-prototypes算法[J].激光杂志,2014,35(1).
    [4]刘燕驰,高学东,国宏伟等.聚类有效性的组合评价方法[J].计算机工程与应用,2011,(19).
    [5]陈韡,王雷,蒋子云.基于K-prototypes的混合属性数据聚类算法[J].计算机应用,2010,30(8).
    [6]刘新涛,刘晓光,申琪等.合并与不合并:两个相似性聚类分析方法的比较[J].生态学报,2013,33(11)./

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700