摘要
针对现有的个性化隐私匿名技术不能很好地解决数值型敏感属性容易遭受近邻泄漏的问题,提出了一种基于聚类技术的匿名模型——(ε_i, k)-匿名模型.该模型首先基于聚类技术将按升序排列的敏感属性值划分到几个值域区间内;然后,提出了针对数值型敏感属性抵抗近邻泄漏的(ε_i, k)-匿名原则;最后,提出了一种最大桶优先算法来实现(ε_i, k)-匿名原则.实验结果表明,与已有的面向数值型敏感属性抗近邻泄漏方案相比,该匿名方案信息损失降低,算法执行效率提高,可以有效地降低用户隐私泄露风险.
As for that existing personalized privacy anonymous technology can not solve the problem that the numerical sensitive attribute is vulnerable to the proximity breach, an anonymous model called(ε_i, k)-anonymity model is proposed and the model is based on clustering technology. Firstly, the model divides the sensitive attribute values in ascending order into several sub-intervals based on the clustering method; then, it proposes an(ε_i, k)-anonymity principle for numerically sensitive attributes against proximity breach; finally, a maximum bucket-first algorithm is proposed to implement the(ε_i, k)-anonymity principle. The experimental results show that compared with the existing scheme used for resisting proximity breach, the information loss of the proposed anonymous scheme is reduced, the algorithm execution efficiency is improved and it can reduce the leakage risk of user privacy effectively.
引文
1Sweeney L.k-anonymity:A model for protecting privacy.International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems,2002,10(5):557-570.[doi:10.1142/S0218488502001648]
2Pramanik I,Lau RYK,Zhang WP.K-anonymity through the enhanced clustering method.Proceedings of the 2016 IEEE 13th International Conference on e-Business Engineering.Macau,China,2016:85-91.
3Gao YXN,Luo T,Li JF,et al.Research on K anonymity algorithm based on association analysis of data utility.Proceedings of the 2017 IEEE 2nd Advanced Information Technology,Electronic and Automation Control Conference.Chongqing,China,2017:426-432.
4吕品,钟珞,于文兵,等.MA-Datafly:一种支持多属性泛化的k-匿名方法.计算机工程与应用,2013,49(4):138-140.[doi:10.3778/j.issn.1002-8331.1107-0183]
5Machanavajjhala A,Kifer D,Gehrke J,et al.L-diversity:Privacy beyond k-anonymity.ACM Transactions on Knowledge Discovery from Data,2007,1(1):3.[doi:10.1145/1217299]
6Xiao XK,Yi K,Tao YF.The hardness and approximation algorithms for l-diversity.Proceedings of the 13th International Conference on Extending Database Technology.Lausanne,Switzerland,2010:135-146.
7Yang GM,Li JZ,Zhang SX,et al.An enhanced l-diversity privacy preservation.Proceedings of the 2013 10th International Conference on Fuzzy Systems and Knowledge Discovery.Shenyang,China,2013:1115-1120.
8Li NH,Li TC,Venkatasubramanian S.t-closeness:Privacy beyond k-anonymity and?-diversity.Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering. Istanbul,Turkey,2007:106-115.
9El Ouazzani Z,El Bakkali H.A new technique ensuring privacy in big data:Variable t-closeness for sensitive numerical attributes.Proceedings of the 2017 3rd International Conference of Cloud Computing Technologies and Applications.Rabat,Morocco,2017:1-6.
10Li JX,Tao YF,Xiao XK.Preservation of proximity privacy in publishing numerical sensitive data.Proceedings of 2008 ACM SIGMOD International Conference on Management of Data.Vancouver,Canada,2008:473-486.
11Zhang Q,Koudas N,Srivastava D,et al.Aggregate query answering on anonymized tables.Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering.Istanbul,Turkey,2007:116-125.
12韩建民,于娟,虞慧群,等.面向数值型敏感属性的分级l-多样性模型.计算机研究与发展,2011,48(1):147-158.
13Han JW,Kamber M,Pei J.数据挖掘:概念与技术.范明,孟小峰,译.北京:机械工业出版社,2012.88-90.
14周志华.机器学习.北京:清华大学出版社,2016.39.
15Xiao XK,Tao YF.Anatomy:Simple and effective privacy preservation.Proceedings of the 32nd International Conference on Very Large Data Bases.Seoul,Korea,2006.