Deep neural network framework and transformed MFCCs for speaker's age and gender classification

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

Deep neural network framework and transformed MFCCs for speaker's age and gender classification

详细信息查看全文

作者：Zakariya Qawaqneh^a ; ^{zqawaqne@my.bridgeport.edu" class="auth_mail" title="E-mail the corresponding author} ; Arafat Abu Mallouh^a ; ^{aabumall@my.bridgeport.edu" class="auth_mail" title="E-mail the corresponding author} ; Buket D. Barkana^b ; ^{bbarkana@bridgeport.edu" class="auth_mail" title="E-mail the corresponding author}
关键词：Deep neural network ; DNN ; I-Vector ; MFCCs ; Speaker age and gender classification
刊名：Knowledge-Based Systems
出版年：2017
出版时间：1 January 2017
年：2017
卷：115
期：Complete
页码：5-14
全文大小：1499 K

文摘

Speaker age and gender classification is one of the most challenging problems in speech processing. Although many studies have been carried out focusing on feature extraction and classifier design for improvement, classification accuracies are still not satisfactory. The key issue in identifying speaker's age and gender is to generate robust features and to design an in-depth classifier. Age and gender information is concealed in speaker's speech, which is liable for many factors such as, background noise, speech contents, and phonetic divergences. The success of DNN architecture in many applications motivated this work to propose a new speaker's age and gender classification system that uses BNF extractor together with DNN. This work has two major contributions: Introduction of shared class labels among misclassified classes to regularize the weights in DNN and generation of transformed MFCCs feature set. The proposed system uses HTK to find tied-state triphones for all utterances, which are used as labels for the output layer in the DNNs for the first time in age and gender classification. BNF extractor is used to generate transformed MFCCs features. The performance evaluation of the new features is done by two classifiers, DNN and I-Vector. It is observed that the transformed MFCCs are more effective than the traditional MFCCs in speaker's age and gender classification. By using the transformed MFCCs, the overall classification accuracies are improved by about 13%.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700