Combination of Long-Term and Short-Term Features for Age Identification from Voice

BÜYÜK, OSMAN; Arslan, Mustafa

doi:10.4316/aece.2018.02013

Combination of Long-Term and Short-Term Features for Age Identification from Voice

BÜYÜK O., Arslan M. L.

ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, cilt.18, sa.2, ss.101-108, 2018 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 18 Sayı: 2
Basım Tarihi: 2018
Doi Numarası: 10.4316/aece.2018.02013
Dergi Adı: ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.101-108
Kocaeli Üniversitesi Adresli: Evet

Özet

In this paper, we propose to use Gaussian mixture model (GMM) supervectors in a feed-forward deep neural network (DNN) for age identification from voice. The GMM is trained with short-term mel-frequency cepstral coefficients (MFCC). The proposed GMM/DNN method is compared with a feed-forward DNN and a recurrent neural network (RNN) in which the MFCC features are directly used. We also make a comparison with the classical GMM and GMM/support vector machine (SVM) methods. Baseline results are obtained with a set of long-term features which are commonly used for age identification in previous studies. A feed-forward DNN and an SVM are trained using the long term features. All the systems are tested using a speech database which consists of 228 female and 156 male speakers. We define three age classes for each gender; young, adult and senior. In the experiments, the proposed GMM/DNN significantly outperforms all the other DNN types. Its performance is only comparable to the GMM/SVM method. On the other hand, experimental results show that age identification performance is significantly improved when the decisions of the short-term and long-term systems are combined together. We obtain approximately 4% absolute improvement with the combination compared to the best standalone system.