Combination of Long-Term and Short-Term Features for Age Identification from Voice


BÜYÜK O. , Arslan M. L.

ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, cilt.18, ss.101-108, 2018 (SCI İndekslerine Giren Dergi) identifier identifier

  • Cilt numarası: 18 Konu: 2
  • Basım Tarihi: 2018
  • Doi Numarası: 10.4316/aece.2018.02013
  • Dergi Adı: ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING
  • Sayfa Sayıları: ss.101-108

Özet

In this paper, we propose to use Gaussian mixture model (GMM) supervectors in a feed-forward deep neural network (DNN) for age identification from voice. The GMM is trained with short-term mel-frequency cepstral coefficients (MFCC). The proposed GMM/DNN method is compared with a feed-forward DNN and a recurrent neural network (RNN) in which the MFCC features are directly used. We also make a comparison with the classical GMM and GMM/support vector machine (SVM) methods. Baseline results are obtained with a set of long-term features which are commonly used for age identification in previous studies. A feed-forward DNN and an SVM are trained using the long term features. All the systems are tested using a speech database which consists of 228 female and 156 male speakers. We define three age classes for each gender; young, adult and senior. In the experiments, the proposed GMM/DNN significantly outperforms all the other DNN types. Its performance is only comparable to the GMM/SVM method. On the other hand, experimental results show that age identification performance is significantly improved when the decisions of the short-term and long-term systems are combined together. We obtain approximately 4% absolute improvement with the combination compared to the best standalone system.