Application of Voice Conversion for Cross-Language Rap Singing Transformation

Tuerk O., Bueyuek O., Haznedaroglu A., Arslan L. M.

IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Tayvan, 19 - 24 Nisan 2009, ss.3597-3598, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Doi Numarası: 10.1109/icassp.2009.4960404
Basıldığı Şehir: Taipei
Basıldığı Ülke: Tayvan
Sayfa Sayıları: ss.3597-3598
Kocaeli Üniversitesi Adresli: Evet

Özet

Voice conversion enables generation of a desired speaker's voice from audio recordings of another speaker. In this paper, we focus on a music application and describe the first steps towards generating voices of music celebrities using conventional voice conversion techniques. Specifically, rap singing transformations from English to Spanish are performed using parallel training material in English. Weighted codebook mapping based voice conversion with two different alignment methods and temporal smoothing of the transformation filter are employed. The first aligner uses a HMM trained for each source recording to force-align the corresponding target recording. The second aligner employs speaker-independent HMMs trained from a large number of speakers. Additionally, a smoothing step is devised to reduce discontinuities and to improve performance. The results of subjective evaluations indicate that both aligners perform equivalenty well. The proposed smoothing technique improves both similarity to target singer and quality significantly regardless of the alignment method.