Verifikasi Suara menggunakan Jaringan Syaraf Tiruan dan Ekstraksi Ciri Mel Frequency Cepstral Coefficient


Article Metrics: (Click on the button below to see the detail)

Article Info
Submitted: 20-06-2016
Published: 27-05-2017
Section: Research Articles

Voice recording is an important part of the evidence for the suspect, so it is necessary to verify the voice suspects to prove the allegations of the suspect. The research aims to develop a voice verification system using artificial neural networks and extraction characteristics mel frequency cepstral coefficient. As the input data analyzed is the data of the unrecognized voice recorder of the owner and the recorded data of the sound that the owner has known as the comparison data. Data input is processed by feature extraction consisting of framing, windowing, fast Fourier transform, mel frequency wrapping, discrete cosine transform resulting in mel-frequency wrapping coefficient. The mel frequency wrapping coefficient of each frame in each input voice, is used as input on pattern recognition using artificial neural networks. The results of artificial neural networks are analyzed using decision logic to get a decision whether these two voices are the same or not. The output of the system is a decision that the tested sound is the same as or not with a voice comparison. Based on the level of compatibility of the test data produces a voice verification system with mel-frequency wrapping and artificial neural networks have a rate of 96% accuracy. The accuracy of the voice verification system can be an option to help resolve the issues in verification of voice recordings.


Voice Verification System; Mel Frequency Cepstral Coefficient; Mel-Frequency Wrapping; Artificial Neural Network

  1. Andi Kurniawan 
    Universitas Sultan Fatah Demak , Indonesia

Ahmed, N., Natarajan, T., Rao, K. R., 1974. Discrete Cosine Transform. Computers, IEEE Transactions on, C-23(1), 90–93.

Boë, L. J., 2000. Forensic voice identification in France, Speech Communication, 31(2), 205–224.

Dede, G., Sazlı, M. H., 2010. Speech recognition with artificial neural networks, Digital Signal Processing, 20(3), 763–768.

Ganchev, T., Ganchev, T., Fakotakis, N., Fakotakis, N., Kokkinakis, G., & Kokkinakis, G., 2005. Comparative evaluation of various MFCC implementations on the speaker verification task, In Proc. of the SPECOM-2005, 191–194.

Kusumadewi, S., 2003. Artificial Intelligence, Penerbit Graha Ilmu, Yogyakarta.

Ooi, C. S., Seng, K. P., Ang, L. M., Chew, L. W., 2014. A new approach of audio emotion recognition, Expert Systems with Applications, 41(13), 5858–5869.

Pal, M., Saha, G., 2015. On robustness of speech based biometric systems against voice conversion attack, Applied Soft Computing, 30, 214–228.

Riyanto, E., 2013. Speaker Recognition System with MFCC Feature Extraction and Neural Network Backpropagation, ICISBC, 62–66.

Sahidullah, M., Saha, G., 2012. Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition, Speech Communication, 54(4), 543–565.

Shahamiri, S. R., Binti Salim, S. S., 2014a. Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach, Advanced Engineering Informatics, 28(1), 102–110.

Shahamiri, S. R., Binti Salim, S. S., 2014b. Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach, Neurocomputing, 129, 199–207.

Siniscalchi, S. M., Svendsen, T., Lee, C.-H., 2014. An artificial neural network approach to automatic speech processing, Neurocomputing, 140, 326–338.

Zhang, J., Ji, N., Liu, J., Pan, J., Meng, D., 2015. Enhancing performance of the backpropagation algorithm via sparse response regularization, Neurocomputing, 153, 20–40.