SPRATAMA MODEL FOR INDONESIAN PARAPHRASE DETECTION USING BIDIRECTIONAL LONG SHORT-TERM MEMORY AND BIDIRECTIONAL GATED RECURRENT UNIT

Titin Siswantining; Stanley Pratama; Devvi Sarwinda

doi:10.14710/medstat.15.2.129-138

DOI: https://doi.org/10.14710/medstat.15.2.129-138

SPRATAMA MODEL FOR INDONESIAN PARAPHRASE DETECTION USING BIDIRECTIONAL LONG SHORT-TERM MEMORY AND BIDIRECTIONAL GATED RECURRENT UNIT

*Titin Siswantining

- Departemen Matematika, Universitas Indonesia, Indonesia

Stanley Pratama - Department of Mathematics, Universitas Indonesia, Indonesia

Devvi Sarwinda

- Department of Mathematics, Universitas Indonesia, Indonesia

Citation Format:

Abstract

Paraphrasing is a way to write sentences with other words with the same intent or purpose. Automatic paraphrase detection can be done using Natural Language Sentence Matching (NLSM) which is part of Natural Language Processing (NLP). NLP is a computational technique for processing text in general, while NLSM is used specifically to find the relationship between two sentences. With the development Neural Network (NN), nowadays NLP can be done more easily by computers. Many models for detecting and paraphrasing in English have been developed compared to Indonesian, which has less training data. This study proposes SPratama Model, which models paraphrase detection for Indonesian using a Recurrent Neural Network (RNN), namely Bidirectional Long Short-Term Memory (BiLSTM) and Bidirectional Gated Recurrent Unit (BiGRU). The data used is "Quora Question Pairs" taken from Kaggle and translated into Indonesian using Google Translate. The results of this study indicate that the proposed model has an accuracy of around 80% for the detection of paraphrased sentences.

Fulltext View|Download

Keywords: natural language processing; natural language sentence matching; recurrent neural network

Article Metrics:

Article Info

Section: Articles

Language : EN

In Vol 15, No 2 (2022): Media Statistika