skip to main content

Human Action Recognition (HAR) Classification Using MediaPipe and Long Short-Term Memory (LSTM)

*Ichsan Arsyi Putra orcid  -  Departemen Teknik Komputer, Fakultas Teknik, Universitas Diponegoro, Jl. Prof. Soedarto, S.H., Tembalang, Semarang, Indonesia 50275, Indonesia
Oky Dwi Nurhayati orcid scopus  -  Departemen Teknik Komputer, Fakultas Teknik, Universitas Diponegoro, Jl. Prof. Soedarto, S.H., Tembalang, Semarang, Indonesia 50275, Indonesia
Dania Eridani orcid scopus publons  -  Departemen Teknik Komputer, Fakultas Teknik, Universitas Diponegoro, Jl. Prof. Soedarto, S.H., Tembalang, Semarang, Indonesia 50275, Indonesia
Open Access Copyright (c) 2022 TEKNIK

Citation Format:

Human Action Recognition is an important research topic in Machine Learning and Computer Vision domains. One of the proposed methods is a combination of MediaPipe library and Long Short-Term Memory concerning the testing accuracy and training duration as indicators to evaluate the model performance. This research tried to adapt proposed LSTM models to implement HAR with image features extracted by MediaPipe library. There would be a comparison between LSTM models based on their testing accuracy and training duration. This research was conducted under OSEMN methods (Obtain, Scrub, Explore, Model, and iNterpret). The dataset was preprocessed Weizmann dataset with data preprocessing and data augmentation implementations. Video features extracted by MediaPipe: Pose was used in training and validation processes on neural network models focusing on Long Short-Term Memory layers. The processes were finished by model performance evaluation based on confusion matrices interpretation and calculations of accuracy, error rate, precision, recall, and F1score. This research yielded seven LSTM model variants with the highest testing accuracy at 82% taking 10 minutes and 50 seconds of training duration.

Fulltext View|Download
Keywords: Classification; Deep Learning; Human Action Recognition; MediaPipe; Long Short-Term Memory

Article Metrics:

  1. Agrawal, A. S., Chakraborty, A., & Rajalakshmi, C. M. (2022). Real-Time Hand Gesture Recognition System Using MediaPipe and LSTM. International Journal of Research Publication and Reviews, 3(4), 2509–2515
  2. Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Kamar, E., Nagappan, N., Nushi, B., & Zimmermann, T. (2019). Software Engineering for Machine Learning: A Case Study. Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP 2019, 291–300.
  3. An, J., Li, W., Li, M., Cui, S., & Yue, H. (2019). Identification and classification of maize drought stress using deep convolutional neural network. Symmetry, 11(2).
  4. Beniwal, S., Jambheshwar, G., & Arora, J. (2012). Classification and Feature Selection Techniques in Data Mining. International Journal of Engineering Research & Technology (IJERT), 1(6).
  5. Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 25(11), 120–123
  6. Cheng, G., Wan, Y., Saudagar, A. N., Namuduri, K., & Buckles, B. P. (2015). Advances in Human Action Recognition: A Survey. ArXiv, abs/1501.05964.
  7. Chinchor, N. (1992). MUC-4 Evaluation Metrics. Proceedings of the 4th Conference on Message Understanding - MUC4 ’92, 22–29.
  8. Codreanu, V., Podareanu, D., & Saletore, V. A. (2017). Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train. ArXiv, abs/1711.04291
  9. Daniel Tanugraha, F., Pratikno, H., Musayanah, M., & Indah Kusumawati, W. (2022). Pengenalan Gerakan Olahraga Berbasis (Long Short-Term Memory) Menggunakan Mediapipe. Journal of Advances in Information and Industrial Technology, 4(1), 37–45.
  10. Ghosh, S. (2021). Proposal of a Real-time American Sign Language Detector using MediaPipe and Recurrent Neural Network. International Journal of Computer Sciences and Engineering, 9(7), 46–52.
  11. Google LLC. (2020). MediaPipe Pose.
  12. Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as Space-Time Shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2247–2253.
  13. Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780.
  14. Janssens, J. (2021, December 17). Data Science at the Command Line, 2e: Obtain, Scrub, Explore, and Model Data with Unix Power Tools.
  15. Keras. (n.d.-a). Adam. Retrieved July 3, 2022, from
  16. Keras. (n.d.-b). ModelCheckpoint. Retrieved April 5, 2022, from
  17. Lakkapragada, A., Kline, A., Mutlu, O. C., Paskov, K., Chrisman, B., Stockham, N., Washington, P., & Wall, D. (2022). Classification of Abnormal Hand Movement for Aiding in Autism Detection: Machine Learning Study. JMIR Biomedical Engineering (JBME), 7(1).
  18. Mason, H., & Wiggins, C. (2010). A Taxonomy of Data Science.
  19. McCullum, N. (2020, July 13). The Ultimate Guide to Recurrent Neural Networks in Python.
  20. Minh, T. N., Sinn, M., Lam, H. T., & Wistuba, M. (2018). Automated Image Data Preprocessing with Deep Reinforcement Learning. ArXiv, abs/1806.05886.
  21. Moetia Putri, H., & Fuadi, W. (2022). Pendeteksian Bahasa Isyarat Indonesia Secara Real-Time Menggunakan Long Short-Term Memory (LSTM). Jurnal Teknologi Terapan and Sains 4.0, 3(1)
  22. Olah, C. (2015, August 27). Understanding LSTM Networks.
  23. Reimers, N., & Gurevych, I. (2017). Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging. EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings, 338–348.
  24. Sak, H., Senior, A., & Beaufays, F. (2014). Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition. ArXiv, abs/1402.1128.
  25. Sarker, I. H. (2021). Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Computer Science, 2(6).
  26. Sokolova, M., & Lapalme, G. (2009). A Systematic Analysis of Performance Measures for Classification Tasks. Information Processing and Management, 45(4), 427–437.
  27. Tan, M., & Le, Q. v. (2021). EfficientNetV2: Smaller Models and Faster Training. ArXiv, abs/2104.00298.
  28. Verdhan, V. (2021). Computer Vision Using Deep Learning (1st ed.). Apress.
  29. Wang, T., Chen, Y., Zhang, M., Chen, J., & Snoussi, H. (2017). Internal Transfer Learning for Improving Performance in Human Action Recognition for Small Datasets. IEEE Access, 5, 17627–17633.
  30. Xu, J., Zhang, Y., & Miao, D. (2020). Three-Way Confusion Matrix for Classification: A Measure Driven View. Information Sciences, 507, 772–794.
  31. Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., & Zheng, N. (2017). View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data. Proceedings of the IEEE International Conference on Computer Vision, 2017-October, 2136–2145.

Last update:

  1. AI Trainer: Autoencoder Based Approach for Squat Analysis and Correction

    Mukundan Chariar, Shreyas Rao, Aryan Irani, Shilpa Suresh, C S Asha. IEEE Access, 11 , 2023. doi: 10.1109/ACCESS.2023.3316009

Last update: 2023-11-28 05:35:59

No citation recorded.