skip to main content

Human Action Recognition (HAR) Classification Using MediaPipe and Long Short-Term Memory (LSTM)

*Ichsan Arsyi Putra orcid  -  Departemen Teknik Komputer, Fakultas Teknik, Universitas Diponegoro, Jl. Prof. Soedarto, S.H., Tembalang, Semarang, Indonesia 50275, Indonesia
Oky Dwi Nurhayati orcid scopus  -  Departemen Teknik Komputer, Fakultas Teknik, Universitas Diponegoro, Jl. Prof. Soedarto, S.H., Tembalang, Semarang, Indonesia 50275, Indonesia
Dania Eridani orcid scopus publons  -  Departemen Teknik Komputer, Fakultas Teknik, Universitas Diponegoro, Jl. Prof. Soedarto, S.H., Tembalang, Semarang, Indonesia 50275, Indonesia
Open Access Copyright (c) 2022 TEKNIK

Citation Format:
Abstract

Human Action Recognition is an important research topic in Machine Learning and Computer Vision domains. One of the proposed methods is a combination of MediaPipe library and Long Short-Term Memory concerning the testing accuracy and training duration as indicators to evaluate the model performance. This research tried to adapt proposed LSTM models to implement HAR with image features extracted by MediaPipe library. There would be a comparison between LSTM models based on their testing accuracy and training duration. This research was conducted under OSEMN methods (Obtain, Scrub, Explore, Model, and iNterpret). The dataset was preprocessed Weizmann dataset with data preprocessing and data augmentation implementations. Video features extracted by MediaPipe: Pose was used in training and validation processes on neural network models focusing on Long Short-Term Memory layers. The processes were finished by model performance evaluation based on confusion matrices interpretation and calculations of accuracy, error rate, precision, recall, and F1score. This research yielded seven LSTM model variants with the highest testing accuracy at 82% taking 10 minutes and 50 seconds of training duration.

Fulltext View|Download
Keywords: Classification; Deep Learning; Human Action Recognition; MediaPipe; Long Short-Term Memory

Article Metrics:

  1. Agrawal, A. S., Chakraborty, A., & Rajalakshmi, C. M. (2022). Real-Time Hand Gesture Recognition System Using MediaPipe and LSTM. International Journal of Research Publication and Reviews, 3(4), 2509–2515
  2. Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Kamar, E., Nagappan, N., Nushi, B., & Zimmermann, T. (2019). Software Engineering for Machine Learning: A Case Study. Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP 2019, 291–300. https://doi.org/10.1109/ICSE-SEIP.2019.00042
  3. An, J., Li, W., Li, M., Cui, S., & Yue, H. (2019). Identification and classification of maize drought stress using deep convolutional neural network. Symmetry, 11(2). https://doi.org/10.3390/sym11020256
  4. Beniwal, S., Jambheshwar, G., & Arora, J. (2012). Classification and Feature Selection Techniques in Data Mining. International Journal of Engineering Research & Technology (IJERT), 1(6). https://www.researchgate.net/publication/263662705
  5. Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 25(11), 120–123
  6. Cheng, G., Wan, Y., Saudagar, A. N., Namuduri, K., & Buckles, B. P. (2015). Advances in Human Action Recognition: A Survey. ArXiv, abs/1501.05964. http://arxiv.org/abs/1501.05964
  7. Chinchor, N. (1992). MUC-4 Evaluation Metrics. Proceedings of the 4th Conference on Message Understanding - MUC4 ’92, 22–29. https://doi.org/10.3115/1072064.1072067
  8. Codreanu, V., Podareanu, D., & Saletore, V. A. (2017). Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train. ArXiv, abs/1711.04291
  9. Daniel Tanugraha, F., Pratikno, H., Musayanah, M., & Indah Kusumawati, W. (2022). Pengenalan Gerakan Olahraga Berbasis (Long Short-Term Memory) Menggunakan Mediapipe. Journal of Advances in Information and Industrial Technology, 4(1), 37–45. https://doi.org/10.52435/jaiit.v4i1.182
  10. Ghosh, S. (2021). Proposal of a Real-time American Sign Language Detector using MediaPipe and Recurrent Neural Network. International Journal of Computer Sciences and Engineering, 9(7), 46–52. https://doi.org/10.26438/ijcse/v9i7.4652
  11. Google LLC. (2020). MediaPipe Pose. https://google.github.io/mediapipe/solutions/pose.html
  12. Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as Space-Time Shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2247–2253. https://doi.org/10.1109/TPAMI.2007.70711
  13. Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
  14. Janssens, J. (2021, December 17). Data Science at the Command Line, 2e: Obtain, Scrub, Explore, and Model Data with Unix Power Tools. https://datascienceatthecommandline.com/2e/
  15. Keras. (n.d.-a). Adam. Retrieved July 3, 2022, from https://keras.io/api/optimizers/adam/
  16. Keras. (n.d.-b). ModelCheckpoint. Retrieved April 5, 2022, from https://keras.io/api/callbacks/model_checkpoint/
  17. Lakkapragada, A., Kline, A., Mutlu, O. C., Paskov, K., Chrisman, B., Stockham, N., Washington, P., & Wall, D. (2022). Classification of Abnormal Hand Movement for Aiding in Autism Detection: Machine Learning Study. JMIR Biomedical Engineering (JBME), 7(1). https://doi.org/10.2196/33771
  18. Mason, H., & Wiggins, C. (2010). A Taxonomy of Data Science. https://sites.google.com/a/isim.net.in/datascience_isim/taxonomy
  19. McCullum, N. (2020, July 13). The Ultimate Guide to Recurrent Neural Networks in Python. https://www.freecodecamp.org/news/the-ultimate-guide-to-recurrent-neural-networks-in-python/
  20. Minh, T. N., Sinn, M., Lam, H. T., & Wistuba, M. (2018). Automated Image Data Preprocessing with Deep Reinforcement Learning. ArXiv, abs/1806.05886. https://doi.org/10.48550/ARXIV.1806.05886
  21. Moetia Putri, H., & Fuadi, W. (2022). Pendeteksian Bahasa Isyarat Indonesia Secara Real-Time Menggunakan Long Short-Term Memory (LSTM). Jurnal Teknologi Terapan and Sains 4.0, 3(1)
  22. Olah, C. (2015, August 27). Understanding LSTM Networks. https://colah.github.io/posts/2015-08-Understanding-LSTMs/
  23. Reimers, N., & Gurevych, I. (2017). Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging. EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings, 338–348. https://doi.org/10.48550/arxiv.1707.09861
  24. Sak, H., Senior, A., & Beaufays, F. (2014). Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition. ArXiv, abs/1402.1128. http://arxiv.org/abs/1402.1128
  25. Sarker, I. H. (2021). Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Computer Science, 2(6). https://doi.org/10.1007/s42979-021-00815-1
  26. Sokolova, M., & Lapalme, G. (2009). A Systematic Analysis of Performance Measures for Classification Tasks. Information Processing and Management, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002
  27. Tan, M., & Le, Q. v. (2021). EfficientNetV2: Smaller Models and Faster Training. ArXiv, abs/2104.00298. https://doi.org/10.48550/arxiv.2104.00298
  28. Verdhan, V. (2021). Computer Vision Using Deep Learning (1st ed.). Apress. https://doi.org/10.1007/978-1-4842-6616-8
  29. Wang, T., Chen, Y., Zhang, M., Chen, J., & Snoussi, H. (2017). Internal Transfer Learning for Improving Performance in Human Action Recognition for Small Datasets. IEEE Access, 5, 17627–17633. https://doi.org/10.1109/ACCESS.2017.2746095
  30. Xu, J., Zhang, Y., & Miao, D. (2020). Three-Way Confusion Matrix for Classification: A Measure Driven View. Information Sciences, 507, 772–794. https://doi.org/10.1016/j.ins.2019.06.064
  31. Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., & Zheng, N. (2017). View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data. Proceedings of the IEEE International Conference on Computer Vision, 2017-October, 2136–2145. https://doi.org/10.48550/arxiv.1703.08274

Last update:

  1. AI Trainer: Autoencoder Based Approach for Squat Analysis and Correction

    Mukundan Chariar, Shreyas Rao, Aryan Irani, Shilpa Suresh, C S Asha. IEEE Access, 11 , 2023. doi: 10.1109/ACCESS.2023.3316009

Last update: 2024-11-22 01:53:37

No citation recorded.