skip to main content

BIG DATA PIPELINE INFRASTRUCTURE DESIGN IN MSME E-COMMERCE SYSTEMS WITH A FOCUS ON DATA SOURCE PROCESSING USING ORCHESTRATION TOOLS

Isro' Rizky Wibowo  -  Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia
Ramadhan Rakhmat Sani  -  Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia
Ika Novita Dewi  -  Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia
*Farrikh Alzami  -  Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia
Ifan Rizqa  -  Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia
Abu Salam  -  Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia
Candra Irawan  -  Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia
Diana Aqmala  -  Faculty of Economics and Business, Universitas Dian Nuswantoro, Indonesia
Dikirim: 20 Okt 2023; Diterbitkan: 30 Jan 2024.
Akses Terbuka Copyright (c) 2024 Transmisi: Jurnal Ilmiah Teknik Elektro under http://creativecommons.org/licenses/by-sa/4.0.

Citation Format:
Sari
In the digital era, Micro, Small and Medium Enterprises (MSMEs) need to utilize data to improve their business performance, such as increasing customer targets, product development and pricing strategies. Apache Airflow is a powerful tool for building data scraping pipelines that are scalable, flexible, and easy to monitor. One of them is the Central Java MSME data scraping pipeline, which collects business registration information, business type, location, contacts, products, and financial information from various websites, including the Central Java Provincial Government website, basic goods price comparison tables, and specialized news sites. The captured data is stored in a data warehouse for further analysis by the Central Java souvenir entrepreneurs association (ASPOO) in the region. Apache Airflow is used to manage the scraping pipeline in the Central Java MSME E-Commerce system and ensure it runs smoothly. Apache Airflow also has a built-in dashboard for monitoring pipelines and troubleshooting issues. Overall, scraping pipeline in the Central Java MSME e-commerce system is a valuable tool for collecting and analyzing data on the MSME sector in Central Java. This pipeline is scalable, flexible and easy to use, and can be adapted to different user needs and can be integrated with various systems.
Fulltext View|Download
Kata Kunci: Big Data Pipeline, Data Scraping, Pipeline, Apache Airflow, MSMEs

Article Metrics:

Article Info
Bagian: Artikel - Teknologi Informasi dan Komputer
Bahasa : EN
  1. . V. T. Ragoobur, B. Seetanah, Z. K. Jaffur, and V. Mooneeram-Chadee, “Building recovery and resilience of Mauritian MSMEs in the midst of the COVID-19 pandemic,” Sci. Afr., vol. 20, p. e01651, Jul. 2023, doi: 10.1016/j.sciaf.2023.e01651
  2. . M. Tuba, S. Akashe, and A. Joshi, Eds., ICT Systems and Sustainability: Proceedings of ICT4SD 2020, Volume 1, vol. 1270. in Advances in Intelligent Systems and Computing, vol. 1270. Singapore: Springer Singapore, 2021. doi: 10.1007/978-981-15-8289-9
  3. . S. Kamboj and S. Rana, “Big data-driven supply chain and performance: a resource-based view,” TQM J., vol. 35, no. 1, pp. 5–23, Jan. 2023, doi: 10.1108/TQM-02-2021-0036
  4. . M. Matskin et al., “A Survey of Big Data Pipeline Orchestration Tools from the Perspective of the DataCloud Project”
  5. . A. Q. Khan et al., “Smart Data Placement Using Storage-as-a-Service Model for Big Data Pipelines,” Sensors, vol. 23, no. 2, p. 564, Jan. 2023, doi: 10.3390/s23020564
  6. . R. Hofstetter, “A Step-by-Step Guide for Data Scraping,” in The Machine Age of Customer Insight, M. Einhorn, M. Löffler, E. De Bellis, A. Herrmann, and P. Burghartz, Eds., Emerald Publishing Limited, 2021, pp. 129–143. doi: 10.1108/978-1-83909-694-520211013
  7. . Y. Kryvenchuk, M. Burak, and Lviv Polytechnic National University, “COMPARATIVE ANALYSIS OF SELENIUM AND BEAUTIFULSOUP EFFICIENCY,” Her. Khmelnytskyi Natl. Univ., vol. 305, no. 1, pp. 50–52, Feb. 2022, doi: 10.31891/2307-5732-2022-305-1-50-52
  8. . R. Bhargava, R. Lobo, R. Shah, N. Shah, and S. Nair, “Easier Web Navigation Using Intent Classification, Web Scraping and NLP Approaches,” in 2022 5th International Conference on Advances in Science and Technology (ICAST), Mumbai, India: IEEE, Dec. 2022, pp. 286–290. doi: 10.1109/ICAST55766.2022.10039559
  9. . A. Suleykin and P. Panfilov, “Implementing Big Data Processing Workflows Using Open Source Technologies,” in DAAAM Proceedings, 1st ed., vol. 1, B. Katalinic, Ed., DAAAM International Vienna, 2019, pp. 0394–0404. doi: 10.2507/30th.daaam.proceedings.054
  10. . A. Mishra, S. Mishra, and N. S. Kumar, “Data Analysis using Robot Process Automation Study on Web Scraping using UI Path Studio,” in 2022 4th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), Greater Noida, India: IEEE, Dec. 2022, pp. 2221–2225. doi: 10.1109/ICAC3N56670.2022.10074502
  11. . A. Yudhistira, I. S. Sitanggang, and H. A. Adrianto, “Development of ETL (Extract, Transform and Load) Module in Indonesian Agricultural Commodities OLAP System,” vol. 15, no. 2, 2023
  12. . A. Mistry, “NCRD’s Technical Review : e-Journal ISSN: 2455-166X Volume 7, Issue 1 (Jan-Dec 2022),” vol. 7, no. 1, 2022
  13. . A. Issac, A. Ebrahimi, J. Mohammadpour Velni, and G. Rains, “Development and deployment of a big data pipeline for field-based high-throughput cotton phenotyping data,” Smart Agric. Technol., vol. 5, p. 100265, Oct. 2023, doi: 10.1016/j.atech.2023.100265
  14. . B. Harenslak, “Data Pipelines with Apache Airflow”
  15. . “Design and Visualization of Python Web Scraping Based on Third-Party Libraries and Selenium Tools,” Acad. J. Comput. Inf. Sci., vol. 6, no. 9, 2023, doi: 10.25236/AJCIS.2023.060904
  16. . S. V. Oprea and A. Bâra, “Why Is More Efficient to Combine BeautifulSoup and Selenium in Scraping For Data Under Energy Crisis,” no. 2
  17. . LaboNFC, University of Quebec at Chicoutimi, 555 Boulevard de l’Université, Saguenay (QC), Canada, C. Lotfi, S. Srinivasan, M. Ertz, and I. Latrous, “Web Scraping Techniques and Applications: A Literature Review,” in SCRS CONFERENCE PROCEEDINGS ON INTELLIGENT SYSTEMS, Soft Computing Research Society, 2021, pp. 381–394. doi: 10.52458/978-93-91842-08-6-38
  18. . D. Reis, B. Piedade, F. F. Correia, J. P. Dias, and A. Aguiar, “Developing Docker and Docker-Compose Specifications: A Developers’ Survey,” IEEE Access, vol. 10, pp. 2318–2329, 2022, doi: 10.1109/ACCESS.2021.3137671
  19. . M. Kotliar, A. V. Kartashov, and A. Barski, “CWL-Airflow: a lightweight pipeline manager supporting Common Workflow Language,” GigaScience, vol. 8, no. 7, p. giz084, Jul. 2019, doi: 10.1093/gigascience/giz084
  20. . S. Ashouri et al., “Indicators on firm level innovation activities from web scraped data,” Data Brief, vol. 42, p. 108246, Jun. 2022, doi: 10.1016/j.dib.2022.108246
  21. . R. Mitchell et al., “Exploration of Workflow Management Systems Emerging Features from Users Perspectives,” in 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA: IEEE, Dec. 2019, pp. 4537–4544. doi: 10.1109/BigData47090.2019.9005494

Last update:

No citation recorded.

Last update: 2025-01-09 19:01:21

No citation recorded.