Prediction of Salinity Based on Meteorological Data Using the Backpropagation Neural Network Method

Salinity is the level of salt dissolved in water. The salinity level of seawater can affect the hydrological balance and climate change. The salinity level of seawater in each area varies depending on the influencing factors, that is evaporation and precipitation (rainfall). One way to find out the salinity level is by taking seawater samples, which requires a long time and costs a lot. In this study, the salinity level of seawater can be predicted by utilizing time series data patterns from evaporation and precipitation using artificial neural network learning, namely the backpropagation neural network. The evaporation and precipitation data used were derived from the ECMWF dataset, while the salinity data were derived from NOAA where each data was taken at the coordinate point of 9,625 113,625 in the south of Java island. Seawater salinity, evaporation, and precipitation data were formed into a 7-day time series data. This study conducted several backpropagation architectural experiments, that is the learning rate, hidden layer, and the number of nodes in the hidden layer to obtain the best results. The results of the seawater salinity prediction were obtained at a MAPE value of 2.063% with a model architecture using 14 input layers, 2 hidden layers with 10 nodes and 2 nodes, 1 output layer, and a learning rate of 0.7. Predicted sea water salinity data ranging from 33 to 35 ppt. Therefore, the prediction system for seawater salinity using the backpropagation method can be said to be good in providing information about the salinity level of sea water on the island of Java.


Introduction
Indonesia is known as a maritime country with two-thirds of Indonesia's territory consisting of the sea. According to Law No. 17 of 1985, the total area of the Indonesian sea is 5.9 million km2. This makes Indonesia's territorial waters have great potential to advance the country's economy (Lasabuda, 2013). According to data from the Central Statistics Agency (BPS), fishery production in the Java Sea reached the highest production with an average fishing rate of 200,903 tons in Central Java, 88,453.8 tons in East Java, and 308,416.5 tons in West Java from 2010 to 2017 (BPS, 2019). The fish caught in the Java Sea also varies from tuna, stingray, anchovies, squid, and others (Sentosa et al., 2016).
Several factors influence the diversity of fish species in the sea, one of which is salinity (Subagiyo et al., 2015). Salinity is the amount of salt dissolved in water. Salinity greatly affects the morphological conditions, size, and some organisms living in the area (Arisandi et al., 2011). In both oceanographic and meteorological studies, salinity is an important parameter. The availability of seawater salinity data is related to variations in sea level salinity which are useful for hydrological balance and changing climates (Najib et al., 2017). Meanwhile, according to Tubalawony the level of salinity and temperature on the surface are indicators of upwelling events (Tubalawony and Kusmanto, 2012). The climate in the Java Sea follows the season pattern where the dry season lasts from June to September, while the rainy season is from November to March. In waters where seasonally the salinity level is important to know sea surface temperature, salt content, and abundance of fauna (fish) so that the accuracy on the salinity level data of water is required (Islami, 2013). Salinity data is obtained using tools such as a refractometer and a salinometer. The refractometer works by measuring the index of refraction in the liquid to determine the dissolved salt content, while the salinometer works by delivering electrical power to the solution, the greater the electrical conductivity, the greater the level of salt content. Both tools require a long time to collect data and the accuracy of the location in the ocean, while the salinity level can be determined from evaporation and rainfall (precipitation) in an area (Grodsky and Carton 2018;Ratnawati et al., 2018). Therefore, this study makes the prediction of salinity levels without having to take samples in the field by utilizing time series data from the factors that affect salinity, that is evaporation and precipitation.
A previous research made the prediction of salinity in the Taiwan Strait based on remote sensing data from MODIS using the improved genetic algorithm combining operation tree (IGAOT) method resulting in a correlation coefficient (CC) value of 0.38 and a root mean squared error (RMSE) of 1.88 (Chen et al. 2017). Rajabi-Kiasari and Hasanlou (2020) predicted the salinity level of sea in the Persian Gulf using a comparison between the Support Vector Regression (SVR) method, artificial neural network (ANN), random forest (RF) and gradient boosting machine (GBM). In several previous studies regarding the prediction of sea water salinity, prediction of salinity requires a good method for data pattern recognition so that this study uses one of the neural network methods, namely backpropagation.
The backpropagation neural network method is the most popular method in data pattern recognition, prediction, and forecasting. A previous research which predicted the water distribution of PDAM in Malang City had an accuracy rate of 97.99% (Sawitri et al., 2018). A subsequent research, that is the prediction of rainfall in Pekanbaru City using the backpropagation method resulted in an accuracy of 96% (Jauhari et al., 2016). The learning process of the backpropagation method is influenced by the number of hidden layers, learning rate, and several other architectural experimental factors (Wahyuni et al., 2017). (Jumhuriyah et al., 2020) conducted backpropagation architectural experiments using various hidden layer nodes and learning rates which resulted in the best architecture at nodes of 100 and learning rate of 0.4. Furthermore, (Wahyu et al., 2020) in the best architectural experiments using backpropagation resulted in an accuracy of 99.03% at a learning rate of 0.3.Therefore, in this study the backpropagation neural network method is carried out with several architectural experiments to determine the best results in predicting the salinity level of seawater in the south of Java using meteorological data, that is precipitation and evaporation. It is hoped that the salinity prediction can help fishermen determine certain types of fishing areas and can advance the Indonesian economy by increasing salt production.

Materials and Method
In this study, the prediction of salinity levels in the southern waters of Java Island was carried out as shown in Figure 1. The data used were daily time series data of evaporation and precipitation rates as parameters and salinity data as targets taken from April 1, 2017 to July 31, 2019, which was taken at the coordinate point (-9.625 113.625). Evaporation and precipitation data were obtained from the ECMWF dataset (ECMWF, 2019) while salinity data were obtained from NOAA (National Oceanic and Atmospheric Administration) (NOAA, 2019).

Data analysis
Evaporation, precipitation, and salinity data were analyzed by doing fill missing if there were blank data in the time series data. Missing data were corrected based on known data values using linear interpolation (Moon et al., 2019) in Equation (1) with the results presented in Table 1 as pre-processing data. Then the stage was proceeded with data sharing as much as 70% and 30% on training data and testing data.

Time series
Periodic data or time series is a collection of data arranged according to time series. Time series data are usually also used to determine the development of a situation or event with the same time interval which can be seconds, minutes, hours, days, weeks, months, or years (Khasanah et al., 2019). In this study, a time interval of 1 week or 7 days with variables was used because evaporation data were formulated as E1, E2, …, En and precipitation data were formulated as P1, P2, …, Pn , while the target or effect variables, that is salinity data were formulated as S1, S2, …, Sn with n as the total data. The data pattern can be seen in Table 2.

Backpropagation neural network method
The backpropagation neural network method is a neural network model with supervised learning (Howard, 2019). The backpropagation method is also the most popular method to solve complex pattern recognition problems. In pattern recognition, this method consists of two phases, that is the forward propagation phase and the backward propagation phase (Sawitri et al., 2018). The network of the backpropagation method is included in a multilayer network where the network is composed of neurons that are connected at each layer, namely the input layer, hidden layer, and output layer as shown in Figure 2, there is which is the weight of the input layer to the hidden layer and there is a bias weight with a value of 1. There is also which is the weight from the hidden layer to the output layer and added with a bias weight with a value of 1. The steps of backpropagation (Alkronz et al., 2019) can be seen in Table 3.
E(t+(n-5)) … E(t+(n-1)) P(t+(n-6)) P(t+(n-5)) … P(t+(n-1)) S(t+n) In this study, several network architecture models of backpropagation are made to determine the best model used in the training process. The steps to create a network architecture model determine the number of input layers, hidden layers, the number of nodes, and the output layers illustrated in Table 4, and determine the learning rate between 0.1 and 0.9 with a range of 0.1 in the backpropagation network for each architectural model.

Mean Absolute Percentage Error (MAPE)
MAPE is a method of measuring accuracy by averaging the values found from the error for each period and divided by the value of observations (De Myttenaere et al. 2016). The MAPE value is formulated : Where is target data, is the predictive data, is the amount of data. The smaller the MAPE results which show the better the prediction results are shown in Table 5 on the MAPE result criteria.

Results and Discussion
This study made a prediction on the salinity level of seawater in the southern part of the Java sea based on meteorological data, that is precipitation and evaporation data using the backpropagation neural network method. To obtain the best results, experiments were carried out on several backpropagation network architectures. The data were taken from April 2017 to July 2019, and the total data per variable was 852 data. Next, fill missing using Equation (1) was carried out as shown in Table  6. Then a time series data pattern was created as shown in Table 2 with precipitation and evaporation data as input variables and salinity data as target variables to be used in the learning stage of the backpropagation neural network.  at each data Calculate error factor δ and multiply by activity function derivative Calculate weight change by multiplying alpha α, error factor δ, and output value Calculate weight change by multiplying alpha α, error factor δ, and data value Update weight (new) by adding up old weight (old) + Update weight ∆ jk Update weight ( ) by adding up old weight ( ) + Update weight ∆ End Calculate the error with MSE iter = iter+1 End As seen from Table 7, various experiments of backpropagation network architecture models are generated by the most optimal network in predicting salinity using the 4th network architecture model which can be seen in Figure 3 with a total of 14 input layers, 1 output layer, 2 hidden layers with 10 nodes in the first hidden layer and 2 nodes in the second hidden layer, using the sigmoid activation function. The best result in this architectural model is at a learning rate of 0.7 with the MAPE result obtained of 2.063%.
In Table 7, it is also clear that the network architecture model has a greater influence on the MAPE results than the learning rate experiment. The 4th architecture network model has a smaller MAPE mean than the other architectural models with a value of 2.239. From these results, the prediction of salinity levels from 19 November 2018 to 31 July 2019 using the backpropagation method can be seen in Figure 4. These results are said to be good in predicting salinity levels (Xu et al., 2017).
As seen from the charts on the comparison of the prediction results above, the red line shows the actual salinity data with a value range between 31 to 38 ppt, while the blue line shows the predicted data with a range of values between 33 to 35 ppt. The actual salinity data has a fairly high range of values in March and May 2019 coupled with a large number of blank values in the data so that the salinity prediction value is not good. Outliers can be found in the data with a far range of values so it needs improvement (Kieu et al., 2019). Salinity levels can be predicted using salinity image data from Landsat (El-Battay et al., 2017) or salinity   (Yu et al., 2017). In the backpropagation method, it is recommended to add calculations such as momentum, adaptive learning rate, gradient, (Hameed et al., 2016), or others so that it can optimize the training process. Also, it can use other forecasting methods such as support vector regression (SVR) (Suwanto and Novitasari, 2020) or the adaptive neuro fuzzy inference system (ANFIS) method so that it can produce even better levels of accuracy.

Conclusions
Backpropagation neural network method is very good at predicting the salinity level of seawater in southern Java. The parameter data used were precipitation and evaporation data with 7 days series data patterns. Several experiments were carried out by changing the number of hidden layers, hidden layer nodes, and learning rate values in the backpropagation architecture. The best result was the 4th model architecture with 2 hidden layers, 10 nodes in the first hidden, and 2 in the second hidden which resulted in a MAPE value of 2.063% at a learning rate of 0.7 with a range of salinity prediction results between 33 to 35 ppt.