Statistical analysis of an orographic rainfall for Eight North-East region of India with special focus over Sikkim

Autoregressive integrated moving average (ARIMA) models are used to predict the rain rate for orographic rainfall over a long period of time, from 1980 to 1918. As the orographic rainfall may cause landslides and other natural disaster issues, So, this study is very important for the analysis of rainfall prediction. In this research, statistical calculations have been done based on the rainfall data for twelve regions of India (Cherrapunji, Darjeling, Dawki, Ghum, Itanagar, Kamchenjunga, Mizoram, Nagaland, Pakyong, Saser Kangri, Slot Kangri, and Tripura) from the eight states, i.e., Sikkim, Meghalaya, West Bengal, Ladakh (Union Territory of India), Arunachal Pradesh, Mizoram, Tripura, and Nagaland) with varying altitude. The model's output is assessed using several error calculations. The model's performance is represented by the fit value, which is reliable for the north-east region of India with increasing altitude. The statistical dependability of the rainfall prediction is shown by the parameters. The lowest value of root mean square error (RMSE) indicates better prediction for orographic rainfall.


INTRODUCTION
The Orographic rainfall is characterized as widespread rainfall having rain rate in between 25 mm/hr to 60 mm/hr. This type of rainfall has specific values of the parameter A=300 to 350 and b=0.5 to 0.7 in the popular rainfall-radar reflectivity (Z-R) relation Z= . This type of rainfall is formed when a low cloud approximate value of 0.85 to 0.9 accompanied by a wind-gust of 6 to 7 km/hr causes rainfall almost 70% of the time during a year over the hills.
At frequencies above 10 GHz, the attenuation of signals due to rain is a serious problem for various necessary communication of systems. Research was done on the attenuation and prediction of rainfall to build a reliable prediction accuracy of these methods and an assessment they acquire for the applicability based on the data base are required which can be collected from the meteorological department. In this research we have collected the rainfall data from Giovani-NASA for the 39 years form 1980-2018.
Many prediction methods for rain attenuation have been discussed, rain covers approximately more than half of time during a year. Because of rain chances of landslides will be increase high which is very dangerous for our surroundings. Nonlinear time series was also used in many researches for the rainfall [1] by using different technique and considering various input as a cause of rain [2]. In India, agriculture is the primary source of economic growth, accurate rainfall forecasting is critical. Some studies explain the regression model, neural networks and clustering to improve rainfall prediction [3]. "The approaches based on autoregressive integrated moving average (ARIMA), the fuzzy time series (FST) model, and the non-parametric method have been discussed in many literatures" [4]. In other studies, a traditional regression model was adjusted to forecast rainfall by iterating existing data and adding error percentage to the input, as well as taking numerous inputs of rainfall such as wind-gust, humidity, and temperature.
Modelling of rainfall is a critical component of responsibility in areas like north east India, where the Indian summer monsoon lasts approximately half the year. There are so many researches and different techniques for prediction. Some literature of ARIMA models compute the missing observations using the Kalman filter [5], which allows a partially diffuse initial state vector. Also, spatial autoregressive moving averages (SARMAS) algorithm calculates an approximation of the multiplicative models [6]. Many algorithms compute the fast result for ARIMA models [7] also the error estimates for detecting the possible intervention in the data time series [8]. To calculate the time series data formed by different variations of monthly data, an improved ARIMA is developed, [9] contemplating the high spatiotemporal variation in rainfall distribution, developed an ARIMA model for forecasting and prediction of monthly rainfall [10]. Semi-empirical method is also used for the prediction of rain mainly International Telecommunication Union-Radiocommunication sector (ITU-R) recommended attenuation in slant path link and terrestrial links which affect the propagation path [11]. Scaling the rain attenuation will benefit the quick monitoring of rain attenuation by using artificial neural network [12], [13]. To measure the attenuation time series on satelliteearth link are also done [14]. Evaluation of the forecast accuracy as well as evaluation among the district fashion suited to a time series model [15] for the modelling. A modified ARIMA modeling technique capture time correlation and possibility of distribution records [16]. Some architecture is also used to combine simple tune to ARIMA model [17]. A correction mechanism is run for the sum of the predicted findings in medium and long-term software programme failure time forecasting [18]. Effectivity of method in literature can also predict the experiment for the time collection [19], Metro-wheel based ARIMA model shows the stationarity evaluation and transformation [20] also Box-Jenkins emphases to recognise a fitting time series replica [21] with some model of forecasting correctness [22] by combining models is dynamic research area for ARIMA models.
In this paper, we will describe the forecasting of different hill stations with a statistical analysis of prediction using regression model by taking 39 years of historical data of India [23]. Mostly the tropical areas are orogeographic in nature, sudden rain in the environment may causes the landslides which is very big problem for human beings and society and for the agriculture purpose. So, the purpose of this research is to statistically analyses of rainfall prediction by using historical data so that we prevent the human lives from the landslide and other natural disaster caused by rain. To mitigate this problem, we have taken the different tropical region and doing the statistically analyses through the ARIMA model equation and find the different parameters such as mean, standard deviation and variance after that we also find the function-statistics and percentile-value. Based on these parameters we got the absolute error which can help us to find the prediction of rainfall for future use, as for many aspects rainfall prediction is important for human beings to prevent the risk of landslides and other societal issues.

RESEARCH METHOD
We have taken eight regions mention in the Table 1 with twelve different tropical regions, as shown in Figure 1. For different rainfall seasons all regions we have taken are orographic. This affects the temperature and hills of that region. So, we collected the data of all these regions mention in the Table 1 from the Giovani (NASA), to forecasting to be alerted the problems to protect the environment and human lives. Climate of these places are subtropical, a lot of rain seen in months from May-September. The work flow is shown in Figure 2. The primary cause of rain in these places has the sudden rainfall due to the natural hazards which threaten human life. So, it is very important to study the area for the betterment of human  Thirty-nine years of historical rainfall data for twelve regions i.e., Cherrapunji, Darjeling, Dawki, Ghum, Itanagar, Kamchenjunga, Mizoram, Nagaland, Pakyong, Saser Kangri, Slot Kangri, and Tripura are smoothed and processed with white noise test. After processing, the data is fed into the ARIMA model, which is fine-tuned for lower prediction error. The model is then calculated in terms of MSE, root mean square error (RMSE), and mean absolute error (MAE) [5].

EQUATION AND METHOD
The ARMA model [1]:

RESULTS AND DISCUSSION
The developed model is used to forecast monthly precipitation at twelve locations ten steps ahead of time. The lowest error percentage values of the selected region are further counter-confirmed by forecast techniques, which suggest that the observed value is closer to forecasting the average rainfall intensity. Before doing the rainfall prediction, we have done some statistical calculations of these regions, which can help us to find the betterment of the result. In Table 2, regions are listed with their respective altitudes in meters. Apart from this, we can get the mean, standard deviation, and variance in this study. In Table 3, colors are used in the graphs for the prediction of rainfall regions for better understanding.  Figure 3 depicts the rainfall of three hill stations, Cherrapunji, Darjeling, and Dawki, at altitudes of 1430 m, 2042 m, and 45 m, respectively, over a thirty-nine-year period . As shown in the graph, rainfall at Cherrapunji is quite high when compared to other hill stations. Cherrapunji is noted for having the most rainfall in India. As a result, determining expected precipitation for the three stations, is an early indicator of excessive rainfall. Similarly, for the region Ghum, Itanagar, and Kanchenjunga are observed the predicted rainfall in Figure 4. Here, Kanchenjunga is having the highest altitude which is very rare as compared to the other regions. This region ranges from 10 °C to 28 °C, the South-West Monsoon brings rain to Kanchenjunga. Figure 5 shows prediction of rainfall for Mizoram, Nagaland, and Pakyong. For Saser Kangri, Slot Kangri, and Tripura, Figure 6 depicts an actual rainfall and predicted rainfall. Rainfall was correctly predicted by the model that was built over these areas. We have done some error estimation for all these regions using R 2 value, F-statistics, and P-value after doing ten-step ahead prediction for 39 years. F-statistics, also known as fixation statistics, reflect the level of heterozygosity in a dataset that is statistically expected. It's calculated theoretically as the ratio of two scaled sums of squares of the dataset's elements. As a result, it indicates the dataset's variability, while the p-value denotes the level of marginal significance inside a statistical hypothesis test that represents the occurrence of a specific feature within the data set. The observed R-squared is reliable, according to the F-test in Table 4. As a result, the model is statistically sound and may be used to complex rainfall scenarios such as forecasting. The outcome of the F-test is further confirmed by the percentile-test (P-test). Table 4 shows that the R-squared is credible and that the data set utilised was not chosen at random. As a result, the prediction model is statistically sound and may be used to complex rainfall scenarios such as orographic forecasting. The outcome of the F-test is further confirmed by the percentile-test (P-test). The residual diagnostics test has been performed before all the models have been tested, and the best models that produce white noise residuals with well-behaved autocorrelation function (ACF) plots have been chosen.  Table 5 demonstrates that the model coefficients are less than 10, demonstrating the ease with which complicated variables like orographic rainfall may be predicted. The RMSE value of the dependent variable, such as historical rainfall, as shown in Table 4, reveals a close match to the expected estimate. Scale-free measures of fit, such as MAE, are determined, and a few models are chosen, followed by the best and estimated models based on the lowest RMSE and MAE for prediction. The scatter index, which is lowest in Cherrapunji, Darjeeling, and Tripura, reveals several parameters following rainfall forecast delivers the best outcome and because of their orographic nature, MAE is likewise at a minimum in Cherrapunji, Darjeeling, and Tripura. So, we set the model's na and nc values to 6 and 8 for Darjeeling, but 6 and 4 for Cherrapunji and Tripura, as well as other places, for better results while na and nc are the model's polynomial order and delays, respectively.

CONCLUSION
Rainfall, which has a direct impact on agriculture, is the main contributor of natural calamities such as landslides and also various other factors due to rainfall in these regions. As a result, we need to mitigate of this problem, we must forecast the event at an early stage. The regression model that was optimized this problem has an acceptable error value for different orographic regions that can be make accurate predictions. Rainfall data from more stations at higher altitudes will be required in the future to validate the improved rain forecast model.