Kannegowda, Naveena; Udayar Pillai, Surendran; Kommireddi, Chinni Venkata Naga Kumar; Fousiya

doi:10.1007/s11600-023-01152-y

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Comparative assessment of univariate and multivariate imputation models for varying lengths of missing rainfall data in a humid tropical region: a case study of Kozhikode, Kerala, India.
Authors: Kannegowda, Naveena; Udayar Pillai, Surendran; Kommireddi, Chinni Venkata Naga Kumar; Fousiya
Abstract: Accurate measurement of meteorological parameters is crucial for weather forecasting and climate change research. However, missing observations in rainfall data can pose a challenge to these efforts. Traditional methods of imputation can lead to increased uncertainty in predictions. Additionally, varying lengths of missing data and nonlinearity in rainfall distribution make it difficult to rely on a single imputation method in all situations. To address this issue, our study compared univariate and multivariate imputation models for different lengths of missing daily rainfall observations in a humid tropical region. We used 33 years of weather data from Kozhikode, an urban city in Kerala region, and evaluated the selected models using accuracy measures such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Nash–Sutcliffe Efficiency (NSE) and Mean Absolute Relative Error (MARE). Among the considered univariate and multivariate imputation models, Kalman filter coupled time series models like Kalman–Arima ( RMSE ¯ = 11.90, MAE ¯ = 4.46) and Kalman Smoothing with structure time series ( RMSE ¯ = 11.37, MAE ¯ = 5.28) were found to be best for small (< 7 days) range imputation of rainfall data. Random Forest ( RMSE ¯ = 16.57, MAE ¯ = 8.0) and Kalman Smoothing with structure time series ( RMSE ¯ = 16.84, MAE ¯ = 8.09) performed well for medium range (8–15 days) of rainfall imputation. Random Forest technique was found to be suitable for large (≤ 30 days) ( RMSE ¯ = 15.45, MAE ¯ = 6.77), and very large (> 30 days) ( RMSE ¯ = 12.91, MAE ¯ = 3.42) missing length groups and Kalman–ARIMA performed best for mixed day series (RMSE = 9.7, MAE = 3.52). NSE and MARE values for different gap margins in rainfall data (≥ 1 mm) suggest that Kalman Smoothing (KS) connected models, as a representative univariate model, perform exceptionally well when dealing with a small number of missing observations. Notably, multivariate models like Principal Component Analysis (PCA) and Random Forest outperformed univariate models for medium to large gap margins. Considering these findings, utilizing multivariate techniques is recommended for imputing a large number of missing rainfall values and univariate models can be limited for small range of rainfall missing data imputation. The identified imputation models provide effective solutions for filling missing data of various lengths in all stations' datasets in humid tropical regions, thus enhancing rainfall-related analysis and enabling more accurate weather forecasts and climate change research.
Subjects: MISSING data (Statistics); WEATHER &; climate change; STANDARD deviations; RAINFALL; CLIMATE change forecasts; KALMAN filtering; WEATHER forecasting
Publication: Acta Geophysica, 2024, Vol 72, Issue 4, p2663
ISSN: 1895-6572
Publication type: Article
DOI: 10.1007/s11600-023-01152-y

We found a match

Comparative assessment of univariate and multivariate imputation models for varying lengths of missing rainfall data in a humid tropical region: a case study of Kozhikode, Kerala, India.

Kannegowda, Naveena; Udayar Pillai, Surendran; Kommireddi, Chinni Venkata Naga Kumar; Fousiya

MISSING data (Statistics); WEATHER &; climate change; STANDARD deviations; RAINFALL; CLIMATE change forecasts; KALMAN filtering; WEATHER forecasting

Acta Geophysica, 2024, Vol 72, Issue 4, p2663

1895-6572

Article

10.1007/s11600-023-01152-y