引用本文: | 石锦,周脚根,王辉,等.点源时间序列数据缺失值的估值不确定性分析——以小流域气象和水文数据为例[J].灌溉排水学报,2019,38(2):84-92. |
| SHI Jin,ZHOU Jiaogen,WANG Hui,et al.点源时间序列数据缺失值的估值不确定性分析——以小流域气象和水文数据为例[J].灌溉排水学报,2019,38(2):84-92. |
|
摘要: |
【目的】对普遍存在的时间序列缺失值进行有效估值,进而改善时间序列数据的质量。【方法】以亚热带典型小流域长期定位观测的气象(最低气温、最高气温、太阳辐射)及水文(降水量、地表径流量)数据为样本并利用计算机模拟的方法,比较了线性内插法、K-最近邻插值法、多项式插值法、样条插值法和核密度估值法5种估值方法的性能差异,分析了不同取样时间步长(日和月)及不同数据缺失量(1%、5%、10%、15%、20%)条件下对缺失值进行估值的不确定性。均方根误差(RMSE)、绝对值平均误差(MAE)和Pearson相关系数(r)3个交叉验证指标用于评估5种估值方法的性能优劣。【结果】①5种方法估值性能较好,Pearson相关系数为0.62~0.99(P<0.05),且以核密度估值法和多项式插值法的估值性能为优;②数据缺失量和取样步长增加降低了5种估值方法的估值精度;③数据集的变异系数(CV)与估值评估指标(RMSE、MAE及r)显著相关(P<0.05)。【结论】核密度估值法和多项式插值法的估值结果相对更可靠,变异系数是影响估值不确定性的重要因素。 |
关键词: 缺失值; 估值方法; 变异系数; 不确定性; 时间序列 |
DOI:10.13522/j.cnki.ggps.2017.0421 |
分类号: |
基金项目: |
|
Analyzing the Uncertainty Induced by Methods Used to Calculate the Missing Data in Time Series: A Case Study Based on Meteorological and Hydrological Data in Small Watershed |
SHI Jin, ZHOU Jiaogen, WANG Hui, GAN Lei, SHEN Jianlin, LI Xi, LI Yuyuan,WU Jinshui
|
1. College of Engineering, Hunan Agricultural University, Changsha 410128, China; 2. Key Laboratory of Agro-ecological Processes in Subtropical Region, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha 410125, China; 3. College of Resources and Environmental Sciences, Hunan Normal University, Changsha 410081, China
|
Abstract: |
【Objective】Incomplete data is common in meteorological and hydrological analysis and this paper analyzed uncertainty caused by estimating such missing date using different interpolation methods.【Method】We take meteorological data, including minimum temperature, maximum temperature, solar radiation; and hydrological data, including rainfall and stream flow, collected from a long-term field experiment in a typical small watershed in a subtropical zone as examples. We developed a computer model to simulate them. The difference between the simulated results using five interpolation methods: the linear interpolation method (LIM), the K-Nearest neighbor interpolation method (KNNM), the polynomial interpolation method (PIM), the spline interpolation method (SIM) and kernel density estimation method (KDEM), was compared. We then analyzed the uncertainty resulted from sampling frequency (daily and monthly) and data missing degree (1%, 5%, 10%, 15%, 20%). Root mean square error (RMSE), absolute mean error (MAE) and the Pearson correlation coefficient (r) were used as criterion to evaluate the five methods. 【Result】 ① All five methods worked well in estimating the missing meteorological data with r varying from 0.62 to 0.99 (P<0.05). In general, the KDEM and PIM were more accurate than other three methods. ② Accuracy of all five methods deteriorated when the sampling time frequency changed from daily to monthly and data missing degree increased. ③The coefficient of variance (CV) of the data sets was significantly correlated with the valuation indexes (RMSE, MAE and r) (P<0.05).【Conclusion】The KDEM and PIM are relatively more reliable, and the coefficient of variance (CV) of data sets is critical to the accuracy of all five interpolation methods. |
Key words: missing data; interpolation methods; coefficient of variance; uncertainty; time series |