English
引用本文:石锦,周脚根,王辉,等.点源时间序列数据缺失值的估值不确定性分析——以小流域气象和水文数据为例[J].灌溉排水学报,2019,38(2):84-92.
SHI Jin,ZHOU Jiaogen,WANG Hui,et al.点源时间序列数据缺失值的估值不确定性分析——以小流域气象和水文数据为例[J].灌溉排水学报,2019,38(2):84-92.
【打印本页】   【下载PDF全文】   查看/发表评论  【EndNote】   【RefMan】   【BibTex】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 1372次   下载 1178 本文二维码信息
码上扫一扫!
分享到: 微信 更多
点源时间序列数据缺失值的估值不确定性分析——以小流域气象和水文数据为例
石锦, 周脚根, 王辉, 甘蕾, 沈健林, 李希, 李裕元, 吴金水
1.湖南农业大学 工学院,长沙 410128; 2.中国科学院 亚热带农业生态研究所 亚热带农业生态过程重点实验室,长沙 410125; 3. 湖南师范大学 资源与环境科学学院, 长沙 410081
摘要:
【目的】对普遍存在的时间序列缺失值进行有效估值,进而改善时间序列数据的质量。【方法】以亚热带典型小流域长期定位观测的气象(最低气温、最高气温、太阳辐射)及水文(降水量、地表径流量)数据为样本并利用计算机模拟的方法,比较了线性内插法、K-最近邻插值法、多项式插值法、样条插值法和核密度估值法5种估值方法的性能差异,分析了不同取样时间步长(日和月)及不同数据缺失量(1%、5%、10%、15%、20%)条件下对缺失值进行估值的不确定性。均方根误差(RMSE)、绝对值平均误差(MAE)和Pearson相关系数(r)3个交叉验证指标用于评估5种估值方法的性能优劣。【结果】①5种方法估值性能较好,Pearson相关系数为0.62~0.99(P<0.05),且以核密度估值法和多项式插值法的估值性能为优;②数据缺失量和取样步长增加降低了5种估值方法的估值精度;③数据集的变异系数(CV)与估值评估指标(RMSE、MAE及r)显著相关(P<0.05)。【结论】核密度估值法和多项式插值法的估值结果相对更可靠,变异系数是影响估值不确定性的重要因素。
关键词:  缺失值; 估值方法; 变异系数; 不确定性; 时间序列
DOI:10.13522/j.cnki.ggps.2017.0421
分类号:
基金项目:
Analyzing the Uncertainty Induced by Methods Used to Calculate the Missing Data in Time Series: A Case Study Based on Meteorological and Hydrological Data in Small Watershed
SHI Jin, ZHOU Jiaogen, WANG Hui, GAN Lei, SHEN Jianlin, LI Xi, LI Yuyuan,WU Jinshui
1. College of Engineering, Hunan Agricultural University, Changsha 410128, China; 2. Key Laboratory of Agro-ecological Processes in Subtropical Region, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha 410125, China; 3. College of Resources and Environmental Sciences, Hunan Normal University, Changsha 410081, China
Abstract:
【Objective】Incomplete data is common in meteorological and hydrological analysis and this paper analyzed uncertainty caused by estimating such missing date using different interpolation methods.【Method】We take meteorological data, including minimum temperature, maximum temperature, solar radiation; and hydrological data, including rainfall and stream flow, collected from a long-term field experiment in a typical small watershed in a subtropical zone as examples. We developed a computer model to simulate them. The difference between the simulated results using five interpolation methods: the linear interpolation method (LIM), the K-Nearest neighbor interpolation method (KNNM), the polynomial interpolation method (PIM), the spline interpolation method (SIM) and kernel density estimation method (KDEM), was compared. We then analyzed the uncertainty resulted from sampling frequency (daily and monthly) and data missing degree (1%, 5%, 10%, 15%, 20%). Root mean square error (RMSE), absolute mean error (MAE) and the Pearson correlation coefficient (r) were used as criterion to evaluate the five methods. 【Result】 ① All five methods worked well in estimating the missing meteorological data with r varying from 0.62 to 0.99 (P<0.05). In general, the KDEM and PIM were more accurate than other three methods. ② Accuracy of all five methods deteriorated when the sampling time frequency changed from daily to monthly and data missing degree increased. ③The coefficient of variance (CV) of the data sets was significantly correlated with the valuation indexes (RMSE, MAE and r) (P<0.05).【Conclusion】The KDEM and PIM are relatively more reliable, and the coefficient of variance (CV) of data sets is critical to the accuracy of all five interpolation methods.
Key words:  missing data; interpolation methods; coefficient of variance; uncertainty; time series