Introduction to Emerging Methods for Imputation in Official .ppt
《Introduction to Emerging Methods for Imputation in Official .ppt》由会员分享,可在线阅读,更多相关《Introduction to Emerging Methods for Imputation in Official .ppt(38页珍藏版)》请在麦多课文档分享上搜索。
1、Introduction to Emerging Methods for Imputation in Official Statistics,Ventspils 08/2006 Pasi Piela,25.08.2006,2,Pasi Piela,Overview,1. Imputation in the quality framework of official statistics2. Classes of imputation methods3. Requirements for imputation in official statistics4. Past research work
2、5. Statistical clustering6. Best imputation methods7. Processing imputation (and editing)8. New research plans9. Multiple imputation10. Fractional imputation11. Multilevel model based imputation,25.08.2006,3,Pasi Piela,Overview,1. Imputation in the quality framework of official statistics2. Classes
3、of imputation methods3. Requirements for imputation in official statistics4. Past research work5. Statistical clustering6. Best imputation methods7. Processing imputation (and editing)8. New research plans9. Multiple imputation10. Fractional imputation11. Multilevel model based imputation,25.08.2006
4、,4,Pasi Piela,Imputation is defined as the process of statistical replacement of missing valuesEditing and imputation are undertaken as part of a quality improvement strategy to improve accuracy, consistency and completeness.,25.08.2006,5,Pasi Piela,Classes of imputation methods,A1. Deterministic im
5、putation, or A2. Stochastic imputationB1. Logical imputation, B2. Real donor imputation, or B3. Model donor imputationC1. Single imputation, or C2. Multiple imputationD1. Hot-deck D2. Cold-deck,25.08.2006,6,Pasi Piela,Five requirements for imputation in official statistics (Chambers, 2001),Predictiv
6、e Accuracy: The imputation procedure should maximise the preservation of true values. Distributional Accuracy: The preservation of the distribution of the true values is also important. Estimation Accuracy: The imputation procedure should reproduce the lower order moments of the distributions of the
7、 true values. Imputation Plausibility: The imputation procedure should lead to imputed values that are plausible. Ranking Accuracy: The imputation procedure should maximise the preservation of order in the imputed values.,25.08.2006,7,Pasi Piela,Past research work,traditional methods: cell-mean impu
8、tation, regression imputation, random donor, nearest neighbour, etc. advanced imputation techniques based on statistical clustering (e.g. K-means) homogenous imputation classes hierarchical clustering (e.g. classification and regression trees) ”modern” statistical pattern recognition methods,25.08.2
9、006,8,Pasi Piela,Statistical clustering for imputation,imputation classes, cells, clusters, groups,average point locating the cluster,Computational methods?,.or searching appropriate imputation cells by using categorical sorting variables?,25.08.2006,9,Pasi Piela,K-means Clustering,The basic varianc
10、e minimization clustering algorithm. The ”Voronoi region” (as part of the tesselation) for cluster i is given bywhere | refers to the Euclidean norm (distance). At the iteration time t + 1 the weight wi is updated bywhere #Vi refers to the number of units xk in Vi.,25.08.2006,10,Pasi Piela,Classific
11、ation and Regression Trees,WAID - Weighted Automatic Interaction Detection software EU FP4 Project AUTIMP,Node,Y,X15,X1= 5,X27,X2=7,Binary splits,Target variable: categorical or continuous,Predictor variable,The original data,Original data splitted into two separate parts,Only one variable in turn d
12、efines the split.,Neural networks for clustering: Self-Organizing Maps, SOM,The basic SOM defines a mapping from the input data space Rn onto a latent space consisted typically of a two-dimensional array of nodes or neurons.,25.08.2006,12,Pasi Piela,Best methods,Nearest neighbour (hot-deck) by Eucli
13、dean distance metrics The best method is actually a system that includes several competitive imputation methods The development and evaluation of new imputation methods is closely connected to the software development.,25.08.2006,13,Pasi Piela,Processing imputation (and editing),The Banff system of
14、Statistics Canada is a good example about computerized editing and imputation process. It is a collection of specialized SAS procedures “each of which can be used independently or put together in order to satisfy the edit and imputation requirements of a survey“ as stated in the Banff manual (Statis
15、tics Canada, 2006). successor to more well-known GEIScurrently Statistics Finland is evaluating Banff,25.08.2006,14,Pasi Piela,25.08.2006,15,Pasi Piela,Verifying edits, generating implied edits and extremal points,25.08.2006,16,Pasi Piela,(Pre-)view of failure rates, fine-tuning the edits,25.08.2006
16、,17,Pasi Piela,Three basic outlier detection methods,25.08.2006,18,Pasi Piela,Identifying which fields require imputation and how to satisfy edits.,25.08.2006,19,Pasi Piela,Here : logical imputation (one possible value allowing to pass the edits).,25.08.2006,20,Pasi Piela,Nearest neighbour via const
17、ructing a k-dimensional tree.,25.08.2006,21,Pasi Piela,Using user-defined or some of the 20 hard-coded algorithms.,25.08.2006,22,Pasi Piela,Adjusting and rounding the data so that they add to a specific totals.,25.08.2006,23,Pasi Piela,Mass imputation procedure (not handled in this presentation),25.
18、08.2006,24,Pasi Piela,25.08.2006,25,Pasi Piela,Plans for future research - intro,Multiple imputation, MI, was not handled here because of the context of the research. But also detailed research in imputation variance and careful analysis of the datasets with hierarchical, multilevel nature (note the
19、 difference to the previously mentioned hierarchical clustering methods) containing cross-classifications and missingness were also excluded. This will lead us to the forthcoming research that will be next outlined.,25.08.2006,26,Pasi Piela,Multiple imputation,Donald Rubin, 1987 Bayesian framework v
20、ery famous and popular family of the imputation methods rarely used in official statistics several imputed datasets are combined to reach an estimate for the imputation variance assumptions are strict and there also exist some difficulties with MI variance estimation as discussed by Rao (2005) and K
21、im et al. (2004),25.08.2006,27,Pasi Piela,Fractional imputation,a sort of mixed real donor and model donor “hot-deck” imputation method for a population divided into imputation cells involves using more than one donor for a recipient simply: three imputed values might be assigned to each missing val
22、ue, with each entry allocated a weight of 1/3 of the nonrespondents original weight Kim and Fuller (2004): superior to MI designed to reduce the imputation variance while MI gives only a simple way to estimate it,25.08.2006,28,Pasi Piela,Fractional imputation,a sort of mixed real donor and model don
23、or “hot-deck” imputation method for a population divided into imputation cells involves using more than one donor for a recipient simply: three imputed values might be assigned to each missing value, with each entry allocated a weight of 1/3 of the nonrespondents original weight Kim and Fuller (2004
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- INTRODUCTIONTOEMERGINGMETHODSFORIMPUTATIONINOFFICIALPPT

链接地址:http://www.mydoc123.com/p-376652.html