ASTM D7915-2014 8137 Standard Practice for Application of Generalized Extreme Studentized Deviate &40 GESD&41 Technique to Simultaneously Identify Multiple Outliers in a Data Set《使.pdf
《ASTM D7915-2014 8137 Standard Practice for Application of Generalized Extreme Studentized Deviate &40 GESD&41 Technique to Simultaneously Identify Multiple Outliers in a Data Set《使.pdf》由会员分享,可在线阅读,更多相关《ASTM D7915-2014 8137 Standard Practice for Application of Generalized Extreme Studentized Deviate &40 GESD&41 Technique to Simultaneously Identify Multiple Outliers in a Data Set《使.pdf(5页珍藏版)》请在麦多课文档分享上搜索。
1、Designation: D7915 14Standard Practice forApplication of Generalized Extreme Studentized Deviate(GESD) Technique to Simultaneously Identify MultipleOutliers in a Data Set1This standard is issued under the fixed designation D7915; the number immediately following the designation indicates the year of
2、original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. Asuperscript epsilon () indicates an editorial change since the last revision or reapproval.1. Scope1.1 This practice provides a step by step procedure for theappl
3、ication of the Generalized Extreme Studentized Deviate(GESD) Many-Outlier Procedure to simultaneously identifymultiple outliers in a data set. (See Bibliography.)1.2 This practice is applicable to a data set comprisingobservations that is represented on a continuous numericalscale.1.3 This practice
4、is applicable to a data set comprising aminimum of six observations.1.4 This practice is applicable to a data set where the normal(Gaussian) model is reasonably adequate for the distributionalrepresentation of the observations in the data set.1.5 The probability of false identification of outliers a
5、sso-ciated with the decision criteria set by this practice is 0.01.1.6 It is recommended that the execution of this practice beconducted under the guidance of personnel familiar with thestatistical principles and assumptions associated with theGESD technique.1.7 This standard does not purport to add
6、ress all of thesafety concerns, if any, associated with its use. It is theresponsibility of the user of this standard to establish appro-priate safety and health practices and determine the applica-bility of regulatory limitations prior to use.2. Terminology2.1 Definitions of Terms Specific to This
7、Standard:2.1.1 outlier, nan observation (or a subset of observations)which appears to be inconsistent with the remainder of the dataset.3. Significance and Use3.1 The GESD procedure can be used to simultaneouslyidentify up to a pre-determined number of outliers (r) in a dataset, without having to pr
8、e-examine the data set and make apriori decisions as to the location and number of potentialoutliers.3.2 The GESD procedure is robust to masking. Maskingdescribes the phenomenon where the existence of multipleoutliers can prevent an outlier identification procedure fromdeclaring any of the observati
9、ons in a data set to be outliers.3.3 The GESD procedure is automation-friendly, and hencecan easily be programmed as automated computer algorithms.4. Procedure4.1 Specify the maximum number of outliers (r) in a data setto be identified.4.1.1 The recommended maximum number of outliers (r)by this prac
10、tice is two (2) for data sets with six to twelveobservations.4.1.2 For data sets with more than twelve observations, therecommended maximum number of outliers (r) is the lesser often or 20 %.4.1.3 The recommended values for r in 4.1.1 and 4.1.2 arenot intended to be mandatory. Users can specify othe
11、r valuesbased on their specific needs.4.2 Compute test statistic T for each observation in theinitial starting data set (DTS0) as follows:T 5 |x 2 x|s (1)where:x = an observation in the data set,x = average calculated using all observations in the data set,ands = sample standard deviation calculated
12、 using all observa-tions in the data set.4.3 Remove the observation in the data set with the largestabsolute magnitude of the test statistic T and form a reduceddata set (DTSi), where i = number of observations removedfrom the initial data set.4.4 Re-calculate T for all observations in the reduced d
13、ataset from 4.3.1This practice is under the jurisdiction of ASTM Committee D02 on PetroleumProducts, Liquid Fuels, and Lubricants and is the direct responsibility of Subcom-mittee D02.94 on Coordinating Subcommittee on Quality Assurance and Statistics.Current edition approved May 1, 2014. Published
14、June 2014. DOI: 10.1520/D7915-14.Copyright ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States14.5 Repeat steps 4.3 to 4.4 until r number of observationshave been removed from the initial data set. That is, untilcalculation of all Ts for all observ
15、ations in the reduced data setDTSrhas been completed.4.6 Compare the maximum T computed in each data set(DTS0to DTSr) to a critical value criticalassociated the data setDTSi, where is chosen based on a false identificationprobability of 0.01. See Table A1.1 in Annex A1 for valuesapplicable to differ
16、ent data set sizes.4.7 Identify the data set DTSmfor which the maximum Texceeds critical, and m (number of observations removed fromthe initial data set DTS0) is the largest value (0 DTS0T0DTS1T1DTS2T2DTS3T3DTS4T4DTS5T5DTS6T635.0 0.30 35.0 0.44 35.0 0.64 35.0 0.97 35.0 0.94 35.0 1.05 35.0 1.1636.6 0
17、.05 36.6 0.04 36.6 0.17 36.6 0.37 36.6 0.32 36.6 0.40 36.6 0.4934.7 0.37 34.7 0.52 34.7 0.73 34.7 1.08 34.7 1.06 34.7 1.17 34.7 1.2936.2 0.04 36.2 0.14 36.2 0.29 36.2 0.52 36.2 0.48 36.2 0.56 36.2 0.6637.0 0.14 37.0 0.06 37.0 0.05 37.0 0.22 37.0 0.17 37.0 0.24 37.0 0.3225.3 2.44 25.3 2.8537.2 0.18 3
18、7.2 0.11 37.2 0.00 37.2 0.15 37.2 0.09 37.2 0.16 37.2 0.2441.3 1.09 41.3 1.12 41.3 1.20 41.3 1.38 41.3 1.50 41.3 1.49 41.3 1.4926.0 2.29 26.0 2.68 26.0 3.2724.6 2.6033.5 0.63 33.5 0.81 33.5 1.08 33.5 1.53 33.5 1.52 33.5 1.6535.5 0.19 35.5 0.32 35.5 0.49 35.5 0.78 35.5 0.75 35.5 0.85 35.5 0.9535.4 0.
19、21 35.4 0.34 35.4 0.52 35.4 0.82 35.4 0.79 35.4 0.89 35.4 1.0039.9 0.78 39.9 0.78 39.9 0.79 39.9 0.86 39.9 0.96 39.9 0.93 39.9 0.9039.2 0.62 39.2 0.60 39.2 0.59 39.2 0.60 39.2 0.69 39.2 0.65 39.2 0.6036.6 0.05 36.6 0.04 36.6 0.17 36.6 0.37 36.6 0.32 36.6 0.40 36.6 0.4937.2 0.18 37.2 0.11 37.2 0.00 3
20、7.2 0.15 37.2 0.09 37.2 0.16 37.2 0.2433.2 0.70 33.2 0.89 33.2 1.16 33.2 1.64 33.2 1.6434.0 0.52 34.0 0.69 34.0 0.93 34.0 1.34 34.0 1.33 34.0 1.45 34.0 1.5935.7 0.15 35.7 0.27 35.7 0.43 35.7 0.71 35.7 0.67 35.7 0.77 35.7 0.8739.2 0.62 39.2 0.60 39.2 0.59 39.2 0.60 39.2 0.69 39.2 0.65 39.2 0.6042.1 1
21、.26 42.1 1.32 42.1 1.43 42.1 1.6835.7 0.15 35.7 0.27 35.7 0.43 35.7 0.71 35.7 0.67 35.7 0.77 35.7 0.8740.2 0.84 40.2 0.85 40.2 0.88 40.2 0.97 40.2 1.08 40.2 1.05 40.2 1.0236.6 0.05 36.6 0.04 36.6 0.17 36.6 0.37 36.6 0.32 36.6 0.40 36.6 0.4941.1 1.04 41.1 1.07 41.1 1.14 41.1 1.31 41.1 1.43 41.1 1.41
22、41.1 1.4041.1 1.04 41.1 1.07 41.1 1.14 41.1 1.31 41.1 1.43 41.1 1.41 41.1 1.4039.1 0.60 39.1 0.58 39.1 0.56 39.1 0.56 39.1 0.65 39.1 0.61 39.1 0.5640.6 0.93 40.6 0.95 40.6 1.00 40.6 1.12 40.6 1.23 40.6 1.21 40.6 1.1941.3 1.09 41.3 1.12 41.3 1.20 41.3 1.38 41.3 1.50 41.3 1.49 41.3 1.49average 36.37 3
23、6.78 37.19 37.60 37.43 37.60 37.77std dev 4.54 4.02 3.42 2.68 2.58 2.48 2.38Tmax2.60 2.85 3.27 1.68 1.64 1.65 1.59critical3.24 3.22 3.20 3.18 3.16 3.14 3.11m=0 m=1 m=2 m=3 m=4 m=5 m=6D7915 1425.2.4 From 4.7, the largest m value for which the maximumT value of the data set DTSmexceeds criticalis 2 (s
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
5000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- ASTMD791520148137STANDARDPRACTICEFORAPPLICATIONOFGENERALIZEDEXTREMESTUDENTIZEDDEVIATE40GESD41TECHNIQUETOSIMULTANEOUSLYIDENTIFYMULTIPLEOUTLIERSINADATASET

链接地址:http://www.mydoc123.com/p-526513.html