1、 Extreme Events: Examining the “Tails” of a Distribution Eric W. Adams, Ph.D. Professor Samarin Ghosh, Ph.D. Member ASHRAE Abstract Although our engineering training treats all physics as deterministic, we also know that random variation is a normal part of nature. Strength of parts and loads on par
2、ts vary. Unusually low strengths and unusually high loads do occur, for example a flood or a hurricane in the case of a building or a bridge, or a slug of liquid refrigerant in the case of a compressor. Accidents can occur when extreme events happen. Failure of a part occurs when the load on the par
3、t is greater than the strength. Extreme events happen much more frequently than predicted by theories based on the normal distribution. Statisticians describe extreme value distributions as “heavy tailed” as a result. In this paper, models of extreme values are discussed for both load and strength.
4、Modeling examples are given for loads, strength of materials, applications to predicting time to failure and maintenance intervals. Extreme values are a part of our normal engineering lives. Introduction Most engineering problems are not, by their nature, completely deterministic. While deterministi
5、c physics may govern simple electrical circuits via Ohms law, neither the applied voltage nor the resistance is completely deterministic. Even the most basic electrical circuit, such as a light bulb, is subject to variation. Small differences in material properties and manufacturing affect the level
6、 of resistance of the wire in the light, even when the circuit is new. As the circuit ages, the resistance varies more. Material properties and age affect the voltage delivered by a battery powering the circuit. The result is that a nominally deterministic problem has many features of a problem with
7、 random variations. Human factors are another source of seemingly “random” variations. ASHRAE standard 55 (2010) , attempts to define the thermal environmental parameters that lead to comfort for human occupants. This problem is full of variation. First, in the same room environment, all occupants w
8、ill have different levels of clothing, and will have different metabolic rates. Second, experiments (Fanger, 1972) have shown that people in the same environment, with the same clothing, at nominally the same metabolic rate, still do not respond identically to the question “are you too hot or too co
9、ld?” In a building, there are always multiple spaces (or zones), and each space is not identical, so there is further variation in the comfort of occupants. To overcome the problem of variation, engineers use “factors of safety” or other constants in expressions from “experience”. The notion is that
10、 the deterministic expressions are used for design, but then an added “margin” is given to account for the unknown variations in the load and strength of the structure. Since failure occurs when load is greater than strength, and the levels of load and strength are not truly deterministic, the quest
11、ion becomes - what is the probability that load is greater than strength?1In this question, the mean load and strength are not as important as the extreme values of load and strength. 1Is this probability acceptable? Although the consequences of failure will not be discussed here, its importance can
12、not be overemphasized. The level of analysis and/or the factor of safety used in design must be much larger for events that endanger life when compared to events that might make us uncomfortable. Eric W. Adams is Manager, Aeroacoustics, Vibration, and Indoor Air Quality at Carrier Corporation, Syrac
13、use, New York Professor Samarin Ghosh is Assistant Professor of Biostatistics at Weill Cornell Medical College, NY LV-11-C033270 ASHRAE Transactions2011. American Society of Heating, Refrigerating and Air-Conditioning Engineers, Inc. (www.ashrae.org). Published in ASHRAE Transactions, Volume 117, Pa
14、rt 1. For personal use only. Additional reproduction, distribution, or transmission in either print or digital form is not permitted without ASHRAES prior written permission.Unfortunately, classes in basic statistics focus on statistics for the mean and predicting the main effects for various factor
15、s. The distributions learned in basic statistics, such as the Gaussian or Normal distribution, that are valuable for predicting main effects are not suitable for predicting extreme values (see OConnor, 2002). Consider the thermal comfort problem of an entire building with many spaces. For simplicity
16、, we will ignore the human factors and state that all people will react identically to the environment. Further, we will ignore radiation effects from the walls and through the windows. The “strength” variable in this example is temperature (to include radiation, one might use an operative temperatu
17、re). The temperature in each zone will be slightly different - and we will model it as random. The use of zoning, personal control, or other control strategies will certainly affect the standard deviation of the temperature, but will not affect the basic fact of variation - all sensors and systems w
18、ill have variation. The “load” variable is the combination of clothing and metabolic rate of the occupants that determines if they are comfortable. Consider the case of “too cold”: a person will be too cold if the combination of “clothing and metabolism” is too small for the given temperature. The s
19、tatistical problem is to determine how often someone is too cold in the building. It is important to predict the extremes of the distribution: how many people are wearing very light clothing, and how many rooms are much colder than average. Consider a second problem: is the strength of a beam is suf
20、ficient to hold a given load when both the beam strength and the load are subject to variation. Consider the charts in Figure 1. In Figure 1a, the load is much less than the strength, or using the thermal comfort problem, all people are dressed so that they will not be too cold (ignore the “too hot”
21、 problem). In Figure 1a, the probability of the beam breaking is the small gray shaded area where the two distributions intersect. Here, load is greater than strength, even though the average load is much smaller than strength. In the case of Figure 1a, there is a very, very small probability of fai
22、lure. In Figure 1b, the strength is not sufficiently larger than the load and some fraction of time the load is larger than the strength and failure will occur. In Figure 1a and 1b, the nave assumption of normal distributions was assumed for both load and strength. For simplicity, the standard devia
23、tion is assumed unity, but this assumption can be relaxed without any change in the conclusions. The normal distribution has the property that the tail of the distribution is very light - that is a very small fraction of the population lies outside 3 standard deviations from the mean. Further, the n
24、ormal distribution is symmetric, so the probability of an event a certain distance greater than the mean is equal to the probability of an event the same distance less than the mean. In Figure 1c, the same mean and standard deviation is assumed for both the load and strength, but for this figure, no
25、n-normal distributions are used. These distributions have the property that they are skewed rather than symmetric. The load distribution is assumed to be positively skewed. This occurs physically because many loads cannot be negative while in practice there is often little reason why the maximum loa
26、d is limited, thus the distribution must be right skewed. The strength distribution shown is left-skewed, indicating that some items have a much lower strength than others. This is often the case because flaws will limit the strength of the item, where a completely unflawed (e.g. single crystal) ite
27、m will have a maximum possible strength. Further, weakness might occur due to the natural effects of aging, which tend to create an upper limit for the strength value but no lower limit. The failure rate of the items in Figure 1c is seen to be larger than in Figure 1b. This is the case even though t
28、he mean and standard deviations are identical! The distributions in Figure 1c have the property that they have “heavy” tails compared with the normal distribution. This means a larger fraction of the population lies to the extremes relative to the normal distribution. This is quantified in Figure 1d
29、, where the probability of an event larger than a given value is shown for the distributions in Figure 1b and 1c. 2011 ASHRAE 271a) b) 024681e-051e-031e-01Normalized Strength or LoadprobabilityXxNormalExtreme Valuec) d) Figure 1: Comparison of load and strength curves. a) load greater than strength,
30、 low failure rate, b)load only slightly greater than strength, c) load greater than strength with more typical distribution functions, d) probability of an extreme event for the cases in Figure 1b and 1c. Figure 1d can be used to understand the error when using the normal distribution to assess the
31、probability of extreme events. The normal distribution can under predict the probability of extreme events substantially. For example, at the odds of an event occurring 1 in 1000 times (10-3 on the vertical scale), the magnitude of the event is about the mean value plus three standard deviations if
32、the normal distribution is used. If the extreme value distribution is used, the magnitude of the event is predicted to be six standard deviations greater than the mean. Alternatively, examine the probability of an event occurring four standard deviations above the mean. For the normal distribution,
33、the probability is less than 10-4, whereas for the extreme value distribution, the probability is approximately 1:100. Since extreme events (such as failures) often have very negative consequences (in contrast to the example of the fraction of people being too cold) under prediction of the probabili
34、ty of extreme events can lead to financial or, in the case of safety, human catastrophe. To understand the origin of extreme value theory, consider the problem of “records” - very low or very high values associated with a distribution. If one randomly draws sample load or strength values from the di
35、stribution in Figure 1b, and records only the highest values, it can be shown mathematically that the distribution of these extreme values will have the shape given in Figure 1c (see OConnor, 2002 or 272 ASHRAE TransactionsRausand and Hoyland, 2004). This process is exactly the same as the problem o
36、f prediction of failure in engineering. During the course of time, loads appear on the part. These loads, while nominally deterministic, have a random component. The only value of importance when considering the failure of the part is the highest (or “record” load in the given time). If one consider
37、s equal increments of time, records the highest load in each increment - it is exactly analogous to the statistical problem, and the distribution of those record values is given by Figure 1c. Development Given the wide range of possible distributions, the problem of describing the extremes of a dist
38、ribution may appear hopeless. First, it is a relatively simple problem in order statistics to describe the statistics of the minimum or maximum of a random sample of n numbers from a given cumulative distribution function, F(x) (see Hoog, McKean, and Craig, 2005 or Rausand and Hoyland, 2004): T(1) =
39、 min T1, T2, T3, , Tn = UnT(n) = max T1, T2, T3, , Tn = Vn (1) FUn(u) = 1 - (1 - F(u)nFVn(u) = F(u)nLuckily, a number of researchers (e.g. Cramer, (1946), Gumbel (1958), Pickands (1975) developed Extreme Value Theory2that showed that in limit of large samples (n is large, exactly what we have in eng
40、ineering where we have a very large number of load and strength values in each increment of time), and under a wide range of conditions, there are only a few models that are needed for describing the statistics of the largest and/or smallest value of a distribution. For details of applying extreme v
41、alue theory to reliability modeling, see Rausand and Hoyland (2004) and OConnor (2002). Type I extreme value distributions are used to describe the minimum and maximum for the right and left tails for exponential types of distributions (this includes most standard distributions such as the normal, l
42、ognormal, and exponential). These are usually referred to as the Gumbel distribution of the largest or smallest extreme when referring to the left and right tails respectively. It is these distributions which are shown in Figure 1c. This distribution is not limited in magnitude, meaning the smallest
43、 extreme can be negative. A Gumbel distribution of the largest extreme would be used to model loads in many situations. The case where the variable may have an upper limit can also be treated by more advanced techniques (see Einmahl and Smeets, 2009). The probability density functions for the extrem
44、e value distributions of type I are: expexp1)(xxxfFor the maximum values (2) expexp1)(xxxfFor the minimum values is the mode (the most probable value the “peak” of the probability density function), is the scale parameter. It can be shown that the mode and scale are related to the mean, , and standa
45、rd deviation, , of the distribution as: 577.0For the maximum values 577.0For the minimum values 283.1For both maximum and minimum values 2Two classes of Extreme Value distributions exist. This paper will cover only the first class, using generalized extreme value distributions. The second class of p
46、roblems uses generalized Pareto distributions to handle exceedance over threshold problems. 2011 ASHRAE 273If the log of the load or strength is extreme value distributed, a type II extreme value distribution is used. Type III extreme value distributions are the limiting case for the minimum of a se
47、t of values which are bounded on the left. The Weibull distribution describes such cases, which are typical of strength of materials (strength must be positive and one is interested in the smallest strength of a collection of bonds making up a material). Another typical example of a Type III extreme
48、 value distribution is the smallest time to failure for assemblies made of many possible failure modes. Thus, the Weibull distribution arises naturally in engineering analysis of extreme values. The Weibull distribution (cumulative distribution function or CDF) is: )3(exp1)( ettF The two constants i
49、n the distribution have engineering meaning. is the time for 62% of the population to fail - a time of little practical interest in reliability because the engineers would all be fired long before that threshold. However it gives a method of comparing failure rates as engineering improvements to the product will increase . Some product, for example bearings, use a different failure percentage, such as 10% to create a B(10) life. The other Weibull parameter, is the “slope” which describes the rate of failure. All slopes less than one have a decreasing