书签分享收藏举报版权申诉 / 53

立即下载加入VIP,交流精品资源

当前位置：首页 > 教学课件 > 大学教育 > Introduction to Bayesian inference and computation for social .ppt

Introduction to Bayesian inference and computation for social .ppt

上传人：inwarn120

文档编号：376631

上传时间：2018-10-08

格式：PPT

页数：53

大小：1.08MB

《Introduction to Bayesian inference and computation for social .ppt》由会员分享，可在线阅读，更多相关《Introduction to Bayesian inference and computation for social .ppt（53页珍藏版）》请在麦多课文档分享上搜索。

1、Introduction to Bayesian inference and computation for social science data analysisNicky Best Imperial College, Londonwww.bias-project.org.uk,Outline,Overview of Bayesian methods Illustration of conjugate Bayesian inference MCMC methods Examples illustrating: Analysis using informative priors Hierar

2、chical priors, meta-analysis and evidence synthesis Adjusting for data quality Model uncertainty Discussion,Overview of Bayesian inference and computation,Overview of Bayesian methods,Bayesian methods have been widely applied in many areas medicine / epidemiology / genetics ecology / environmental s

3、ciences finance archaeology political and social sciences, Motivations for adopting Bayesian approach vary natural and coherent way of thinking about science and learning pragmatic choice that is suitable for the problem in hand,Overview of Bayesian methods,Medical context: FDA draft guidance www.fd

4、a.gov/cdrh/meetings/072706-bayesian.html: “Bayesian statisticsprovides a coherent method for learning from evidence as it accumulates” Evidence can accumulate in various ways: Sequentially Measurement of many similar units (individuals, centres, sub-groups, areas, periods) Measurement of different a

5、spects of a problem Evidence can take different forms: Data Expert judgement,Overview of Bayesian methods,Bayesian approach also provides formal framework for propagating uncertainty Well suited to building complex models by linking together multiple sub-models Can obtain estimates and uncertainty i

6、ntervals for any parameter, function of parameters or predictive quantity of interest Bayesian inference doesnt rely on asymptotics or analytic approximations Arbitrarily wide range of models can be handled using same inferential framework Focus on specifying realistic models, not on choosing analyt

7、ically tractable approximation,Bayesian inference,Distinguish between x : known quantities (data) q : unknown quantities (e.g. regression coefficients, future outcomes, missing observations) Fundamental idea: use probability distributions to represent uncertainty about unknowns Likelihood model for

8、the data: p( x | q ) Prior distribution representing current uncertainty about unknowns: p(q ) Applying Bayes theorem gives posterior distribution,Conjugate Bayesian inference,Example: election poll (from Franklin, 2004*) Imagine an election campaign where (for simplicity) we have just a Government/

9、Opposition vote choice. We enter the campaign with a prior distribution for the proportion supporting Government. This is p(q ) As the campaign begins, we get polling data. How should we change our estimate of Governments support?,*Adapted from Charles Franklins Essex Summer School course slides: ht

10、tp:/www.polisci.wisc.edu/users/franklin/Content/Essex/Lecs/BayesLec01p6up.pdf,Conjugate Bayesian inference,Data and likelihood Each poll consists of n voters, x of whom say they will vote for Government and n - x will vote for the opposition. If we assume we have no information to distinguish voters

11、 in their probability of supporting government then we have a binomial distribution for x,This binomial distribution is the likelihood p(x | q ),Conjugate Bayesian inference,Prior We need to specify a prior that expresses our uncertainty about the election (before it begins) conforms to the nature o

12、f the q parameter, i.e. is continuous but bounded between 0 and 1 A convenient choice is the Beta distribution,Conjugate Bayesian inference,Beta(a,b) distribution can take a variety of shapes depending on its two parameters a and b,Mean of Beta(a, b) distribution = a/(a+b)Variance of Beta(a,b) distr

13、ibution = ab(a+b+1)/(a+b)2,Conjugate Bayesian inference,Posterior Combining a beta prior with the binomial likelihood gives a posterior distribution,When prior and posterior come from same family, the prior is said to be conjugate to the likelihood Occurs when prior and likelihood have the same kern

14、el,Conjugate Bayesian inference,Suppose I believe that Government only has the support of half the population, and I think that estimate has a standard deviation of about 0.07 This is approximately a Beta(50, 50) distribution We observe a poll with 200 respondents, 120 of whom (60%) say they will vo

15、te for Government This produces a posterior which is a Beta(120+50, 80+50) = Beta(170, 130) distribution,Conjugate Bayesian inference,Prior mean, E(q ) = 50/100 = 0.5 Posterior mean, E(q | x, n) = 170/300 = 0.57 Posterior SD, Var(q | x, n) = 0.029 Frequentist estimate is based only on the data:,Conj

16、ugate Bayesian inference,A harder problem What is the probability that Government wins? It is not .57 or .60. Those are expected votes but not the probability of winning. How to answer this? Frequentists have a hard time with this one. They can obtain a p-value for testing H0: q 0.5, but this isnt t

17、he same as the probability that Government wins (its actually the probability of observing data more extreme than 120 out of 200 if H0 is true),Easy from Bayesian perspective calculate Pr(q 0.5 | x, n), the posterior probability that q 0.5,Bayesian computation,All Bayesian inference is based on the

18、posterior distribution Summarising posterior distributions involves integration,Except for conjugate models, integrals are usually analytically intractable Use Monte Carlo (simulation) integration (MCMC),Bayesian computation,Suppose we didnt know how to analytically integrate the Beta(170, 130) post

19、erior .but we do know how to simulate from a Beta,Bayesian computation,Can also use samples to estimate posterior tail area probabilities, percentiles, variances etc. Difficult to generate independent samples when posterior is complex and high dimensional Instead, generate dependent samples from a M

20、arkov chain having p(q | x ) as its stationary distribution Markov chain Monte Carlo (MCMC),Illustrative Examples,Borrowing strength,Bayesian learning borrowing “strength” (precision) from other sources of information Informative prior is one such source “todays posterior is tomorrows prior” relevan

21、ce of prior information to current study must be justified,Informative priors,Example 1: Western and Jackman (1994)* Example of regression analysis in comparative research What explains cross-national variation in union density? Union density is defined as the percentage of the work force who belong

22、s to a labour union Two issues Philosophical: data represent all available observations from a population conventional (frequentist) analysis based on long-run behaviour of repeatable data mechanism not appropriate Practical: small, collinear dataset yields imprecise estimates of regression effects,

23、* Slides adapted from Jeff Grynaviski: http:/home.uchicago.edu/grynav/bayes/abs03.htm,Informative priors,Competing theories Wallerstein: union density depends on the size of the civilian labour force (LabF) Stephens: union density depends on industrial concentration (IndC) Note: These two predictors

24、 correlate at -0.92. Control variable: presence of a left-wing government (LeftG) Sample: n = 20 countries with a continuous history of democracy since World War II Fit linear regression model to compare theoriesunion densityi N(mi, s2)mi = b0 + b1LeftG + b2LabF + b3IndC,Informative priors,Results w

25、ith non-informative priors on regression coefficients (numerically equivalent to OLS analysis), point estimate _ 95% CI,Informative priors,Motivation for Bayesian approach with informative priors Because of small sample size and multicollinear variables, not able to adjudicate between theories Data

26、tend to favour Wallerstein (union density depends on labour force size), but neither coefficient estimated very precisely Other historical data are available that could provide further relevant information Incorporation of prior information provides additional structure to the data, which helps to u

27、niquely identify the two coefficients,Informative priors,Prior distributions for regression coefficients Wallerstein Believes in negative labour force effect Comparison of Sweden and Norway in 1950: doubling of labour force corresponds to 3.5-4% drop in union densityon log scale, labour force effect

28、 size -3.5/log(2) -5 Confidence in direction of effect represented by prior SD giving 95% interval that excludes 0 b2 N(-5, 2.52),Informative priors,Prior distributions for regression coefficients Stephens Believes in positive industrial concentration effect Decline in industrial concentration in UK

29、 in 1980s: drop of 0.3 in industrial concentration corresponds to about 3% drop in union densityindustrial concentration effect size 3/0.3 = 10 Confidence in direction of effect represented by prior SD giving 95% interval that excludes 0 b3 N(10, 52),Informative priors,Prior distributions for regres

30、sion coefficients Wallerstein and Stephens Both believe left-wing govts assist union growth Assuming 1 year of left-wing govt increases union density by about 1% translates to effect size of 0.3 Confidence in direction of effect represented by prior SD giving 95% interval that excludes 0 b1 N(0.3, 0

31、.152) Vague prior b0 N(0, 1002) assumed for intercept,Informative priors,Ind ConcLab ForceLeft Govt,Informative priors,Effects of LabF and IndC estimated more precisely Both sets of prior beliefs support inference that labour-force size decreases union density Only Stephens prior supports conclusion

32、 that industrial concentration increases union density Choice of prior is subjective if no consensus, can we be satisfied that data have been interpreted “fairly”? Sensitivity analysis Sensitivity to priors (e.g. repeat analysis using priors with increasing variance) Sensitivity to data (e.g. residu

33、als, influence diagnostics),Hierarchical priors,Hierarchical priors are another widely used approach for borrowing strength Useful when data available on many “similar” units (individuals, areas, studies, subgroups,) Data xi and parameters qi for each unit i=1,N Three different assumptions: Independ

34、ent parameters: units are unrelated, and each qi is estimated separately using data xi alone Identical parameters: observations treated as coming from same unit, with common parameter q Exchangeable parameters: units are “similar” (labels convey no information) mathematically equivalent to assuming

35、qis are drawn from common probability distribution with unknown parameters,Meta-analysis,Example 2: Meta-analysis (Spiegelhalter et al 2004) 8 small RCTs of IV magnesium sulphate following acute myocardial infarction Data: xig = deaths, nig = patients in trial i, treatment group g (0=control, 1=magn

36、esium) Model (likelihood): xig Binomial(pig, nig)logit(pig) = fi + qig i is log odds ratio for treatment effect If not willing to believe trials are identical, but no reason to believe they are systematically different assume qis are exchangeable with hierarchical prior qi Normal(m, s 2) m, s 2 also

37、 treated as unknown with (vague) priors,Meta-analysis,Estimates and 95% intervals for treatment effect from independent MLE and hierarchical Bayesian analysis,Meta-analysis,Effective sample sizen = sample size of trialV1 = variance of qi without borrowing (var of MLE)V2 = variance of qi with borrowi

38、ng (posterior variance of qi ),Meta-analysis,Example 3: Meta-analysis of effect of class size on educational achievement (Goldstein et al, 2000),8 studies: 1 RCT 3 matched 2 experimental 2 observational,Meta-analysis,Goldstein et al use maximum likelihood, with bootstrap CI due to small sample size

39、Under-estimates uncertainty relative to Bayesian intervals Note that 95% CI for Bayesian estimate of effect of class size includes 0,Accounting for data quality,Bayesian approach also provides formal framework for propagating uncertainty about different quantities in a model Natural tool for explici

40、tly modelling different aspects of data quality Measurement error Missing data,Accounting for data quality,Example 4: Accounting for population errors in small-area estimation and disease mapping (Best and Wakefield, 1999),Context: Mapping geographical variations in risk of breast cancer by electora

41、l ward in SE England, 1981-1991 Typical model: yi Poisson(li Ni) yi = number of breast cancer cases in area i,li is the area specific rate of breast cancer: parameter of interest,Ni = St Nit = population-years at risk in area i,Accounting for data quality,Ni usually assumed to be known Ignores uncer

42、tainty in small-area age/sex population counts in inter-census years B&W make use of additional data on Registrar Generals mid-year district age/sex population totals Ndt Model A: Nit = Ndt pit where pit is proportion of annual district population in particular age group of interest living in ward i

43、 pit estimated by interpolating 1981 and 1991 census counts Model B: Allow for sampling variability in Nit Nit Multinomial(Ndt, p1t , pKt ) Model C: Allow for uncertainty in proportions pit pit informative Dirichlet prior distribution,Accounting for data quality,prior,prior,prior,ward i,Random effec

44、ts Poisson regression: log li = ai + b Xi Xi = deprivation score for ward i,Accounting for data quality,prior,prior,prior,ward i,year t,prior,Accounting for data quality,Area-specific RR estimates,Model uncertainty,Model uncertainty can be large for observational data studies In regression models: W

45、hat is the best set of predictors for response of interest? Which confounders to control for? Which interactions to include? What functional form to use (linear, non-linear,.)?,Model uncertainty,Example 5: Predictors of crime rates in US States (adapted from Raftery et al, 1997) Ehrlich (1973) devel

46、oped and tested theory that decision to commit crime is rational choice based on costs and benefits Costs of crime related to probability of imprisonment and average length of time served in prison Benefits of crime related to income inequalities and aggregate wealth of community Net benefits of oth

47、er (legitimate) activities related to employment rate and education levels in community Ehrlich analysed data from 47 US states in 1960, focusing on relationship between crime rate and the 2 prison variables Up to 13 candidate control variables also considered,Model uncertainty,y = log crime rate in

48、 1960 in each of 47 US states Z1, Z2 = log prob. of prison, log av. time in prison X1, X13 = candidate control variables Fit Normal linear regression model Results sensitive to choice of control variables,Table adapted from Table 2 in Raftery et al (1997),Model uncertainty,Using Bayesian approach, c

49、an let set of control variables be an unknown parameter of the model, q Dont know (a priori) no. of covariates in best model q has unknown dimension assign prior distribution Can handle such “trans-dimensional” (TD) models using “reversible jump” MCMC algorithms Normal linear regression model yi Normal(mi, s 2) i = 1,.,47,Variable selection model:,yi,k,s2,b,q,mi,state i,g,Model uncertainty,Model uncertainty,probability 0 0.4 0.8,Model uncertainty,Model uncertainty,

下载提示：本站仅提供存储空间/不修改/不编辑