欢迎来到麦多课文档分享! | 帮助中心 海量文档,免费浏览,给你所需,享你所想!
麦多课文档分享
全部分类
  • 标准规范>
  • 教学课件>
  • 考试资料>
  • 办公文档>
  • 学术论文>
  • 行业资料>
  • 易语言源码>
  • ImageVerifierCode 换一换
    首页 麦多课文档分享 > 资源分类 > PPT文档下载
    分享到微信 分享到微博 分享到QQ空间

    BioInformatics (3).ppt

    • 资源ID:378972       资源大小:1.45MB        全文页数:39页
    • 资源格式: PPT        下载积分:2000积分
    快捷下载 游客一键下载
    账号登录下载
    微信登录下载
    二维码
    微信扫一扫登录
    下载资源需要2000积分(如需开发票,请勿充值!)
    邮箱/手机:
    温馨提示:
    如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
    如需开发票,请勿充值!如填写123,账号就是123,密码也是123。
    支付方式: 支付宝扫码支付    微信扫码支付   
    验证码:   换一换

    加入VIP,交流精品资源
     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    BioInformatics (3).ppt

    1、BioInformatics (3),Computational Issues,Data Warehousing: Organising Biological Information into a Structured Entity (Worlds Largest Distributed DB) Function Analysis (Numerical Analysis) : Gene Expression Analysis : Applying sophisticated data mining/Visualisation to understand gene activities with

    2、in an environment (Clustering ) Integrated Genomic Study : Relating structural analysis with functional analysis Structure Analysis (Symbolic Analysis) : Sequence Alignment: Analysing a sequence using comparative methods against existing databases to develop hypothesis concerning relatives (genetics

    3、) and functions (Dynamic Programming and HMM) Structure prediction : from a sequence of a protein to predict its 3D structure (Inductive LP),Data Warehousing : Mapping Biologic into Data Logic,Structure Analysis : Alignments & Scores,Global (e.g. haplotype)ACCACACA:xx:x:ACACCATA Score= 5(+1) + 3(-1)

    4、 = 2,Suffix (shotgun assembly)ACCACACA:ACACCATA Score= 3(+1) =3,Local (motif)ACCACACA: ACACCATA Score= 4(+1) = 4,A comparison of the homology search and the motif search for functional interpretation of sequence information.,Homology Search,Motif Search,New sequence,Retrieval,Similar sequence,Expert

    5、 knowledge,Sequence interpretation,Sequence database (Primary data),Knowledge acquisition,Motif library (Empirical rules),Expert knowledge,New sequence,Inference,Sequence interpretation,Search and learning problems in sequence analysis,(Whole genome) Gene Expression Analysis,Quantitative Analysis of

    6、 Gene Activities (Transcription Profiles),Gene Expression,Biotinylated RNA from experiment,GeneChip expression analysis probe array,Image of hybridized probe array,Each probe cell contains millions of copies of a specific oligonucleotide probe,Streptavidin- phycoerythrin conjugate,(Sub)cellular inho

    7、mogeneity,( see figure),Cell-cycle differences in expression.XIST RNA localized on inactive X-chromosome,Cluster Analysis,Protein/protein complex,Genes,DNA regulatory elements,Functional Analysis via Gene Expression,Pairwise Measures,Clustering,Motif Searching/.,Clustering Algorithms,A clustering al

    8、gorithm attempts to find natural groups of components (or data) based on some similarity. Also, the clustering algorithm finds the centroid of a group of data sets.To determine cluster membership, most algorithms evaluate the distance between a point and the cluster centroids. The output from a clus

    9、tering algorithm is basically a statistical description of the cluster centroids with the number of components in each cluster.,Clusters of Two-Dimensional Data,Key Terms in Cluster Analysis,Distance & Similarity measures Hierarchical & non-hierarchical Single/complete/average linkage Dendrograms &

    10、ordering,Distance Measures: Minkowski Metric,ref,Most Common Minkowski Metrics,An Example,4,3,x,y,Manhattan distance is called Hamming distance when all features are binary.,Gene Expression Levels Under 17 Conditions (1-High,0-Low),Similarity Measures: Correlation Coefficient,Similarity Measures: Co

    11、rrelation Coefficient,Time,Gene A,Gene B,Gene A,Time,Gene B,Expression Level,Expression Level,Expression Level,Time,Gene A,Gene B,Distance-based Clustering,Assign a distance measure between data Find a partition such that: Distance between objects within partition (i.e. same cluster) is minimized Di

    12、stance between objects from different clusters is maximised Issues : Requires defining a distance (similarity) measure in situation where it is unclear how to assign it What relative weighting to give to one attribute vs another? Number of possible partition is super-exponential,Normalized Expressio

    13、n Data,hierarchical & non-,Hierarchical Clustering Techniques,Hierarchical Clustering,Given a set of N items to be clustered, and an NxN distance (or similarity) matrix, the basic process hierarchical clustering is this: 1.Start by assigning each item to its own cluster, so that if you have N items,

    14、 you now have N clusters, each containing just one item. Let the distances (similarities) between the clusters equal the distances (similarities) between the items they contain.2.Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one less clus

    15、ter.3.Compute distances (similarities) between the new cluster and each of the old clusters.4.Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.,The distance between two clusters is defined as the distance between,Single-Link Method / Nearest Neighbor Complete-Link /

    16、 Furthest Neighbor Their Centroids. Average of all cross-cluster pairs.,Computing Distances,single-link clustering (also called the connectedness or minimum method) : we consider the distance between one cluster and another cluster to be equal to the shortest distance from any member of one cluster

    17、to any member of the other cluster. If the data consist of similarities, we consider the similarity between one cluster and another cluster to be equal to the greatest similarity from any member of one cluster to any member of the other plete-link clustering (also called the diameter or maximum meth

    18、od): we consider the distance between one cluster and another cluster to be equal to the longest distance from any member of one cluster to any member of the other cluster. average-link clustering : we consider the distance between one cluster and another cluster to be equal to the average distance

    19、from any member of one cluster to any member of the other cluster.,Single-Link Method,b,a,Distance Matrix,Euclidean Distance,(1),(2),(3),a,b,c,c,c,d,a,b,d,d,a,b,c,d,Complete-Link Method,b,a,Distance Matrix,Euclidean Distance,(1),(2),(3),a,b,c,c,d,a,b,d,c,d,a,b,c,d,Compare Dendrograms,2,4,6,0,Single-

    20、Link,Complete-Link,Ordered dendrograms,2 n-1 linear orderings of n elements (n= # genes or conditions)Maximizing adjacent similarity is impractical. So order by: Average expression level, Time of max induction, or Chromosome positioning,Eisen98,Which clustering methods do you suggest for the followi

    21、ng two-dimensional data?,Nadler and Smith, Pattern Recognition Engineering, 1993,Problems of Hierarchical Clustering,It concerns more about complete tree structure than the optimal number of clusters. There is no possibility of correcting for a poor initial partition. Similarity and distance measure

    22、s rarely have strict numerical significance.,Normalized Expression Data,Tavazoie et al. 1999 (http:/arep.med.harvard.edu),Non-hierarchical clustering,Clustering by K-means,Given a set S of N p-dimension vectors without any prior knowledge about the set, the K-means clustering algorithm forms K disjo

    23、int nonempty subsets such that each subset minimizes some measure of dissimilarity locally. The algorithm will globally yield an optimal dissimilarity of all subsets. K-means algorithm has time complexity O(RKN) where K is the number of desired clusters and R is the number of iterations to converges

    24、. Euclidean distance metric between the coordinates of any two genes in the space reflects ignorance of a more biologically relevant measure of distance. K-means is an unsupervised, iterative algorithm that minimizes the within-cluster sum of squared distances from the cluster mean. The first cluste

    25、r center is chosen as the centroid of the entire data set and subsequent centers are chosen by finding the data point farthest from the centers already chosen. 200-400 iterations.,K-Means Clustering Algorithm,1) Select an initial partition of k clusters 2) Assign each object to the cluster with the

    26、closest center: 3) Compute the new centers of the clusters: 4) Repeat step 2 and 3 until no object changes cluster,Time-point 1,Time-point 3,Time-point 2,Gene 1,Gene 2,Normalized Expression Data from microarrays,T1,T2,T3,Gene 1,Gene N,.,Representation of expression data,dij,Identifying prevalent expression patterns (gene clusters),Time-point 1,Time-point 3,Time-point 2,Time -point,Time -point,Time -point,Normalized Expression,Normalized Expression,Normalized Expression,Glycolysis,Nuclear Organization,Ribosome,Translation,Unknown,Genes,MIPS functional category,Evaluate Cluster contents,


    注意事项

    本文(BioInformatics (3).ppt)为本站会员(bowdiet140)主动上传,麦多课文档分享仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文档分享(点击联系客服),我们立即给予删除!




    关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

    copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
    备案/许可证编号:苏ICP备17064731号-1 

    收起
    展开