BioInformatics (3).ppt
《BioInformatics (3).ppt》由会员分享,可在线阅读,更多相关《BioInformatics (3).ppt(39页珍藏版)》请在麦多课文档分享上搜索。
1、BioInformatics (3),Computational Issues,Data Warehousing: Organising Biological Information into a Structured Entity (Worlds Largest Distributed DB) Function Analysis (Numerical Analysis) : Gene Expression Analysis : Applying sophisticated data mining/Visualisation to understand gene activities with
2、in an environment (Clustering ) Integrated Genomic Study : Relating structural analysis with functional analysis Structure Analysis (Symbolic Analysis) : Sequence Alignment: Analysing a sequence using comparative methods against existing databases to develop hypothesis concerning relatives (genetics
3、) and functions (Dynamic Programming and HMM) Structure prediction : from a sequence of a protein to predict its 3D structure (Inductive LP),Data Warehousing : Mapping Biologic into Data Logic,Structure Analysis : Alignments & Scores,Global (e.g. haplotype)ACCACACA:xx:x:ACACCATA Score= 5(+1) + 3(-1)
4、 = 2,Suffix (shotgun assembly)ACCACACA:ACACCATA Score= 3(+1) =3,Local (motif)ACCACACA: ACACCATA Score= 4(+1) = 4,A comparison of the homology search and the motif search for functional interpretation of sequence information.,Homology Search,Motif Search,New sequence,Retrieval,Similar sequence,Expert
5、 knowledge,Sequence interpretation,Sequence database (Primary data),Knowledge acquisition,Motif library (Empirical rules),Expert knowledge,New sequence,Inference,Sequence interpretation,Search and learning problems in sequence analysis,(Whole genome) Gene Expression Analysis,Quantitative Analysis of
6、 Gene Activities (Transcription Profiles),Gene Expression,Biotinylated RNA from experiment,GeneChip expression analysis probe array,Image of hybridized probe array,Each probe cell contains millions of copies of a specific oligonucleotide probe,Streptavidin- phycoerythrin conjugate,(Sub)cellular inho
7、mogeneity,( see figure),Cell-cycle differences in expression.XIST RNA localized on inactive X-chromosome,Cluster Analysis,Protein/protein complex,Genes,DNA regulatory elements,Functional Analysis via Gene Expression,Pairwise Measures,Clustering,Motif Searching/.,Clustering Algorithms,A clustering al
8、gorithm attempts to find natural groups of components (or data) based on some similarity. Also, the clustering algorithm finds the centroid of a group of data sets.To determine cluster membership, most algorithms evaluate the distance between a point and the cluster centroids. The output from a clus
9、tering algorithm is basically a statistical description of the cluster centroids with the number of components in each cluster.,Clusters of Two-Dimensional Data,Key Terms in Cluster Analysis,Distance & Similarity measures Hierarchical & non-hierarchical Single/complete/average linkage Dendrograms &
10、ordering,Distance Measures: Minkowski Metric,ref,Most Common Minkowski Metrics,An Example,4,3,x,y,Manhattan distance is called Hamming distance when all features are binary.,Gene Expression Levels Under 17 Conditions (1-High,0-Low),Similarity Measures: Correlation Coefficient,Similarity Measures: Co
11、rrelation Coefficient,Time,Gene A,Gene B,Gene A,Time,Gene B,Expression Level,Expression Level,Expression Level,Time,Gene A,Gene B,Distance-based Clustering,Assign a distance measure between data Find a partition such that: Distance between objects within partition (i.e. same cluster) is minimized Di
12、stance between objects from different clusters is maximised Issues : Requires defining a distance (similarity) measure in situation where it is unclear how to assign it What relative weighting to give to one attribute vs another? Number of possible partition is super-exponential,Normalized Expressio
13、n Data,hierarchical & non-,Hierarchical Clustering Techniques,Hierarchical Clustering,Given a set of N items to be clustered, and an NxN distance (or similarity) matrix, the basic process hierarchical clustering is this: 1.Start by assigning each item to its own cluster, so that if you have N items,
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- BIOINFORMATICS3PPT
