Historical Perspective The Relational Model revolutionized .ppt
《Historical Perspective The Relational Model revolutionized .ppt》由会员分享,可在线阅读,更多相关《Historical Perspective The Relational Model revolutionized .ppt(18页珍藏版)》请在麦多课文档分享上搜索。
1、,Historical PerspectiveThe Relational Modelrevolutionized transaction processing systemsDBMS gave access to the data storedOLTPs are good at putting data into databasesThe data explosionIncrease in use of electronic data gathering devices e.g. point-of-sale, remote sensing devices etc.Data storage b
2、ecame easier and cheaper with increasing computing powerProblemsDBMS gave access to the data stored but no analysis of dataAnalysis required to unearth the hidden relationships within the data i.e. for decision supportSize of databases has increased e.g. VLDBs, need automated techniques for analysis
3、 as they have grown beyond manual extractionObstaclestypical scientific user knew nothing of commercial business applicationsthe business database programmers, knew nothing of massively parallel principlessolution was for database software producers to create easy-to-use tools and form strategic rel
4、ationships with hardware manufacturersWhat is data mining? the non trivial extraction of implicit, previously unknown, and potentially useful information from dataWilliam J Frawley, Gregory Piatetsky-Shapiro and Christopher J MatheusData mining is the analysis of data and the use of software techniq
5、ues for finding patterns and regularities in sets of data.The computer is responsible for finding the patterns by identifying the underlying rules and features in the data.It is possible to strike gold in unexpected places as the data mining software extracts patterns not previously discernible or s
6、o obvious that no-one has noticed them before.Mining analogy:large volumes of data are sifted in an attempt to find something worthwhilein a mining operation large amounts of low grade materials are sifted through in order to find something of value.Books:Jiawei Han and Micheline Kamber, Data Mining
7、: Concepts and Techniques, Morgan Kaufmann, 2001, ISBN 1-55860-489-8. Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, 1999, ISBN 1-55860-552-5.,Data Mining vs. DBMSDBMS - queries based on the data held e.g.last mo
8、nths sales for each productsales grouped by customer age etc.list of customers who lapsed their policy Data Mining - infer knowledge from the data held to answer queries e.g.what characteristics do customers share who lapsed their policies and how do they differ from those who renewed their policies
9、?why is the Cleveland division so profitable?Characteristics of a data mining systemLarge quantities of datavolume of data so great it has to be analyzed by automated techniques e.g. POS, satellite information, credit card transactions etc. Noisy, incomplete dataimprecise data is characteristic of a
10、ll data collectiondatabases - usually contaminated by errors, cannot assume that the data they contain is entirely correcte.g. some attributes rely on subjective or measurement judgments Complex data structure - conventional statistical analysis not possible Heterogeneous data stored in legacy syste
11、msWho needs data mining?Who(ever) has information fastest and uses it winsDon McKeough, former president of Coke Cola,Data Mining ApplicationsMedicine - drug side effects, hospital cost analysis, genetic sequence analysis, prediction etc.Finance - stock market prediction, credit assessment, fraud de
12、tection etc.Marketing/sales - product analysis, buying patterns, sales prediction, target mailing, identifying unusual behavior etc.Knowledge AcquisitionExpert systems are models of real world processesMuch of the information is available straight from the process e.g.in production systems, data is
13、collected for monitoring the systemknowledge can be extracted using data mining toolsexperts can verify the knowledgeEngineering - automotive diagnostic expert systems, fault detection etc.Data Mining GoalsClassificationDM system learns from examples or the data how to partition or classify the data
14、 i.e. it formulates classification rulesExample - customer database in a bankQuestion - Is a new customer applying for a loan a good investment or not?Typical rule formulated:if STATUS = married and INCOME 10000 and HOUSE_OWNER = yesthen INVESTMENT_TYPE = goodAssociationRules that associate one attr
15、ibute of a relation to anotherSet oriented approaches are the most efficient means of discovering such rulesExample - supermarket database72% of all the records that contain items A and B also contain item Cthe specific percentage of occurrences, 72 is the confidence factor of the ruleSequence/Tempo
16、ralSequential pattern functions analyze collections of related records and detect frequently occurring patterns over a period of timeDifference between sequence rules and other rules is the temporal factorExample - retailers databaseCan be used to discover the set of purchases that frequently preced
17、es the purchase of a microwave oven,Data Mining and Machine LearningData Mining (DM) or Knowledge Discovery in Databases (KDD) is about finding understandable knowledge Machine Learning (ML) is concerned with improving performance of an agenttraining a neural network to balance a pole is part of ML,
18、 but not of KDD Efficiency of the algorithm and scalability is more important in DM or KDDDM is concerned with very large, real-world databasesML typically looks at smaller data sets ML has laboratory type examples for the training set DM deals with real world data. Real world data tend to have prob
19、lems such as:missing valuesdynamic datanoiseStatistical Data Analysis Ill-suited for Nominal and Structured Data Types Completely data driven - incorporation of domain knowledge not possible Interpretation of results is difficult and daunting Requires expert user guidance,Stages of the Data Mining P
20、rocessData pre-processingheterogeneity resolutiondata cleansingdata warehousing Applying Data Mining Tools: extraction of patterns from the pre-processed data Interpretation and evaluation: the user bias can direct DM tools to areas of interestattributes of interest in databasesgoal of discoverydoma
21、in knowledgeprior knowledge or belief about the domainTechniquesMachine Learning methods Statistics: can be used in several data mining stagesdata cleansing i.e. the removal of erroneous or irrelevant dataEDA, exploratory data analysis e.g. frequency counts, histograms etc.data selection - sampling
22、facilities and so reduce the scale of computationattribute re-definitiondata analysis - measures of association and relationships between attributes, interestingness of rules, classification etc. Visualization: enhances EDA, makes patterns more visible Clustering (Cluster Analysis)Clustering and seg
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- HISTORICALPERSPECTIVETHERELATIONALMODELREVOLUTIONIZEDPPT

链接地址:http://www.mydoc123.com/p-372918.html