C20.0046- Database Management SystemsLecture #28.ppt
《C20.0046- Database Management SystemsLecture #28.ppt》由会员分享,可在线阅读,更多相关《C20.0046- Database Management SystemsLecture #28.ppt(46页珍藏版)》请在麦多课文档分享上搜索。
1、M.P. Johnson, DBMS, Stern/NYU, Sp2004,1,C20.0046: Database Management Systems Lecture #28,Matthew P. Johnson Stern School of Business, NYU Spring, 2004,M.P. Johnson, DBMS, Stern/NYU, Sp2004,2,Agenda,Previously: Failure/recovery Next: Finish Data mining WebsearchProj5 due today no extensions! Final E
2、xam Thursday, 5/6,10-11:50am 1-minute responses,M.P. Johnson, DBMS, Stern/NYU, Sp2004,3,Method: Genetic Algorithms,for optimization problems each possible solution x as some fitness f(x) space of possible solutions a surface goal: find an x at the top of a hill on the surface example: TSP with limit
3、ations must go here before some point cannot go from here to there 25 cities 24! Possible ways billions of years, under reasonable assumptions GAs are one method for hill climbing Avoid local maxima,M.P. Johnson, DBMS, Stern/NYU, Sp2004,4,GA motivation,Analogy from genetics: Create from random a pop
4、. of poss. solns Most probably very bad, but Compute fitness f(x) for each member x Create next generation by picking relatively good pairs Merge half of info from each to create new one repeat Examples: Glower, Dawkins, etc.,M.P. Johnson, DBMS, Stern/NYU, Sp2004,5,Moodys example (D&S),Financial ser
5、vices needed helpdesk dispatching system send right person for each IT problem experienced technicians to solve hard problems minimize loss of value due to computer downtime minimize individual wait time tell users when ere issues would be fixed schedules did not have to be perfect but good job-shop
6、 scheduling,M.P. Johnson, DBMS, Stern/NYU, Sp2004,6,Moodys example,SOGA: Scheduling Optimizing GAFitness function: fitness inverse to total amount of downtime, max indy. waittime on live set of current jobs! includes time of request, type of user, type of prob., etc. updates schedules every 10-15 mi
7、nutes, based on new inputs not-yet-started jobs can be reassigned,M.P. Johnson, DBMS, Stern/NYU, Sp2004,7,Neural Networks,Also do hill climbing, but completely different idea based on connections between neurons in brain but used for supervised learningsimple NN: input layer with 3 nodes hidden laye
8、r with 2 nodes output layer with 1 node each node points to each node in next level each node as some activation level and a something like a critical mass Draw picture What kind of graph is this?,M.P. Johnson, DBMS, Stern/NYU, Sp2004,8,Neural Networks,values passed into input nodes represent the pr
9、oblem instance given weighted sum of inputs to neuron, sends out pulse only if the sum is greater than its threshold values output by hidden nodes sent to output node if the weighted sum going into output node is high enough, it outputs 1, otherwise 0,M.P. Johnson, DBMS, Stern/NYU, Sp2004,9,NN appli
10、cations,plausible application: we have data about potential customers party registration married or not gender income levelor: have credit applicant information employment income home ownership bankruptcy should we give a credit card to him?,M.P. Johnson, DBMS, Stern/NYU, Sp2004,10,How NNs work,hope
11、: plug in customer out comes whether we should market toward himHow does it get the right answer?Initially, all weights are random! But: we assume we have data for lots of people which we know to be either interested in our products or not we have data for both kinds so, when we plug in one of these
12、 customers, we know what the right answer is supposed to be,M.P. Johnson, DBMS, Stern/NYU, Sp2004,11,How NNs work,can used the Backpropagation algorithm: for each known problem instance, plug in and look at answer if answer is wrong, cane edges weights in one way; o.w., change them the opposite way
13、(details omitted) repeatthe more iterations we do, the more the NN learns our known data with enough confidence, can apply NN to unknown customer data to learn whether to market toward them,M.P. Johnson, DBMS, Stern/NYU, Sp2004,12,LBS example,Investments goal: maximize return on investments buy/sell
14、 right securities at right timelots of time-series data for different props for different stocks return market signals pick right ones reactsoln: create NN for each stock retrain weekly,M.P. Johnson, DBMS, Stern/NYU, Sp2004,13,Decision Trees,One technology: decision trees Yet another use of trees! E
15、ach node one attribute Its children possible values of that attribute E.g.: given votes on issues, dist. Dems v. GOP Each path from root to leaf is one rule Like 20Qs/BSTs/B-trees whats important diff? Learn rules from the data Given value of this and this and that, this value will be something Cust
16、omer details potential good customer, financial details good credit risk, etc.,M.P. Johnson, DBMS, Stern/NYU, Sp2004,14,Decision Trees,Details: for binary property, two out edges, but may be more for continuous property (income), divide values into discrete ranges a property may appear more tan once
17、Example: top node: history of bankruptcy? if yes, REJECT if no, ten: employed? If no, (maybe look for high monthly housing payment) If yes, Algorithms: ID3, C4.5, CART,M.P. Johnson, DBMS, Stern/NYU, Sp2004,15,DB Functionality,Association study frequencies of items occurring together buys(x,cereal) b
18、uys(x,milk) Mail order tulip bulbs Conservative party voters intuitionTechniques: Decision trees GAs - Glower,M.P. Johnson, DBMS, Stern/NYU, Sp2004,16,Mining v. warehousing,Warehousing: let user search, group by interesting properties Give me the sales of A4s by year and dealer, for these colors Use
19、r tries to learn from results which properties are important/interesting Mining: tell the user what the interesting properties are Whats driving sales?,M.P. Johnson, DBMS, Stern/NYU, Sp2004,17,Social/political concerns,Privacy TIA Sensitive data Allow mining but not queries Opt-in/opt-out “Dont be e
20、vil.”,M.P. Johnson, DBMS, Stern/NYU, Sp2004,18,For more info,See Dahr & Stein: Seven Methods for Transforming Corporate Data into Business Intelligence (1997) Drawn on above A few years old, but very accessibleData mining courses offered here,M.P. Johnson, DBMS, Stern/NYU, Sp2004,19,Next topic: Webs
21、earch,Websearch is not running queries on the web Info/connections downloaded from web Crawl paths from tree rooted at homepage Special database is formed On request, we run query on DB Draw picture Issues: DNS bottleneck search strategy refresh strategy robot exclusion protocol bad HTML, non-respon
22、sive servers, etc.,M.P. Johnson, DBMS, Stern/NYU, Sp2004,20,Crawling issues,Content-seen test compute fingerprint/hash of page content also: URL-seen test DNS bottleneck to view page by text link, must get address BP claim: 87% crawling time DNS look-up Referring to webpages use DocIDs, not addresse
23、s more popular pages get shorter DocIDs (why?) many commodity machines frequent crashes Logging, checkpointing,M.P. Johnson, DBMS, Stern/NYU, Sp2004,21,Crawling,Comparatively straightforward although Google uses PageRank for crawling, tooFirst, must get pages Spiders Prof. Davis (NYU/CS): http:/www.
24、cs.nyu.edu/courses/fall02/G22.3033-008/WebCrawler.javahttp:/pages.stern.nyu.edu/mjohnson/dbms/eg/WebCrawler.javaRun program: sales% java WebCrawler http:/pages.stern.nyu.edu/mjohnson/dbms 200,M.P. Johnson, DBMS, Stern/NYU, Sp2004,22,Inverted indices,Basic idea of finding pages: Create inverted index
25、 mapping words to pagesFirst, think of each webpage as a tuple One column for every possible word True means the word appears on the page Index on all columns Now can search: youre fired select * from T where youre=T and fired=T,M.P. Johnson, DBMS, Stern/NYU, Sp2004,23,Inverted indices,Can simplify
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- C200046DATABASEMANAGEMENTSYSTEMSLECTURE28PPT

链接地址:http://www.mydoc123.com/p-379239.html