A Review of Information FilteringPart II- Collaborative Filtering.ppt
《A Review of Information FilteringPart II- Collaborative Filtering.ppt》由会员分享,可在线阅读,更多相关《A Review of Information FilteringPart II- Collaborative Filtering.ppt(48页珍藏版)》请在麦多课文档分享上搜索。
1、A Review of Information Filtering Part II: Collaborative Filtering,Chengxiang ZhaiLanguage Technologies Institiute School of Computer Science Carnegie Mellon University,Outline,A Conceptual Framework for Collaborative Filtering (CF) Rating-based Methods (Breese et al. 98) Memory-based methods Model-
2、based methods Preference-based Methods (Cohen et al. 99 & Freund et al. 98) Summary & Research Directions,What is Collaborative Filtering (CF)?,Making filtering decisions for an individual user based on the judgments of other users Inferring individuals interest/preferences from that of other simila
3、r users General idea Given a user u, find similar users u1, , um Predict us preferences based on the preferences of u1, , um,CF: Applications,Recommender Systems: books, CDs, Videos, Movies, potentially anything! Can be combined with content-based filtering Example (commercial) systems GroupLens (Re
4、snick et al. 94): usenet news rating Amazon: book recommendation Firefly (purchased by Microsoft?): music recommendation Alexa: web page recommendation,CF: Assumptions,Users with a common interest will have similar preferences Users with similar preferences probably share the same interest Examples
5、“interest is IR” = “read SIGIR papers” “read SIGIR papers” = “interest is IR” Sufficiently large number of user preferences are available,CF: Intuitions,User similarity If Jamie liked the paper, Ill like the paper ? If Jamie liked the movie, Ill like the movie Suppose Jamie and I viewed similar movi
6、es in the past six months Item similarity Since 90% of those who liked Star Wars also liked Independence Day, and, you liked Star Wars You may also like Independence Day,Collaborative Filtering vs. Content-based Filtering,Basic filtering question: Will user U like item X? Two different ways of answe
7、ring it Look at what U likes Look at who likes X Can be combined,= characterize X = content-based filtering,= characterize U = collaborative filtering,Rating-based vs. Preference-based,Rating-based: Users preferences are encoded using numerical ratings on items Complete ordering Absolute values can
8、be meaningful But, values must be normalized to combine Preferences: Users preferences are represented by partial ordering of items Partial ordering Easier to exploit implicit preferences,A Formal Framework for Rating,u1 u2 ui .um,Users: U,Objects: O,o1 o2 oj on 3 1.5 . 2 213,Xij=f(ui,oj)=?,?,The ta
9、sk,Unknown function f: U x O R,Assume known f values for some (u,o)s Predict f values for other (u,o)s Essentially function approximation, like other learning problems,Where are the intuitions?,Similar users have similar preferences If u u, then for all os, f(u,o) f(u,o) Similar objects have similar
10、 user preferences If o o, then for all us, f(u,o) f(u,o) In general, f is “locally constant” If u u and o o, then f(u,o) f(u,o) “Local smoothness” makes it possible to predict unknown values by interpolation or extrapolation What does “local” mean?,Two Groups of Approaches,Memory-based approaches f(
11、u,o) = g(u)(o) g(u)(o) if u u Find “neighbors” of u and combine g(u)(o)s Model-based approaches Assume structures/model: object cluster, user cluster, f defined on clusters f(u,o) = f(cu, co) Estimation & Probabilistic inference,Memory-based Approaches (Breese et al. 98),General ideas: Xij: rating o
12、f object j by user i ni: average rating of all objects by user i Normalized ratings: Vij = Xij - ni Memory-based predictionSpecific approaches differ in w(a,i) - the distance/similarity between user a and i,User Similarity Measures,Pearson correlation coefficient (sum over commonly rated items)Cosin
13、e measureMany other possibilities!,Improving User Similarity Measures (Breese et al. 98),Dealing with missing values: default ratings Inverse User Frequency (IUF): similar to IDF Case Amplification: use w(a,I)p, e.g., p=2.5,Model-based Approaches (Breese et al. 98),General ideas Assume that data/rat
14、ings are explained by a probabilistic model with parameter Estimate/learn model parameter based on data Predict unknown rating using E xk+1 | x1, , xk, which is computed using the estimated modelSpecific methods differ in the model used and how the model is estimated,Probabilistic Clustering,Cluster
15、ing users based on their ratings Assume ratings are observations of a multinomial mixture model with parameters p(C), p(xi|C) Model estimated using standard EM Predict ratings using Exk+1 | x1, , xk,Bayesian Network,Use BN to capture object/item dependency Each item/object is a node (Dependency) str
16、ucture is learned from all data Model parameters: p(xk+1 |pa(xk+1) where pa(xk+1) is the parents/predictors of xk+1 (represented as a decision tree) Predict ratings using Exk+1 | x1, , xk,Three-way Aspect Model (Popescul et al. 2001),CF + content-based Generative model (u,d,w) as observations z as h
17、idden variable Standard EM Essentially clustering the joint data Evaluation on ResearchIndex data Found its better to treat (u,w) as observations,Evaluation Criteria (Breese et al. 98),Rating accuracy Average absolute deviation Pa = set of items predicted Ranking accuracy Expected utility Exponentia
18、lly decaying viewing probabillity ( halflife )= the rank where the viewing probability =0.5 d = neutral rating,Datasets,Results,- BN & CR+ are generally betterthan VSIM & BC - BN is best with more training data - VSIM is better with little training data - Inverse User Freq. Is effective - Case ampli
19、fication is mostly effective,Summary of Rating-based Methods,Effectiveness Both memory-based and model-based methods can be effective The correlation method appears to be robust Bayesian network works well with plenty of training data, but not very well with little training data The cosine similarit
20、y method works well with little training data,Summary of Rating-based Methods (cont.),Efficiency Memory based methods are slower than model-based methods in predicting Learning can be extremely slow for model-based methods,Preference-based Methods (Cohen et al. 99, Freund et al. 98),Motivation Expli
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- AREVIEWOFINFORMATIONFILTERINGPARTIICOLLABORATIVEFILTERINGPPT

链接地址:http://www.mydoc123.com/p-373182.html