Text Mining Techniques for Patent Analysis.ppt
《Text Mining Techniques for Patent Analysis.ppt》由会员分享,可在线阅读,更多相关《Text Mining Techniques for Patent Analysis.ppt(60页珍藏版)》请在麦多课文档分享上搜索。
1、Text Mining Techniques for Patent Analysis,Yuen-Hsien Tseng, National Taiwan Normal University, samtsengntnu.edu.twYuen-Hsien Tseng, Yeong-Ming Wang, Yu-I Lin, Chi-Jen Lin and Dai-Wei Juang, “Patent Surrogate Extraction and Evaluation in the Context of Patent Mapping“, accepted for publication in Jo
2、urnal of Information Science, 2007 (SSCI, SCI) Yuen-Hsien Tseng, Chi-Jen Lin, and Yu-I Lin, “Text Mining Techniques for Patent Analysis“, to appear in Information Processing and Management, 2007 (SSCI, SCI, EI),Outline,Introduction A General Methodology Technique Details Technique Evaluation Applica
3、tion Example Discussions Conclusions,Introduction Why Patent Analysis?,Patent documents contain 90% research results valuable to the following communities: Industry Business Law Policy-making If carefully analyzed, they can: reduce 60% and 40% R&D time and cost, respectively show technological detai
4、ls and relations reveal business trends inspire novel industrial solutions help make investment policy,Introduction Gov. Efforts,PA has received much attention since 2001 Korea: to develop 120 patent maps in 5 years Japan: patent mapping competition in 2004 Taiwan: more and more PM were created Exam
5、ple: “carbon nanotube” (CNT) 5 experts dedicated more than 1 month Asian countries, such as, China, Japan, Korean, Singapore, and Taiwan have invested various resources in patent analysis PA requires a lot of human efforts Assisting tools are in great need,Typical Patent Analysis Scenario,1. Task id
6、entification: define the scope, concepts, and purposes for the analysis task. 2. Searching: iteratively search, filter, and download related patents. 3. Segmentation: segment, clean, and normalize structured and unstructured parts. 4. Abstracting: analyze the patent content to summarize their claims
7、, topics, functions, or technologies. 5. Clustering: group or classify analyzed patents based on some extracted attributes. 6. Visualization: create technology-effect matrices or topic maps. 7. Interpretation: predict technology or business trends and relations.,Technology-Effect Matrix,To make deci
8、sions about future technology development seeking chances in those sparse cells To inspire novel solutions by understanding how patents are related so as to learn how novel solutions were invented in the past and can be invented in the future To predict business trends by showing the trend distribut
9、ion of major competitors in this map,Part of the T-E matrix (from STIC) for “Carbon Nanotube”,Topic Map of Carbon Nanotube,Text Mining - Definition,Knowledge discovery is often regarded as a process to find implicit, previously unknown, and potentially useful patterns Data mining: from structured da
10、tabases Text mining: from a large text repository In practice, TM involves a series of user interactions with the text mining tools to explore the repository to find such patterns. After supplemented with additional information and interpreted by experienced experts, these patterns can become import
11、ant intelligence for decision-making.,Text Mining Process for Patent Analysis A General Methodology,Document preprocessing Collection Creation Document Parsing and Segmentation Text Summarization Document Surrogate Selection Indexing Keyword/Phrase extraction morphological analysis Stop word filteri
12、ng Term association and clustering Topic Clustering Term selection Document clustering/categorization Cluster title generation Category mapping Topic Mapping Trend map - Aggregation map Query map - Zooming map,Example: An US Patent Doc.,See Example or this URL: http:/patft.uspto.gov/netacgi/nph-Pars
13、er?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=5,695,734.PN.&OS=PN/5,695,734&RS=PN/5,695,734,Download and Parsing into DBMS,NSC Patents,612 US patents with assignee contains “National Science Council” downloaded on 2005/06/15,Document Parsing and Segmentation
14、,Data conversion Parsing unstructured texts and citations into structured fields in DBMS Document segmentation Partition the full patent texts into 6 segments Abstract, application, task, summary, feature, claim Only 9 empty segments in 6*92=552 CNT patent segments =1.63% Only 79 empty segments in 6
15、*612=3672 NSC patent segments = 2.15%,NPR Parsing for Most-Frequently Cited Journals and Citation Age Distribution,Data are for 612 NSC patents,Automatic Summarization,Segment the doc. into paragraphs and sentences Assess sentences, consider their Positions Clue words Title words keywordsSelect sent
16、ences Sort by the weights and select the top-k sentences. Assembly the selected sentences Concatenate the sentences in their original order,Example: Auto-summarization MS Word (blue) Vs Ours (red),Evaluation of Each Segment,abs: the Abstract section of each patent app: FIELD OF THE INVENTION task: B
17、ACKGROUND OF THE INVENTION sum: SUMMARY OF THE INVENTION fea: DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT cla: Claims section of each patent seg_ext: summaries from each of the sets: abs, app, task, sum, and fea full: full texts from each of the sets: abs, app, task, sum, and fea,Evaluation Goa
18、l,Analyze a human-crafted patent map to see which segments have more important termsPurposes (so as to): allow analysts to spot the relevant segments more quickly for classifying patents in the map provide insights to possibly improve automated clustering and/or categorization in creating the map,Ev
19、aluation Method,In the manual creation of a technology-effect matrix, it is helpful to be able to quickly spot the keywords that can be used for classifying the patents in the map. Once the keywords or category features are found, patents can usually be classified without reading all the texts. Thus
20、 a segment or summary that retains as many important category features as possible is preferable. Our evaluation design therefore is to reveal which segments contains most such features compared to the others.,Patent Maps for Evaluation,All patent maps are from STPI,Empty segments in the six patent
21、maps,Feature Selection,Well studied in machine learning Best feature selection algorithms Chi-square, information gain, But to select only a few features, correlation coefficient is better than chi-square co=1 if FN=FP=0 and TP 0 and TN 0,Best and worst terms by Chi-square and correlation coefficien
22、t,Data are from a small real-world collection of 116 documents with only two exclusive categories, construction vs. non-construction in civil engineering tasks,Some feature terms and their distribution in each set for the category FED in CNT,Note: The correlation coefficients in each segment correla
23、te to the set counts of the ordered features: the larger the set count, the larger the correlation coefficient in each segment.,Occurrence distribution of 30 top-ranked terms in each set for some categories in CNT,M_Best_Term_Coverage(Segment, Category)=,Occurrence distribution of manually ranked te
24、rms in each set for some categories in CNT,R_Best_Term_Covertage(Segment, Category)=,Occurrence distribution of terms in each segment averaged over all categories in CNT,M_Best_Term_Coverage(Segment)=,R_Best_Term_Coverage(Segment)=,Maximum correlation coefficients in each set averaged over all categ
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- TEXTMININGTECHNIQUESFORPATENTANALYSISPPT
