BS ISO 24611-2013 Language resource management Morpho-syntactic annotation framework (MAF)《语言资源管理 形体语法注释框架(MAF)》.pdf
《BS ISO 24611-2013 Language resource management Morpho-syntactic annotation framework (MAF)《语言资源管理 形体语法注释框架(MAF)》.pdf》由会员分享,可在线阅读,更多相关《BS ISO 24611-2013 Language resource management Morpho-syntactic annotation framework (MAF)《语言资源管理 形体语法注释框架(MAF)》.pdf(72页珍藏版)》请在麦多课文档分享上搜索。
1、raising standards worldwideNO COPYING WITHOUT BSI PERMISSION EXCEPT AS PERMITTED BY COPYRIGHT LAWBSI Standards PublicationBS ISO 24611:2012Language resource management Morpho-syntactic annotation framework (MAF)BS ISO 24611:2012 BRITISH STANDARDNational forewordThis British Standard is the UK implem
2、entation of ISO 24611:2012. The UK participation in its preparation was entrusted toT e c h n i c a l Committee TS/1, Terminology.A list of organizations represented on this committee can be obtained on request to its secretary.This publication does not purport to include all the necessary provision
3、s of a contract. Users are responsible for its correct application. The British Standards Institution 2013. Published by BSI Standards Limited 2013.ISBN 978 0 580 54234 3 ICS 01.020 Compliance with a British Standard cannot confer immunityfrom legal obligations.This British Standard was published un
4、der the authority of the Standards Policy and Strategy Committee on 31 March 2013.Amendments issued since publicationDate T e x t a f f e c t e dBS ISO 24611:2012Reference numberISO 24611:2012(E)ISO 2012INTERNATIONAL STANDARD ISO24611First edition2012-11-01Language resource management Morpho-syntact
5、ic annotation framework (MAF) Gestion des ressources langagires Cadre dannotation morphosyntaxique (MAF) BS ISO 24611:2012ISO 24611:2012(E) COPYRIGHT PROTECTED DOCUMENT ISO 2012 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or
6、by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISOs member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-
7、mail copyrightiso.org Web www.iso.org Published in Switzerland ii ISO 2012 All rights reservedBS ISO 24611:2012ISO 24611:2012(E) ISO 2012 All rights reserved iiiContents Page Foreword . v Introduction vi 1 Scope 1 2 Normative references 1 3 Terms and definitions . 1 4 The MAF meta-model 4 4.1 Overvi
8、ew . 4 4.2 MAF Meta-model 4 5 Segmenting with tokens . 6 5.1 General . 6 5.2 Formal description: 7 5.3 Embedding notation 7 5.4 Alternate representation for TEI based documents . 8 5.5 Stand-off notation 9 5.6 Informative attributes 9 5.7 Completing the inline token notation 10 5.7.1 Joining tokens
9、in embedded mode . 10 5.7.2 Overlapping tokens . 11 6 Word-forms as linguistic units . 11 6.1 Formal description: 12 6.2 Token attachment 12 6.2.1 One token; one word-form . 12 6.2.2 Several contiguous tokens; one word-form . 12 6.2.3 Several discontinuous tokens; one word-form 13 6.2.4 Zero token;
10、one word-form . 13 6.2.5 One token; several word-forms . 14 6.3 Referring to lexical entries . 14 6.4 Compound word-forms . 15 6.5 Identification of word-forms within a TEI-compliant document . 15 7 Morpho-syntactic content . 18 7.1 General . 18 7.2 Using feature structures . 18 7.3 Compact morpho-s
11、yntactic tags . 18 7.4 FSR libraries 19 7.5 Designing tagsets 20 7.6 Formal description: . 22 8 Handling ambiguities 22 8.1 Word-form content ambiguities . 22 8.2 Lexical Ambiguities . 23 8.3 Structural ambiguities . 23 8.3.1 Structural ambiguities with word-forms . 23 8.3.2 Structural ambiguities w
12、ith tokens 24 8.4 Simplified structuring variants 24 8.4.1 Non-ambiguous linear representation 24 8.4.2 Mixed linear and lattice representation . 25 8.5 Expanding the simplified variants . 26 8.5.1 Separating tokens and word-forms . 26 8.5.2 Wrapping into local lattices 26 BS ISO 24611:2012ISO 24611
13、:2012(E) iv ISO 2012 All rights reserved8.5.3 Merging local lattices 27 8.5.4 Removing 28 8.6 Formal description: and 29 Annex A (informative) Encoded example using the MAF serialization 30 Annex B (normative) MAF specification .33 B.1 Elements .33 B.1.1 33 B.1.2 34 B.1.3 34 B.1.4 35 B.1.5 .35 B.1.6
14、 36 B.1.7 36 B.1.8 .37 B.2 Model classes .38 B.3 Attribute classes 38 B.3.1 att.token.information .38 B.3.2 att.token.join .39 B.3.3 att.token.span .39 B.3.4 att.wordForm.content 39 B.3.5 att.wordForm.tokens .40 B.4 Macros 40 B.4.1 data.certainty 40 B.4.2 data.code 40 B.4.3 data.count .40 B.4.4 data
15、.duration.w3c 41 B.4.5 data.enumerated 41 B.4.6 data.key .41 B.4.7 data.language .42 B.4.8 data.name .43 B.4.9 data.numeric .43 B.4.10 data.pointer 43 B.4.11 data.probability 44 B.4.12 data.temporal.w3c44 B.4.13 data.truthValue .44 B.4.14 data.word 45 B.4.15 data.xTruthValue 45 Annex C (normative) M
16、orpho-syntactic data categories 46 Bibliography 58 BS ISO 24611:2012ISO 24611:2012(E) ISO 2012 All rights reserved vForeword ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standa
17、rds is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also tak
18、e part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization. International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2. The main task of technical committees is
19、 to prepare International Standards. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval by at least 75 % of the member bodies casting a vote. Attention is drawn to the possibilit
20、y that some of the elements of this document may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights. ISO 24611 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content resources, Subcommittee SC 4, Languag
21、e resource management. BS ISO 24611:2012ISO 24611:2012(E) vi ISO 2012 All rights reservedIntroduction ISO/TC 37/SC 4 focuses on the definition of models and formats for the representation of annotated language resources. To this end, it has generalised the modelling strategy initiated by its sister
22、committee, SC 3, for the representation of terminological data Romary, 2001, through which linguistic data models are seen as the combination of a generic data pattern (a meta-model), which is further refined through a selection of data categories that provide the descriptors for this specific annot
23、ation level. Such models are defined independently of any specific formats, and ensure that an implementer has the necessary conceptual instrument with which to design and compare formats with regard to their degrees of interoperability. One important aspect of representing any kind of annotation is
24、 the capacity to provide a clear and reliable semantics for the various descriptors used, either in the form of formal features and feature values, or directly as objects in a representation that is expressed, for instance, in XML. In order to be shared across various annotation schemas and encoding
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
10000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- BSISO246112013LANGUAGERESOURCEMANAGEMENTMORPHOSYNTACTICANNOTATIONFRAMEWORKMAF 语言 资源管理 形体 语法 注释 框架 MAFPDF

链接地址:http://www.mydoc123.com/p-586678.html