欢迎来到麦多课文档分享! | 帮助中心 海量文档,免费浏览,给你所需,享你所想!
麦多课文档分享
全部分类
  • 标准规范>
  • 教学课件>
  • 考试资料>
  • 办公文档>
  • 学术论文>
  • 行业资料>
  • 易语言源码>
  • ImageVerifierCode 换一换
    首页 麦多课文档分享 > 资源分类 > PDF文档下载
    分享到微信 分享到微博 分享到QQ空间

    BS ISO 24611-2013 Language resource management Morpho-syntactic annotation framework (MAF)《语言资源管理 形体语法注释框架(MAF)》.pdf

    • 资源ID:586678       资源大小:1.82MB        全文页数:72页
    • 资源格式: PDF        下载积分:10000积分
    快捷下载 游客一键下载
    账号登录下载
    微信登录下载
    二维码
    微信扫一扫登录
    下载资源需要10000积分(如需开发票,请勿充值!)
    邮箱/手机:
    温馨提示:
    如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
    如需开发票,请勿充值!如填写123,账号就是123,密码也是123。
    支付方式: 支付宝扫码支付    微信扫码支付   
    验证码:   换一换

    加入VIP,交流精品资源
     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    BS ISO 24611-2013 Language resource management Morpho-syntactic annotation framework (MAF)《语言资源管理 形体语法注释框架(MAF)》.pdf

    1、raising standards worldwideNO COPYING WITHOUT BSI PERMISSION EXCEPT AS PERMITTED BY COPYRIGHT LAWBSI Standards PublicationBS ISO 24611:2012Language resource management Morpho-syntactic annotation framework (MAF)BS ISO 24611:2012 BRITISH STANDARDNational forewordThis British Standard is the UK implem

    2、entation of ISO 24611:2012. The UK participation in its preparation was entrusted toT e c h n i c a l Committee TS/1, Terminology.A list of organizations represented on this committee can be obtained on request to its secretary.This publication does not purport to include all the necessary provision

    3、s of a contract. Users are responsible for its correct application. The British Standards Institution 2013. Published by BSI Standards Limited 2013.ISBN 978 0 580 54234 3 ICS 01.020 Compliance with a British Standard cannot confer immunityfrom legal obligations.This British Standard was published un

    4、der the authority of the Standards Policy and Strategy Committee on 31 March 2013.Amendments issued since publicationDate T e x t a f f e c t e dBS ISO 24611:2012Reference numberISO 24611:2012(E)ISO 2012INTERNATIONAL STANDARD ISO24611First edition2012-11-01Language resource management Morpho-syntact

    5、ic annotation framework (MAF) Gestion des ressources langagires Cadre dannotation morphosyntaxique (MAF) BS ISO 24611:2012ISO 24611:2012(E) COPYRIGHT PROTECTED DOCUMENT ISO 2012 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or

    6、by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISOs member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-

    7、mail copyrightiso.org Web www.iso.org Published in Switzerland ii ISO 2012 All rights reservedBS ISO 24611:2012ISO 24611:2012(E) ISO 2012 All rights reserved iiiContents Page Foreword . v Introduction vi 1 Scope 1 2 Normative references 1 3 Terms and definitions . 1 4 The MAF meta-model 4 4.1 Overvi

    8、ew . 4 4.2 MAF Meta-model 4 5 Segmenting with tokens . 6 5.1 General . 6 5.2 Formal description: 7 5.3 Embedding notation 7 5.4 Alternate representation for TEI based documents . 8 5.5 Stand-off notation 9 5.6 Informative attributes 9 5.7 Completing the inline token notation 10 5.7.1 Joining tokens

    9、in embedded mode . 10 5.7.2 Overlapping tokens . 11 6 Word-forms as linguistic units . 11 6.1 Formal description: 12 6.2 Token attachment 12 6.2.1 One token; one word-form . 12 6.2.2 Several contiguous tokens; one word-form . 12 6.2.3 Several discontinuous tokens; one word-form 13 6.2.4 Zero token;

    10、one word-form . 13 6.2.5 One token; several word-forms . 14 6.3 Referring to lexical entries . 14 6.4 Compound word-forms . 15 6.5 Identification of word-forms within a TEI-compliant document . 15 7 Morpho-syntactic content . 18 7.1 General . 18 7.2 Using feature structures . 18 7.3 Compact morpho-s

    11、yntactic tags . 18 7.4 FSR libraries 19 7.5 Designing tagsets 20 7.6 Formal description: . 22 8 Handling ambiguities 22 8.1 Word-form content ambiguities . 22 8.2 Lexical Ambiguities . 23 8.3 Structural ambiguities . 23 8.3.1 Structural ambiguities with word-forms . 23 8.3.2 Structural ambiguities w

    12、ith tokens 24 8.4 Simplified structuring variants 24 8.4.1 Non-ambiguous linear representation 24 8.4.2 Mixed linear and lattice representation . 25 8.5 Expanding the simplified variants . 26 8.5.1 Separating tokens and word-forms . 26 8.5.2 Wrapping into local lattices 26 BS ISO 24611:2012ISO 24611

    13、:2012(E) iv ISO 2012 All rights reserved8.5.3 Merging local lattices 27 8.5.4 Removing 28 8.6 Formal description: and 29 Annex A (informative) Encoded example using the MAF serialization 30 Annex B (normative) MAF specification .33 B.1 Elements .33 B.1.1 33 B.1.2 34 B.1.3 34 B.1.4 35 B.1.5 .35 B.1.6

    14、 36 B.1.7 36 B.1.8 .37 B.2 Model classes .38 B.3 Attribute classes 38 B.3.1 att.token.information .38 B.3.2 att.token.join .39 B.3.3 att.token.span .39 B.3.4 att.wordForm.content 39 B.3.5 att.wordForm.tokens .40 B.4 Macros 40 B.4.1 data.certainty 40 B.4.2 data.code 40 B.4.3 data.count .40 B.4.4 data

    15、.duration.w3c 41 B.4.5 data.enumerated 41 B.4.6 data.key .41 B.4.7 data.language .42 B.4.8 data.name .43 B.4.9 data.numeric .43 B.4.10 data.pointer 43 B.4.11 data.probability 44 B.4.12 data.temporal.w3c44 B.4.13 data.truthValue .44 B.4.14 data.word 45 B.4.15 data.xTruthValue 45 Annex C (normative) M

    16、orpho-syntactic data categories 46 Bibliography 58 BS ISO 24611:2012ISO 24611:2012(E) ISO 2012 All rights reserved vForeword ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standa

    17、rds is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also tak

    18、e part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization. International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2. The main task of technical committees is

    19、 to prepare International Standards. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval by at least 75 % of the member bodies casting a vote. Attention is drawn to the possibilit

    20、y that some of the elements of this document may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights. ISO 24611 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content resources, Subcommittee SC 4, Languag

    21、e resource management. BS ISO 24611:2012ISO 24611:2012(E) vi ISO 2012 All rights reservedIntroduction ISO/TC 37/SC 4 focuses on the definition of models and formats for the representation of annotated language resources. To this end, it has generalised the modelling strategy initiated by its sister

    22、committee, SC 3, for the representation of terminological data Romary, 2001, through which linguistic data models are seen as the combination of a generic data pattern (a meta-model), which is further refined through a selection of data categories that provide the descriptors for this specific annot

    23、ation level. Such models are defined independently of any specific formats, and ensure that an implementer has the necessary conceptual instrument with which to design and compare formats with regard to their degrees of interoperability. One important aspect of representing any kind of annotation is

    24、 the capacity to provide a clear and reliable semantics for the various descriptors used, either in the form of formal features and feature values, or directly as objects in a representation that is expressed, for instance, in XML. In order to be shared across various annotation schemas and encoding

    25、 applications, such a semantics should be implemented as a centralised registry of concepts: we will henceforth refer to these as data categories. As such, data categories should bear the following constraints. From a technical point of view, they must provide unique, stable references (implemented

    26、as persistent identifiers, in the sense of ISO 24619) such that the designer of a specific encoding schema can refer to them in his or her specification. By doing so, two annotations will be deemed to be equivalent when they are in fact defined in relation to the same data categories (as feature and

    27、 feature value). From a descriptive point of view, each unique semantic reference should be associated with precise documentation combining a full text elicitation of the meaning of the descriptor with the expression of specific constraints that bear upon the category. In recent years, ISO has devel

    28、oped a general framework for representing and maintaining such a registry of data categories, encompassing all domains of language resources. This initiative, described in ISO 12620, has led to the implementation of an online environment providing access to all data categories that have been standar

    29、dized in the context of the various language resource-related activities within ISO, or specifically as part of the maintenance of the data category registry. It also provides access to the various data categories that individual language technology practitioners have defined in the course of their

    30、own work and decided to share with the community. The ISO data category registry, as available through the ISOCat (www.isocat.org) implementation, is intended as a flat marketplace of semantic objects, providing only a limited set of ontological constraints. The objective there is to facilitate the

    31、maintenance of a comprehensive descriptive environment where new categories are easily inserted and reused without the need for any strong consistency check with the registry at large. Indeed, the following basic constraints are part of the data category model, as defined in ISO 12620: simple generi

    32、c-specific relations, when these are useful for the proper identification of interoperability descriptors between data categories. For instance, the fact that /properNoun/ is a sub-category of /noun/ makes it possible to compare morpho-syntactic annotations based on different descriptive levels of g

    33、ranularity; the description of conceptual domains, in the sense of ISO 11179, to identify, when known or applicable, the possible value of so-called complex data categories For instance, it can be used to record that possible values of /grammaticalGender/ (limited to a small group of languages Romar

    34、y 2011), could be a subset of /masculine/, /feminine/ and /neutral/; language-specific constraints, either in the form of specific application notes or as explicit restrictions bearing upon the conceptual domains of complex data categories. For instance, it is possible to express explicitly that /gr

    35、ammaticalGender/ in French can only take the two values: /masculine/ and /feminine/. BS ISO 24611:2012ISO 24611:2012(E) ISO 2012 All rights reserved viiThis International Standard provides a comprehensive framework for the representation of morpho-syntactic (also referred to as part-of-speech) annot

    36、ations. Such an annotation level corresponds to a first lexical abstraction level over language data (textual or spoken) and, depending on the language to be annotated, together with the characteristics of the annotation tool or annotation scheme that is being used, can vary enormously in structure

    37、and complexity. In order to deal with such complex issues as ambiguity and determinism in morpho-syntactic annotation, this International Standard introduces a meta-model that draws a clear distinction between the two levels of tokens (representing the surface segmentation of the source) and word-fo

    38、rms (identifying lexical abstractions associated with groups of tokens). These two levels share the following specificities: on the one hand, they can be represented as simple sequences and as local graphs such as multiple segmentations and ambiguous compounds; on the other hand, any n-to-n combinat

    39、ion can stand between word forms and tokens. As linguistic segments (sometimes called markables in the literature see, for instance, Carletta et al. 1997), tokens may be embedded in the source document as inline mark-up, or they may point remotely to it by means of so-called stand-off annotations. A

    40、s linguistic abstractions, word-forms can be qualified by various linguistic features characterising the morpho-syntactic properties that are instantiated in the realisation of the lexical entry within the annotated text. Such properties may range from the simple indication of a lemma up to an expli

    41、cit reference to a lexical entry in a dictionary. In most existing applications of morpho-syntactic annotation, linguistic properties are expressed by means of so-called tags; these codes refer to basic feature structures (see early examples in Monachini and Calzolari, 1994). Such codes may also pro

    42、vide morphological information, including its part of speech (e.g. noun, adjective or verb), and features such as number, gender, person, mood and verbal tense. In keeping with the general modelling strategy of ISO/TC 37, this International Standard/MAF provides means of relating morpho-syntactic ta

    43、gs expressed as feature structures (compliant with ISO 24610) to the data categories available in ISOCat. A normative annex of this International Standard elicits a core set of data categories that can be used as reference for most current morpho-syntactic annotation tasks in a multilingual context.

    44、 However, when implementers of this International Standard find these categories inappropriate in either coverage, scope or semantics, they are encouraged to use ISOCat to define their own categories in compliance with ISO/TC 37 principles. Associated to the meta-model, MAF also provides a default X

    45、ML syntax that may be used to serialise MAF- compliant annotation models. Since many existing projects are based on the text encoding initiative (TEI) guidelines (www.tei-c.org) particularly in digital humanities, where a proper encoding of textual sources is essential this International Standard wi

    46、ll also provide clues about how to articulate the MAF model with TEI- compliant encodings. Indeed, the TEI guidelines already offer a variety of constructs and mechanisms to cope with many issues relevant to spoken corpora and their annotations (Romary and Witt, 2012). Finally, it should be noted he

    47、re that this International Standard forms the conceptual basis for the development of the ISO 24614 series on word segmentation, whereby all general principles and rules defined in ISO 24614-1, as well as the constraints expressed in additional parts for specific languages, are to be understood acco

    48、rding to the tokenword-form dichotomy. BS ISO 24611:2012BS ISO 24611:2012INTERNATIONAL STANDARD ISO 24611:2012(E) ISO 2012 All rights reserved 1Language resource management Morpho-syntactic annotation framework (MAF) 1 Scope This International Standard provides a framework for the representation of

    49、annotations of word-forms in texts; such annotations concern tokens, their relationship with lexical units, and their morpho-syntactic properties. It describes a metamodel for morpho-syntactic annotation that relates to a reference to the data categories contained in the ISOCat data category registry (DCR, as defined in ISO 12620). It also describes an XML serialization for morpho-syntactic annotations, with equivalences to the guidelines of the TEI (text encoding initiative). 2 Normative references The following ref


    注意事项

    本文(BS ISO 24611-2013 Language resource management Morpho-syntactic annotation framework (MAF)《语言资源管理 形体语法注释框架(MAF)》.pdf)为本站会员(unhappyhay135)主动上传,麦多课文档分享仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文档分享(点击联系客服),我们立即给予删除!




    关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

    copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
    备案/许可证编号:苏ICP备17064731号-1 

    收起
    展开