BS ISO 24612-2012 Language resource management Linguistic annotation framework (LAF)《语言资源管理 语言学注释框架(LAF)》.pdf
《BS ISO 24612-2012 Language resource management Linguistic annotation framework (LAF)《语言资源管理 语言学注释框架(LAF)》.pdf》由会员分享,可在线阅读,更多相关《BS ISO 24612-2012 Language resource management Linguistic annotation framework (LAF)《语言资源管理 语言学注释框架(LAF)》.pdf(30页珍藏版)》请在麦多课文档分享上搜索。
1、raising standards worldwideNO COPYING WITHOUT BSI PERMISSION EXCEPT AS PERMITTED BY COPYRIGHT LAWBSI Standards PublicationBS ISO 24612:2012Language resourcemanagement Linguisticannotation framework (LAF)BS ISO 24612:2012 BRITISH STANDARDNational forewordThis British Standard is the UK implementation
2、 of ISO 24612:2012.The UK participation in its preparation was entrusted to TechnicalCommittee TS/1, Terminology.A list of organizations represented on this committee can beobtained on request to its secretary.This publication does not purport to include all the necessaryprovisions of a contract. Us
3、ers are responsible for its correctapplication. The British Standards Institution 2012. Published by BSI StandardsLimited 2012ISBN 978 0 580 54235 0ICS 01.020Compliance with a British Standard cannot confer immunity fromlegal obligations.This British Standard was published under the authority of the
4、Standards Policy and Strategy Committee on 30 November 2012.Amendments issued since publicationDate Text affectedBS ISO 24612:2012Reference numberISO 24612:2012(E)ISO 2012INTERNATIONAL STANDARD ISO24612First edition2012-06-15Language resource management Linguistic annotation framework (LAF) Gestion
5、des ressources langagires Cadre dannotation linguistique (LAF) BS ISO 24612:2012ISO 24612:2012(E) COPYRIGHT PROTECTED DOCUMENT ISO 2012 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, in
6、cluding photocopying and microfilm, without permission in writing from either ISO at the address below or ISOs member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail copyrightiso.org Web www.iso.org Publ
7、ished in Switzerland ii ISO 2012 All rights reservedBS ISO 24612:2012ISO 24612:2012(E) ISO 2012 All rights reserved iiiContents Page Foreword iv Introduction . v 1 Scope 1 2 Terms and definitions . 1 3 LAF specification . 3 3.1 Overview . 3 3.2 LAF data model 3 3.3 LAF architecture 4 3.4 XML pivot f
8、ormat . 6 3.5 XML elements for the resource header . 11 3.6 Elements in the primary data document header 16 Bibliography 19 BS ISO 24612:2012ISO 24612:2012(E) iv ISO 2012 All rights reservedForeword ISO (the International Organization for Standardization) is a worldwide federation of national standa
9、rds bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been established has the right to be represented on that committee. International organ
10、izations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization. International Standards are drafted in accordance with the rules given i
11、n the ISO/IEC Directives, Part 2. The main task of technical committees is to prepare International Standards. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval by at least 75 %
12、 of the member bodies casting a vote. Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights. ISO 24612 was prepared by Technical Committee ISO/TC 37, Termin
13、ology and other language and content resources, Subcommittee SC 4, Language resource management. BS ISO 24612:2012ISO 24612:2012(E) ISO 2012 All rights reserved vIntroduction Effective creation, encoding, processing and management of language resources is facilitated by a single high-level data mode
14、l that supports analysis and design of both annotation schemes and representation formats. This International Standard is designed to support the development and use of computer applications relying on linguistically annotated resources and the exchange of these resources among different application
15、s. BS ISO 24612:2012BS ISO 24612:2012INTERNATIONAL STANDARD ISO 24612:2012(E) ISO 2012 All rights reserved 1Language resource management Linguistic annotation framework (LAF) 1 Scope This International Standard specifies a linguistic annotation framework (LAF) for representing linguistic annotations
16、 of language data such as corpora, speech signal and video. The framework includes an abstract data model and an XML serialization of that model for representing annotations of primary data. The serialization serves as a pivot format to allow annotations expressed in one representation format to be
17、mapped onto another. NOTE Standardization of linguistic data categories that provide annotation content is provided by ISO 12620 and other related International Standards. 2 Terms and definitions For the purposes of this document, the following terms and definitions apply. 2.1 primary data electroni
18、c representation of language data EXAMPLE Text, image, speech signal. Note to entry: Typically, primary data objects are addressed by “locations” in an electronic file, for example, the span of characters comprising a sentence or word, or a point at which a given temporal event begins or ends (as in
19、 speech annotation). More complex data objects may consist of a list or set of contiguous or non-contiguous locations in primary data. 2.2 annotate, verb process of adding linguistic information to primary data (2.1) 2.3 annotation, noun linguistic information added to primary data (2.1), independen
20、t of its representation 2.4 representation format in which the annotation (2.3) is rendered, independent of its content EXAMPLE XML, list or bracketed format, tab-delimited text. 2.5 segmentation annotation annotation (2.3) that delimits linguistic elements that appear in the primary data (2.1) Note
21、 to entry: These elements include (1) continuous segments (appearing contiguously in the primary data), (2) super- and sub-segments, where groups of segments will comprise the parts of a larger segment (e.g. contiguous word segment typically comprise a sentence segment), (3) discontinuous segments (
22、linking continuous segments), and (4) landmarks BS ISO 24612:2012ISO 24612:2012(E) 2 ISO 2012 All rights reserved(e.g. timestamp) that note a point in the primary data. In current practice, segmental information may or may not appear in the document containing the primary data itself. 2.6 linguistic
23、 annotation annotation (2.3) that provides linguistic information about the segments in the primary data (2.1) EXAMPLE Morphosyntactic annotation in which a part of speech and lemma are associated with each segment in the data. Note to entry: The identification of a segment as a word, sentence, noun
24、 phrase, etc. also constitutes linguistic annotation. In current practice, when it is possible to do so, segmentation and identification of the linguistic role or properties of that segment are often combined (e.g. syntactic bracketing, or delimiting each word in the document with an XML element tha
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
10000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- BSISO246122012LANGUAGERESOURCEMANAGEMENTLINGUISTICANNOTATIONFRAMEWORKLAF 语言 资源管理 语言学 注释 框架 LAFPDF

链接地址:http://www.mydoc123.com/p-586679.html