1、 Reference number ISO/IEC TS 30135-6:2014(E) ISO/IEC 2014TECHNICAL SPECIFICATION ISO/IEC TS 30135-6 First edition 2014-11-15Information technology Digital publishing EPUB3 Part 6: EPUB Canonical Fragment Identifier Technologies de linformation Publications numriques EPUB3 Partie 6: Identificateurs d
2、e fragment canoniques EPUB ISO/IEC TS 30135-6:2014(E) COPYRIGHT PROTECTED DOCUMENT ISO/IEC 2014 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posti
3、ng on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below or ISOs member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail copyri
4、ghtiso.org Web www.iso.org Published in Switzerland ii ISO/IEC 2014 All rights reservedISO/IEC TS 30135-6:201 4(E) Foreword ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. Nati
5、onal bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest
6、. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1. International Standards are drafted in accordance with the
7、 rules given in the ISO/IEC Directives, Part 2. The main task of the joint technical committee is to prepare International Standards. Draft International Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as an International Standard requires
8、 approval by at least 75 % of the national bodies casting a vote. In other circumstances, particularly when there is an urgent market requirement for such documents, the joint technical committee may decide to publish an ISO/IEC Technical Specification (ISO/IEC TS), which represents an agreement bet
9、ween the members of the joint technical committee and is accepted for publication if it is approved by 2/3 of the members of the committee casting a vote. An ISO/IEC TS is reviewed after three years in order to decide whether it will be confirmed for a further three years, revised to become an Inter
10、national Standard, or withdrawn. If the ISO/IEC TS is confirmed, it is reviewed again after a further three years, at which time it must either be transformed into an International Standard or be withdrawn. Attention is drawn to the possibility that some of the elements of this document may be the s
11、ubject of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights. ISO/IEC TS 30135 series were prepared by Korean Agency for Technology and Standards (as KS X 6070 series) with International Digital Publishing Forum and were adopted, under a special “f
12、ast-track procedure”, by Joint Technical Committee ISO/IEC JTC 1, Information technology, in parallel with its approval by the national bodies of ISO and IEC. ISO/IEC TS 30135 consists of the following parts, under the general title Information technology Document description and processing language
13、s EPUB 3: Part 1: Overview Part 2: Publications Part 3: Content Documents Part 4: Open Container Format Part 5: Media Overlay Part 6: Canonical Fragment Identifier Part 7: Fixed-Layout Documents EPUB Canonical Fragment Identifier (epubcfi) Specification Recommended Specification 11 October 2011 THIS
14、 VERSION http:/www.idpf.org/epub/linking/cfi/epub-cfi-20111011.html LATEST VERSION http:/www.idpf.org/epub/linking/cfi/epub-cfi.html PREVIOUS VERSION http:/www.idpf.org/epub/linking/cfi/epub-cfi-20110908.html A diff of changes from the previous draft is available at this link. Please refer to the er
15、rata for this document, which may include some normative corrections. Copyright 2011 International Digital Publishing Forum All rights reserved. This work is protected under Title 17 of the United States Code. Reproduction and dissemination of this work with changes is prohibited except with the wri
16、tten permission of the International Digital Publishing Forum (IDPF). EPUB is a registered trademark of the International Digital Publishing Forum. Editors Peter Sorotokin, Adobe Garth Conboy, Google Inc. Brady Duga, Google Inc. John Rivlin, Google Inc. Don Beaver, Apple Inc. Kevin Ballard, Apple In
17、c. Alastair Fettes, Apple Inc. Daniel Weck, DAISY Consortium TAB LE O F CO NTENTS 1. Overview 1.1. Purpose and Scope 1.2. Terminology 1.3. Conformance Statements 2. EPUB CFI Definition 2.1. Introduction 2.2. Syntax 2.3. Character Escaping 3. EPUB CFI Processing 3.1. Path Resolution 3.1.1. Step Refer
18、ence to Child Node (/ ) 3.1.2. XML ID Assertion ( ) 3.1.3. Step Indirection (! ) 3.1.4. Terminating Step Character Offset (: ) 3.1.5. Terminating Step Temporal Offset ( ) 3.1.6. Terminating Step Spatial Offset ( ) 3.1.7. Terminating Step Temporal-Spatial Offset ( + ) 3.1.8. Text Location Assertion (
19、 ) 3.1.9. Side Bias ( + ;s= ) 3.1.10. Examples 3.2. Sorting Rules 3.3. Intra-Publication CFIs 3.4. Simple Ranges 3.5. Intended Target Location Correction 4. Extending EPUB CFIs References 1 Overview 1.1 Purpose and Scope This specification, EPUB Canonical Fragment Identifier (epubcfi), defines a sta
20、ndardized method for referencing arbitrary content within an EPUB Publication through the use of fragment identifiers. The Web has proven that the concept of hyperlinking is tremendously powerful, but EPUB Publications have been denied much of the benefit that hyperlinking makes possible because of
21、the lack of a standardized scheme to link into them. Although proprietary schemes have been developed and implemented for individual Reading Systems, without a commonly-understood syntax there has been no way to achieve cross-platform interoperability. The functionality that can see significant bene
22、fit from breaking down this barrier, however, is varied: from reading location maintenance to annotation attachment to navigation, the ability to point into any Publication opens a whole new dimension not previously available to developers and Authors. This specification attempts to rectify this sit
23、uation by defining an arbitrary structural reference that can uniquely identify any location, or simple range of locations, in a Publication: the EPUB CFI. The following considerations have strongly influenced the design and scope of this scheme: The mechanism used to reference content should be int
24、eroperable: references to a reading position created by one Reading System should be usable by another. Document references to EPUB content should be enabled in the same way that existing hyperlinks enable references throughout the Web. Each location in an EPUB file should be able to be identified w
25、ithout the need to modify the document. All fragment identifiers that reference the same logical location should be equal when compared. Comparison operations, including tests for sorting and comparison, should be able to be performed without accessing the referenced files. Simple manipulations shou
26、ld be possible without access to the original files (e.g., given a reference deep in a file, it should be possible to generate a reference to the start of the file). Identifier resolution should be reasonably efficient (e.g., processing of the first chapter is not required to resolve a fragment iden
27、tifier that points to the last chapter). References should be able to recover their target locations through parser variations and document revisions. Expression of simple, contiguous ranges should be supported. An extensible mechanism to accommodate future reference recovery heuristics should be pr
28、ovided. 1.2 Terminology Please refer to the EPUB Specifications for definitions of EPUB-specific terminology used in this document. Standard EPUB CFI A Publication-level EPUB CFI links into an EPUB Publication. The path preceding the EPUB CFI references the location of the Publication. Intra-Publica
29、tion EPUB CFI An intra-Publication EPUB CFI allows one Content Document to reference another within the same Publication. The path preceding the EPUB CFI references the current Publications Package Document. Refer to Intra-Publication CFIs for more information. 1.3 Conformance Statements The keyword
30、s “MUST“, “MUST NOT“, “REQUIRED“, “SHALL“, “SHALL NOT“, “SHOULD“, “SHOULD NOT“, “RECOMMENDED“, “MAY“, and “OPTIONAL“ in this document are to be interpreted as described in RFC2119. All sections of this specification are normative except where identified by the informative status label “This section
31、is informative“. The application of informative status to sections and appendices applies to all child content and subsections they may contain. All examples in this specification are informative. 2 EPUB CFI Definition 2.1 Introduction This section is informative A fragment identifier is the part of
32、 an IRI RFC3987 that defines a location within a resource. Syntactically, it is the segment attached to the of end the resource IRI starting with a hash (# ). For HTML documents, IDs and named anchors are used as fragment identifiers, while for XML documents the Shorthand XPointer XPTRSH notation is
33、 used to refer to a given ID. A Canonical Fragment Identifier (CFI) is a similar construct to these, but expresses a location within an EPUB Publication. For example: book.epub#epubcfi(/6/4chap01ref!/4body01/10para05/3:10) The function-like string immediately following the hash (epubcfi() ) indicate
34、s that this fragment identifier conforms to the scheme defined by this specification, and the value contained in the parentheses is theN O T E syntax used to reference the location within the specified Publication (demo.epub ). Using the processing rules defined in Path Resolution, any Reading Syste
35、m can parse this syntax, open the corresponding Content Document in the Publication and load the specified location for the User. A complete definition of the EPUB CFI syntax is provided in the next section. epub has been prepended to the name of the scheme as a more generic CFI-like scheme may be d
36、efined in the future for all XML+ZIP-based file formats. 2.2 Syntax (EBNF productions ISO/IEC 14977) fragment = “epubcfi(“ , ( path | range ) , “)“ ; path = step , local_path ; range = path , “,“ , local_path , “,“ , local_path ; local_path = step | “!“ , termstep ; step = “/“ , integer , “ , assert
37、ion , “ ; termstep = terminus , “ , assertion , “ ; terminus = ( “:“ , integer ) | ( “ , number , “:“ , number ) | ( “ , number ) | ( “ , number , “ , number , “:“ , number ) ;number = ( digit-non-zero , digit , “.“ , digit , digit-non-zero ) | ( zero , “.“ , digit , digit-non-zero ) ;integer = zero
38、 | ( digit-non-zero , digit ) ; assertion = csv , parameter ; parameter = “;“ , value-no-space , “=“ , csv ; csv = value , “,“ , value ; value = string-escaped-special-chars ; value-no- space = value - ( value , space , value ) ; special-chars = circumflex | square-brackets | parentheses | comma | s
39、emicolon | equal ; escaped- special-chars = ( circumflex , circumflex ) | ( circumflex , square-brackets ) | ( circumflex , parentheses ) | ( circumflex , comma ) | ( circumflex , semicolon ) | ( circumflex , equal ) ;character- escaped- special = ( character - special-chars ) | escaped-special-char
40、s ; string- escaped- special-chars = character-escaped-special , character-escaped-special ; = string character , character ; digit = zero | digit-non-zero ; digit-non-zero = “1“ | “2“ | “3“ | “4“ | “5“ | “6“ | “7“ | “8“ | “9“ ; zero = “0“ ; space = “ “ ; circumflex = “ ; double-quote = “ ; square-
41、brackets = “ | “ ; parentheses = “(“ | “)“ ; comma = “,“ ; semicolon = “;“ ; equal = “=“ ; character = ? Unicode Characters ? ; Unicode Characters The definition of allowed Unicode characters is the same as XML 1.0. This excludes the surrogate blocks, FFFE, and FFFF: #x9 | #xA | #xD | #x20-#xD7FF |
42、#xE000-#xFFFD | #x10000-#x10FFFF Document authors are encouraged to avoid “compatibility characters“, as defined in section 2.3 of Unicode. The characters defined in the following ranges are also discouraged. They are either control characters or permanently undefined Unicode characters: #x7F-#x84,
43、#x86-#x9F, #xFDD0-#xFDEF, #x1FFFE-#x1FFFF, #x2FFFE-#x2FFFF, #x3FFFE-#x3FFFF, #x4FFFE-#x4FFFF, #x5FFFE-#x5FFFF, #x6FFFE-#x6FFFF, #x7FFFE-#x7FFFF, #x8FFFE-#x8FFFF, #x9FFFE-#x9FFFF, #xAFFFE-#xAFFFF, #xBFFFE-#xBFFFF, #xCFFFE-#xCFFFF, #xDFFFE-#xDFFFF, #xEFFFE-#xEFFFF, #xFFFFE-#xFFFFF, #x10FFFE-#x10FFFF.
44、A Canonical Fragment Identifier (CFI) consists of an initial sequence epubcfi that identifies this particular reference method, and a parenthesized path or range. A path is built up as a sequence of structural steps to reference a location. A range is a path followed by two local (or relative) paths
45、 that identify the start and end of the range. Steps can either be navigational or terminating. Navigational steps may be repeated as necessary (e.g., to count elements, to process children or to follow references). There may be only one terminating step, which, if present, must be the last step in
46、the sequence. Substrings in brackets are extensible assertions that improve the robustness of traversing paths and migrating them from one revision of the document to another. These assertions preserve additional information about traversed elements of the document, which makes it possible to recove
47、r intended location even after some modifications are made to the Publication. Although the value definition in the syntax above allows any a sequence of characters, a circumflex ( ) must be used to escape the following characters to ensure their presence does not interfere with parsing: brackets (
48、, ) circumflex ( ) comma (, ) parentheses ( ,) ) semicolon (; ) Example of an EPUB CFI that points to a location after the text 21 . epubcfi(/6/7chap05ref!/4body01/10/2/1:321) The following rules apply to the use of numbers and integers within the path or range: leading zeros are not allowed for num
49、bers or integers (to ensure uniqueness); trailing zeros are not allowed in the fractional part of a number; zero must be represented as the integer 0 ; numbers in the range 1 N 0 must have a leading 0. ; integral numbers must be represented as integers. 2.3 Character Escaping As described in Syntax, the EPUB CFI grammar contains characters that have a