1、INTERNATIONAL STANDARD ISOjlEC 15200 First edition 1996-l 2-01 Information technology - Adaptive Lossless Data Compression algorithm (ALDC) Technologies de Iinformation - Algorithme de compression de don&es dadaptation sans pertes (ALDCI Reference number ISO/1 EC 15200: 1996(E) ISO/IEC 15200:1996(E)
2、 Contents Foreword IntlWdUCtiOn 1 Scope 2 Conformance 3 Normative Reference 4 Definitions 4.1 Compressed Data Stream 4.2 Copy Pointer 4.3 Current Address 4.4 Data Byte. 4.5 Displacement Field. 4.6 End Marker 4.7 History Buffer. 4.8 Literal. 4.9 Matching String 4.10 Match Count 4.11 Match Count Field
3、. 4.12 Pad Bits 5 Conventions and Notations 5.1 Representation of numbers 5.2 Names 6 ALDC compression algorithm 6.1 Encoding description for a 5 1Zbyte History Buffer 6.2 Description of the Compressed Data Stream Annexes A - ALDC encoding format B - ALDC Overview C - ALDC Encoding Flow Chart D - Bi
4、bliography . . . lu iv 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 3 5 7 8 12 0 ISO/IEC 1996 All rights reserved. Unless otherwise specified, no part of this publication may be repro- duced or utilized in any form or by any means, electronrc or mechanrcal, rncluding photo- copying and microfilm, witho
5、ut permission In writing from the publisher. ISO/IEC Copyright Office l Case postale 56 l CH-1211 Geneve 20 l Switzerland Printed in Switzerland ii OISO/IEC ISO/IEC15200:1996(E) Foreword IS0 (the International Organization for Standardization) and IEC (the International Electrotechnical Commission)
6、form the specialized system for worldwide standardization. National bodies that are members of IS0 or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. IS0 and IE
7、C technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with IS0 and IEC, also take part in the work. In the field of information technology, IS0 and IEC have established a joint technical committee, ISO/IEC JT
8、C 1. Draft International Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as an International Standard requires approval by at least 75% of the national bodies casting a vote. International Standard ISO/IEC DIS 15200 was prepared by ECMA (a
9、s ECMA-222) and was adopted, under a special “fast- track procedure”, by Joint Technical Committee ISO/IEC JTC 1, Inform&on technology, in parallel with its approval by national bodies of IS0 and IEC. Annexes A to D are for information only. . . . 111 ISO/IEC 15200:19!M (E) OISO/IEC Introduction In
10、the past decades ISO/IEC have published numerous International Standards for magnetic tapes, magnetic tape cassettes and cartridges, as well as for optical disk cartridges. Those media developed recently have a very high physical recording density. In order to make optimal use of the resulting data
11、capacity, lossless compression algorithms have been designed which allow a reduction of the number of bits required for the representation of user data. These compression algorithms are registered by ECMA, the International Registration Authority established by ISO/IEC. The registration consists in
12、allocating to each registered algorithm a numerical identifier which will be recorded on the medium and, thus, indicate which compression algorithm(s) has been used. This International Standard is the third one for lossless compression algorithms. The two previous International Standards are: ISO/IE
13、C 11558:1992 Information technology - Data compression for information interchange - Adaptive coding with embedded dictionary - DCLZ algorithm. ISOLIEC 12042: 1993 Information technology - Data compression for information interchange - Binary arithmetic coding algorithm. iv INTERNATIONAL STANDARD 0
14、ISO/JIX ISO/IEC 15200:1996 (E) Information technology - Adaptive Lossless Data Compression algorithm (ALDC) 1 Scope This International Standard specifies a lossless compression algorithm to reduce the number of bytes required to represent data. The algorithm is known as Adaptive Lossless Data Compre
15、ssion algorithm (ALDC). The numerical identifiers according to ISO/IEC 11576 allocated to this algorithm are: ALDC 512-Byte History Buffer: 3 ALDC 1024-Byte History Buffer: 4 ALDC 2048-Byte History Buffer: 5 2 Conformance A compression algorithm shall be in conformance with this International Standa
16、rd if its output data stream satisfies the requirements of this International Standard. 3 Normative Reference The following standard contains provisions which, through reference in this text, constitute provisions of this International Standard. At the time of publication, the edition indicated was
17、valid. All standards are subject to revision, and parties to agreements based on this International Standard are encouraged to investigate the possibility of applying the most recent edition of the standard indicated below. Members of IEC and IS0 maintain registers of currently valid International S
18、tandards. ISOIIEC 11576: 1993, Information technology - Procedure for the registration of algorithms for the lossless compression of data 4 Definitions For the purposes of this International Standard, the following definitions apply. 4.1 4.2 Compressed Data Stream: The output stream after encoding.
19、Copy Pointer: A part of the Compressed Data Stream which represents a group of two or more consecutive bytes for which there already exists an identical group in the History Buffer. It comprises a Length Code Field and a Displacement Field. 4.3 4.4 4.5 Current Address: The location within the Histor
20、y Buffer where the Data Byte is written. Data Byte: The current byte of incoming data which is written into the History Buffer and is compared to all data bytes previously written into the History Buffer. Displacement Field: That part of the Copy Pointer which specifies the location within the Histo
21、ry Buffer of the first byte of a Matching String. 4.6 4.7 End Marker: A string of 12 ONES indicating the end of the Compressed Data Stream. History Buffer: A data structure where incoming data bytes are stored for use in the compression and decompression process. 4.8 4.9 Literal: A Data Byte for whi
22、ch no match was found in the History Buffer. Matching String: A sequence of bytes in the incoming data which is identical with a sequence of bytes in the History Buffer. 4.10 4.11 4.12 Match Count: The number of bytes in a Matching String. Match Count Field: That part of the Copy Pointer which speci
23、fies the number of consecutive bytes for which a match was found in the History Buffer. Pad Bits: Bits set to ZERO and included in the Compressed Data Stream, as required, to maintain an g-bit byte boundary. 5 Conventions and Notations 5.1 Representation of numbers The following conventions and nota
24、tions apply in this International Standard, unless otherwise stated. ISOIIEC 152m1996 (E) 01s0m3c - The setting of binary bits is denoted by ZERO and ONE. - Numbers in binary notation and bit combinations are represented by ZEROS and ONES with the most significant bit to the left. - All other number
25、s shall be in decimal form. 5.2 Names The names of entities are given with a capital initial letter. 6 ALDC compression algorithm An overview of the ALDC encoding process is given in annex B, and a flow chart is contained in annex C. 6.1 Encoding description for a 512-byte History Buffer At the star
26、t of encoding, all bytes of the History Buffer shall be reset to all ZEROS. Data bytes shall be stored in sequence in the History Buffer, starting with a Current Address of 0. The encoder processes the incoming data stream one byte at a time. The current byte being processed is referred to as the Da
27、ta Byte. When a Data Byte is received from the input data stream, it shall be written into the History Buffer at the Current Address. Then the Current Address shall be incremented by 1. If it exceeds the maximum address, which is 5 11 for a History Buffer size of 5 12 bytes, it shall be reset to 0.
28、Step 1 The Data Byte shall be compared with each byte previously written into the History Buffer to identify any identical bytes. Step 2 If the Data Byte does not match any byte in the History Buffer, the process shall continue at Step 6. - If the Data Byte matches one or more bytes in the History B
29、uffer, for every matching byte it shall be noted whether this matching byte is a continuation of a previous sequence of matching bytes or not. - If it is not a continuation, the Displacement Field of the matching byte shall be noted and recorded as having a Match Count of one byte. - If the matching
30、 byte is a continuation of a previous string, the Match Count for that string shall be incremented by 1. Step 3 If a Match Count equals 271, the corresponding bytes shall be identified by a Copy Pointer, which shall be added to the Compressed Data Stream. Its Match Count Field and Displacement Field
31、 shall be specified as defined in 6.2. The next Data Byte shall then be read and the process shall continue at Step 1. If there is no more data to be read, the process shall continue at Step 7. Note: The value of 271 was chosen for implementation reasons. Step 4 If the Match Count has not reached 27
32、1, any pending Matching Strings shall be checked to see if any are continued by the Data Byte. - If none of the previous Matching Strings is continued and if any of the previous Matching Strings consists of two or more bytes, that Matching String having the lowest Displacement Field shall be identif
33、ied by a Copy Pointer, which shall be added to the Compressed Data Stream. The next Data Byte shall then be read. If there is no more data to be read, the process shall continue at Step 7. - If no previous Matching Strings are continued and the previous matches were only l-byte matches, the previous
34、 l-byte match shall be identified as a Literal and shall be added to the Compressed Data Stream as defined in 6.2. The next Data Byte shall then be read. If there is no more data to be read, the process shall continue at Step 7. Step 5 If there are no Matching Strings with a Match Count of 27 1 and
35、there is the continuation of at least 1 previous Matching String, the next Data Byte shall be read. The process shall continue at Step 1. 2 QISO/IEC ISO/IEC 15200:1996 (E) If there is no more data to be read, the pending Matching String shall be identified as a Copy Pointer, which shall be added to
36、the Compressed Data Stream. The process shall then continue at Step 7. Step 6 If the Data Byte does not match any bytes of the History Buffer, a check shall be made for any previous Matching Strings that may be pending. If there are any previous Matching Strings of two or more bytes, they shall be i
37、dentified by a Copy Pointer, which shall be added to the Compressed Data Stream. Then the Data Byte that did not match shall be added to the Compressed Data Stream as a Literal. The next Data Byte shall be read and the process shall continue at Step 1. If there is no more data to be read, the proces
38、s shall continue at Step 7. If there are no pending Matching Strings of two or more bytes and there are any pending l-byte matches, the byte preceding the Data Byte shall be identified as a Literal, which shall be added to the Compressed Data Stream. Then the Data Byte that did not find a match shal
39、l be identified as a Literal, which shall be added to the Compressed Data Stream. The next Data Byte shall be read and the process shall continue at Step 1. If there is no more data to be read, the process shall continue at Step 7. Step 7 An End Marker shall be added to the Compressed Data Stream an
40、d Pad Bits shall be included as required. This ends the encoding process. 6.2 Description of the Compressed Data Stream As described above, the processing of the input data generates as its output the Compressed Data Stream. The completed Compressed Data Stream shall consist of: - Literals, each pre
41、ceded by a bit set to ZERO. - Copy Pointers, each preceded by a bit set to ONE. - An End Marker preceded by a bit set to ONE. - Pad Bits. Once all data has been read, the Compressed Data Stream shall be terminated by a bit set to ONE, followed by an End Marker, followed by Pad Bits. During the encod
42、ing, if more than one Matching String of the same Match Count is found, the Copy Pointer with the lowest Displacement Field shall be used. The Match Count Field shall consist of 2, 4, 6, 8, or 12 bits, identifying Match Counts as specified in table 1. The length of the Displacement Field shall be 9
43、bits, 10 bits, or 11 bits for History Buffer sizes of 512 bytes, 1024 bytes, or 2048 bytes, respectively. ISO/IEC 15200: 1996 (E) OISO/IEC Table l-Match Count Field Setting of Match Count Field 00 01 1000 : : 10 11 110 000 : : 110 111 1110 0000 : : 1110 1111 111100000000 : : 111111101110 11111110111
44、1 Number of bytes : : 15 16 : : 31 32 : : 270 271 OISO/IEC ISO/IEC 15200: 1996 (E) Annex A (informative) ALDC encoding format A.1 Nomenclature The Compressed Data Stream encoding format is described using BNF-like language, the symbols being defined as follows : Symbol Definition .- .- the non-termi
45、nal on the left side of the “:=I can be replaced by the expression on the right side 1 I ( 071 non-terminal expression the expression inside the square brackets may occur 0 or more times the logical disjunction OR the text enclosed within the parentheses is a comment only, added for clarity the term
46、inal binary digits ZERO or ONE ISO/IEC 15200:1996 (E) OISO/IEC A.2 ALDC-1 Algorithm - encoding format (512-byte History Buffer) := OcLitera.l OcLiteral 1 1 := cbcbcbcbcbcbcbcb (g-bit byte data) cb := 0 1 1 := cDisplacement Field := (the length can be 2,4,6, 8 or 12 bits) Setting of Match Count Field
47、 00 01 1000 10 11 7 110000 8 110 111 15 11100000 16 1110 1111 31 111100000000 32 111111101110 270 111111101111 271 cDisplacement Field := cbcb (9-bit) := number of bits sets to ZERO required to maintain an 8-bit boundary := 1111 1111 1111 A.3 ALDC-2 Algorithm - encoding format (1024-byte History Buf
48、fer) (identical to ALDC-1 definition except for a lo-bit Displacement Field) := cbcbcbcb (lo-bit) A.4 ALDCQ Algorithm - encoding format (2048-byte History Buffer) (identical to ALDC-1 definition except for an 11 -bit Displacement Field) cDisplacement Field := cbcbcbcb (11 -bit) Match Count in bytes
49、2 3 4 6 01s0/IBc ISO/IEC 152l-Kk1996 (E) Annex B (informative) ALDC Overview The ALDC algorithm accepts input in g-bit data bytes and outputs a bit stream representing data in compressed form. The ALDC algorithm is one implementation of the Lempel-Ziv 1 (LZl) class of data compression algorithms. LZl algorithms achieve compression using a data structure called a History Buffer where incoming data is stored and compared to previous data in the same History Buffer. An LZl encoding process and an LZl decoding