1、 Reference number ISO/TR 12033:2009(E) ISO 2009TECHNICAL REPORT ISO/TR 12033 First edition 2009-12-01 Document management Electronic imaging Guidance for the selection of document image compression methods Gestion de documents Imagerie lectronique Directives pour le choix des mthodes de compression
2、dimage ISO/TR 12033:2009(E) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobes licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing
3、. In downloading this file, parties accept therein the responsibility of not infringing Adobes licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems Incorporated. Details of the software products used to create this PDF file can be fou
4、nd in the General Info relative to the file; the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the
5、 address given below. COPYRIGHT PROTECTED DOCUMENT ISO 2009 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from eithe
6、r ISO at the address below or ISOs member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail copyrightiso.org Web www.iso.org Published in Switzerland ii ISO 2009 All rights reservedISO/TR 12033:2009(E) ISO
7、 2009 All rights reserved iiiContents Foreword iv Introduction.v 1 Scope1 2 Normative references1 3 Terms and definitions .1 4 General .3 5 Type of document and digitization parameters .4 5.1 General .4 5.2 Type of documents4 5.3 Document classification and digitization4 6 Compression methods and st
8、andards6 6.1 LZW compression (Lempel Ziv Welch) .6 6.2 RLE compression (run-length encoding)6 6.3 ITU-T algorithms6 6.4 JBIG compression.8 6.5 JBIG2 compression.8 6.6 Discrete cosine transform (DCT) .8 6.7 Fractal compression .8 6.8 Wavelet compression9 6.9 JPEG compression9 6.10 JPEG 2000 10 7 Sele
9、ction of compression parameters 12 7.1 Pertinence of compression 12 7.2 Selection of a compression method12 7.3 Adjusting JPEG compression13 8 Final considerations for the selection of a compression method .14 Bibliography15 ISO/TR 12033:2009(E) iv ISO 2009 All rights reservedForeword ISO (the Inter
10、national Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has
11、been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical st
12、andardization. International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2. The main task of technical committees is to prepare International Standards. Draft International Standards adopted by the technical committees are circulated to the member bodies
13、for voting. Publication as an International Standard requires approval by at least 75 % of the member bodies casting a vote. In exceptional circumstances, when a technical committee has collected data of a different kind from that which is normally published as an International Standard (“state of t
14、he art”, for example), it may decide by a simple majority vote of its participating members to publish a Technical Report. A Technical Report is entirely informative in nature and does not have to be reviewed until the data it provides are considered to be no longer valid or useful. Attention is dra
15、wn to the possibility that some of the elements of this document may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights. ISO/TR 12033 was prepared by Technical Committee ISO/TC 171, Document management applications, Subcommittee SC 2, App
16、lication issues. This first edition of ISO/TR 12033 cancels and replaces ISO/TS 12033:2001, which has been technically revised. ISO/TR 12033:2009(E) ISO 2009 All rights reserved vIntroduction With respect to the rapid increase of applications using digitization techniques, the role of compression me
17、thods has become a factor of growing importance for the management of the volumes of stored data. The effects of the available compression methods vary greatly, depending on the source documents. For example, an electronic image management (EIM) system configured for scanning and storing continuous
18、tone images will have different image compression requirements as compared to an application involving only text. Practical methods for analysing user requirements for image compression in order to select accurate and optimal image compression schemes are complex. This Technical Report was issued in
19、 order to guide users and system developers in their selection of these methods. TECHNICAL REPORT ISO/TR 12033:2009(E) ISO 2009 All rights reserved 1Document management Electronic imaging Guidance for the selection of document image compression methods 1 Scope This Technical Report gives information
20、 to enable a user or electronic image management (EIM) integrator to make an informed decision on selecting compression methods for digital images of business documents. It provides technical guidance to analyse the type of documents and which compression methods are most suitable for particular doc
21、uments in order to optimize their storage and use. For the user, this Technical Report provides information on image compression methods incorporated in hardware or software in order to help the user during the selection of equipment in which the methods are embedded. For the equipment or software d
22、esigner, this Technical Report provides planning information. This Technical Report is applicable only to still images in bit map mode. It only takes into account compression algorithms based on well-tested mathematical work. 2 Normative references The following referenced documents are indispensabl
23、e for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies. ISO 12651:1999, Electronic imaging Vocabulary 3 Terms and definitions For the purposes of this docum
24、ent, the terms and definitions given in ISO 12651 and the following apply. 3.1 compression process of removing redundancies in digital data to reduce the amount that should be stored or transmitted NOTE Lossless compression removes only enough redundancy so that the original data can be recreated ex
25、actly as it was. Lossy compression sacrifices additional data to achieve greater compression. This is typically useful for greyscale or colour image compression, where details that are not perceptible, or are minimally perceptible, to the human eye can be eliminated, normally with a dramatic increas
26、e in compression. It is advisable that lossy compression not be used for documents containing textual information and not be used for long term archival of any type of documents. 3.2 resolution number of pixels per unit of length ISO/TR 12033:2009(E) 2 ISO 2009 All rights reserved3.3 dots per inch d
27、pi number of dots that a scanner (printer) can scan (print) per inch both horizontally or vertically 3.4 brightness visual sensation that enables an observer to detect luminance 3.5 contrast ratio of on pixel brightness to off pixel brightness 3.6 bit level number of bits used to define a pixel 3.7
28、luminance Y luminous flux emitted from a surface NOTE The former term was photometric brightness. 3.8 chrominance Cr Cb colour portion of the video signal including hue and saturation but not brightness NOTE Low chroma means the colour picture looks pale or washed out; high chroma means intense colo
29、ur; black, grey and white have a chrominance equal to zero. 3.9 ITU-T Group 3 and Group 4 compression algorithms standards defined by the ITU-T in Recommendations T.4 and T.6 3.10 Joint Photographic Experts Group JPEG name of the committee that developed the ISO/IEC 10918 series which shares the sam
30、e popular name NOTE The “J” refers to the joint development with the ITU-T. 3.11 Comit Consultatif International Tlgraphique et Tlphonique former name of the International Telecommunication Union (ITU) standardization body 3.12 compression ratio relationship of the total bits used to represent the o
31、riginal to the total number of encoded bits 3.13 Joint Bi-level Image Experts Group JBIG name of the sub committee that developed ISO/IEC 11544 NOTE The joint committee is with ITU-T. JBIG and JPEG are managed by ISO/IEC JTC1/SC 29/Working Group 1. ISO/TR 12033:2009(E) ISO 2009 All rights reserved 3
32、4 General In a document imaging system, users are concerned about the quality of archived images, for two reasons: a) it can affect the imaging systems future in the medium or even long-term; b) it is necessary to choose the imaging tools based on an evolving technology. The digitization process, wh
33、ich by nature transforms an image conveying comprehensible information into a dematerialized one, changes the observers perception of that image. The observer may consider the image as being improved, though more frequently he considers it degraded. In fact, images undergo a number of successive tra
34、nsformations at different points during the digitization process. At each of these stages, attempts are made to keep the image within acceptable legibility limits, but also to restrict its size to within acceptable economic limits. The specific role of one of the digitization stages compression is t
35、o reduce the size of the image. Some compression methods are reversible in that the decompression algorithm restores the initial digital information. These methods are lossless and have no impact on the quality of the image as it is perceived by the human eye. Other methods are lossy, and may cause
36、degradation perceptible to the eye. By adjusting certain parameters, the user can bring a lossy method within acceptable limits; because the acceptance of a lossy method is a subjective judgement. Any image or document, on which a computerized treatment may be applied, should not be compressed with
37、such a method. This is one of the major reasons not to use lossy compression for long-term archiving, as future usage of the image or document is unknown. While numerous compression methods are described in technical literature, few are stable according to industrial standards. These are based on a
38、limited number of principles: dominance of certain patterns, pattern repetition, and noticeable mathematical properties. In any individual method, the number of parameters the user can modify is small. The choice of a method and compression parameters are in large part determined by two consideratio
39、ns: a) the characteristics of the document; b) the period of time the document is to be retained (retention time). Obviously, the graphical contents of a document play a key role in determining the method and its parameters. However, other factors characterizing the application context are also very
40、 important (see Table 1). The graphical content of the document is important to the compression process. A business document that can be copied or faxed as “pure black and pure white” (even if the original was blue ink on yellow paper) are probably best compressed with the technologies developed by
41、the ITU-T for a facsimile. Colour or grey scale photos are probably best compressed using one of the JPEG technologies. But if the photo has been converted to variable size black dots (like many “half-tone” newspaper photos), then JBIG is a superior compression technology. ISO/TR 12033:2009(E) 4 ISO
42、 2009 All rights reserved5 Type of document and digitization parameters 5.1 General A document is a set of organized information intended for presentation to a human user. Documents can be a single page or a set of pages, and can contain arbitrary content types, such as character content, graphical
43、content, and various types of image content. The following document content may be found in various types of documents. The classification list hereafter is somewhat arbitrary, but for a given application, these distinctions may be used to understand how to handle a given document. 5.2 Type of docum
44、ents This clause focuses on only those documents that are most likely to be archived electronically. These documents include: black text on white background, or more technically, dark text on light background (even if the ink happens to be blue or red or other single colour, on whatever colour paper
45、); photographs, i.e. black and white or colour; mixed documents containing both text and photographs reproduced by a printing process, i.e. black and white or colour. 5.3 Document classification and digitization 5.3.1 General For the purpose of determining a compression scheme, documents may be desc
46、ribed in the following five ways. For each type of document, digitization methods are briefly described. 5.3.2 Black and white documents Digitizing pages printed in black and white or more generally in bi-tonal mode (primarily text with a unique foreground on a unique background) generates bi-level
47、images where each pixel is represented by a bit. The most important digitization parameter is resolution. Resolution should be determined according to visual perception needs and on the limits of the complete imaging process. Human eyes will not see noticeable differences on documents digitized at m
48、ore than 300 dpi. This is the most commonly used resolution to keep quality unaltered. Any resolution under 300 dpi will have visible effects on the digitized document. A resolution over 300 dpi may be needed when computerized treatment is done on the document. Also, 300 dpi is the resolution limit
49、of the human eye and should be considered as the needed resolution at the visual size, i.e. if the zooming factor to visualize is 4, a resolution of 1 200 dpi on the original size will provide 300 dpi on the visual size. There are also other parameters, related to image processing, which vary according to the kind of image. If, for example, the images to be digitized are text, then it is advisable to produce black characters that are sharply defined against a white background. The brightness (adjusting the colour