1、Information technology High efficiency coding and media delivery in heterogeneous environments Part 3: 3D audio AMENDMENT 1: MPEG-H, 3D audio profile and levels Technologies de linformation Codage haute efficacit et livraison des medias dans des environnements htrognes Partie 3: Audio 3D AMENDEMENT
2、1: Niveaux et profil audio 3D MPEG-H INTERNATIONAL STANDARD ISO/IEC 23008-3 First edition 2015-10-15 Reference number ISO/IEC 23008-3:2015/Amd.1:2016(E) AMENDMENT 1 2016-08-01 ISO/IEC 2016 ii ISO/IEC 2016 All rights reserved COPYRIGHT PROTECTED DOCUMENT ISO/IEC 2016, Published in Switzerland All rig
3、hts reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior written permission. Permission can be requested from e
4、ither ISO at the address below or ISOs member body in the country of the requester. ISO copyright office Ch. de Blandonnet 8 CP 401 CH-1214 Vernier, Geneva, Switzerland Tel. +41 22 749 01 11 Fax +41 22 749 09 47 copyrightiso.org www.iso.org ISO/IEC 23008-3:2015/Amd.1:2016(E) ISO/IEC 23008-3:2015/Amd
5、.1:2016(E) Foreword ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject fo
6、r which a technical committee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on
7、all matters of electrotechnical standardization. The procedures used to develop this document and those intended for its further maintenance are described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the different types of ISO documents should be noted.
8、 This document was drafted in accordance with the editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives). Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO shall not be held responsible for identifying an
9、y or all such patent rights. Details of any patent rights identified during the development of the document will be in the Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents). Any trade name used in this document is information given for the convenience of u
10、sers and does not constitute an endorsement. For an explanation on the meaning of ISO specific terms and expressions related to conformit y assessment, as well as information about ISOs adherence to the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see the follow
11、ing URL: www.iso.org/iso/foreword.html. Amendment 1 to ISO/IEC 23008-3:2015 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information. ISO/IEC 2016 All rights reserved iii Information technolog
12、y High efficiency coding and media delivery in heterogeneous environments Part 3: 3D audio AMENDMENT 1: MPEG-H, 3D audio profile and levels Page 346 Add the following section after Clause 18. 19 MPEG-H 3D Audio Profile Definition 19.1 Profile: Main Profile The Main Profile for MPEG-H 3D Audio contai
13、ns all normative bitstream elements and normative decoder tools defined in MPEG-H 3D Audio specification. That means that the following tools will be included Main Profile decoders: MPEG-H 3D Audio Core Decoder HOA Rendering SAOC 3D Renderer Static object metadata (MAE) and rendering Dynamic object
14、metadata (OAM) and rendering Generic Loudspeaker Rendering/Format Conversion Immersive Loudspeaker Rendering/Format Conversion Binaural Rendering Time Domain and/or Frequency Domain H2B Binaural Rendering Loudness Metadata DRC processing ISO/IEC 23008-3:2015/Amd.1:2016(E) ISO 2016 All rights reserve
15、d 1 ISO/IEC 23008-3:2015/Amd.1:2016(E) The following table specifies the levels of the Main Profile. Mpegh3daProfile LevelIndication Applicable Notes Max. number of core channels Max. sampling rate of core Max number of loudspeaker output channels Max. PCU in wMOPS a Max. RCU 1 8 48000 8 138 2 16 48
16、000 16 265 3 1) 2) 3) 32 48000 24 448 4 1) 2) 3) 64 48000 24 830 5 1) 2) 3) 128 96000 64 3223 General restrictions for all levels: HOA: The number of active predictions must not be larger than (NumActivePred in Table 127 Syntax of HOAPredictionInfo(DirSigChannelIds, NumOfDirSigs). N is the HOA order
17、. For the definition of global HOA parameters refer to 12.4.1.1. The HOA order must not be larger than 3 for Level 1, 4 for Level 2, 5 for Level 3, 6 for Level 4 and 7 for Level 5 (see HoaOrder in Table 119 Syntax of HOAConfig(). The number of input objects (for SAOC encoding) must not be larger tha
18、n 2 times the maximum num- ber of core coder channels The number of predominant sounds of HOA must not be larger than 8 for Level 1, 10 for Level 2, 12 for Level 3, 14 for Level 4, and 16 for Level 5. Restrictions for specific levels:1) SAOC: The maximum number of SAOC downmix channels is 32. SAOC o
19、bjects must be grouped, i.e. a set of SAOC objects is mixed into a group of maximum of 8 downmix channels and not to any other downmix channel. IOCs must not be transmitted between SAOC objects different groups. 2) The maximum number of channels in each group with SignalGroupTypeChannels is 24, mult
20、iple such groups can exist 3) For DRC-1 and DRC-3 the maximum number of channel groups for each is 16. Note: Also, it is assumed that the both Binaural Renderers (TD and FD) are implemented. The total complexity may increase if only a single Binaural Renderer is available. The numbers for binaural p
21、rocessing are calculated on the basis of BRIR filters of 1 second length measured in a BS.1116 compliant room. aThe maximum PCU numbers are based on theoretical calculations and estimations of the number of opera- tions. They represent worst case total complexity numbers. All PCU figures are provide
22、d as informative data.2 ISO 2016 All rights reserved ISO/IEC 23008-3:2015/Amd.1:2016(E) 19.1.1 Examples for Level 1 of Main Profile Example 1: 8 input channels as a 7.1 mix are carried as channels and coded at a low bitrate. In the decoder a downmix is performed to 5.1 channels. Finally, a multi-ban
23、d dynamic range compression is applied to the 6 loudspeaker output signals. Decoder building block Core Coder channels Rendering Domain switch DRC Post- processing Total PCU in wMOPS Description 8 (incl all tools) = 4 CPEs 8 ch - 6 ch 6 ch FD- TD multi-band DRC 2 -none- PCU 46 5 9 2.2 62 Example 2:
24、A 2 ndorder HOA signal is carried in 8 core coder channels and is decoded to produce 9 HOA components. The H2B binaural processing is applied to render the signal for a headphone output. Single band dynamic range compression is applied to the output. Decoder building block Core Coder channels Render
25、ing Domain switch DRC Post- processing Total PCU in wMOPS Description 8 (including all tools) = 4SCE + 2CPE 4 Amb + 4 PS (HOA rendering matrix 9x8 not applied) 8 ch FD- TD (if SBR, otherwise not applied) DRC 2 full band H2B-Binaural Rendering of 9 HOA compo- nents PCU 12.6+21.6 = 34.2 15 12/0 0.5 21
26、 82/70 19.1.2 Examples for Level 2 of Main Profile Example 1: A 4 thOrder Higher Order Ambisonics (HOA) signal is coded at about 500 kbit/s, so no SBR is applied. The output domain of the core decoder is time domain so no domain switch is necessary. The HOA spatial decoder reproduces a 4 thorder HOA
27、 signal which is rendered to a 11.1 loudspeaker setup. Decoder building block Core Coder channels Rendering Domain switch DRC Post- processing Total PCU in wMOPS Description 8 (including all tools) = 2CPE + 4 SCE 4 Amb + 4 PS (HOA Decoding + Rendering to 11 Speakers) - DRC 2 full band PCU 8+19.4=27.
28、4 24 + 13 = 37 0 0.5 0 65 ISO 2016 All rights reserved 3 ISO/IEC 23008-3:2015/Amd.1:2016(E) Example 2: A 4 thOrder Higher Order Ambisonics (HOA) signal is coded at about 250 kbit/s, so SBR is applied. The output domain of the core decoder is frequency domain and a domain switch is necessary. The HOA
29、 spatial decoder reproduces a 4 thorder HOA signal which is rendered to a 11.1 loudspeaker setup. Additionally 2 dialogue objects accompany the HOA scene. Decoder building block Core Coder channels Rendering Domain switch DRC Post- processing Total PCU in wMOPS Description 8 (HOA) = 2CPE + 4 SCE plu
30、s 2 (Objects) = 2 SCE 4 Amb + 4 PS (HOA Decoding + Rendering to 11 Speakers) + 2 Objects 10 ch FD to TD DRC 2 full band PCU 12.6 + 21.6 + 6.3 = 40.524 + 13 + 2 = 39 15 0.5 0 95 19.1.3 Examples for Level 3 of Main Profile Example 1: A 4 thOrder Higher Order Ambisonics (HOA) signal is coded at about 2
31、50 kbit/s, so SBR is applied. The output domain of the core decoder is frequency domain and a domain switch on the core coder transport channels is necessary. The HOA spatial decoder reproduces a 4 thorder HOA signal which is rendered to a 22.2 loudspeaker setup. Decoder building block Core Coder ch
32、annels Rendering Domain switch DRC Post- processing Total PCU in wMOPS Description 8 (HOA) = 2CPE + 4 SCE plus 4 Amb + 4 PS (HOA Decoding + Rendering to 22 Speakers) 8 ch FD to TD DRC 2 full band PCU 12.6 + 21.6 = 34.2 24 + 26 = 50 12 1 0 974 ISO 2016 All rights reserved ISO/IEC 23008-3:2015/Amd.1:2016(E) ISO/IEC 2016 All rights reserved ICS 35.040 Price based on 4 pages