1、Information technology High efficiency coding and media delivery in heterogeneous environments Part 3: 3D audio AMENDMENT 3: MPEG-H 3D Audio Phase 2 Technologies de linformation Codage haute efficacit et livraison des medias dans des environnements htrognes Partie 3: Audio 3D AMENDEMENT 3: Phase 2 d
2、e laudio 3D MPEG-H INTERNATIONAL STANDARD ISO/IEC 23008-3 First edition 2015-10-15 Reference number ISO/IEC 23008-3:2015/Amd.3:201 7(E) AMENDMENT 3 201 7-01 ISO/IEC 2017 ii ISO/IEC 2017 All rights reserved COPYRIGHT PROTECTED DOCUMENT ISO/IEC 2017, Published in Switzerland All rights reserved. Unles
3、s otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the ad
4、dress below or ISOs member body in the country of the requester. ISO copyright office Ch. de Blandonnet 8 CP 401 CH-1214 Vernier, Geneva, Switzerland Tel. +41 22 749 01 11 Fax +41 22 749 09 47 copyrightiso.org www.iso.org ISO/IEC 23008-3:2015/Amd.3:201 7(E) ISO/IEC 23008-3:2015/Amd. 3:2017(E) ISO/IE
5、C 2017 All rights reserved iii Contents Page Foreword . v Introduction vi 1 Profiles and Levels 7 2 Technical Overview - Update 18 3 MPEG Surround . 21 4 3D Audio Phase II HOA (Subband Directional Prediction, Parametric Ambiance Replication, Phase-based Decorrelation, HOA Layered Coding) . 23 5 Opti
6、mizations and Improvements for Low Bitrate Coding 125 6 Joint Channels for Low Bitrate Coding . 163 7 Discrete Multi-Channel Coding Tool . 173 8 Updates to MHAS 190 9 Metadata Updates 197 9.1 Update of mae_Data() syntax and semantics . 197 9.2 Update of OAM data transmission and processing . 203 9.2
7、.1 OAM syntax and semantics 203 9.2.2 2D spread rendering 218 9.2.3 Informative distance and depth spread rendering . 220 9.3 Signaling and Processing of Scene Displacement Angles for CO content 221 9.4 Extension of screen-related processing for off-centered screens 230 9.5 Update of closest speaker
8、 playout for the conditioned case . 235 9.6 Processing of excluded sectors 237 9.7 Interface for channel-based, object-based, and HOA metadata and audio . 238 9.8 Diffuseness Rendering . 249 9.8.1 Diffuseness Processing 249 9.8.2 Informative decorrelation filtering for diffuseness processing 252 9.9
9、 Updates of the element metadata preprocesssor 253 9.10 Review of Metadata . 262 9.11 References . 271 10 Improvements for use in broadcast ecosystems . 271 10.1 Order of elements in mpegh3daDecoderConfig() and mpegh3daFrame() 271 10.2 Overall delay alignment and constant decoder delay . 273 10.3 Br
10、oadcast Contribution Mode Operation of MPEG-H 276 10.4 Audio Pre-Roll 277 10.5 Multi-stream Handling . 284 11 SAOC signaling update . 287 12 Tool for Advanced Loudness Control . 289 13 Frequency-Domain Prediction and Time-Domain Post-Filtering 293 14 Sample Rate Converter . 302 15 Low Complexity Dow
11、nmix 303 16 Tonal Component Coding . 378 17 Internal Channel on MPS212 for Low Complexity Format Conversion . 390 18 Low Complexity HOA Spatial Decoding and Rendering . 403 19 High Resolution Envelope Processing (HREP) 417 ISO/IEC 23008-3:2015/Amd. 3:2017(E) iv ISO/IEC 2017 All rights reserved 20 Si
12、gnaling of IGF start and stop bands 428 21 Consolidated Tables for Configuration Extensions, mpegh3daConfigExtension(),usacConfigExtType 430 22 Consolidated Tables for Extensions Element Configuration and Payload, mpegh3daExtElementConfig(),usacExtElementType . 432 23 Consolidated Tables for MAE Dat
13、a Types, mae_data(), mae_dataType . 435 24 Consolidated Table for tcx_coding() 437 25 Peak Limiter . 439 26 Informative Annex on screen-related adaptation of HOA content in complexity constrained implementations 441 27 Further Changes, Not Categorized . 442 28 Retaining original file length with MPE
14、G-H 3D Audio . 447 AMD.OFL.1 General447 AMD.OFL.2 Avoiding Leading Zero Sampl .447 AMD.OFL.3 Avoiding Trailing Zero Samples.448 29 Enhanced Noise Filling 449 30 Scope . 453 31 Main Profile 454 ISO/IEC 23008-3:2015/Amd. 3:2017(E) ISO/IEC 2017 All rights reserved v Foreword ISO (the International Orga
15、nization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the re
16、spective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non- governmental, in liaison with ISO and IEC, also take part in the work. In the field of infor
17、mation technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1. The procedures used to develop this document and those intended for its further maintenance are described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the differ
18、ent types of document should be noted. This document was drafted in accordance with the editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives). Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO and IEC sh
19、all not be held responsible for identifying any or all such patent rights. Details of any patent rights identified during the development of the document will be in the Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents). Any trade name used in this document
20、 is information given for the convenience of users and does not constitute an endorsement. For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment, as well as information about ISOs adherence to the World Trade Organization (WTO) principles in the Tec
21、hnical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html. Amendment 3 to ISO/IEC 23008-3:2015 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology , Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information. vi ISO/IEC 2
22、017 All rights reserved ISO/IEC 23008-3:2015/Amd. 3:2017(E) Introduction The following text describes the Amendment 3 to the specification ISO/IEC 23008-3:2015 MPEG-H 3D Audio in an “Amendment“-style, i.e. in “Replace A with B“-style. It includes additions and changes that serve a number of purposes
23、: improving the coding efficiency especially for low bitrate coding modes (for scene based as well as for object based and for multichannel based content) adding descriptive metadata updating the MHAS description some improvements for usage of MPEG-H 3D Audio in broadcasting applications a tool for
24、Advanced Loudness Control a layered coding mode for coding of scene based content It is envisioned that this amendment will be merged with the current version of the MPEG -H 3D Audio specification resulting in a second edition of the standard. Text with yellow highlight shall be adjusted by the edit
25、or of a new edition of ISO/IEC 23008-3. For ease of review the document is structured by clauses, each of which reflect an approved set of changes. New Clauses, Tables and Figures are typically labelled “AMDX.Y“, where X is the num ber of the clause it appears in in this document and Y is an increas
26、ing integer counter. AMENDMENT ISO/IEC 23008-3:2015/Amd. 3:2017(E) ISO/IEC 2017 All rights reserved 7 Information technology High efficiency coding and media delivery in heterogeneous environments Part 3: 3D audio, AMENDMENT 3: MPEG-H 3D Audio Phase 2 1 Profiles and Levels Add the following definiti
27、on of profiles and levels to clause 4 Technical Overview: 4.X MPEG-H 3D Audio profiles and levels 4.X.1 Introduction This subclause defines profiles and their levels for MPEG-H 3D Audio. Complexity units are defined to give an approximation of the decoder complexity in terms of processing power requ
28、ired for the decoding process. The approximated processing power is given in “Processor Complexity Units“ (PCU), specified in Millions Operations Per Second (MOPS). 4.X.2 Profiles The following Audio Profiles are defined: 1. The Main Profile of MPEG -H 3D Audio provides a complete set of featur es f
29、or low-bitrate and high- quality coding, and rendering for all playback scenarios, exclusively based on the first edition of the MPEG-H 3D Audio specification ISO/IEC 23008-3:2015 3D Audio. 2. The High Profile of MPEG -H 3D Audio provides a complete set of features for low -bitrate and high- quality
30、 coding, and rendering for all playback scenarios. The High Profile is a superset of the Low-complexity Profile. 3. The Low Complexity Profile provides features for broadcasting and streaming with a reduced complexity of the decoder; Table P1 Summary of the Location of and Normative Reference to the
31、 Definitions of MPEG -H 3D Audio profiles. USAC and MPEG-H 3DA Main Profile are provided for information only Tool / Module defined in ISO/IEC sub- clause USAC 23003- 3 MPEG- H 3DA Main Profile MPEG- H 3DA High Profile MPEG-H 3DA Low- Complexi ty Profile block switching 14496-3 4.6.11 X X X X window
32、 shapes AAC based 14496-3 4.6.11 X X X X additional windows 23003-3 6.2.9.3 X X X X filter bank AAC based 14496-3 4.6.11 X X X X additional USAC 23003-3 7.9 X X X X ISO/IEC 23008-3:2015/Amd. 3:2017(E) 8 ISO/IEC 2017 All rights reserved TNS 14496-3 4.6.9 X X X X intensity 14496-3 4.6.8.2 coupling 144
33、96-3 4.6.8.3 perceptual noise synthesis PNS 14496-3 4.6.13 noise filling 23003-3 7.2 X X X X MS basic mid/side coding 14496-3 4.6.8.1 X X X X MDCT based complex prediction 23003-3 7.7.2 X X X X quantization non-uniform 14496-3 4.6.1 X X X X uniform 23003-3 7.1 X X X X entropy coding Huffman 14496-3
34、4.6.3 context adaptive arithmetic coding 23003-3 7.4 X X X X SBR base 14496-3 4.6.18 X X X enhanced 23003-3 7.5 X X X parametric stereo extension Parametric Stereo 14496-3 8.6.4 / 8.A MPEG Surround 2 -1-2 (incl. residual coding) 23003-3 6.2.13 X X X Quad Channel Element 23008-3 5.5 X X ACELP 23003-3
35、 7.14 X X X X frequency domain noise shaping scale factor based 14496-3 4.6.2 X X X X LPC based 23003-3 X X X X Intelligent Gap Filling IGF for FD 23008-3 X X X Improved LPD coding IGF for TCX and TBE in ACELP 23008-3 Amd3 X X LPD stereo 23008-3 Amd3 X X Predictors for FD and TCX frequency-domain pr
36、ediction and time- domain post-filtering 23008-3 Amd3 X X Discrete Multi- channel coding MCT 23008-3 Amd3 X X Format Converter Generic downmix 23008-3 10, Amd3.1 X X X (Note 4) Immersive Rendering Immersive rendering within format converter 23008-3 11, Amd3.2 X X X (Note 4) Static metadata Metadata
37、Audio Elements (MAE) and Audio Scene Information (ASI) Decoder and Renderer 23008-3 15 X X X Dynamic object metadata Object Audio Metadata (OAM) Decoder and Renderer 23008-3 7, 8 X X X MPEG Surround Extension 23003-1 Amd 3 9 X SAOC-3D Decoder and Renderer 23008-3 9 X X HOA Decoder and Renderer 23008
38、-3 and Amd3 12 X X X (Note 5) ISO/IEC 23008-3:2015/Amd. 3:2017(E) ISO/IEC 2017 All rights reserved 9 Near Field Compensation 23008-3 X X X (Note1) Subband Directional Prediction 23008-3 Amd3 X Parametric Ambiance Replication (PAR) 23008-3 Amd3 X Phase-based decorrelation 23008-3 Amd3 X Binaural FD-b
39、inaural, TD - binaural 23008-3 13 X X X (Note2) HOA2Binaural H2B 23008-3 X X X (Note2) DRC DRC-1 23003-4 X X X (Note3) DRC-2 (single band) 23003-4 X X X DRC-2 (multi band) 23003-4 DRC-3 (single band) 23003-4 X X X Sample Rate Converter 23008-3 Amd3 Amd3. 3 X X Peak Limiter Unguided clipping preventi
40、on 23008-3 23003-4 D X X Loudness Loudness metadata and handling 23003-4 6 X X X Loudness compensation 23008-3 Amd3 X X MHAS MPEG-H 3D audio stream 23008-3 14 X X X Truncation message and CRC packet type, ASI packet type 23008-3 Amd3 X X File Format Carriage of MPEG-H 3D Audio in ISO base media file
41、 format 23008-3 Amd2 (Note 6) Interfaces and processing Interfaces and processing for Interaction data and local setup info 23008-3 17,18 X X X Carriage of system data Carriage of System Data for the interaction with System Engine 23008-3 Amd4 X X TCC Tonal Component Coding 23008-3 Amd3 X IC Interna
42、l Channel 23008-3 Amd3 X HREP High Resolution Envelope Processing 23008-3 Amd3 X Note 1: Restrictions apply dependent on the levels Note 2: Implementation of binaural rendering is only mandated if headphone reproduction is supported. Note 3: Multi-band DRC-1 shall be applied in the STFT domain of th
43、e TD format converter. Note 4: The TD format converter downmix shall be applied for downmixing. Note 5: In order to achieve target complexity for the LC profile at a given level implementers should study Annex G. Note 6: File Format encapsulation is independent of the profile that is used for the bi
44、tstream. A profile level indicator is part of the file format specification (see XXX). ISO/IEC 23008-3:2015/Amd. 3:2017(E) 10 ISO/IEC 2017 All rights reserved 4.X.2.1 Levels of the Low Complexity Profile Table P2 Levels and their corresponding restrictions for the Low Complexity Profile Level Max. S
45、ampling rate Max. no. of core ch. in compressed data stream Max. no. of decoder processed core ch. Max. no. of loudspeaker output ch. Example of max. loudspeaker configuration Max. no. of decoded objects Example of a max. Config C+O Max. HOA order Example of max. HOA order + O 1 48000 10 5 2 2.0 5 2
46、 ch. + 3 static obj. NOTE 12 2 ndorder + 3 static obj. NOTE 12 48000 18 9 8 7.1 9 6 ch. + 3 static obj. NOTE 14 4 thorder + 3 static obj. NOTE 13 48000 32 16 12 11.1 16 12 ch. + 4 obj. 6 6 thorder + 4 obj. 4 48000 56 28 24 22.2 28 24 ch. + 4 obj. 6 6 thorder + 4 obj. 5 96000 56 28 24 22.2 28 24 ch.
47、+ 4 obj. 6 6 thorder + 4 obj. NOTE 1 In this context “static objects“ are understood as channel-based signals without accompanying OAM data which are not also associated to a channel bed The use of switch groups determines the subset of core channels out of the core channels in the bitstream that sh
48、all be decoded. If the mae_AudioSceneInfo() contains switch groups (mae_numSwitchGroups0), then the elementLengthPresent flag shall be 1 The number of channels of the signaled referenceLayout shall not exceed the maximum number of loudspeaker output channels as defined in the levels Table P2 Table P
49、3 Approximated worst case processing power (PCU) of decoder modules and the whole decoder for the different Levels of the Low Complexity Profile given in MOPS Level Core LC Format Converter Object Renderer HOA 2 Objects only Renderer DRC Limiter Binaural 1 Worst case PCU 1 33 3 0 3 9 6 4 7 58 2 59 10 0 17 16 18 5 19 118 3 106 36 7 36 29 24 6 27 206 4 186 113 7 93 50 30 9 46 392 5 373 226 14 186 50 34 19 92 758 1NOTE: The complexity numbers for binaural processing