1、 Reference number ECMA-123:2009 Ecma International 2009 , ECMA-407 1st Edition / June 2014 Scalable Sparse Spatial Sound System (S5) Base S5 Coding COPYRIGHT PROTECTED DOCUMENT Ecma International 2014 Ecma International 2014 i Contents Page 1 Scope 1 2 Conformance . 1 3 Normative references 1 4 Term
2、s, definitions and acronyms 1 5 S5 Overview . 2 6 Inverse Coding . 4 7 Configuration Data 7 7.1 Syntax of Configuration Data (S5Config) 7 7.2 Configuration Identifier (S5ConfigID) 7 7.3 Window Size for the Calculation of Synchronization Tags (S5SyncTagWindow) 7 7.4 Accuracy of the Calculation of Syn
3、chronization Tags (S5SyncTagAccuracy) . 7 7.5 Downmix Configuration (S5DownmixConfig). 8 7.6 Output Channel Configuration (S5ChannelConfig) . 8 7.7 Upmix Configuration (S5UpmixConfig) . 8 8 Inverse Coding Parameter Data . 8 8.1 Syntax of Inverse Coding Parameter Data (S5InvCodeData) 8 8.2 Synchroniz
4、ation Elements (S5SyncTag, S5SyncTag-1, S5SyncTag-2) . 9 8.3 Number of Parameter Sets (S5ParameterSetCount) 9 8.4 Inverse Coding Parameter Data Set ID (S5ParameterSetID) . 9 8.5 Parameter Data Set Type (S5ParameterSetType) 10 8.6 Inverse Coding Parameter Data Set (S5ParameterSet) . 10 9 Downmix . 11
5、 10 Upmix 11 10.1 Synchronization of Inverse Coding Parameter Data . 11 10.2 Expanding of S5AbrParameterSet . 11 10.3 Default Values of Inverse Coding Parameter Data . 11 10.4 Default values of S5UpmixConfig 11 Annex A (normative) Channel Positions and Configurations 13 Annex B (normative) Syntax fo
6、r S5UpmixConfig 15 Annex C (informative) Channel Configuration and Position Tables 19 Annex D (informative) Loudness Adjustment 23 Annex E (informative) Multiplexing . 25 ii Ecma International 2014 Introduction S5 denotes a scalable multichannel coding system for spatial audio data compression, whic
7、h can be applied to provide 3D audio experience with little overhead. Such system may incorporate a wide range of state-of-the-art audio codecs and can be applied to provide 3D audio experience. By using an audio codec, which may offer encapsulation capacity for external data, S5 data may be carried
8、 within the audio coder stream with little overhead and maintain a compatible bit stream syntax. This Standard specifies the base S5 encoder and decoder in terms of configuration data, downmix, inverse coding parameter data and upmix. It provides reference and guidance on how to incorporate further
9、components to form a scalable multichannel coding system for audio data compression. The base S5 codec achieves data compression of multichannel audio information by mapping the audio information on to a downmix signal and to sparse spatial data, which refers to the parameter values of a mathematica
10、l model to reconstruct localization and ambiance. A specific method, denoted as inverse coding, is used for upmixing from the audio downmix and its associated parameter values. Compressing the downmix audio by a state-of-the-art audio codec will further increase the coding efficiency of S5. An overv
11、iew is given on how the base S5 encoder/decoder may be extended by incorporation of an audio codec and other components; however, the components themselves and their interfaces are not specified. Such specific S5 codecs are subject to separate standards, which share the base S5 coding standard as th
12、eir common basis. This Ecma Standard has been adopted by the General Assembly of June 2014. Ecma International 2014 iii “COPYRIGHT NOTICE 2014 Ecma International This document may be copied, published and distributed to others, and certain derivative works of it may be prepared, copied, published, a
13、nd distributed, in whole or in part, provided that the above copyright notice and this Copyright License and Disclaimer are included on all such copies and derivative works. The only derivative works that are permissible under this Copyright License and Disclaimer are: (i) works which incorporate al
14、l or portion of this document for the purpose of providing commentary or explanation (such as an annotated version of the document), (ii) works which incorporate all or portion of this document for the purpose of incorporating features that provide accessibility, (iii) translations of this document
15、into languages other than English and into different formats and (iv) works by making use of this specification in standard conformant products by implementing (e.g. by copy and paste wholly or partly) the functionality therein. However, the content of this document itself may not be modified in any
16、 way, including by removing the copyright notice or references to Ecma International, except as required to translate it into languages other than English or into a different format. The official version of an Ecma International document is the English language version on the Ecma International webs
17、ite. In the event of discrepancies between a translated version and the official version, the official version shall govern. The limited permissions granted above are perpetual and will not be revoked by Ecma International or its successors or assigns. This document and the information contained her
18、ein is provided on an “AS IS“ basis and ECMA INTERNATIONAL DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY OWNERSHIP RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PU
19、RPOSE.“ iv Ecma International 2014 Ecma International 2014 1 Scalable Sparse Spatial Sound System (S5) Base S5 Coding 1 Scope This Standard specifies the base S5 encoder and decoder in terms of configuration data, downmix, inverse coding parameter data and upmix. In addition it provides reference an
20、d guidance on how to incorporate further components to form a scalable multichannel coding system for audio data compression. 2 Conformance Conformant base S5 encoders generate dataflows as specified in Clauses 7, 8, and 9. Conformant base S5 decoders generate the upmix as specified in Clause 10 by
21、processing dataflows as specified in Clauses 7, 8, and 9. 3 Normative references ISO/IEC 23001-8, Information technology - MPEG systems technologies - Part 8: Coding-independent code points IETF RFC 5234, Augmented BNF for syntax specifications: ABNF 4 Terms, definitions and acronyms For the purpose
22、s of this document, the following terms and definitions apply. 4.1 downmix reduced number of audio channels from an input signal 4.2 upmix increased number of audio channels from a downmix 4.3 base S5 encoder encoding unit providing the downmix, the inverse coding parameter data and the upmix config
23、uration 4.4 base S5 decoder decoding unit providing the upmix based on the downmix, the inverse coding parameter data and the upmix configuration 4.5 base audio codec audio codec component providing lossless or lossy compression and decompression of the downmix 4.6 loudness perceived level of an aud
24、io programme 4.7 Mid (M) signal non-directional input signal to a Mid-Side (MS) decoder 2 Ecma International 2014 4.8 Side (S) signal directional input signal to a Mid-Side (MS) decoder 4.9 uimsbf unsigned integer, most significant bit first 4.10 Q format fixed point binary format for fractional num
25、bers, where the number of fractional bits and the number of integer bits is specified 4.11 uqmsbf unsigned Q format most significant bit first NOTE This Standard uses for uqmsbf the Q format notation Qm.n, where “m” designates the number of bits of the integer part and “n” denotes the number of bits
26、 of the fractional portion to the right of the binary point. The width “w” of the corresponding bitfield is w = m + n bits. The value range covers 0 to 2m - 2-n with a constant resolution of 2-n. To convert a number from unsigned Q format to a decimal number take the Q bitfield as an integer and mul
27、tiply it by 2-n. 4.12 sqmsbf signed Q format most significant bit first NOTE This Standard uses for sqmsbf the 2s complement with Q format notation Qm.n, where “m” designates the number of bits of the integer part without the sign bit and “n” the number of bits of the fractional portion to the right
28、 of the binary point. The width “w” of the corresponding bitfield is w = m + n + 1 bits, which includes the sign bit as most significant bit. The value range covers 2-m to 2m - 2-n with a constant resolution of 2-n. To convert a number from signed Q format to a decimal number take the Q bitfield as
29、an 2s complement integer and multiply it by 2-n. 5 S5 Overview S5 denotes a scalable multichannel coding system for spatial audio data compression, which can be applied to provide 3D audio experience with little overhead. Such a system may incorporate a wide range of state-of-the-art audio codecs an
30、d can be applied to provide a 3D audio experience. By using an audio codec, which may offer encapsulation capacity for external data, S5 data may be carried within the base audio coder stream with little overhead and maintain a compatible bit stream syntax. The system of an S5 codec can be determine
31、d by the functional block diagrams of the S5 encoder, as depicted in Figure 1, and of the S5 decoder, as depicted in Figure 2. An S5 encoder shall at least consist of a base S5 encoder; and a S5 decoder shall at least consist of a base S5 decoder. The base S5 encoder shall achieve compression of mul
32、tichannel audio information by downmixing the f-channel signal to g channels and shall produce sparse spatial data, which is a parametric encoding of a mathematical model to reconstruct from the downmix an upmix having the localization and ambiance approaching that of the original signal. A specific
33、 method, denoted as inverse coding (see Clause 6), is used to construct an upmix of h channels from the audio downmix and its associated spatial data. Compressing the downmix audio by a state-of-the-art base audio coder can further increase the coding efficiency of S5. The various bitstreams produce
34、d by the functional units of an S5 encoder may be encapsulated into a single bitstream by the functional unit Multiplexer (see Annex F). Ancillary data may be conveyed from the S5 encoder to the S5 decoder and may be used to encapsulate data other than coding parameters, for example, loudness parame
35、ters, which may be used to adjust the perceived level of audio signals. For the loudness parameters, see Annex D. This Standard specifies the base S5 encoder/decoder and their interfaces only. All other components and their interfaces are not specified. Such specific S5 codecs are subject to separat
36、e standards. As the base S5 Ecma International 2014 3 encoder/decoder is agnostic to the other system components, the base S5 coding standard shall represent the common base for all S5 specific standards. Figure 1 Functional block diagram of the S5 encoder Figure 2 Functional block-diagram of the S5
37、 decoder 4 Ecma International 2014 The subsequent clauses of this Standard specify the syntax of data streams by using Augmented Backus Naur Form (ABNF) as is defined in IETF RFC 5234. In addition to this notation, the code of the data stream elements is denoted by the format and the length of their
38、 bit fields. Note, that syntax and final encoding of a data stream are strictly separated. For the same syntax of a data stream, an external encoding e.g. by a multiplexer may vary according to the constraints of the storage or transmission environment. Examples are byte alignment or error protectio
39、n. However, external encoding details are beyond the scope of this Standard and are subject to specific S5 standards or other specifications. 6 Inverse Coding Inverse coding denotes a mathematical method for upmixing a channel-based audio signal while preserving to a high degree the localization and
40、 ambiance information of the audio source. Inverse coding is based on the spatial representation of a left and a right signal by a real-valued composite signal, the mid (M) signal, and a real-valued differential signal, the side (S) signal. A mid-side (MS) decoder maps without information loss the s
41、amples of the MS signals on to the left and the right channel. The mapping follows the equations below: Left = (M +S) 12 Right = (M S) 12 Inverse coding assumes that the S signal can be approximated by processing the M signal with two specific gains P, P and two specific delays L, L. Figure 4 depict
42、s inverse coding as a signal processing unit. The corresponding functions L, L, P, P and their parameters refer to Table 1: Table 1 Formulae of inverse coding gains and delays Delay L L = L = ( f()2sin + f2()4sin2 + f2() f() f() sinsin ) Delay L L = L = ( f()2sin + f2()4sin2 + f2() + f() f() sinsin
43、) Gain P P =f2()4sin2 + f2() f() f() sinsin Gain P P =f2()4sin2+ f2() + f() f() sinsin The discriminant relationship of these gains and delays induces sound source separation for sound sources, even at the same frequency. It relies on an inverse problem solution, which takes into account the directi
44、vity pattern together with angular assumptions with regard to the main angle of incidence and a left opening angle and a right opening angle . These parameters of such discriminant relationship define the overall sound stage of the resulting MS signal. Figure 3 depicts this relationship for an M sig
45、nal showing a cardioid polar pattern. Ecma International 2014 5 Figure 3 Angular assumptions and directivity pattern with inverse coding The following inverse coding parameters are applied: the directivity pattern f() is the ascertained polar diagram with f() = 1 2 + 2 sin with 0 2 and 0 n 2 the asc
46、ertained main angle of incidence between sound source and the polar main axis of the directivity pattern (such polar axis corresponding with a microphones main axis), with -/2 /2 the stipulated left opening angle adjoining the polar main axis of the directivity pattern on the left, with, 0 /2. For a
47、 positive main angle of incidence , the condition shall be satisfied. the stipulated right opening angle adjoining the polar main axis of the directivity pattern on the right, with 0 /2. For a negative main angle of incidence , the condition | | shall be satisfied. the time scaling factor for genera
48、ting a S-signal, with 0.029s 0.146s the side signal ratio gain to control the S-signal level, with, 0 1, leading to signals that may be seamlessly varied between a degree of correlation of 1 and +1. 6 Ecma International 2014 These parameters are applied for inverse coding as shown by the signal proc
49、essing circuit of Figure 4: Figure 4 Functional block diagram for an inverse coding function According to Figure 4, the samples of left and right channels shall be derived from the previous equations as given in Table 2. Table 2 Formulae of inverse coding functions Left = 12 (M + (P delay(M,L ) + P delay(M,L ) Right = 12 (M (P delay