1、 Access to Additional Content for TIA/EIA/IS-641-A, Dated: June 1998 (Click here to view the publication) This Page is not part of the original publication This page has been added by IHS as a convenience to the user in order to provide access to additional content as authorized by the Copyright hol
2、der of this document Click the link(s) below to access the content and use normal procedures for downloading or opening the files. TIA/EIA/IS-641-A Software Files Information contained in the above is the property of the Copyright holder and all Notice of Disclaimer ) with mi being the line spectral
3、 frequencies (LSF) and they satisfy the ordering property O 0.85Ri+, . This procedure of dividing the delay range into 3 sections and favoring the lower sections is used to avoid choosing pitch multiples. 11 STDmEIA TIA/IS-b4L-A-ENGL 1998 M 3234600 Ob04353 594 IS-641 -A i 2.4 Impulse response comput
4、ation 2 The impulse response, h(n), of the weighted synthesis filter H(z)W(z) = A(z / y,) / (z)A( / y2) is computed each subframe. This impulse response is needed for the search of adaptive and fixed codebooks. The impulse response h(n) is computed by filtering the vector of coefficients of the filt
5、er A(z / y,) 3 4 5 6 extended by zeros through the two filters 1 / (z) and 1 / A(z / y2) . 7 2.5 Target signal computation 8 9 IO The target signal for adaptive codebook search is usually computed by subtracting the zero-input response of the weighted synthesis filter H(z)W(z) = A(z / yl) I A(z)A(z
6、/ y2) from the weighted speech signal s, (n) . This is II performed on a subframe basis. 12 13 14 15 16 17 18 19 20 21 An equivalent procedure for computing the target signal, which is used in this codec, is the filtering of the LP residual signal r(n) through the combination of synthesis filter 1 /
7、 (z) and the weighting filter A(z / y1 ) / A(z / y2) . After determining the excitation for the subframe, the initial states of these filters are updated by filtering the difference between the LP residual and excitation. The memory update of these filters is explained in Section 2.9. The residual s
8、ignal r(n) which is needed for finding the target vector is also used in the adaptive codebook search to extend the past excitation buffer. This simplifies the adaptive codebook search procedure for delays less than the subframe size of 40 as will be explained in the next section. The LP residual is
9、 given by 10 22 r( n) = s( n) + ais( n - i), n= O, ., 39. i=l (2.20) 23 2.6 Adaptive codebook search 24 25 26 Adaptive codebook search is performed on a subframe basis. It consists of performing closed loop pitch search, and then computing the adaptive codevector by interpolating the past excitation
10、 at the selected fractional pitch lag. 27 28 29 30 31 32 The adaptive codebook parameters (or pitch parameters) are the delay and gain of the pitch filter. In the adaptive codebook approach for implementing the pitch filter, the excitation is repeated for delays less than the subframe length. In the
11、 search stage, the excitation is extended by the LP residual to simplify the closed-loop search. In the first and third subframes, a fractional pitch delay is used with resolutions 1/3 in the For the second and fourth and integers only in the range 85, 1431. 12 I 2 9 10 II I2 13 14 15 16 17 I8 19 20
12、 21 22 23 24 25 26 27 28 29 30 31 IS-641-A subframes, a pitch resolution of 113 is always used in the range Ti is nearest integer to the fractional pitch lag of the previous (1 st or 3rd) subframe. Closed-loop pitch analysis is performed around the open-loop pitch estimates on a subframe basis. In t
13、he first (and third) subframe the range To, * 3, bounded by 20 . 143, is searched. For the other subframes, closed-loop pitch analysis is performed around the integer pitch selected in the previous subframe, as described above. The pitch delay is encoded with 8 bits in the first and third subframes
14、and the relative delay of the other subframes is encoded with 5 bits. The closed loop pitch search is performed by minimizing the mean-square weighted error between the original and synthesized speech. This is achieved by maximizing the term (2.21) where x(n) is the target signal and yk (n) is the p
15、ast filtered excitation at delay k (past excitation convolved with h(n) ). Note that the search range is limited around the open- loop pitch as explained earlier. The convolution yk (n) is computed for the first delay in the searched range, and for the other delays, it is updated using the recursive
16、 relation where u), n = -(143 + 1 l),. . .,39, is the excitation buffer. Note that in search stage, the samples u), n = O, ., 39 , are not known, and they are needed for pitch delays less than 40. To simplify the search, the LP residual is copied to u(n) in order to make the relation in Equation (2.
17、22) valid for all delays. Once the optimum integer pitch delay is determined, the fractions from step of $ around that integer are tested. The fractional pitch search is performed by interpolating the normalized correlation in Equation (2.21) and searching for its maximum. Once the fractional pitch
18、lag is determined, the adaptive codebook vector u(n) is computed by interpolating the past excitation signal (fraction). The interpolation is performed using two FIR filters (Hamming windowed sinc functions); one for interpolating the term in Equation (2.21) with the sinc truncated at fl 1 and the o
19、ther for interpolating the past excitation with the sinc truncated at The filters have their cut-off frequency (-3 dB) at 3600 Hz in the oversampled domain. to f with a u(n) at the given phase +29. 32 The adaptive codebook gain is then found by 13 I 2 3 - STD-EIA TIA/IS-b4L-A-ENGL 1998 m 3234b00 Ob0
20、4355 3b7 IS-64 1 -A (2.23) where y(n) = v(n)* h(n) is the filtered adaptive codebook vector (zero-state response of H(z)W(z) to v(n). 4 2.7 Algebraic codebook structure and search 9 10 II 12 13 14 15 16 17 18 19 20 21 22 The codebook structure is based on interleaved single-pulse permutation (ISPP)
21、design. In this codebook, the innovation vector contains 4 non-zero pulses. All pulses can have the amplitudes +1 or -1. The 40 positions in a subframe are divided into 4 tracks, where each track contains one pulse, as shown in Table 2.1. Table 2.1: -Giz- + F Potential positions of individual pulses
22、 in the algebraic codebook. positions O, 5, 10, 15,20,25,30, 35 I 1, 6, 11, 16,21,26, 31,36 2,7, 12, 17,22,27,32,37 3, 8, 13, 18, 23, 28, 33, 38 4,9, 14, 19,24,29,34,39 The first three pulse positions are coded with 3 bits and the fourth pulse position with 4 bits, and the sign of the each pulse is
23、encoded with 1 bit. This makes total of 17 bits per subframe. The algebraic codebook is searched by minimizing the mean square error between the weighted input speech and the weighted synthesis speech. The target signal used in the closed-loop pitch search is updated by subtracting the adaptive code
24、book contribution. That is xz(n) = x(n)-g,y(n) , n = o ,., 39, (2.24) where y(n)=v(n)*h(n) is the filtered adaptive codebook vector and g, is the unquantized adaptive codebook gain. The matrix H is defined as the lower triangular Toeplitz convolution matrix with diagonal h(0) and lower diagonals h(1
25、) ,., h(39), and d = Hx, is the correlation 14 STDmEIA TIA/IS-b41-A-ENGL 1978 3234600 Ob04356 2T3 i5-64 1 -A between the target signal x2(n) and the impulse response h(n) , and 0 = HH is the matrix of correlations of h(n) . 1 2 The elements of the vector d are computed by 3 (2.25) 4 and the elements
26、 of the symmetric matrix are computed by 5 39 Ki, j) = C h(n - i)h(n - i), i = O, . . ., 39, j = i, . . . ,39. (2.26) n=j 6 If ck is the algebraic codevector at index k , then the algebraic codebook is searched by maximizing the term 7 8 (2.27) 9 The vector d and the matrix 0 are computed prior to t
27、he codebook search 10 The algebraic structure of the codebooks allows for very fast search procedures since the innovation vector ck contains only a few nonzero pulses. The correlation in the numerator of Equation (2.27) is given by 11 12 13 (2.28) 14 where mi is the position of the i th pulse, ui i
28、s its amplitude, and N, =4 is the number of pulses. The energy in the denominator of Equation (2.27) is given by I5 16 (2.29) 17 i=o i=O j=i+i To simplify the search procedure, the pulse amplitudes are predetermined by quantizing the signal d(n) . This is done by setting the amplitude of a pulse at
29、a certain position equal to the sign of d(n) at that position. Before the codebook search, the following steps are done. First, the signal d(n) is decomposed into two parts: its absolute value Id(n)l and its sign signd(n) . Second, the matrix (D is modified by including the sign information; that is
30、, 18 19 20 21 22 23 #(i, j) = signd(i)signd(j)#(i,J), i = 0 ,., 39,J = i + 1 ,., 39. (2.30) 24 15 IS-641 -A 3 The correlation in Equation (2.28) is now given by C= xld(rni)l i=O 4 and the energy in Equation (2.29) is given by N-1 N,.-2 N.,-1 5 10 11 12 I3 14 IS 16 (2.31) (2.32) Having preset the pul
31、se amplitudes, as explained above, the optimal pulse positions are determined using an efficient non-exhaustive analysis-by-synthesis search technique. In this technique, the term in Equation (2.27) is tested for a small percentage of position combinations. A special feature incorporated in the code
32、book is that the selected codevector is filtered through an adaptive prefilter F(z) which enhances special spectral components in order to improve the synthesis speech quality. Here the filter F(z) = 1 is used, where T is the integer part of the pitch lag and p is a pitch gain. is given by the quant
33、ized pitch gain, gp, from the previous subframe bounded by 0.0,0.8. Note that prior to the codebook search, the impulse response h(n) must include the prefilter F(z) . That is, h(n) t h(n) +,&(n - T) . 17 2.8 Quantization of the gains 18 19 The adaptive codebook gain (pitch gain) and the fixed (alge
34、braic) codebook gain are vector quantized using a 7-bit codebook. 20 21 22 23 24 25 26 The fixed codebook gain quantization is performed using MA prediction with fixed coefficients. The 4th order MA prediction is performed on the innovation energy as follows. Let E(n) be the mean-removed innovation
35、energy (in dB) at subframe n , and given by N-1 (2.33) where N = 40 is the subframe size, c(i) is the fixed codebook excitation, and = 36 dB is the mean of the innovation energy. The predicted energy is given by 27 4 E(n) = c biR(n -i) (2.34) i=l 16 i5-64 i -A where b, 6, b3 b4 = 0.68 0.58 0.34 0.19
36、1 are the MA prediction coefficients, and k(k) is the quantized energy prediction error at subframe k . The predicted energy is used to compute a predicted fixed-codebook gain g: as in Equation (2.33) (by substituting E(n) by E(,) and g, by gc). This is done as follows. First, the mean innovation en
37、ergy is found by E, =lolog -c c2(i) 1 (2.35) 6 and then the predicted gain g, is found by 7 (2.36) 8 A correction factor between the gain g, and the estimated one g, is given by 9 Y = 6c lsc. (2.37) 10 Note that the prediction error is given by II R(n) = E(n) - E(,) = 20 log (y). (2.38) I2 The pitch
38、 gain, g, and correction factor y are jointly vector quantized using a 7-bit codebook. The gain codebook search is performed by minimizing the mean-square of the weighted error between original and reconstructed speech which is given 13 14 15 E = xn+gyry+gzz-2gpxy-2g,xrz+2g,g,yz, (2.39) 16 where the
39、 x is the target vector, y is the filtered adaptive codebook vector, and z is the filtered fixed codebook vector. Each gain vector in the codebook also has an element representing the quantized energy prediction error. The one associated with the chosen gains is used to update n). k(n) is related to
40、 the variable past qua en in the C- code. - 17 18 19 20 21 2.9 Memorv uDdate 22 An update of the states of the synthesis and weighting filters is needed in order to compute the target signal in the next subframe. 23 24 After the two gains have been quantized, the excitation signal, u(n) , in the pre
41、sent subframe is found by 25 26 u(n) = b,v(n) + 8,c(n), n = O,. . .39, (2.40) 27 17 14 15 16 17 STDmEIA TIA/IS-64L-A-ENGL 1998 m 3234600 0604359 TO2 IS-641-A where ip and i, are the quantized adaptive and fixed codebook gains, respectively, v(n) the adaptive codebook vector (interpolated past excita
42、tion), and c(n) is the fixed codebook vector (algebraic code including pitch sharpening). The states of the filters can be updated by filtering the signal r(n)-u(n) (difference between residual and excitation) through the filters l/(z) and A(z/ yi)/A(z/ yz) for the 40 sample subframe and saving the
43、states of the filters. This would require 3 filtenngs. A simpler approach which requires only one filtering is as follows. The local synthesis speech, s(n), is computed by filtering the excitation signal through 1 / (z) . The output of the filter due to the input r(n) - u(n) is equivalent to e(n) =
44、s(n) -s(n). So the states of the synthesis filter 1 / (z) are given by e(n), n = 30,. ,39. Updating the states of the filter A(z / ri) / A(z / yz) can be done by filtering the error signal e(n) through this filter to find the perceptually weighted error e,(n). However, the signal e,(n) can be equiva
45、lently found by Since the signals x(n), y(n) , and z(n) are available, the states of the weighting filter are updated by computing e,(n) as in Equation (2.41) for n = 30, ., 39 . This saves two filterings. 18 1 2 3 4 5 6 7 8 9 10 II 12 13 14 15 16 17 18 19 20 21 22 23 24 2s 26 27 28 29 30 31 32 STD-
46、EIA TIA/IS-b4L-A-ENGL 1998 3234b00 0b043b0 724 IS-641-A 3. Speech decoding The function of the decoder consists of decoding the transmitted parameters (LP parameters, adaptive codebook vector, adaptive codebook gain, fixed codebook vector, fixed codebook gain) and performing synthesis to obtain the
47、reconstructed speech. The reconstructed speech is then postfiltered and upscaled. The signal flow at the decoder is shown in Figure 3. 3.1 Decoding and sDeech svnthesis The decoding process is performed in the following order: Decoding of LP filter parameters: The received indices of LSP quantizatio
48、n are used to reconstruct the quantized LSP vector. The interpolation described in Section 2.2.6 is performed to obtain 4 interpolated LSP vectors (corresponding to 4 subframes). For each subframe, the interpolated LSP vector is converted to LP filter coefficient domain ak , which is used for synthe
49、sizing the reconstructed speech in the subframe. The following steps are repeated for each subframe: 1. 2. 3. Decoding of the adaptive codebook vector: The received pitch index (adaptive codebook index) is used to find the integer and fractional parts of the pitch lag. The adaptive codebook vector v(n) is found by interpolating the past excitation u(n) (at the pitch delay) using the FIR filter described in Section 2.6. Decoding of the innovative vector: The received algebraic codebook index is used to extract the positions and am