ETSI ES 202 212-2005 Speech Processing Transmission and Quality Aspects (STQ) Distributed speech recognition Extended advanced front-end feature extraction algorithm Compression al.pdf
《ETSI ES 202 212-2005 Speech Processing Transmission and Quality Aspects (STQ) Distributed speech recognition Extended advanced front-end feature extraction algorithm Compression al.pdf》由会员分享,可在线阅读,更多相关《ETSI ES 202 212-2005 Speech Processing Transmission and Quality Aspects (STQ) Distributed speech recognition Extended advanced front-end feature extraction algorithm Compression al.pdf(93页珍藏版)》请在麦多课文档分享上搜索。
1、 ETSI ES 202 212 V1.1.2 (2005-11)ETSI Standard Speech Processing, Transmission and Quality Aspects (STQ);Distributed speech recognition;Extended advanced front-end feature extraction algorithm;Compression algorithms;Back-end speech reconstruction algorithmfloppy3 ETSI ETSI ES 202 212 V1.1.2 (2005-11
2、) 2 Reference RES/STQ-00084a Keywords performance, speech, transmission ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N 348 623 562 00017 - NAF 742 C Association but non lucratif enregistre la Sous-Prfecture de Grasse (06) N
3、7803/88 Important notice Individual copies of the present document can be downloaded from: http:/www.etsi.org The present document may be made available in more than one electronic version or in print. In any case of existing or perceived difference in contents between such versions, the reference v
4、ersion is the Portable Document Format (PDF). In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of
5、status. Information on the current status of this and other ETSI documents is available at http:/portal.etsi.org/tb/status/status.asp If you find errors in the present document, please send your comment to one of the following services: http:/portal.etsi.org/chaircor/ETSI_support.asp Copyright Notif
6、ication No part may be reproduced except as authorized by written permission. The copyright and the foregoing restriction extend to reproduction in all media. European Telecommunications Standards Institute 2005. All rights reserved. DECTTM, PLUGTESTSTM and UMTSTM are Trade Marks of ETSI registered
7、for the benefit of its Members. TIPHONTMand the TIPHON logo are Trade Marks currently being registered by ETSI for the benefit of its Members. 3GPPTM is a Trade Mark of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. ETSI ETSI ES 202 212 V1.1.2 (2005-11) 3 Con
8、tents Intellectual Property Rights6 Foreword.6 Introduction 6 1 Scope 7 2 References 8 3 Definitions, symbols and abbreviations .8 3.1 Definitions8 3.2 Symbols9 3.3 Abbreviations .10 4 System overview 11 5 Feature extraction description 12 5.1 Noise reduction 12 5.1.1 Two stage mel-warped Wiener fil
9、ter approach.12 5.1.2 Buffering.13 5.1.3 Spectrum estimation .13 5.1.4 Power spectral density mean.14 5.1.5 Wiener filter design 15 5.1.6 VAD for noise estimation (VADNest)16 5.1.7 Mel filter-bank18 5.1.8 Gain factorization .19 5.1.9 Mel IDCT .20 5.1.10 Apply filter21 5.1.11 Offset compensation .21
10、5.2 Waveform Processing.22 5.3 Cepstrum Calculation.23 5.3.1 Log energy calculation23 5.3.2 Pre-emphasis (PE) 23 5.3.3 Windowing (W)23 5.3.4 Fourier transform (FFT) and power spectrum estimation.23 5.3.5 Mel Filtering (MEL-FB).24 5.3.6 Non-linear transformation (Log).25 5.3.7 Cepstral coefficients (
11、DCT)25 5.3.8 Cepstrum calculation output .26 5.4 Blind equalization.26 5.5 Extension to 11 kHz and 16 kHz sampling frequencies .26 5.5.1 FFT-based spectrum estimation26 5.5.2 Mel Filter-Bank 28 5.5.3 High-frequency band coding and decoding 28 5.5.4 VAD for noise estimation and spectral subtraction i
12、n high-frequency bands.29 5.5.5 Merging spectral subtraction bands with decoded bands30 5.5.6 Log energy calculation for 16 kHz .31 5.6 Pitch and class estimation.32 5.6.1 Spectrum and energy computation32 5.6.2 Voice Activity Detection for Voicing Classification (VADVC) 33 5.6.3 Low-band noise dete
13、ction.38 5.6.4 Pre-Processing for pitch and class estimation.38 5.6.5 Pitch estimation 39 5.6.5.1 Dirichlet interpolation .40 5.6.5.2 Non-speech and low-energy frames42 5.6.5.3 Search ranges specification and processing 42 5.6.5.4 Spectral peaks determination 42 5.6.5.5 F0 Candidates generation44 5.
14、6.5.6 Computing correlation scores46 ETSI ETSI ES 202 212 V1.1.2 (2005-11) 4 5.6.5.7 Pitch estimate selection.48 5.6.5.8 History information update .50 5.6.5.9 Output pitch value.51 5.6.6 Classification 51 6 Feature compression.52 6.1 Introduction 52 6.2 Compression algorithm description52 6.2.1 Inp
15、ut52 6.2.2 Vector quantization.52 6.2.3 Pitch and class quantization53 6.2.3.1 Class quantization .53 6.2.3.2 Pitch quantization54 7 Framing, bit-stream formatting and error protection55 7.1 Introduction 55 7.2 Algorithm description.56 7.2.1 Multiframe format 56 7.2.2 Synchronization sequence.56 7.2
16、.3 Header field 56 7.2.4 Frame packet stream .58 8 Bit-stream decoding and error mitigation.58 8.1 Introduction 58 8.2 Algorithm description.58 8.2.1 Synchronization sequence detection .58 8.2.2 Header decoding .59 8.2.3 Feature decompression .59 8.2.4 Error mitigation 59 8.2.4.1 Detection of frames
17、 received with errors 59 8.2.4.2 Substitution of parameter values for frames received with errors.60 8.2.4.3 Modification of parameter values for frames received with errors .60 9 Server feature processing .63 9.1 lnE and c(0) combination .63 9.2 Derivatives calculation.63 9.3 Feature vector selecti
18、on63 10 Server side speech reconstruction 64 10.1 Introduction 64 10.2 Algorithm description.64 10.2.1 Speech reconstruction block diagram .64 10.2.2 Pitch Tracking and Smoothing65 10.2.2.1 First stage - gross pitch error correction66 10.2.2.2 Second stage - voiced/unvoiced decision and other correc
19、tions .68 10.2.2.3 Third stage - smoothing 69 10.2.2.4 Voicing class correction69 10.2.3 Harmonic Structure Initialization .70 10.2.4 Unvoiced phase synthesis .70 10.2.5 Cepstra de-equalization.70 10.2.6 Transformation of features extracted at 16 kHz71 10.2.7 Harmonic magnitudes reconstruction .71 1
20、0.2.7.1 High order cepstra recovery 71 10.2.7.2 Solving front-end equation73 10.2.7.3 Cepstra to magnitudes transformation.77 10.2.7.4 Combined magnitudes estimate calculation 79 10.2.7.4.1 Combined magnitude estimate for unvoiced harmonics79 10.2.7.4.2 Combined magnitude estimate for voiced harmoni
21、cs80 10.2.8 All-pole spectral envelope modelling .81 10.2.9 Postfiltering.83 10.2.10 Voiced phase synthesis .84 10.2.11 Line spectrum to time-domain transformation86 10.2.11.1 Mixed-voiced frames processing 86 10.2.11.2 Filtering very high-frequency harmonics 86 ETSI ETSI ES 202 212 V1.1.2 (2005-11)
22、 5 10.2.11.3 Energy normalization87 10.2.11.4 STFT spectrum synthesis 87 10.2.11.5 Inverse FFT.87 10.2.12 Overlap-Add .88 Annex A (informative): Voice Activity Detection (VAD)89 A.1 Introduction 89 A.2 Stage 1 - Detection .89 A.3 Stage 2 - VAD Logic90 Annex B (informative): Bibliography.92 History 9
23、3 ETSI ETSI ES 202 212 V1.1.2 (2005-11) 6 Intellectual Property Rights IPRs essential or potentially essential to the present document may have been declared to ETSI. The information pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found
24、in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web server (http:/webapp.etsi.org/IPR/home.asp). Pursuant to the
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
10000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- ETSIES2022122005SPEECHPROCESSINGTRANSMISSIONANDQUALITYASPECTSSTQDISTRIBUTEDSPEECHRECOGNITIONEXTENDEDADVANCEDFRONTENDFEATUREEXTRACTIONALGORITHMCOMPRESSIONALPDF

链接地址:http://www.mydoc123.com/p-730887.html