1、ITU-T RECNN*P-Sb 93 m 4862.593 0585680 TT8 W INTERNATIONAL TELECOMMUNICATION UNION ITU-T TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU TELEPHONE TRANSMISSION QUALITY OBJECTIVE MEASURING APPARATUS P.56 (03193) OBJECTIVE MEASUREMENT OF ACTIVE SPEECH LEVEL ITU-T Recommendation P.56 (Previously “CCITT
2、 Recommendation“) ITU-T RECMNJP.56 93 4862591 0585b81 934 H FOREWORD The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of the International Telecom- munication Union. The ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommendatio
3、ns on them with a view to standardizing telecommunications on a worldwide basis. The World Telecommunication Standardization Conference (WTSC), which meets every four years, established the topics for study by the ITU-T Study Groups which, in their turn, produce Recommendations on these topics. ITU-
4、T Recommendation P.56 was revised by the ITU-T Study Group XII (1988-1993) and was approved by the WTSC (Helsinki, March 1-12, 1993). NOTES I As a consequence of a reform process within the International Telecommunication Union (ITU), the CCIT ceased to exist as of 28 February 1993. In its place, th
5、e ITU Telecommunication Standardization Sector (ITU-T) was created as of 1 March 1993. Similarly, in this reform process, the CCIR and the IFRB have been replaced by the Radiocommunication Sector. In order not to delay publication of this Recommendation, no change has been made in the text to refere
6、nces containing the acronyms “CCIT, CCiR or iFRB” or their associated entities such as Plenary Assembly, Secretariat, etc. Future editions of this Recommendation will contain the proper terminology related to the new ITU structure. 2 telecommunication administration and a recognized operating agency
7、. In this Recommendation, the expression “Administration” is used for conciseness to indicate both a O U 1994 All rights reserved. No part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission
8、 in writing from the ITU. ITU-T RECNN*P-Sb 93 4862593 0585682 870 = CONTENTS 1 Introduction 2 Terminology . 3 General . 4 Method A - Immediate indication of speech volume for real-time applications . 5 Method B - Active speech level for other applications than those mentioned in method A . 6 Approxi
9、mate equivalents of method B . 8 Routine calibration of method-B meter Annex A - A method using a speech voltmeter complying with method B in network conditions . 7 Specification . Page 1 1 2 2 3 6 6 10 11 Recommendation P.56 (03/93) I ITU-T RECMN*P.Sb 93 W 4862593 0585683 707 Recommendation P.56 OB
10、JECTIVE MEASUREMENT OF ACTIVE SPEECH LEVEL (Melbourne, 1988; amended at Helsinki, 1993) 1 Introduction The CCIT considers it important that there should be a standardized method of objectively measuring speech level, so that measurements made by different Administrations may be directly comparable.
11、Requirements of such a meter are that it should measure active speech level and should be independent of operator interpretation. In this Recommendation, a meter is a complete unit that includes the input circuitry, filter (if necessary). processor and display. The processor includes the algorithm o
12、f the detection method. In its present form, this meter can safely be used for laboratory experiments or can be used with care on operational circuits. Further study is continuing on: a) how the meter can be used on 2-wire and 4-wire circuits to determine who is talking and whether it is an echo; an
13、d how such an instrument can discriminate between speech and signalling, for example. b) The method described herein maintains maximum comparability and continuity with past work, provided suitable monitoring is used, e.g. an operator performing the monitoring function. In particular, the new method
14、 yields data and conclusions compatible with those that have established the conventional value (22 microwatts) of speech power at the input to the 4-wire point of the international circuit according to Recommendation G.223. A method using operator monitoring can be found in Annex A. This Recommenda
15、tion describes a method that can be easily implemented using current technology. It also acts as a reference against which other methods can be compared. The purpose of this Recommendation is not to exclude any other method but to ensure that results from different methods give the same result. Acti
16、ve speech level shall be measured and reported in decibels relative to a stated reference according to the methods described below, namely: - Method A - Measuring a quantity called speech volume, used for the purpose of real-time control of speech level (see clause 4); Method B - Measuring a quantit
17、y called active speech level, used for other purposes (see clause 5). - Comparison of readings given by meters of methods A and B can be found in the Handbook on Telephonometry. peairms ratio, provided the signal has not been restricted or modified in any way, e.g. peak clipping. NOTE - This meter c
18、annot be used to determine peak levels but sufficient information exists i giving the instantaneous 2 Terminology The recommended terminology is as follows: speech volume Until now used interchangeably with speech level, should in future be used exclusively to denote a value obtained by method A; Sh
19、ould be used exclusively to denote a value obtained by method B; Should be used as a general term to denote a value obtained by any method yielding a value expressed in decibels relative to a stated reference. active speech level speech level The definitions of these terms 2, and other related terms
20、 such as those for the meters themselves 3, should be adjusted accordingly. Recommendation P.56 (03/93) 1 ITU-T RECMN*P-56 93 4862593 0585684 643 Application Control of vocal level in live-speech loudness balances Avoidance of peak limiting Maintenance of optimum level in making magnetic tape record
21、ing 3 General Meter Quantity observed ARAEN volume meter (SV3) Peak programme meter Highest reading VU meter Average of peaks Level exceeded in 3 s (excluding most extreme) 3.1 This Recommendation deals primarily with electrical measurements yielding results expressed in terms of electrical units, g
22、enerally decibels relative to an appropriate reference value such as one volt. However, if the calibration and linearity of the transmission system in which the measurement takes place are assured, it is possible to refer the result backwards or forwards from the measurement point to any other point
23、 in the system, where the signal may exist in some nonelectrical form (e.g. acoustical). Power is proportional to squared voltage in the electrical domain, squared sound pressure in the acoustical domain, or the digital equivalent of either of these in the numerical domain, and the reference value m
24、ust be of the appropriate kind (1 volt, 1 pascal, reference acoustic pressure equal to 20 micropascals, or any other stated unit, as the case may be). Electrical, acoustic and other levels 3.2 Universal requirements For speech-level measurements of all types, the information reported should include:
25、 the designation of the measuring system, the method used (A, B, or B-equivalent as explained in clause 4, or other specified method), the quantity observed, the units, and other relevant information such as the margin value (explained below) where applicable. All the relevant conditions of measurem
26、ent should also be stated, such as bandwidth, position of the measuring instrument in the communication circuit, and presence or absence of a terminating impedance. Apart from the stated band limitation intended to exclude spurious signals, no frequency weighting should be introduced in the measurem
27、ent path (as distinct from the transmission path). 3.3 Averaging Where an average of several readings is reported, the method of averaging should be stated. The mean level (mean speech volume or mean active speech level), formed by taking the mean of a number of decibel values, should be distinguish
28、ed from the mean power, formed by converting a number of decibel values to units of power, taking the mean of these, and then optionally restoring the result to decibels. Any correction that has been applied should be mentioned, together with the facts or assumptions on which any such correction is
29、based. For example, in loading calculations, when the active levels or durations of the individually measured portions of speech differ widely, 0.1 15 o2 is commonly added to the median or mean level in order to estimate the mean power, on the grounds that the distribution of mean active speech leve
30、ls (dB values) is approximately Gaussian. 4 Measurement of speech volume for rapid real-time control or adjustment of level by a human observer should be accomplished in the traditional manner by means of one of the devices listed in Recommendation P.52. The choice of meter and the method of interpr
31、eting the pointer deflexions should be appropriate to the application, as in Table 1. Method A - Immediate indication of speech volume for real-time applications Values obtained by method A should be reported as speech volume; the meter employed, the quantity observed, and the units in which the res
32、ult is expressed, should be stated. TABLE W.56 2 Recommendah P.56 (03/93) ITU-T RECMNbP.56 93 48b259L 0585b85 58T 5 Method B - Active speech level for other applications than those mentioned in method A 5.1 Principie of measurement Active speech level is measured by integrating a quantity proportion
33、al to instantaneous power over the aggregate of time during which the speech in question is present (called the active time), and then expressing the quotient, proportional to total energy divided by active time, in decibels relative to the appropriate reference. The mean power of a speech signal wh
34、en known to be present can be estimated with high precision from samples taken at a rate far below the Nyquist rate. However, the all-important question is what criterion should be used to determine when speech is present. Ideally, the criterion should indicate the presence of speech for the same pr
35、oportion of time as it appears to be present to a human listener, excluding noise that is not part of the speech (such as impulses, echoes, and steady noise during periods of silence), but including those brief periods of low or zero power that are not perceived as interruptions in the flow of speec
36、h 4. It is not essential that the detector should operate exactly in synchronism with the beginnings and ends of utterances as perceived: there may be a delay in both operating and releasing, provided that the total active time is measured correctly. For this reason, complex real-time voice-activity
37、 detectors depending on sampling at the Nyquist rate, such as those that have been successfully used in digital speech interpolation, are not necessarily the most suitable for this application. Their function is to indicate when a channel is available for transmission of information: this state does
38、 not always coincide with the absence of speech; on the one hand, it may occur during short intervals that ought to be considered part of the speech, and on the other hand, it may be delayed long after the end of an utterance (for reasons of convenience in the allocation of channels, for example). T
39、his Recommendation describes the detection method that meets the requirements. The method involves applying a signal-dependent threshold which cannot be specified in advance, so that accurate results cannot be guaranteed while the measurement is actually in progress; despite that, by accumulating su
40、fficient information during the process, it is possible to apply the correct threshold retrospectively, and hence to output a correct result almost as soon as the measurement finishes. Continuous adaptation of the threshold level in real time appears to yield similar results in simple cases, but fur
41、ther study is needed to find out how far this conclusion can be generalized. 5.2 Details of realization The algorithm for method B is as follows. Let the speech signal be sampled at a rate not less than f samples per second. and quantized uniformly into a range of at least 212 quantizing intervals (
42、Le. using 12 bits per sample including the sign). NOTE - This requirement ensures that the dynamic range for instantaneous voltage is at least 66 dB, but two factors combine to make the range of measurable active speech levels about 30 dB less than this: 1) Allowance must be made for the ratio of pe
43、ak power to mean power in speech. namely about 18 dB where the probability of exceeding that value is 0.001. 2) Envelope values down to at least 16 dB below the mean active level must be calculated: these values may be fractional, but will not be accurate enough if computed from a quantizing interva
44、l much exceeding twice the sample value; that is to say, it should not be expected that an active speech level less than about 10 dB above the quantizing interval would be measurable. Let the successive sample values be denoted by xi where i = 1, 2, 3, Let the time interval between consecutive sampl
45、es be r = I/fseconds. Recommendaoa P.56 (0-3) 3 ITU-T RECMN*P*Sb 93 Y862591 058568b YLb Other constants required are: V T g = exp (-t/T) Coefficient of smoothing; H Hangover time in seconds; (Voltdunit) scale factor of the analoguedigital converter; Time constant of smoothing in seconds; I = HJt Rou
46、nded up to next integer; M Margin in dB, difference between threshold and active speech level. Let the input samples be subjected to two distinct processes, 1 and 2. Accumulate the number of samples n, the sum s, and the sum of squares, sq: where so, sqo and no (initial values) are zero. Process 2 P
47、erform two-stage exponential averaging on the rectified signal values: where po and qo (initial values) are zero. The sequence qi is called the envelope, pi denotes intermediate quantities. Let a series of fixed threshold voltages ci be applied to the envelope. These should be spaced in geometric pr
48、ogression, at intervals of not more than 2: 1 (6.02 dB), from a value equal to about half the maximum code down to a value equal to one quantizing interval or lower. Let a corresponding series of activity counts a if qi ci and hi = I, then do nothing. In the first case, the envelope is at or above t
49、hejth threshold, so that the speech is active as judged by that threshold level. In the second case, the envelope is below the threshold, but the speech is still considered active because the corresponding hangover has not yet expired. In the third case, the speech is inactive as judged by the threshold level in question. Initially, all the a, values are set equal to zero, and the h, values set equal to I. It should be noted that the suffix i in all the above cases is needed only to distinguish current values from previous values of accumulated quantities;