ITU-R BS 1657-2003 Procedure for the performance testing of automated audio identification systems《自动音频识别系统性能测试的步骤 关于ITU-R 8 6》.pdf
《ITU-R BS 1657-2003 Procedure for the performance testing of automated audio identification systems《自动音频识别系统性能测试的步骤 关于ITU-R 8 6》.pdf》由会员分享,可在线阅读,更多相关《ITU-R BS 1657-2003 Procedure for the performance testing of automated audio identification systems《自动音频识别系统性能测试的步骤 关于ITU-R 8 6》.pdf(6页珍藏版)》请在麦多课文档分享上搜索。
1、 Rec. ITU-R BS.1657 1 RECOMMENDATION ITU-R BS.1657 Procedure for the performance testing of automated audio identification systems (Question ITU-R 8/6) (2003) The ITU Radiocommunication Assembly, considering a) that metadata will be accompanying most audio broadcast transmissions in the future; b) t
2、hat the automatic generation of metadata will be necessary to offer a complete cost-efficient service in future; c) that automatic identification of audio items enables tracking of transmitted programme material; d) that different schemes for extraction of audio metadata are developed today; e) that
3、 ISO/IEC JTC 1/SC 29/WG 11 is currently finalizing schemes for the coding of metadata for multimedia data; f) that no quality assessment procedures for audio metadata extraction schemes have been standardized until now, recommends 1 that the procedure described in Annex 1 should be used to evaluate
4、the performance of automated audio identification systems. Annex 1 Procedure for the performance testing of automated audio identification systems 1 Introduction In a time of ever-increasing databases filled with musical content, be it genuine audio material or associated metadata (“data about data”
5、), the demand for tools to maintain this mass of data is also growing more urgent day by day. This desire is not only voiced by professionals, but also by the common Internet user and music-lover, who searches the web on numerous errands for her or his preferred musical style. In order to facilitate
6、 the retrieval of the desired material two different levels of abstraction are here discerned: The search for metadata that can more or less be extracted automatically from the audio content, such as instrumentation, melodic theme, rhythm, etc. An example application for this would be a query-by-hum
7、ming system or the classification into genres, which is commonly used in recommendation engines. Automatic identification of titles, where only insufficient, unreliable or no metadata at all is available. An “essence” of the audio data is distilled and compared to a database of known material, thus
8、creating a link to relevant metadata such as artist, song name, etc. 2 Rec. ITU-R BS.1657 While the first mentioned class contributes mainly to the human interaction interface, the second topic finds its application also in the protection of rights by tracking radio programmes and Internet transacti
9、ons. It is foremost in this latter context that algorithms fitting that profile are referred to as “fingerprinting” techniques. 2 Motivation To meet the demands of the music business, the recognition rate of the applied fingerprinting technology must be high and withstand common alterations and modi
10、fications of the original audio content. For this purpose, the music business has acknowledged the need of quality assurance for audio identification systems by recently formulating a request for information on audio finger-printing technologies. The severity and urgency of this problem is also unde
11、rlined by the fact that a number of different, often proprietary, solutions have appeared recently. All methods, however, face the same problems regarding their robustness to modification and deterioration of the original material. Although the original material may have changed by a number of proce
12、ssing steps or degradations, it nevertheless shall be recognized as the intellectual property of the artist and composer. This leads to the proposition that automated music identification should ideally be as precise and tolerant to signal modification as human perception and recognition. Beyond rob
13、ustness to signal alterations, a good fingerprinting system should exhibit a small fingerprint size (considering that certain applications might require the storage of millions of fingerprints), fast fingerprint extraction and recognition and further desirable properties. It should be noted that rob
14、ustness to signal alterations and compactness of fingerprint representation are two conflicting requirements which have to be reconciled by systems. Consequently, in order to assess the quality of an automated audio identification system, a test environment has to be defined that covers different ty
15、pes of signal degradation in multiple degrees of severity and describes how to determine other essential system parameters. To allow the objective evaluation of identification systems, a unified test procedure is needed. 3 Quality parameters For audio identification systems the following quality par
16、ameters have to be considered: Segment size of the audio material to be identified. What portion of an item is necessary for the identification? Size of the fingerprint. How many data (bytes) per item have to be stored in the database? Is the size of the fingerprint constant or variable (with respec
17、t to the length of the item)? Size of the database. How many items can be handled simultaneously by the system? Rec. ITU-R BS.1657 3 Mode of identification. Does the system allow identification of randomly chosen subsets of audio material (continous fingerprinting) or is identification tied to short
18、 fingerprinted segments? If the latter: What is the segment size? Identification speed. How long does it take to identify an item? How does this scale with the number of items in the database? Identification performance with original and altered material. How much distortion can be introduced withou
19、t significantly affecting the recognition rate? How does this scale with the number of items in the database and the amount of distortion? Fingerprint generation speed. How fast can a fingerprint be generated on a given platform? How many resources are necessary to generate a fingerprint (e.g. centr
20、al processing unit (CPU) speed, amount of random access memory (RAM), floatingpoint processing (FPU) unit necessary)? Training speed. How long does it take to add items to the database? How does this scale with the number of items already in the database? To assess these properties in a sensible fas
21、hion and thereby to show the suitability of a system for real-world application, a test environment must exhibit constant boundary conditions regarding the characteristics under test. Relevant test conditions are the size and content of the reference database, size (referring to the playing duration
22、) and number of the test items, exact modification rules for the test items, and computing platform, which includes specification of the CPU, memory, and operating system. A number of control titles should also be included with the set of test items that are not contained in the reference database i
23、n order to properly test rejection behaviour of the system under test. 4 Selection of test material and size of database All different musical styles and genres should be present in the reference database with prevalence in numbers on the most heard genres. A database size of 10 000 to 100 000 piece
24、s is suggested for a realistic evaluation. Definition of terms: An item is called a duplicate item with respect to another audio item if it consists of the same recording as the original one with the exception that it might have a certain amount of zero valued leading or trailing samples added. This
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
10000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- ITURBS16572003PROCEDUREFORTHEPERFORMANCETESTINGOFAUTOMATEDAUDIOIDENTIFICATIONSYSTEMS 自动 音频 识别 系统 性能 测试

链接地址:http://www.mydoc123.com/p-790323.html