使用循序式文本不特定分段法之自動音素邊界點偵測__國立清華大學博碩士論文全文影像系統

帳號：guest(3.135.195.249) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	陸勁逢
作者(外文):	Lu, Ching-Feng
論文名稱(中文):	使用循序式文本不特定分段法之自動音素邊界點偵測
論文名稱(外文):	Automatic phone boundary detection using sequential text-independent segmentation
指導教授(中文):	王小川
指導教授(外文):	Wang, Hsiao-Chuan
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	電機工程學系
學號:	9761589
出版年(民國):	99
畢業學年度:	98
語文別:	中文
論文頁數:	90
中文關鍵詞:	音素分段、音素邊界偵測、小波參數、頻譜變異函式、貝式資訊修正準則
外文關鍵詞:	phone segmentation、phone boundary detection、wavelet parameter、spectral variation function (SVF)、Bayesian information criterion corrected (BICC)
相關次數:	推薦:0 點閱:62 評分: 下載:2 收藏:0

本論文之目的係在沒有提供任何已知的資訊下，將能暗示語音頻譜變化速率線索的參數相互的結合，建構一個自動音素分段系統。研究方法提出一個文本不特定(Text-independent)的循序式音素分段法沿著時間一次只找一個候選音素邊界點(Candidate Phoneme Boundary)，找到之後即作確認，經確認後才算是偵測到的音素邊界點而進行音素分段。偵測候選音素邊界點時採用改變音框長度之小波特徵參數，接著再針對找到的候選音素邊界點，採用梅爾頻率倒頻譜係數建立單一高斯機率模型後透過貝式資訊修正準則(Bayesian Information Criterion Correct)計算出之分數值(Delta BICC)，同時結合以小波特徵參數為輸入之正規化頻譜變異函式做為確認候選音素邊界點的雙重確認條件，進行候選音素邊界點的確認演算。以TIMIT語料進行系統之測試評估，實驗結果顯示：在20ms容忍度下，640句實驗語料中有422句之F估測值超過70%，且此422句之平均F估測值達到76%，640句語料之平均F估測值也有72%；另一種評量分段演算法好壞的R-值，640句語料在±20ms容忍度下的平均R-值有75%的表現。除此之外，本文還計算了分類音素邊界的偵測率（擊中率），結果顯示系統對於塞音(Stops)接母音(Vowels)、母音接塞音、摩擦音(Fricatives)接母音、母音接摩擦音、母音接鼻音(Nasals)、鼻音接母音、塞音接半母音與流音(Semivowels & Glides)的邊界擁有較高的偵測率，惟在鼻音接鼻音、半母音接靜音、鼻音接靜音、鼻音接塞音的邊界偵測效果仍有待改善。

This paper proposes a text-independent sequential phone boundary detection algorithm. Without any previous knowledge, an automatic phone segmentation system can be constructed. The method is to search for a candidate phone boundary and then follow by a verification process. The phone segmentation is accomplished when the phone boundaries are verified. The wavelet parameters are calculated in a frame of variable frame length for searching for the candidate phone boundaries. The Bayesian information criterion corrected (BICC) and normalized spectral variation function (SVF) are applied for verifying the phone boundaries. To evaluate this proposed algorithm, the experiment was conducted on TIMIT corpus. The performance of phone segmentation was measured in F-value. In the condition of 20-ms tolerance, the average F-value of 640 test utterances is 72%. Among them, 422utterances get the F-value larger than 70%.

摘要.......................................................i
Abstract..................................................ii
圖目錄.....................................................v
表目錄...................................................vii
第一章緒論................................................1
1.1研究動機................................................1
1.2自動音素分段............................................2
1.3章節概要................................................4
第二章語音特徵參數抽取....................................5
2.1本章概要................................................5
2.2梅爾頻率倒頻譜係數之抽取................................7
2.3小波轉換特徵參數之抽取.................................11
2.3.1固定長度音框的小波轉換特徵參數(WLP_FL)...............14
2.3.2改變音框長度的小波轉換特徵參數(WLP_VL)...............15
第三章模型選擇準則(Model Selection Criteria, MSC)........17
3.1本章概要...............................................17
3.2以貝式資訊修正準則(Bayesian Information Criterion Correct, BICC)計算出之Delta BICC做為確認候選音素邊界點之條件........................................................20
3.2.1模型建立:單一高斯機率密度函數........................21
3.2.2貝式資訊準則(BIC)與貝式資訊修正準則(BICC)............22
第四章頻譜變異函式(Spectral Variation Function, SVF).....26
4.1本章概要...............................................26
4.2以小波特徵參數(WLP_FL)為輸入之正規化頻譜變異函式做為確認候選音素邊界點之條件................................... 28
4.2.1 SVF與正規化頻譜變異函式之數學定義...................29
4.2.2比較:使用小波特徵參數(WLP_FL)輸入之正規化頻譜變異函式與MFCC輸入之正規化頻譜變異函式..............................30
4.2.3候選音素邊界點對應之正規化頻譜變異函式...............32
第五章循序式音素分段演算法...............................34
5.1候選音素邊界點之偵測...................................34
5.1.1偵測法則.............................................37
5.2音素邊界點之確認.......................................41
5.3音素分段演算法.........................................44
第六章實驗與討論.........................................46
6.1實驗語料庫：TIMIT......................................46
6.2實驗結果與討論.........................................48
6.2.1以F-值作為評量方式...................................48
6.2.2以R-值作為評量方式...................................74
第七章結論與未來展望.....................................83
參考文獻..................................................86

[1]林宥余,“高解析度之國語類音素單元端點自動標示”,電信工程學系碩士班，國立交通大學，中華民國九十八年六月
[2]B. Pellom and J. Hansen, “Automatic segmentation of speech recorded in unknown noisy channel characteristics,” Speech Commun., vol. 25, no. 1–3, pp. 97–116, 1998
[3]L.Wang, Y. Zhao, M. Chu, F. Soong, J. Zhou, and Z. Cao, “Context-dependent boundary model for refining boundaries segmentation of TTS Units,” IEICE Trans. Inf. Syst., vol. E89-D, no. 3, pp. 1082–1091, 2006.
[4]J.-W Kuo and H.-M Wang, “Improved HMM/SVM methods for automatic phoneme segmentation,”in Proc.Interspeech, Antwerp,Belgium,2007,pp.2057-2060
[5]J. Hosom, “Automatic phoneme alignment based on acoustic-phonetic modeling,” in Proc. Int. Conf. Spoken Lang. Process., 2002, vol. 1, pp.357–360.
[6]G. Almpanidis,M. Kotti,and C. Kotropoulos,“Robust Detection of Phone Boundaries Using Model Selection Criteria With Few Observations,”IEEE Transactions on Audio,Speech,and Lanquage Processing,vol.17,no.2, pp. 287-298 , Feb. 2009
[7]F. Brugnara, R. De Mori, D. Giuliani, and M. Omologo, “Improved connected digit recognition using spectral variation functions,” in Proc. Int. Conf. Spoken Lang. Process., 1992, vol. 1, pp. 627–630
[8]C. Mitchell, M. Harper, and L. Jamieson, “Using explicit segmentation to improve HMM phone recognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 1995, vol. 1, pp. 229–232.
[9]Beng T Tan, Robert Lang, Heiko Schroder, Andrew Spray and Phillip Dermody “Applying wavelet analysis to speech segmentation and classification”In H. H. Szu, editor, Wavelet Applications, volume Proc. SPIE 2242, pages 750{761, 1994.}
[10]王小川編著, “語音訊號處理”,全華科技圖書股份有限公司,2004
[11]John R. Deller, Jr., John G. Proakis, John H.L. Hansen,“Discrete-Time Processing of Speech Signals” IEEE Computer Society , 1999.
[12]林青慧“強韌式語者辨識系統：從麥克風、市話到手機” ,資訊系統與應用研究所，國立清華大學，中華民國九十二年六月
[13]S. Mallat, “A Wavelet Tour of Signal Processing,” Academic Press 1998.
[14]Beng T. TAN, Minyue Fu, Andrew Spray,“The Use of Wavelet Transforms In Phoneme Recognition”1996
[15]Mark Julian Maslen, “Factoring Wavelet Transforms into Lifting Steps” Honours Thesis, The University of Western Australia, 1997.
[16]蘇盈安,“以動態音框長度調整作語者驗證之研究”,電信工程　碩士班，大葉大學，中華民國九十四年六月
[17]S. Chen and P. Gopalakrishnan, "Speaker, environment, and channel change detection and clustering via the Bayesian information criterion," DARPA Broadcast News Transcription and Understanding Workshop, 1998.
[18]Niloofar Gheissari and Alireza Bab-Hadiashar,“Model Selection Criteria in Computer Vision : Are They Different ”Proc.VIIth Digital Image Computing: Techniques and Application, Sun C., Talbot H.,Ourselin S.and Adriaansen T.(Eds.),10-12 Dec. 2003
[19]蘇峻慶,“錄音資料中語者切割與分群方法之研究”,電機工程學系碩士班，國立清華大學，中華民國九十四年六月
[20]Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon “Spoken Language Processing” Printice Hall PTR, 2001
[21]G. Schwarz, “Estimation the dimension of a model”,The Annals of Statics, vol. 6,pp461-464,1978
[22]M. Tremblay and D. Wallach, “Comparison of parameter estimation methods for crop models,” Agronomie, vol. 24, pp. 351–365, 2004.
[23]Iosif Mporas, Todor Ganchev and Nikos Fakotakis“A Hybrid Architecture For Automatic Segmentation Of Speech Waveforms” , ICASSP 2008
[24]S.S. Cheng and H.M. Wang, "A Sequential Metric-based Audio Segmentation Method via The Bayesian Information Criterion," EuroSpeech 2003, pp. 945-948.
[25]KAI-FU LEE, HSIAO-WUEN HON“Speaker-Independent Phone Recognition Using Hidden Markov Models”IEEE TRANSACTIONS ON ACOUSTICS SPEECH ,AND SIGNAL PROCESSING. VOL. 37.NO. 11. NOVEMBER 1989
[26]孟昭宏“使用結構化支撐向量機之音素辨識”,電機資訊工程學研究所，國立台灣大學，中華民國九十八年六月
[27]Okko Johannes Rasanen, Unto Kalervo Laine, and Tomas Altosaar“An Improves Speech Segmentation Quality Measure: the R-value ”Interspeech 2009
[28]G. Aversano, A. Esposito, and M. Marinaro,“A new Text-Independent Method for Phoneme Segmentation”Proc.IEEE international Workshop on Circuits and Systems, vol.2, pp. 516-519, 2001
[29]A. Esposito and G. Aversano, “Text Independent Methods for Speech Segmentation”Nonlinear Speech Modeling, LNAI 3445,pp.261-290,2005

電子全文
摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文