20150529基于VQ的语音识别

合集下载

基于VQ的说话人识别算法分析及实现

基于 VQ 的说话人识别算法分析及实现摘要：目前，使用网络和计算机进行信息处理的技术已经越来越成熟了。

而语音是人们常用的一种信息形式。

因此研究人员开始关注语音识别技术来对说话人身份识别进行一个新角度、更广泛地认识，并取得一定成果。

需要解决的是如何从大量数据中提取特征参数和进行分析判断出说话人是否为己知人的问题。

说话人识别是在机器系统中对说话者的语音进行处理，并将其转换为一个可用于分析的信号。

对信号进行处理时首先要提取有用信号并根据其特征分类然后再将识别到的数据以适当方式输入计算机系统内，进行分析与计算得到输出结果，通过系统内部建立的语音识别的数据库得出用户发出的话语权值，然后判断平均失真测度，从而判断说话人是谁或者判断说话人是不是所声称的人，得出正确答案后返回给主机。

关键词：说话人识别；VQ算法；判断第1章基于VQ的说话人识别算法1.1 VQ的基本原理VQ实际上是一个近似[6]。

这个想法类似于“舍入”，两者都使用最接近数字的整数来近似一个数字。

小于-2的数约为-3，介于-2和0之间的数约为-1，介于0和2之间的数约为1，大于2的数约为3。

通过这种方式，任何数字都可以是-3、-1、1或3这四个数字中的一个。

我们只需要两位来编码这四个数字。

这是一维的，2比特的VQ，它的速率是每维2比特[7]。

1.2说话人识别的系统结构在语音识别决策阶段，需要从语音信号中提取一定的功能模型参数，并与训练阶段建立的模型数据进行比较，根据一定的模型分类算法得到具体的决策分析结果[14]。

将从识别系统中提取的功能参数应用于系统中每个人的模型，根据决策选择最新模型对应的说话人。

将输入语音的功能参数与预期的说话人模型进行了比较。

如果两者之间的距离小于某一阈值，则进行确认;另一种情况，被拒绝。

图1-4说话人识别流程1.3说话人识别中的特征提取Mel参数（MFCC）倒谱特征是用于说话人识别个性最有效的特征之一，实验表明，在大多数情况下，MFCC优于其他倒谱系数，提取和计算过程如下：1）对原始语言信号（n）进行预加重、分帧和加窗处理，得到每个语言帧的x（n）时域信号，然后用离散傅里叶变换（DFT）得到线性x（k）谱，语音信号的DFT为：其中是传入的语音信号，N是傅里叶变换的点数。

基于VQ的语音识别技术研究

基于VQ的语音识别技术研究
吕晶晶;陈娟;张培;马艳娥
【期刊名称】《伺服控制》
【年(卷),期】2011(000)004
【摘要】基于VQ的语音识别技术在孤立词语的语音识别系统中有较高的识别能力。

本文采用LBG算法设计每个待识别语音的码本,码本是从该说话人的训练序列中提取MFCC特征矢量聚类而生成。

通过计算其所有帧的VQ失真累积值,把具有最小累积失真值的输入语音信号对应的参考类别作为识别判定的结果,从而识别特定人的语音。

与典型的HMM和NN识别算法相比,该方法复杂度低,系统资源消耗少而识别率高,适合在手机、PDA等资源有限的系统中使用。

【总页数】3页(P68-69,36)
【作者】吕晶晶;陈娟;张培;马艳娥
【作者单位】中北大学电子测试技术国家重点实验室
【正文语种】中文
【中图分类】TN912.34
【相关文献】
1.基于VQ与HMM相结合的汉语数字语音识别 [J], 杨建华;赵力
2.基于分段模糊聚类算法的VQ-HMM语音识别模型参数估计 [J], 赵力;邹采荣;吴镇扬
3.基于遗传算法的VQ码本设计及语音识别 [J], 王社国;魏艳娜
4.基于VQ和HMM的语音识别系统的性能比较 [J], 曾昭才;赵力;邹采荣
5.基于VQ的高效汉语语音识别方法的比较性研究(英文) [J], 谢湘;赵军辉;匡镜明因版权原因，仅展示原文概要，查看原文内容请购买。

基于AutoEncoder DBN-VQ的说话人识别系统

基于AutoEncoder DBN-VQ的说话人识别系统刘俊坤;李燕萍;凌云志【摘要】基于矢量量化的说话人识别算法,通过描述说话人语音特征的不同分布进行说话人识别.在说话人数量较多,训练语音时长较短时,系统识别率不高.模型训练一般在纯净语音条件下进行,在实际有噪声环境下进行识别时,系统性能会急剧恶化.为改善系统识别性能,提出一种基于自动编码深度置信网络与矢量量化结合的说话人识别方法.该方法采用深度置信网络对说话人语音数据进行学习和挖掘,在语音时长较短时可以更好地捕获说话人的个性特征;同时采用自动编码器有去噪声的特点,构造自动编码深度置信网络,使网络模型可以对有噪语音数据进行有效地噪声过滤.实验结果证明,该方法在说话人训练语音时长有限时,以及对说话人有噪语音进行识别时,系统识别率都有很大提升.%The speaker recognition system using vector quantization works by describing the different characteristics of the speaker's speech features.When the number of speakers are large and training speech length is short,the recognition rate of the system is not high.For the model is usually trained under the condition of pure speech,the performance of the system will be poor when it is used in the actual environ-ment.In order to improve the recognition performance of the system,we propose a method of speech recognition based on the combination of AutoEncoder deep belief network and vector quantization.It adopts the deep belief network to model and learn for speech data,so speaker's personality characteristics in speech can be better captured when the speech length is short.In the meantime,it structures AutoEncoder deep belief network,which is effective on noise filtering fornoisy speech data.The experiment show that the proposed method can improve the recognition rate greatly when there is only a small amount of speaker training data and speech is noisy.【期刊名称】《计算机技术与发展》【年(卷),期】2018(028)002【总页数】5页(P45-49)【关键词】说话人识别;深度置信网络;自动编码器;矢量量化【作者】刘俊坤;李燕萍;凌云志【作者单位】南京邮电大学通信与信息工程学院,江苏南京210003;南京邮电大学通信与信息工程学院,江苏南京210003;南京邮电大学通信与信息工程学院,江苏南京210003【正文语种】中文【中图分类】TP3020 引言说话人识别(speaker recognition,SR)，又称话者识别[1],是利用说话人语音中的个性特征进行身份鉴定的一种认证技术。

基于虚拟仪器技术的VQ-SVM并行说话人识别系统

基于虚拟仪器技术的VQ-SVM并行说话人识别系统刘祥楼;吴香艳;高丙坤【摘要】说话人识别混合方法是目前研究的热点.基于虚拟仪器技术并融合说话人识别技术,提出矢量量化和支持向量机方法结合,依托MATLAB实现运算,由LabVIEW以多任务管理和调用MATLAB来实现说话人并行识别处理.经自建小样本语料库仿真实验,结果表明:系统识别率98.54％、误识率5.28％、识别时间0.25 s,较单一矢量量化和支持向量机方法识别率分别提高了3.66％和1.16％,误识率分别降低了6.01％和4.43％.随着样本数的增多,矢量量化方法识别率呈上升趋势,而支持向量机方法识别率呈下降趋势.由此可见:两种方法优势互补实现并行识别可提高系统主体性能.%Speaker recognition mixed methods are a hot research at present Based on virtual instrument technique and integration of speech recognition technique, advanced the method of mixed the Vector Quantization and the Support Vector Machine, the arithmetic supported by MATLAB, the Lab VIEW multi-tasks managed and called MATLAB to achieve the speaker' s parallel recognition processing. Using the private small-seal corpus speaker recognition simulation and test, the results indicated that: the recognition ratio of the parallel system is 98. 54% ,the recognition error is 5. 28% ,the testing time is 0. 25 s, comparative the single VQ method and single SVM method the recognition ratio is separatly increased 3. 66% and 1. 16% , the recognition error is separatly reduced 6. 01% and 4.43% ; with the increase of the numbers of samples, the recognition ratio of VQ method is upward trend, while recognition ratio of the SVM method is downward trend. It is now clear that as twomethods having complementary advantages, parallel recognition and can improve the systematic major capabilities.【期刊名称】《科学技术与工程》【年(卷),期】2012(012)011【总页数】4页(P2590-2593)【关键词】说话人识别;矢量量化;支持向量机;并行识别系统【作者】刘祥楼;吴香艳;高丙坤【作者单位】东北石油大学,大庆163318;黑龙江省高校校企共建测试计量技术及仪器仪表工程研发中心,大庆163318;东北石油大学,大庆163318;黑龙江省高校校企共建测试计量技术及仪器仪表工程研发中心,大庆163318;东北石油大学,大庆163318【正文语种】中文【中图分类】TP391.42说话人识别方法主要包括矢量量化方法、概率统计方法、判别分类器方法等。

VQ算法语音识别外文翻译文献

文献信息：文献标题：Enhanced VQ-based Algorithms for Speech Independent Speaker Identification（增强的基于VQ算法的说话人语音识别）国外作者： Ningping Fan，Justinian Rosca文献出处：《Audio-and Video-based Biometrie Person Authentication, International Conference, Avbpa,Guildford, Uk, June》, 2003, 2688:470-477 字数统计：英文1869单词，9708字符；中文3008汉字外文文献：Enhanced VQ-based Algorithms for Speech IndependentSpeaker IdentificationAbstract Weighted distance measure and discriminative training are two different approaches to enhance VQ-based solutions for speaker identification. To account for varying importance of the LPC coefficients in SV, the so-called partition normalized distance measure successfully used normalized feature components. This paper introduces an alternative, called heuristic weighted distance, to lift up higher order MFCC feature vector components using a linear formula. Then it proposes two new algorithms combining the heuristic weighting and the partition normalized distance measure with group vector quantization discriminative training to take advantage of both approaches. Experiments using the TIMIT corpus suggest that the new combined approach is superior to current VQ-based solutions (50% error reduction). It also outperforms the Gaussian Mixture Model using the Wavelet features tested in a similar setting.1.IntroductionVector quantization (VQ) based classification algorithms play an important rolein speech independent speaker identification (SI) systems. Although in baseline form, the VQ-based solution is less accurate than the Gaussian Mixture Model (GMM) , it offers simplicity in computation. For a large database of over hundreds or thousands of speakers, both accuracy and speed are important issues. Here we discuss VQ enhancements aimed at accuracy and fast computation.1.1 VQ Based Speaker Identification SystemFig. 1 shows the VQ based speaker identification system. It contains an offline training sub-system to produce VQ codebooks and an online testing sub-system to generate identification decision. Both sub-systems contain a preprocessing or feature extraction module to convert an audio utterance into a set of feature vectors. Features of interest in the recent literatures include the Mel-frequency cepstral coefficients (MFCC), the Line spectra pairs (LSP), the Wavelet packet parameter (WPP), or PCA and ICA features]. Although the WPP and ICA have been shown to offer advantages, we used MFCC in this paper to focus our attention on other modules of the system.Fig. 1. A VQ-based speaker identification system features an online sub-system for identifying testing audio utterance, and an offline training sub-system, which uses training audio utterance to generate a codebook for each speaker in the database.A VQ codebook normally consists of centroids of partitions over spea ker’s feature vector space. The effects to SI by different partition clustering algorithms, such as the LBG and the RLS, have been studied. The average error or distortion of the feature vectors }1,{T t X t ≤≤ of length T with a speaker k codebook is given by)],([1,11min j k t Tt s j k C X d T e ∑=≤≤= L k ≤≤1（1） d(.,.) is a distance function between two vectors. T D j k j k C c C j k ),...,(,,1,,,=is the j code of dimension D. S is the codebook size. L is the total number of speakers in the database. The baseline VQ algorithm of SI simply uses the LBG to generate codebooks and the square of the Euclidean distance as the d(.,.) .Many improvements to the baseline VQ algorithm have been published. Among them, there are two independent approaches: (1) choose a weighted distance function, such as the F-ratio and IHM weights, the Partition Normalized Distance Measure (PNDM) , and the Bhattacharyya Distance; (2) explore discrimination power of inter-speaker characteristics using the entire set of speakers, such as the Group Vector Quantization (GVQ) discriminative training, and the Speaker Discriminative Weighting. Experimentally we have found that PNDM and GVQ are two very effective methods in each of the groups respectively.1.2 Review of Partition Normalized Distance MeasureThe Partition Normalized Distance Measure is defined as the square of the weighted Euclidean distance.2,,1,,,)(),(i j k i D i i j k j k p c x w C X d -=∑=（2） The weighting coefficients are determined by minimizing the average error of training utterances of all the speakers, subject to the constraint that the geometric mean of the weights for each partition is equal to 1.T D j k j k j k x x X ),...,(,,1,,,= be a random training feature vector of speaker k, which is assigned to partition j via minimization process in Equation (1). It has mean and variance vectors:)]()[(][,,,,,,,j k j k T j k j k j k j k j k C X C X E V X E C --== （3）The constrained optimization criterion to be minimized in order to derive the weights is∑∑∑∑∑∑∑∑------------∏+⋅=-∏+-⋅=-∏+⋅=L k S j Di i j k D i j k i j k L k S j D i i j k D i j k i j k i j k i j k L k S j i j k D i j k j k j k p w w S L w c x E w S L w C X d E S L 111,,1,,,111,,1,2,,,,,,,11,,1,,,})1({1})1(])[({1)}1()],([{1λλλξ（4） Where L is the number of speakers, and S is the codebook size. Letting0,,=∂∂i j k w ξ and 0,=∂∂j k λξ （5） We haveD i j k D i j k v 1,,1,⎪⎭⎫ ⎝⎛∏=-λ and ij k jk i j k v w ,,,,,λ= （6）Where sub-script i is the feature vector component index, k and j are speaker andpartition indices respectively. Because k and j are in both sides of the equations, the weights are only dependent on the data from one partition of one speaker.1.3 Review of Group Vector QuantizationDiscriminative training is to use the data of all the speakers to train the codebook, so that it can achieve more accurate identification results by exploring the inter-speaker differences. The GVQ training algorithm is described as follows.Group Vector Quantization Algorithm:（1）Randomly choose a speaker j.（2）Select N vectors }1,{,N t X t j ≤≤（3）calculate error for all the codebooks.If following conditions are satisfied go to （4）a ）}{min k k i e e ∀= ，but j i ≠;b ）W e e e j ij <-，where W is a window size;Else go to （5）（4）for each }1,{,N t X t j ≤≤t j m j m j X C C ,,,)1(⋅+⋅-⇐αα where )},({min arg ,,,,l j t j C m j C X d C lt j ∀=t j n i n i X C C ,,,)1(⋅-⋅+⇐αα )},({min arg ,,,,n i t j C n i C X d C ll i ∀=（5）for each }1,{,N t X t j ≤≤，t j m j m j X C C ,,,)1(⋅+⋅-⇐εαα ，where )},({min arg ,,,,l j t j C m j C X d C ll j ∀=2.EnhancementsWe propose the following steps to further enhance the VQ based solution: (1) a Heuristic Weighted Distance (HWD), (2) combination of HWD and GVQ, and (3) combination of PNDM and GVQ.2.1 Heuristic Weighted DistanceThe PNDM weights are inversely proportional to partition variances of the feature components, as shown in Equation (6). It has been shown that variances of cepstral 21 . Clearly 11,1-≤≤>+D i v v i i where i is the vector element index, which reflects frequency band. The higher the index, the less feature value and its variance.We considered a Heuristic Weighted Distance as2,,1,)(),(),(i j k i D i i j k h c x D S w C X d -⋅=∑= （7）The weights are calculated by)1(),(1),(-⋅+=i D S c D S w i （8）Where c (S , D) is a function of both the codebook size S and the feature vector dimension D. For a given codebook, S and D are fixed, and thus c (S , D) is a constant. The value of c (S , D) is estimated experimentally by performing an exhaustive search to achieve the maximum identification rate in a given sample test dataset.2.2 Combination of HWD and GVQCombination of the HWD and the GVQ is achieved by simply replacing the original square of the Euclidean distance with the HWD Equation (7), and to adjust the GVQ updating parameter α whenever needed.2.3 Combination of PNDM and GVQTo combine PNDM with the GVQ requires a slight more work, because the GVQ alters the partition and thus its component variance. We have used the following algorithm to overcome this problem.Algorithm to Combine PNDM with the GVQ Discriminative Training:（1）Use LBG algorithm to generate initial LBG codebooks;（2）Calculate PNDM weights using the LBG codebooks, and produce PNDM weighted LBG codebooks, which are LBG codebooks appended with the PNDM weights;（3）Perform GVQ training with PNDM distance function, and generate the initial PNDM+GVQ codebooks by replacing the LBG codes with the GVQ codes;（4）Recalculate PNDM weights using the PNDM+GVQ codebooks, and produce the final PNDM+GVQ codebooks by replacing the old PNDM weights with the new ones.3.Experimental Comparison of VQ-based Algorithms3.1 Testing Data and Procedures168 speakers in TEST section of the TIMIT corpus are used for SI experiment, and 190 speakers from DR1, DR2, DR3 of TRAIN section are used for estimating the c(S,D) parameter. Each speaker has 10 good quality recordings of 16 KHz, 16bits/sample, and stored as WA VE files in NIST format. Two of them, SA1.WA V and SA2.WA V, are used for testing, and the rest for training codebooks. We did not perform silence removal on WA VE files, so that others could reproduce the environment with no additional complication of V AD algorithms and their parameters.A MFCC program converts all the WA VE files in a directory into one feature vector file, in which all the feature vectors are indexed with its speaker and recording. For each value of feature vector dimension, D=30, 40, 50, 60, 70, 80, 90, one training file and one testing file are created. They are used by all the algorithms to train codebooks of size S=16, 32, 64, and to perform identification test, respectively.The MFCC feature vectors are calculated as follows: 1) divide the entireutterance into blocks of size 512 samples with 256 overlapping; 2) perform pre-emphasize filtering with coefficient 0.97; 3) multiply with Hamming window, and perform short-time FFT; 4) apply the standard mel-frequency triangular filter banks to the square of magnitude of FFT; 5) apply the logarithm to the sum of all the outputs of each individual filter; 6) apply DCT on the entire set of data resulted from all filters; 7) drop the zero coefficient, to produce the cepstral coefficients; 8) after all the blocks being processed, calculate the mean over the entire time duration and subtract it from the cepstral coefficients; 9) calculate the 1st order time derivatives of cepstral coefficients, and concatenate them after the cepstral coefficients, to form a feature vector. For example, a filter-bank of size 16 will produce 30 dimensional feature vectors.Due to project time constraint, the HWD parameter c(S, D) was estimated at S=16, 32, 64, D=40, 80, so that it achieves the highest identification rate using the 190 speakers dataset of TRAIN section. For other values of S and D, it was interpolated or extrapolated from optimized samples. The results are shown in the bottom section of Table 1. The identification experiment was then performed using the 168 speakers dataset from TEST section. We have used different datasets for c(S, D) estimation, codebooks training, and identification rate testing, to produce objective results.3.2 Testing ResultsTable 1 shows identification rates for various algorithms. The value of the learning parameter a is displayed after the GVQ title, and the parameter c(S, D) is displayed at bottom section. Combination of the algorithms are indicated by a “+” sign between their name abbreviations.Table 1. Identification rates (%) and parameters for various VQ-based algorithms tested, where the 1st row is the feature vector dimension D, and the 1st column is the codebook size S.The baseline algorithm performs poorest as expected. The plain HWD, PNDM, and GVQ all show enhancements over the baseline. Combination methods further enhanced the plain methods. The PNDM+GVQ performs best when codebook size is 16 or 32, while the HWD+GVQ is better at codebook size 64. The highest score of the test is 99.7%, and corresponds to a single miss in 336 utterances of 168 speakers. It outperforms the reported rate 98.4% by using the GMM with WPP features.4.ConclusionA new approach combining the weighted distance measure and the discriminative training is proposed to enhance VQ-based solutions for speech independent speaker identification. An alternative heuristic weighted distance measure was explored, which lifts up higher order MFCC feature vector components using a linear formula. Two new algorithms combining the heuristic weighted distance and the partitionnormalize distance with the group vector quantization discriminative training were developed, which gathers the power of both the weighted distance measure and the discriminative training. Experiments showed that the proposed methods outperform the corresponding single approach VQ-based algorithms, and even more powerful GMM based solutions. Further research on heuristic weighted distance is being conducted particularly for small codebook size.中文译文：增强的基于VQ算法的说话人语音识别摘要在提高基于VQ的说话人识别的解决方案中，加权距离测度和区分性训练是两种不同的方法。

基于VQ反模仿说话人识别再确认系统算法研究

矢量量化的基本原理是：将若干个标量数据组成一个矢量（者是从一帧语音数据中提取的特征矢量）在多维空间或
人们对信息安全的认识有了进一步的提高，这对信息安全保
障工作提出迫切要求。语音模仿技术的出现给当前的信息安全带来威胁，有必要开展反模仿技术的研究，反模仿技术对保障信息安全的作用是巨大的，它能更好的保障信息和命令的安全性。当前说话人辨认系统的性能已经非常好，而说话
周鸣，景新幸
（．１桂林无线电一厂，广西桂林５１０；２桂林电子科技大学，广西桂林５１０４０４．４０４）
【摘要】人的语音虽然具有独立性，但也是可以被模仿。语音模仿技术的出现就给当前的信息安全带来威胁，这就使得
有必要加强当前说话人识别系的安全性，开展反模仿技术的研究。文章介绍了统说话人识别方面的基本概念、原理以及当前的
人确认系统的性能则相对较差。反模仿说话人再确认系统就
是利用说话人辨认系统的性能优于相应的说话人确认系统的
给予整体量化，从而可以在信息量损失较小的情况下压缩数据量。矢量量化有效地应用了矢量中各元素之间的相关性，因此可以比标量量化有更好的压缩效果。利用矢量量化技术
件下，使得此畸变的统计平均值Ｄ＝Ｅ［（Ｙ达到最小。ｄＸ，１

基于VQ-MAP与LS_SVM融合的说话人识别系统

说话人识别是从说话人的一段语音中提取出说话人的个性特征，通过对这些个性特征的分析和识别，从而达到对说话人进行辨认或者确认的目的。

它可以分为两个范畴：说话人辨认和说话人确认。

说话人辨认是辨认出待识别的语音是来自待考察的个人中的哪一个；而说话人确认则是特定的参考模型和待识别模式之间的比较，系统只做出“是”或“不是”的二元判决[1]。

Ville Hautamaki[2]等人提出了最大后验概率矢量量化(VQ-MAP)过程，它可以看作是GMM-MAP的一种特殊形式；Suykens等人[3]提出了最小二乘支持向量机LS-SVM的概念，而志平等人[4]将最小二乘向量机应用在说话人识别系统中，并取得了较好的效果。

VQ-MAP过程首先只依照均值对通用背景模型UBM(Universal Bakground Model)进行聚类,然后应用VQ-MAP过程来更新自适应参数,由此训练语音未覆盖到的部分就可以用UBM中说话人无关的特征分布近似,以减小训练语音太短带来的影响。

将得到的自适应参数集作为最小二乘向量机的训练样本，在说话人识别中进行应用，取得了较好的效果。

本文介绍了VQ-MAP和LS-SVM融合的说话人识别系统，并在说话人识别中进行了应用。

1VQ-MAP过程在说话人识别中，可以使用训练集中的发音数据对UBM进行参数自适应来得到发音人的模型。

高斯混合模型在最大后验概率自适应(GMM-MAP)过程中需要更新3种参数：权值、均值向量和协方差矩阵。

VQ-MAP 过程是GMM-MAP的一种特殊形式，它只依照均值向量来得到新的自适应说话人模型。

依照均值向量为参数用K均值聚类算法对UBM进行聚类,从而得到一组均值核基于VQ-MAP与LS-SVM融合的说话人识别系统*展领，景新幸（桂林电子科技大学信息与通信学院，广西桂林541004）摘要：传统的最小二乘支持向量机(LS-SVM)使用特征向量作为训练样本,在说话人识别系统中应用时区分性不够明显。

基于改进VQ算法的说话人识别

基于改进VQ算法的说话人识别
马静;李国勇;王珺
【期刊名称】《机械工程与自动化》
【年(卷),期】2008(000)004
【摘要】介绍了基于改进矢量量化(VQ)方法的说话人识别系统.该系统采用了能够反映人对语音感知特性的Mel频率倒谱系数(MFCC)作为特征参数,对VQ训练时码书的形成算法作了一些改进,并提出了一种去空胞腔分裂法的优化算法.实验证明,此优化算法减少了矢量量化失真,同时改善了量化的性能.
【总页数】3页(P74-75,78)
【作者】马静;李国勇;王珺
【作者单位】太原理工大学,信息工程学院,山西,太原,030024;太原理工大学,信息工程学院,山西,太原,030024;空军驻京丰地区代表室,北京,100074
【正文语种】中文
【中图分类】TP391
【相关文献】
1.改进VQ算法在说话人识别中的应用 [J], 张庆芳;赵鹤鸣
2.一种改进的VQ算法在说话人识别上的应用 [J], 余菲;马道钧;李鹏
3.改进的基于VQ+WNN的说话人识别研究 [J], 张鹏;王成儒
4.基于改进VQ算法的文本无关的说话人识别 [J], 张庆芳;赵鹤鸣
5.基于改进后的VQ说话人识别系统研究 [J], 罗利;张友纯
因版权原因，仅展示原文概要，查看原文内容请购买。

基于遗传算法的VQ码本设计及语音识别

Ｅ－ａｌｗｅｙｎｍｉ：ｉａｎａ２０＠１３ｃｎ０５．ｏ６
一
ＷＡＮＧｈ — ｕＷｅＳｅｇｏ．ｉＹａ－ｎ．ｄｂｏｅｉｎｍｅｈｄｂｓｄｏｅｅｉｌｏｉｎａＣｏｅｏｋｄｓｇｔｏａｅｎｇｎｔａｇｒｔｃｈｍｏｐｅｈｅｏｎｔｎ．ｍｐｅ－ｆｒｓｅｃｒｃｇｉｏＣｏｕｔｒＥｎｉ
的设计影响很大。考虑到遗传算法（Ａ）一种具有全局优化搜索能力的算法，出了Ｇ和ＬＧ算法相结合的Ｇ — Ｇ是提ＡＢＡＬ算法来优化码本，善了码本质量，改并将其应用于汉语连续数字语音识别中，实验结果表明了Ｇ — ＡＬ算法的有效性。关键词：传算法；量量化；ＡＬ语音识别遗矢Ｇ—；
维普资讯
ＣｍｕｅｎｉｅｒｇａｄＡｐｉｔｎｏｐｔｒＥｇｎｅｉｎｐｌａｉｓ计算机工程与应用ｎｃｏ
２０，３１）７０７４（７１
基于遗传算法的Ｖ码本设计及语音识别Ｑ
文章编号：０２８３（０７１ — ０１０文献标识码：中图分类号：Ｐ９＿１０ — ３ｌ２０）７０７ — ３ＡＴ３ｌ２４ｈＲｃｇｉｎ主要指让机器听懂人ＳＳｅｃｅｏｎｔ）ｉｏ
Ｋｅｏｄ：Ｇｎｔｌｏｔｍ；ｅｔｕｎｉｔｎＧ — ；ｅｃｅｏｎｔｎｙｗｒｓｅｅｃＡｇｒｈＶｃｏＱａｔａｉ；ＡＬｓｅｈｒｃｇｉｏｉｉｒｚｏｐｉ摘要：矢量量化（Ｑ）码本设计过程中，在Ｖ的经典的ＬＧ算法收敛速度快，极易陷入局部最优，Ｂ但且初始码本的生成对最佳码本

基于VQ的说话人识别系统

基于ＶＱ的说话人识别系统作者：丁艳伟戴玉刚来源：《电脑知识与技术·学术交流》2008年第32期摘要：该文介绍了一种基于矢量量化(VQ)方法的一个说话人识别算法。

基于矢量量化的说话人识别，因其运算过程简单等特点，在说话人识别领域有着广泛的应用。

用不同语音参数进行实验，实验表明应用矢量量化的方法用在说话人识别中是一种有效方法。

关键词：说话人识别；VQ；码本；LPCC；MFCC中图分类号：TP18文献标识码：A文章编号：1009-3044(2008)32-1181-03Speaker Recognition System Based on VQDING Yan-wei, DAI Yu-gang(Northwest University for Nationalities, Lanzhou 730030, China)Abstract: This paper introduces a kind of arithmetic of speaker recognition based on VQ algorithm. Because of its features which include simple operation procedure and so on, speaker recognition based on VQ is widely applied to the field of speaker recognition. The experiment of using different phonetic parameter indicates that the VQ algorithm is an effective method in speaker recognition.Key words: speaker recognition; VQ; code book; LPCC; MFCC说话人识别(Speaker Recognition)，又称声纹识别(Voiceprint Recognition)，是由计算机利用语音波形中所包含的反应特定说话人生理和行为特征的语音特征参数来自动识别说话人身份的技术。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

语音信号处理研讨
040120** ☆☆☆
savylu@
PPT 展示要点——实验
改变每个孤立词的聚类中心个数kmeans_vq.m
男女声偏向，频率高低，平翘舌，音调
改变训练语音数目，
Rfft.m 实数据的FFT %RFFT FFT of real data Y=(X,N)
Rdct.m 实数据的离散余弦变换 %RDCT Discrete cosine transform of real data Y=(X,N)
Melceps.m 计算信号的梅尔倒谱 %MELCEPST Calculate the mel cepstrum of a signal C=(S,FS,W,NC,P,N,INC,FL,FH)
Melbankm.m 决定 %MELBANKM determine matrix for a mel-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)
Enframe.m 信号分帧（重叠） %ENFRAME split signal up into (overlapping) frames: one per row. F=(X,WIN,INC)
Disteu.m % DISTEU 两个矩阵的列之间的欧氏距离
ABSE4.m %基于自适应子带频谱熵的稳健性语音端点检测
Statremovenan.m
Shibie.m
Caiji.m %实时采集语音
kmeans_vq.m %将每个孤立词的聚类中心保存，用于识别
MFCC_TIwuzaosheng.m ??????????
Recognize.m语音识别。