下了好几个语音库,最后挑中这个142000个单词的,下载地址:http://www.verycd.com/topics/133276/
这个是用8W韦氏词典语音库和10W沪江网语音库混合来的,全部是wav格式,但在运用的时候有点小问题:尽管都是wav文件,但两套的音频格式是不同的——MW字典是纯波形wav,8bit,采样率11025Hz,而沪江那套mp3压缩的wav,32kbps,采样率16000Hz。要做单词表朗读必须先统一他们的格式,至少要统一采样率。因为比较信MW字典,故决定将沪江那套按11025Hz重采样并以8bit的wav保存。
于是拿出用了七八年的GoldWave,用批处理功能转文件格式。看上去速度蛮快的,一秒30-50个文件,142000个文件差不多1小时就能转完。就放那儿干其他事去了,结果过一会切回来一看,速度变的奇慢无比,一个文件就要卡死一秒,开始快后来慢这是怎么回事呢,想了一下最后把目光锁定在转换窗口中那个ListBox上,因为每转好一个都会在那里面写一个成功的消息(以前写其他程序文本框末尾添加到最后多了的话也会卡的厉害)。这可糟了,GoldWave作者没想到会有这么多文件。。搁一般人肯定没辙了,不过哥既懂编程也懂破坏别人的程序,于是OD挂上折腾半天,找到添加一行记录的那个函数的地址是4e170c(GoldWave是Pascal写的诶)。用WinHex打开GoldWave.exe,看到text段的RVA是600,定位到e0d0c偏移(4e170c-401000+600),写上C3(x86的retn),另存为。用这个patch过的GoldWave再转,一路顺风~
最后全部弄完是1.48G,RAR再压起来是640M,比原来下的稍稍多一点,不过格式总算统一了。
然后就是按单词表把各个单词连起来做成一个文件,这个原理上比较简单,对于8bit的音频,按每秒11025b的大小开byte数组,全部初始化为80(8bit音频无符号存储,80表示0电平),然后把单词的数据拷贝过去就行。我用来背单词的是网上流传的《考研大纲词汇44页完美打印版》,然后自己把熟悉的单词挖掉,整理出了29页,每页126个词。因为懒每页每页复制粘贴的专门的处理程序中,就直接把程序写成office宏了。。这里贴上窗体代码:
Option Explicit Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (dest As Any, src As Any, ByVal Length As Long) Private Declare Sub FillMemory Lib "kernel32" Alias "RtlFillMemory" (Destination As Any, ByVal Length As Long, ByVal Fill As Byte) Private Const WAVE_CHANNELS As Integer = 1 Private Const WAVE_SAMPLES As Long = 11025 Private Const WAVE_BITS As Integer = 8 Private Type RIFF_HEADER szRiffID As String * 4 dwRiffSize As Long szRiffFormat As String * 4 End Type Private Type RIFF_BLOCK_HEADER szBlockId As String * 4 dwBlockSize As Long End Type Private Type WAVE_FORMAT wFormatTag As Integer wChannels As Integer dwSamplesPerSec As Long dwAvgBytesPerSec As Long wBlockAlign As Integer wBitsPerSample As Integer End Type Dim OutputPath As String Dim VoicePath As String Dim PauseTime As Integer Function PageProc(ByVal PageNum As Integer) As Boolean PageProc = False Dim coll As New Collection Dim xml Set xml = CreateObject("Microsoft.XMLDOM") xml.loadXML ThisDocument.GoTo(wdGoToPage, , , PageNum).GoTo(wdGoToBookmark, , , "\page").xml Dim node For Each node In xml.SelectNodes("//w:tr/w:tc[0]") coll.Add node.Text Next Dim voice() As Byte ReDim voice(WAVE_BITS / 8 * WAVE_CHANNELS * WAVE_SAMPLES * PauseTime * coll.Count() - 1) FillMemory voice(0), UBound(voice) + 1, 128 Dim i As Integer For i = 0 To coll.Count() - 1 Dim b() As Byte Dim word As String Dim path As String word = coll.Item(i + 1) path = VoicePath & Left(word, 1) & "\" & word & ".wav" If GetWaveData(path, b) Then CopyMemory voice(WAVE_BITS / 8 * WAVE_CHANNELS * WAVE_SAMPLES * PauseTime * i), b(0), UBound(b) + 1 Else If MsgBox("无法找到语音:" & word & ",是否继续导出?", vbExclamation + vbYesNo) = vbNo Then Exit Function End If Next SaveWaveData OutputPath & Format(PageNum, "00") & ".wav", voice PageProc = True End Function Function GetWaveData(ByVal path As String, b() As Byte) As Boolean GetWaveData = False If Len(Dir(path)) = 0 Then Exit Function Open path For Binary As 1 Dim riff As RIFF_HEADER Get 1, , riff If riff.szRiffID = "RIFF" And riff.szRiffFormat = "WAVE" Then Dim block As RIFF_BLOCK_HEADER Do While Seek(1) - 1 < 8 + riff.dwRiffSize Get 1, , block If block.szBlockId = "fmt " Then Dim fmt As WAVE_FORMAT Get 1, , fmt If fmt.wChannels = WAVE_CHANNELS And fmt.dwSamplesPerSec = WAVE_SAMPLES And fmt.wBitsPerSample = WAVE_BITS Then Seek 1, Seek(1) + block.dwBlockSize - LenB(fmt) Else Exit Do End If End If If block.szBlockId = "data" Then ReDim b(block.dwBlockSize - 1) Get 1, , b GetWaveData = True Exit Do End If Loop End If Close 1 End Function Function SaveWaveData(ByVal path As String, b() As Byte) As Boolean SaveWaveData = False Open path For Binary As 1 Dim riff As RIFF_HEADER riff.szRiffID = "RIFF" riff.dwRiffSize = 0 riff.szRiffFormat = "WAVE" Put 1, , riff Dim block As RIFF_BLOCK_HEADER block.szBlockId = "fmt " block.dwBlockSize = 0 Dim fmt As WAVE_FORMAT block.dwBlockSize = LenB(fmt) Put 1, , block fmt.wFormatTag = 1 fmt.wChannels = WAVE_CHANNELS fmt.dwSamplesPerSec = WAVE_SAMPLES fmt.dwAvgBytesPerSec = WAVE_SAMPLES * WAVE_BITS / 8 fmt.wBlockAlign = WAVE_CHANNELS * WAVE_BITS / 8 fmt.wBitsPerSample = WAVE_BITS Put 1, , fmt block.szBlockId = "data" block.dwBlockSize = UBound(b) + 1 Put 1, , block Put 1, , b riff.dwRiffSize = Seek(1) - 1 - 8 Seek 1, 1 Put 1, , riff Close 1 SaveWaveData = True End Function Private Sub cmdStart_Click() VoicePath = txtVoicePath.Text PauseTime = txtPauseTime.Text OutputPath = ThisDocument.path & "\语音\" If Len(Dir(OutputPath)) = 0 Then MkDir OutputPath Dim page As Integer For page = txtPageStart.Text To txtPageEnd.Text Me.Caption = "正在导出第" & page & "页" DoEvents If PageProc(page) = False Then Exit For Next End Sub Private Sub UserForm_Initialize() txtPageEnd.Text = Selection.Information(wdNumberOfPagesInDocument) End Sub
最后。。最后直接把wav文件拷mp3里了,反正88k的码率本来就不大,再压mp3反而失真厉害(一定要压的话,建议32k码率,不能再低了)
以后如果编辑了单词表,随时可以再导出更新的录音,非常方便。有兴趣的同学可以照做下。