下了好几个语音库,最后挑中这个142000个单词的,下载地址:http://www.verycd.com/topics/133276/
这个是用8W韦氏词典语音库和10W沪江网语音库混合来的,全部是wav格式,但在运用的时候有点小问题:尽管都是wav文件,但两套的音频格式是不同的——MW字典是纯波形wav,8bit,采样率11025Hz,而沪江那套mp3压缩的wav,32kbps,采样率16000Hz。要做单词表朗读必须先统一他们的格式,至少要统一采样率。因为比较信MW字典,故决定将沪江那套按11025Hz重采样并以8bit的wav保存。
于是拿出用了七八年的GoldWave,用批处理功能转文件格式。看上去速度蛮快的,一秒30-50个文件,142000个文件差不多1小时就能转完。就放那儿干其他事去了,结果过一会切回来一看,速度变的奇慢无比,一个文件就要卡死一秒,开始快后来慢这是怎么回事呢,想了一下最后把目光锁定在转换窗口中那个ListBox上,因为每转好一个都会在那里面写一个成功的消息(以前写其他程序文本框末尾添加到最后多了的话也会卡的厉害)。这可糟了,GoldWave作者没想到会有这么多文件。。搁一般人肯定没辙了,不过哥既懂编程也懂破坏别人的程序,于是OD挂上折腾半天,找到添加一行记录的那个函数的地址是4e170c(GoldWave是Pascal写的诶)。用WinHex打开GoldWave.exe,看到text段的RVA是600,定位到e0d0c偏移(4e170c-401000+600),写上C3(x86的retn),另存为。用这个patch过的GoldWave再转,一路顺风~
最后全部弄完是1.48G,RAR再压起来是640M,比原来下的稍稍多一点,不过格式总算统一了。
然后就是按单词表把各个单词连起来做成一个文件,这个原理上比较简单,对于8bit的音频,按每秒11025b的大小开byte数组,全部初始化为80(8bit音频无符号存储,80表示0电平),然后把单词的数据拷贝过去就行。我用来背单词的是网上流传的《考研大纲词汇44页完美打印版》,然后自己把熟悉的单词挖掉,整理出了29页,每页126个词。因为懒每页每页复制粘贴的专门的处理程序中,就直接把程序写成office宏了。。这里贴上窗体代码:
Option Explicit
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (dest As Any, src As Any, ByVal Length As Long)
Private Declare Sub FillMemory Lib "kernel32" Alias "RtlFillMemory" (Destination As Any, ByVal Length As Long, ByVal Fill As Byte)
Private Const WAVE_CHANNELS As Integer = 1
Private Const WAVE_SAMPLES As Long = 11025
Private Const WAVE_BITS As Integer = 8
Private Type RIFF_HEADER
szRiffID As String * 4
dwRiffSize As Long
szRiffFormat As String * 4
End Type
Private Type RIFF_BLOCK_HEADER
szBlockId As String * 4
dwBlockSize As Long
End Type
Private Type WAVE_FORMAT
wFormatTag As Integer
wChannels As Integer
dwSamplesPerSec As Long
dwAvgBytesPerSec As Long
wBlockAlign As Integer
wBitsPerSample As Integer
End Type
Dim OutputPath As String
Dim VoicePath As String
Dim PauseTime As Integer
Function PageProc(ByVal PageNum As Integer) As Boolean
PageProc = False
Dim coll As New Collection
Dim xml
Set xml = CreateObject("Microsoft.XMLDOM")
xml.loadXML ThisDocument.GoTo(wdGoToPage, , , PageNum).GoTo(wdGoToBookmark, , , "\page").xml
Dim node
For Each node In xml.SelectNodes("//w:tr/w:tc[0]")
coll.Add node.Text
Next
Dim voice() As Byte
ReDim voice(WAVE_BITS / 8 * WAVE_CHANNELS * WAVE_SAMPLES * PauseTime * coll.Count() - 1)
FillMemory voice(0), UBound(voice) + 1, 128
Dim i As Integer
For i = 0 To coll.Count() - 1
Dim b() As Byte
Dim word As String
Dim path As String
word = coll.Item(i + 1)
path = VoicePath & Left(word, 1) & "\" & word & ".wav"
If GetWaveData(path, b) Then
CopyMemory voice(WAVE_BITS / 8 * WAVE_CHANNELS * WAVE_SAMPLES * PauseTime * i), b(0), UBound(b) + 1
Else
If MsgBox("无法找到语音:" & word & ",是否继续导出?", vbExclamation + vbYesNo) = vbNo Then Exit Function
End If
Next
SaveWaveData OutputPath & Format(PageNum, "00") & ".wav", voice
PageProc = True
End Function
Function GetWaveData(ByVal path As String, b() As Byte) As Boolean
GetWaveData = False
If Len(Dir(path)) = 0 Then Exit Function
Open path For Binary As 1
Dim riff As RIFF_HEADER
Get 1, , riff
If riff.szRiffID = "RIFF" And riff.szRiffFormat = "WAVE" Then
Dim block As RIFF_BLOCK_HEADER
Do While Seek(1) - 1 < 8 + riff.dwRiffSize
Get 1, , block
If block.szBlockId = "fmt " Then
Dim fmt As WAVE_FORMAT
Get 1, , fmt
If fmt.wChannels = WAVE_CHANNELS And fmt.dwSamplesPerSec = WAVE_SAMPLES And fmt.wBitsPerSample = WAVE_BITS Then
Seek 1, Seek(1) + block.dwBlockSize - LenB(fmt)
Else
Exit Do
End If
End If
If block.szBlockId = "data" Then
ReDim b(block.dwBlockSize - 1)
Get 1, , b
GetWaveData = True
Exit Do
End If
Loop
End If
Close 1
End Function
Function SaveWaveData(ByVal path As String, b() As Byte) As Boolean
SaveWaveData = False
Open path For Binary As 1
Dim riff As RIFF_HEADER
riff.szRiffID = "RIFF"
riff.dwRiffSize = 0
riff.szRiffFormat = "WAVE"
Put 1, , riff
Dim block As RIFF_BLOCK_HEADER
block.szBlockId = "fmt "
block.dwBlockSize = 0
Dim fmt As WAVE_FORMAT
block.dwBlockSize = LenB(fmt)
Put 1, , block
fmt.wFormatTag = 1
fmt.wChannels = WAVE_CHANNELS
fmt.dwSamplesPerSec = WAVE_SAMPLES
fmt.dwAvgBytesPerSec = WAVE_SAMPLES * WAVE_BITS / 8
fmt.wBlockAlign = WAVE_CHANNELS * WAVE_BITS / 8
fmt.wBitsPerSample = WAVE_BITS
Put 1, , fmt
block.szBlockId = "data"
block.dwBlockSize = UBound(b) + 1
Put 1, , block
Put 1, , b
riff.dwRiffSize = Seek(1) - 1 - 8
Seek 1, 1
Put 1, , riff
Close 1
SaveWaveData = True
End Function
Private Sub cmdStart_Click()
VoicePath = txtVoicePath.Text
PauseTime = txtPauseTime.Text
OutputPath = ThisDocument.path & "\语音\"
If Len(Dir(OutputPath)) = 0 Then MkDir OutputPath
Dim page As Integer
For page = txtPageStart.Text To txtPageEnd.Text
Me.Caption = "正在导出第" & page & "页"
DoEvents
If PageProc(page) = False Then Exit For
Next
End Sub
Private Sub UserForm_Initialize()
txtPageEnd.Text = Selection.Information(wdNumberOfPagesInDocument)
End Sub
最后。。最后直接把wav文件拷mp3里了,反正88k的码率本来就不大,再压mp3反而失真厉害(一定要压的话,建议32k码率,不能再低了)
以后如果编辑了单词表,随时可以再导出更新的录音,非常方便。有兴趣的同学可以照做下。