Speech Synthesis:
Melody of “Daisy Bell”
orch /Users/charlesames/Scratch/SpeechOrch.xml
set rate 44100
set bits 16
set norm 1
name Daisy1
ramp 1 1 0.0 96.0 6000 6000 // Amplitude
ramp 1 3 0.0 96.0 414 414 // Formant 1
ramp 1 4 0.0 96.0 1516 1516 // Formant 2
ramp 1 5 0.0 96.0 2500 2500 // Formant 3
// Dai-
ramp 1 2 0.00 2.98 293.7 293.7 // D4
ramp 1 2 2.98 0.02 293.7 246.9 // D4-B3
note 1 1 101 0 0.00 3.00 1 0.03 // Buzz1
// -sy,
ramp 1 2 3.00 3.00 246.9 246.9 // B3
note 2 1 101 1 3.00 2.00 1 0 // Buzz1
note 3 1 119 0 0.00 5.00 0.1 // RMS1
note 4 1 121 0 0.00 5.00 // Mouth1
note 5 1 199 0 0.00 5.00 // Rebalance1
// (rest)
// Dai-
ramp 1 2 6.00 2.98 196 196 // G3
ramp 1 2 8.98 0.02 196 146.8 // G3-D3
note 6 1 101 0 6.00 3.00 1 0.03 // Buzz1
// -sy.
ramp 1 2 9.00 3.00 146.8 146.8 // D3
note 7 1 101 6 9.00 2.00 1 0 // Buzz1
note 8 1 119 0 6.00 5.00 0.1 // RMS1
note 9 1 121 0 6.00 5.00 // Mouth1
note 10 1 199 0 6.00 5.00 // Rebalance1
// (rest)
// Give
ramp 1 2 12.00 1.00 164.8 164.8 // E3
note 11 1 101 0 12.00 0.90 1 0.03 // Buzz1
note 12 1 119 0 12.00 0.90 0.1 // RMS1
note 13 1 121 0 12.00 0.90 // Mouth1
note 14 1 199 0 12.00 0.90 // Rebalance1
// me
ramp 1 2 13.00 1.00 185 185 // F#3
note 15 1 101 0 13.00 0.90 1 0.03 // Buzz1
note 16 1 119 0 13.00 0.90 // RMS1
note 17 1 121 0 13.00 0.90 // Mouth1
note 18 1 199 0 13.00 0.90 // Rebalance1
// your
ramp 1 2 14.00 1.00 196 196 // G3
note 19 1 101 0 14.00 0.90 1 0.03 // Buzz1
note 20 1 119 0 14.00 0.90 0.1 // RMS1
note 21 1 121 0 14.00 0.90 // Mouth1
note 22 1 199 0 14.00 0.90 // Rebalance1
// an-
ramp 1 2 15.00 1.98 164.8 164.8 // E3
ramp 1 2 16.98 0.02 164.8 196 // E3-G3
note 23 1 101 0 15.00 2.00 1 0.03 // Buzz1
// -swer
ramp 1 2 17.00 1.00 196 196 // G3
note 24 1 101 23 17.00 0.90 1 0 // Buzz1
note 25 1 119 0 15.00 2.90 // RMS1
note 26 1 121 0 15.00 2.90 // Mouth1
note 27 1 199 0 15.00 2.90 // Rebalance1
// do.
ramp 1 2 18.00 6.00 146.8 146.8 // D3
note 28 1 101 0 18.00 3.00 1 0.03 // Buzz1
note 29 1 119 0 18.00 3.00 0.1 // RMS1
note 30 1 121 0 18.00 3.00 // Mouth1
note 31 1 199 0 18.00 3.00 // Rebalance1
// (rest)
// I'm
ramp 1 2 24.00 3.00 220 220 // A3
note 32 1 101 0 24.00 2.90 1 0.03 // Buzz1
note 33 1 119 0 24.00 2.90 // RMS1
note 34 1 121 0 24.00 2.90 // Mouth1
note 35 1 199 0 24.00 2.90 // Rebalance1
// half
ramp 1 2 27.00 3.00 293.7 293.7 // D4
note 36 1 101 0 27.00 3.00 1 0.03 // Buzz1
note 37 1 119 0 27.00 3.00 0.1 // RMS1
note 38 1 121 0 27.00 3.00 // Mouth1
note 39 1 199 0 27.00 3.00 // Rebalance1
// cra-
ramp 1 2 30.00 1.98 246.9 246.9 // B3
ramp 1 2 31.98 0.02 246.9 220 // B3-A3
ramp 1 2 32.00 0.98 220 220 // A3
ramp 1 2 32.98 0.02 220 196 // A3-G3
note 40 1 101 0 30.00 3.00 1 0.03 // Buzz1
// -zy,
ramp 1 2 33.00 2.00 196 196 // G3
note 41 1 101 40 33.00 1.90 1 0 // Buzz1
note 42 1 119 0 30.00 4.90 0.1 // RMS1
note 43 1 121 0 30.00 4.90 // Mouth1
note 44 1 199 0 30.00 4.90 // Rebalance1
// and
ramp 1 2 35.00 1.00 185 185 // F#3
note 45 1 101 0 35.00 0.90 1 0.03 // Buzz1
note 46 1 119 0 35.00 0.90 // RMS1
note 47 1 121 0 35.00 0.90 // Mouth1
note 48 1 199 0 35.00 0.90 // Rebalance1
// all
ramp 1 2 36.00 1.00 164.8 164.8 // E3
note 49 1 101 0 36.00 0.90 1 0.03 // Buzz1
note 50 1 119 0 36.00 0.90 0.1 // RMS1
note 51 1 121 0 36.00 0.90 // Mouth1
note 52 1 199 0 36.00 0.90 // Rebalance1
// for
ramp 1 2 37.00 1.00 185 185 // F#3
note 53 1 101 0 37.00 0.90 1 0.1 // Buzz1
note 54 1 119 0 37.00 0.90 0.1 // RMS1
note 55 1 121 0 37.00 0.90 // Mouth1
note 56 1 199 0 37.00 0.90 // Rebalance1
// the
ramp 1 2 38.00 1.00 196 196 // G3
note 57 1 101 0 38.00 0.90 1 0.03 // Buzz1
note 58 1 119 0 38.00 0.90 0.1 // RMS1
note 59 1 121 0 38.00 0.90 // Mouth1
note 60 1 199 0 38.00 0.90 // Rebalance1
// love
ramp 1 2 39.00 2.00 220 220 // A3
note 61 1 101 0 39.00 1.90 1 0.03 // Buzz1
note 62 1 119 0 39.00 1.90 0.1 // RMS1
note 63 1 121 0 39.00 1.90 // Mouth1
note 64 1 199 0 39.00 1.90 // Rebalance1
// of
ramp 1 2 41.00 1.00 246.9 246.9 // B3
note 65 1 101 0 41.00 0.90 1 0.03 // Buzz1
note 66 1 119 0 41.00 0.90 0.1 // RMS1
note 67 1 121 0 41.00 0.90 // Mouth1
note 68 1 199 0 41.00 0.90 // Rebalance1
// you.
ramp 1 2 42.00 5.00 220 220 // A3
note 69 1 101 0 42.00 2.90 1 0.03 // Buzz1
note 70 1 119 0 42.00 2.90 0.1 // RMS1
note 71 1 121 0 42.00 2.90 // Mouth1
note 72 1 199 0 42.00 2.90 // Rebalance1
// (rest)
// It
ramp 1 2 47.00 1.00 246.9 246.9 // B3
note 73 1 101 0 47.00 0.50 1 0.03 // Buzz1
note 74 1 119 0 47.00 0.90 0.1 // RMS1
note 75 1 121 0 47.00 0.90 // Mouth1
note 76 1 199 0 47.00 0.90 // Rebalance1
// won't
ramp 1 2 48.00 1.00 261.6 261.6 // C4
note 77 1 101 0 48.00 0.90 1 0.03 // Buzz1
note 78 1 119 0 48.00 0.90 // RMS1
note 79 1 121 0 48.00 0.90 // Mouth1
note 80 1 199 0 48.00 0.90 // Rebalance1
// be
ramp 1 2 49.00 1.00 246.9 246.9 // B3
note 81 1 101 0 49.00 0.90 1 0.03 // Buzz1
note 82 1 119 0 49.00 0.90 0.1 // RMS1
note 83 1 121 0 49.00 0.90 // Mouth1
note 84 1 199 0 49.00 0.90 // Rebalance1
// a
ramp 1 2 50.00 1.00 220 220 // A4
note 85 1 101 0 50.00 0.90 1 0.03 // Buzz1
note 86 1 119 0 50.00 0.90 0.1 // RMS1
note 87 1 121 0 50.00 0.90 // Mouth1
note 88 1 199 0 50.00 0.90 // Rebalance1
// sty-
ramp 1 2 51.00 1.98 293.7 293.7 // D4
ramp 1 2 52.98 0.02 293.7 246.9 // D4-B3
note 89 1 101 0 51.00 2.00 1 0.03 // Buzz1
// -lish
ramp 1 2 53.00 1.00 246.9 246.9 // B3
note 90 1 101 89 53.00 0.90 1 0 // Buzz1
note 91 1 119 0 51.00 2.90 0.1 // RMS1
note 92 1 121 0 51.00 2.90 // Mouth1
note 93 1 199 0 51.00 2.90 // Rebalance1
// mar-
ramp 1 2 54.00 0.98 220 220 // A3
ramp 1 2 54.98 0.02 220 196 // A3-G3
note 94 1 101 0 54.00 1.00 1 0.03 // Buzz1
// -riage.
ramp 1 2 55.00 4.00 196 196 // G3
note 95 1 101 94 55.00 1.90 1 0 // Buzz1
note 96 1 119 0 54.00 2.90 // RMS1
note 97 1 121 0 54.00 2.90 // Mouth1
note 98 1 199 0 54.00 2.90 // Rebalance1
// (rest)
// I
ramp 1 2 59.00 1.00 220 220 // A3
note 99 1 101 0 59.00 0.90 1 0.03 // Buzz1
note 100 1 119 0 59.00 0.90 0.1 // RMS1
note 101 1 121 0 59.00 0.90 // Mouth1
note 102 1 199 0 59.00 0.90 // Rebalance1
// can't
ramp 1 2 60.00 2.00 246.9 246.9 // B3
note 103 1 101 0 60.00 1.90 1 0.03 // Buzz1
note 104 1 119 0 60.00 1.90 // RMS1
note 105 1 121 0 60.00 1.90 // Mouth1
note 106 1 199 0 60.00 1.90 // Rebalance1
// af-
ramp 1 2 62.00 0.98 196 196 // G3
ramp 1 2 62.98 0.02 196 164.8 // G3-E3
note 107 1 101 0 62.00 1.00 1 0.03 // Buzz1
// -ford
ramp 1 2 63.00 2.00 164.8 164.8 // E3
note 108 1 101 107 63.00 1.90 1 0 // Buzz1
note 109 1 119 0 62.00 2.90 0.1 // RMS1
note 110 1 121 0 62.00 2.90 // Mouth1
note 111 1 199 0 62.00 2.90 // Rebalance1
// a
ramp 1 2 65.00 1.00 196 196 // G3
note 112 1 101 0 65.00 0.90 1 0.03 // Buzz1
note 113 1 119 0 65.00 0.90 0.1 // RMS1
note 114 1 121 0 65.00 0.90 // Mouth1
note 115 1 199 0 65.00 0.90 // Rebalance1
// car-
ramp 1 2 66.00 0.98 164.8 164.8 // E3
ramp 1 2 66.98 0.02 164.8 146.8 // E3-D3
note 116 1 101 0 66.00 1.00 1 0.03 // Buzz1
// -riage.
ramp 1 2 67.00 4.00 146.8 146.8 // D3
note 117 1 101 116 67.00 2.00 1 0 // Buzz1
note 118 1 119 0 66.00 3.00 0.1 // RMS1
note 119 1 121 0 66.00 3.00 // Mouth1
note 120 1 199 0 66.00 3.00 // Rebalance1
// (rest)
// But
ramp 1 2 71.00 1.00 146.8 146.8 // D3
note 121 1 101 0 71.00 0.90 1 0.03 // Buzz1
note 122 1 119 0 71.00 0.90 0.1 // RMS1
note 123 1 121 0 71.00 0.90 // Mouth1
note 124 1 199 0 71.00 0.90 // Rebalance1
// you'll
ramp 1 2 72.00 2.00 196 196 // G3
note 125 1 101 0 72.00 1.90 1 0.03 // Buzz1
note 126 1 119 0 72.00 1.90 0.1 // RMS1
note 127 1 121 0 72.00 1.90 // Mouth1
note 128 1 199 0 72.00 1.90 // Rebalance1
// look
ramp 1 2 74.00 1.00 246.9 246.9 // B3
note 129 1 101 0 74.00 0.90 1 0.03 // Buzz1
note 130 1 119 0 74.00 0.90 0.1 // RMS1
note 131 1 121 0 74.00 0.90 // Mouth1
note 132 1 199 0 74.00 0.90 // Rebalance1
// sweet
ramp 1 2 75.00 2.00 220 220 // A3
note 133 1 101 0 75.00 1.90 1 0.03 // Buzz1
note 134 1 119 0 75.00 1.90 0.1 // RMS1
note 135 1 121 0 75.00 1.90 // Mouth1
note 136 1 199 0 75.00 1.90 // Rebalance1
// u-
ramp 1 2 77.00 0.98 146.8 146.8 // D3
ramp 1 2 77.98 0.02 146.8 196 // D3-G3
note 137 1 101 0 77.00 1.00 1 0.03 // Buzz1
// -pon
ramp 1 2 78.00 2.00 196 196 // G3
note 138 1 101 137 78.00 1.90 1 0 // Buzz1
note 139 1 119 0 77.00 2.90 // RMS1
note 140 1 121 0 77.00 2.90 // Mouth1
note 141 1 199 0 77.00 2.90 // Rebalance1
// the
ramp 1 2 80.00 1.00 246.9 246.9 // B3
note 142 1 101 0 80.00 0.90 1 0.03 // Buzz1
note 143 1 119 0 80.00 0.90 0.1 // RMS1
note 144 1 121 0 80.00 0.90 // Mouth1
note 145 1 199 0 80.00 0.90 // Rebalance1
// seat
ramp 1 2 81.00 1.00 220 220 // A3
note 146 1 101 0 81.00 0.90 1 0.03 // Buzz1
note 147 1 119 0 81.00 0.90 0.1 // RMS1
note 148 1 121 0 81.00 0.90 // Mouth1
note 149 1 199 0 81.00 0.90 // Rebalance1
// of
ramp 1 2 82.00 1.00 246.9 246.9 // B3
note 150 1 101 0 82.00 0.90 1 0.03 // Buzz1
note 151 1 119 0 82.00 0.90 0.1 // RMS1
note 152 1 121 0 82.00 0.90 // Mouth1
note 153 1 199 0 82.00 0.90 // Rebalance1
// a
ramp 1 2 83.00 1.00 261.6 261.6 // C4
note 154 1 101 0 83.00 0.90 1 0.03 // Buzz1
note 155 1 119 0 83.00 0.90 0.1 // RMS1
note 156 1 121 0 83.00 0.90 // Mouth1
note 157 1 199 0 83.00 0.90 // Rebalance1
// bi-
ramp 1 2 84.00 0.98 293.7 293.7 // D4
ramp 1 2 84.98 0.02 293.7 246.9 // D4-B3
note 158 1 101 0 84.00 1.00 1 0.03 // Buzz1
// -cy-
ramp 1 2 85.00 0.98 246.9 246.9 // B3
ramp 1 2 85.98 0.02 246.9 196 // B3-G3
note 159 1 101 158 85.00 1.00 1 0 // Buzz1
// -cle
ramp 1 2 86.00 1.00 196 196 // G3
note 160 1 101 159 86.00 0.90 1 0 // Buzz1
note 161 1 119 0 84.00 2.90 0.1 // RMS1
note 162 1 121 0 84.00 2.90 // Mouth1
note 163 1 199 0 84.00 2.90 // Rebalance1
// built
ramp 1 2 87.00 2.00 220 220 // A3
note 164 1 101 0 87.00 1.90 1 0.03 // Buzz1
note 165 1 119 0 87.00 1.90 0.1 // RMS1
note 166 1 121 0 87.00 1.90 // Mouth1
note 167 1 199 0 87.00 1.90 // Rebalance1
// for
ramp 1 2 89.00 1.00 146.8 146.8 // D3
note 168 1 101 0 89.00 0.90 1 0.03 // Buzz1
note 169 1 119 0 89.00 0.90 0.1 // RMS1
note 170 1 121 0 89.00 0.90 // Mouth1
note 171 1 199 0 89.00 0.90 // Rebalance1
// two
ramp 1 2 90.00 6.00 196 196 // G3
note 172 1 101 0 90.00 3.00 1 0.03 // Buzz1
note 173 1 119 0 90.00 3.00 0.1 // RMS1
note 174 1 121 0 90.00 3.00 // Mouth1
note 175 1 199 0 90.00 3.00 // Rebalance1
// (rest)
end 96.0
Listing 2: Note-list body for “Daisy Bell” by Henry Dacre, Iteration #1, establishing durations and pitches.
To hear a realization, click
here.
Listing 2 presents the first-iteration synthesis of “Daisy Bell”. The tempo
employed is one quarter note per second. We now clarify the functionalities of the note list
by launching into an explanation of each statement. Be reassured however that this blow-by-blow description
will only extend through the first word of the song.
The listing begins with the note-list header previously provided in Listing 1.
Notice, however, the additional name
statement, which
directs the sythesized result to a file named Daisy1.wav
. This file will reside in the same directory
as the note list.
The listing continues with four ramp
statements.
The six parameters of any ramp
statement have the following roles:
- Voice identifier,
- Contour number,
- Start time in seconds,
- Duration in seconds,
- Origin value, and
- Goal value.
These particular ramp
statements each set their corresponding contour to a value which, for the purposes of
this first iteration, holds steady through the duration of the note list.
-
The first
ramp
statement causes Contour #1: Amplitude to hold steady at 6000 Hz.
-
The second
ramp
statement causes Contour #3: Formant 1 to hold steady at 414 Hz.
-
The third
ramp
statement causes Contour #4: Formant 2 to hold steady at 1516 Hz.
-
The fourth
ramp
statement causes Contour #5: Formant 3 to hold steady at 2500 Hz.
For the record, the formant frequencies 414, 1516, and 2500 are together characteristic of the neutral vowel,
or schwa.
Next in the listing come three statements specifically affecting the initial syllable for “Daisy”, which sustains
for 3 seconds (a dotted half note).
-
Two additional
ramp
statements reference Contour #2: Frequency.
The first statement holds the frequency steady at 293.7 Hz. (D4) for most of the opening syllable.
However in the last 20 milliseconds of this duration, the frequency ramps down from 293.7 Hz. to 246.9 Hz. (B3).
The frequencies corresponding to D4 and B3 were obtained from the table of equal-tempered frequencies in the
Tuning Reference.
-
Now comes
note
#1.
The first six parameters of any note
statement have the following roles:
- Unique note identifier,
- Voice identifier,
- Instrument number,
- Slur-from ID,
- Start time in seconds, and
- Duration in seconds
In this instance note parameter #3 references Instrument #101: Buzz1, which generates a pitched tone.
Remember from before that the waveform for Buzz1 contains
harmonics 1-16, giving all harmonics equal amplitude.
This configuration creates a pulse wave, as described in the Waveform Catalog.
It is especially suitable for speech synthesis because it gives the resonating systems, in effect,
blank stock to sculpt from.
Also remember that Buzz1 takes its amplitude
from contour #1 and parameter #8, its frequency from contour #2, and its attack duration from parameter #8.
Remember finally that Buzz1 mixes its output into Voice Signal #1: Audio
to await further processing.
The next five statements complete the synthesis of the initial word, “Daisy”:
-
A
ramp
statement referencing Contour #2: Frequency sustains the frequency of 246.9 Hz.
(B3) from time 3.0 through an additional three seconds.
Understand that only 2 seconds of this duration will be filled with sound; however the Sound engine
requires contours to be fully described over time, regardless of whether the values are actually used by notes.
-
Note
#2 continues the pitched tone begun by note
#1.
Notice that parameter #4 of note
#2 references back to note
#1; this indicates a slur, about which more will be written shortly.
-
Note
#3 invokes Instrument #119: RMS1 to captures the power envelope of generated by
note
#1 and note
#2 in succession.
Notice that the start time and duration for note
#3 cover the combined duration of the two earlier Buzz1 notes.
-
Note
#4 invokes Instrument #121: Mouth1 to impose the schwa vowel formants upon the combined
tone generated by note
#1 and note
#2. It has the same start time and end time as note
#3.
-
Note
#5 rebalances the output from note
#4 using the power envelope captured by note
#3, then buffers the resuting signal
out to file.
Notice that Listing 2 always slurs between consecutive syllables of a single word.
This is the convention when setting text to music, and the convention is a sensible one, since words are single
utterances which generally ought to be connected together. The slur from notes in Listing 2 #2 (Dai-sy), #7 (Dai-sy),
#24 (an-swer), #41 (cra-zy), #90 (sty-lish), #95 (mar-riage),
#108 (af-ford), #117 (car-riage), #138 (u-pon), #159 (bi-cy-cle),
and #160 (bi-cy-cle).
A slur is indicated by placing a slur-from note id in parameter #4 of the slur-to note statement.
This causes the amplitude envelope to skip the release phase of the slur-from note and to skip
the attack phase of the slur-to note.
It also causes Instrument #101: Buzz's oscillator to carry over it's waveform
position pointer.
Slurs are only permitted when two notes reference the same instrument
and when the end time of the slur-from note matches the start time of the slur-to note.
However it turns out that allowing an oscillator's frequency to transition instantaneously between slur-from note and
slur-to note generates a pop. The pop happens despite the fact that if you view the signal, you'll see no actual discontinuity
where the notes change over. The remedy for this in SpeechOrch.xml
and in Listing 2
has been to implement the frequency input to Instrument #101: Buzz not as a discrete
parameter, but rather as a contour
that evolves over time. Listing 2 for the most part describes frequencies using steady-state
ramp
statements — “steady-state” meaning that the
origin frequency and the goal frequency are the same. However, when notes are slurred, there is a transitional ramp
lasting a very brief 20 msec.
The 20 msec duration of frequency transitions is too short to be heard as a
portamento effect. All it does is eliminate the pop.
The transition becomes hearable portamento if you increase the transition time to 50 msec. and beyond.
I tried 50 msec for “Daisy Bell” and found the poramenti irritating, at least when done as
a matter of policy.
I have indulged myself by using audible (100 msec.) portamento on the first syllable of “cra-zy” (note
#40).
Even so, sliding by two full semitones proved too much, so I stepped the pitch down by inserting an A3 on the third beat.
Later iterations will transition back and forth between vowel-like sounds and noisy sounds.
Specifically, notes #2, #7, #41 begin with voiced fricatives (z
sounds); notes #24 and #108 begins with unvoiced
fricatives (s
and f
, respectively); notes #138 and #160 begin with stop consonants
(p
and k
). In all of these cases it will be necessary to back out the
slurs. For the fricatives the reason is technical: the orchestra in SpeechOrch.xml
does not permit slurring
between voices. For the stop consonants the reason is practical: such consonants involve
actual cessation of sound prior to the “plosive” burst.
A second policy observed in Listing 2 is that if two syllables do not belong to the same
word, then there should always be at least 100 msec. of silence separating the syllables.
The converse of this policy is that there should be little or no silence separating syllables within the same word.
For now, and continuing on through Iteration #4, this no-separation policy is masked
by the must-slur policy. However articulation policy will again become an issue in Iteration #5
and Iteration #6.
Next topic: Pronunciation
© Charles Ames |
Page created: 2014-02-20 |
Last updated: 2017-06-12 |