Listing 2 presents the first-iteration synthesis of “Daisy Bell”. The tempo employed is one quarter note per second. We now clarify the functionalities of the note list by launching into an explanation of each statement. Be reassured however that this blow-by-blow description will only extend through the first word of the song.
The listing begins with the note-list header previously provided in Listing 1.
Notice, however, the additional
name statement, which
directs the sythesized result to a file named
Daisy1.wav. This file will reside in the same directory
as the note list.
The listing continues with four
The six parameters of any
ramp statement have the following roles:
ramp statements each set their corresponding contour to a value which, for the purposes of
this first iteration, holds steady through the duration of the note list.
rampstatement causes Contour #1: Amplitude to hold steady at 6000 Hz.
rampstatement causes Contour #3: Formant 1 to hold steady at 414 Hz.
rampstatement causes Contour #4: Formant 2 to hold steady at 1516 Hz.
rampstatement causes Contour #5: Formant 3 to hold steady at 2500 Hz.
For the record, the formant frequencies 414, 1516, and 2500 are together characteristic of the neutral vowel, or schwa.
Next in the listing come three statements specifically affecting the initial syllable for “Daisy”, which sustains for 3 seconds (a dotted half note).
rampstatements reference Contour #2: Frequency. The first statement holds the frequency steady at 293.7 Hz. (D4) for most of the opening syllable. However in the last 20 milliseconds of this duration, the frequency ramps down from 293.7 Hz. to 246.9 Hz. (B3). The frequencies corresponding to D4 and B3 were obtained from the table of equal-tempered frequencies in the Tuning Reference.
note#1. The first six parameters of any
notestatement have the following roles:
The next five statements complete the synthesis of the initial word, “Daisy”:
rampstatement referencing Contour #2: Frequency sustains the frequency of 246.9 Hz. (B3) from time 3.0 through an additional three seconds. Understand that only 2 seconds of this duration will be filled with sound; however the Sound engine requires contours to be fully described over time, regardless of whether the values are actually used by notes.
Note#2 continues the pitched tone begun by
note#1. Notice that parameter #4 of
note#2 references back to
note#1; this indicates a slur, about which more will be written shortly.
Note#3 invokes Instrument #119: RMS1 to captures the power envelope of generated by
note#2 in succession. Notice that the start time and duration for
note#3 cover the combined duration of the two earlier Buzz1 notes.
Note#4 invokes Instrument #121: Mouth1 to impose the schwa vowel formants upon the combined tone generated by
note#2. It has the same start time and end time as
Note#5 rebalances the output from
note#4 using the power envelope captured by
note#3, then buffers the resuting signal out to file.
Notice that Listing 2 always slurs between consecutive syllables of a single word. This is the convention when setting text to music, and the convention is a sensible one, since words are single utterances which generally ought to be connected together. The slur from notes in Listing 2 #2 (Dai-sy), #7 (Dai-sy), #24 (an-swer), #41 (cra-zy), #90 (sty-lish), #95 (mar-riage), #108 (af-ford), #117 (car-riage), #138 (u-pon), #159 (bi-cy-cle), and #160 (bi-cy-cle).
A slur is indicated by placing a slur-from note id in parameter #4 of the slur-to note statement. This causes the amplitude envelope to skip the release phase of the slur-from note and to skip the attack phase of the slur-to note. It also causes Instrument #101: Buzz's oscillator to carry over it's waveform position pointer. Slurs are only permitted when two notes reference the same instrument and when the end time of the slur-from note matches the start time of the slur-to note.
However it turns out that allowing an oscillator's frequency to transition instantaneously between slur-from note and
slur-to note generates a pop. The pop happens despite the fact that if you view the signal, you'll see no actual discontinuity
where the notes change over. The remedy for this in
SpeechOrch.xml and in Listing 2
has been to implement the frequency input to Instrument #101: Buzz not as a discrete
parameter, but rather as a contour
that evolves over time. Listing 2 for the most part describes frequencies using steady-state
ramp statements — “steady-state” meaning that the
origin frequency and the goal frequency are the same. However, when notes are slurred, there is a transitional ramp
lasting a very brief 20 msec.
The 20 msec duration of frequency transitions is too short to be heard as a
portamento effect. All it does is eliminate the pop.
The transition becomes hearable portamento if you increase the transition time to 50 msec. and beyond.
I tried 50 msec for “Daisy Bell” and found the poramenti irritating, at least when done as
a matter of policy.
I have indulged myself by using audible (100 msec.) portamento on the first syllable of “cra-zy” (
Even so, sliding by two full semitones proved too much, so I stepped the pitch down by inserting an A3 on the third beat.
Later iterations will transition back and forth between vowel-like sounds and noisy sounds.
Specifically, notes #2, #7, #41 begin with voiced fricatives (
z sounds); notes #24 and #108 begins with unvoiced
f, respectively); notes #138 and #160 begin with stop consonants
k). In all of these cases it will be necessary to back out the
slurs. For the fricatives the reason is technical: the orchestra in
SpeechOrch.xml does not permit slurring
between voices. For the stop consonants the reason is practical: such consonants involve
actual cessation of sound prior to the “plosive” burst.
A second policy observed in Listing 2 is that if two syllables do not belong to the same word, then there should always be at least 100 msec. of silence separating the syllables. The converse of this policy is that there should be little or no silence separating syllables within the same word. For now, and continuing on through Iteration #4, this no-separation policy is masked by the must-slur policy. However articulation policy will again become an issue in Iteration #5 and Iteration #6.
Next topic: Pronunciation
|© Charles Ames||Page created: 2014-02-20||Last updated: 2017-06-12|