Five Differences Between Speech and Music for Hearing AidsFive Differences Between Speech and Music for Hearing Aids
Just as there are similarities and differences between speech and music spectra, there are also similarities and differences between the perceptual requirements for speech and for music. Compared with music, speech tends to be a well-controlled spectrum with well established and predictable perceptual characteristics.
In contrast, musical spectra are highly variable and the perceptual requirements can vary based on the musician and the instrument being played. This is an overview of five salient differences between speech and music that have direct ramifications for hearing aid fittings.
1. Speech vs. music spectra:
Speech, regardless of language has to be generated by a rather uniform set of tubes and cavities. The human vocal tract is approximately 17 cm from larynx (vocal chords) to lips (Kent and Read, 2002). The vocal tract can be either a single tube as is the case of oral consonants and vowels, or a pair of parallel tubes when the nasal cavity is open as in [m] and [n]. Nevertheless, the human vocal tract is governed by several fundamental laws of acoustics which are independent of the language spoken.
For example, the frequency of the resonances of the vocal tract (called formants) are governed primarily by constrictions in the mouth and the length of the vocal tract tube. Vocal tract lengths cannot change significantly. Subsequently, it is understandable that the adult vocal tract generates a rather limited set of outputs. Taken together and measured over a period of time, this can be summarized as the ''long term speech spectrum''. In some respects, hearing aid engineers and professionals have sought to re-establish the shape of this spectrum for hearing impaired listeners, via amplification, with hopes of improving speech communication. Indeed, many of the target-based hearing aid fitting formulae are based on the long-term speech spectrum.
In contrast to the relatively well defined human vocal tract output, there is no consistent, well defined, long-term music spectrum. The outputs of various musical instruments are highly variable ranging from a low-frequency preponderance to a high-frequency emphasis. In some cases the output spectrum is ''speech like'' whereas in others, there is no similarity. In contrast to speech, there is no single ''music-based target'' that can be the goal of an optimal hearing aid fitting.
2. Physical output vs. perceptual requirements of the listener:
In speech, there are slight differences between various languages in the proportion of audible cues that are important for speech perception. This has been summarized under ''articulation index (AI)'' research. Measures such as the AI have been used for decades in the hearing aid industry. Results for AI importance weighting as a function of frequency do vary slightly from language to language, but generally show that for speech, most of the important sounds for speech clarity derive from bands over 1000 Hz, whereas most of the loudest perception of speech are from those bands below 1000 Hz. Clinically, it is accepted that if a client reports unclear or muffled speech, a decrease in low frequency, and an increase in high frequency sound transmission will generally help alleviate the complaint.
One may say that with speech, the majority of the energy, or the spectrally most intense region is the lower frequencies, and the clarity (which has more to do with auditory perception) is derived from the higher frequencies. Using linguistic jargon, speech is phonetically more dominant in the lower frequencies, and is phonemically more important in the higher frequencies. That is, the auditory perception of speech has a significantly different weighting than does the physical output from a speaker's mouth. Despite the differences between the physical output of the speech and the frequency requirements for optimal speech understanding, the differences are constant and predictable- low frequency loudness cues and high frequency clarity cues. This has ramifications for fitting hearing aids by accomplishing an appropriate balance between speech loudness and speech clarity.
Unlike speech, the phonemic spectrum of music is highly variable. Regardless of the physical output of the musical instrument, the perceptual needs of the musician or listener may vary depending on the instrument. A stringed instrument musician needs to be able to hear the exact relationship between the lower frequency fundamental energy and the higher frequency harmonic structure. When a violinist says, ''this is a great sounding instrument'', they are saying that the relationship between the fundamental and the harmonics has a preferred balance - both in relative intensity and exact spectral location. One can say that a violinist therefore has a broadband phonemic requirement. Not only does a violinist generate a wide range of frequencies, but the violinist needs to be able to hear those frequencies.
In contrast, a woodwind player such as a clarinetist needs to be able to hear the lower frequency inter-resonant breathiness. When a clarinet player says ''that is a good sound'' they are saying that the lower frequency noise in between resonances of their instrument has a certain level. High frequency information is not very important to a clarinet player (other than for loudness perception). One can, therefore, say that a clarinet player has a low frequency phonemic requirement, despite the fact that the clarinet player can generate as many higher frequency sounds as can the violinist. Setting a hearing aid to provide each one of these two musicians with the optimal sound would be different exercises- the violinist needs a speech-like broadband aided result, whereas the clarinet player would be just as happy with a 1960s hearing aid response.
3. Loudness summation, loudness, and intensity:
The ''source'' of sound in the human vocal tract is the vibration of the vocal cords. For those that like physics, because of the way the vocal cords are held to the larynx, they function as a one half wavelength resonator. This simply means that not only is there the fundamental energy (typically 120-130 Hz for men and 180-220 Hz for women) but there are evenly spaced harmonics at integer multiples of the fundamental. For a man's voice with a fundamental frequency of 125 Hz, there are harmonics at 250 Hz, 375 Hz, 500 Hz, and so on. Rarely is there ever a fundamental frequency below 100 Hz. Therefore the minimal spacing between harmonics in speech is on the order of at least 100 Hz. In other words, no two harmonics would fall within the same critical band with the result that there is minimal loudness summation- soft sounding speech is less intense and loud sounding speech is more intense. With speech, there is a good correlation between one's perception of the loudness and the physical vocal intensity. Setting a hearing aid to re-establish ''equal or normal loudness'' with speech is therefore a relatively simple task.
Some musical instruments are speech-like in the sense that they generate mid- frequency fundamental energy with evenly spaced harmonics. Oboes, saxophones and violins are in this category. However, many bass stringed instruments such as the string bass and the cello are also half wavelength resonator instruments- similar to speech- but tend to be perceived as quite loud since more than one harmonic can fall within one critical bandwidth thereby increasing the loudness (because of loudness summation), but not the intensity. That is, for the bass and cello, there is a poor correlation between measured intensity and perceived loudness. A hard of hearing bass or cello player would therefore need less low- and mid-frequency amplification in order to re-establish equal or normal loudness perceptions than those who play other musical instruments. A so called ''music'' channel for bass and cello players would need to be set with less low- and mid-frequency gain than other, treble-oriented instruments.
4. The ''crest factor'' of speech and music:
The crest factor is a measure of the difference in decibels between the peaks in a spectrum and the average or RMS (root mean square) value. A typical crest factor with speech is about 12 dB. That is, the peaks of speech are about 12 dB more intense than the average values. This is well known in the hearing aid industry and was the basis for the ''reference test gain'' measure in older versions of the ANSI standards. One of the physical parameters that lead to the 12 dB crest factor for speech, is damping. The human vocal tract is highly damped- the highly damped nasal cavity, soft cheeks, soft tongue, lips and saliva- all contribute to a highly damped (if you'll excuse the pun) output. One of the many reasons for setting the threshold knee-points on hearing aid compression systems is predicated on this crest factor. In addition, compression detectors are set to function according to this 12 dB crest factor.
Musical instruments, however, are not so well damped. Hard walled horns and stiff resonator chambers all yield a physical musical signal with much higher crest factors. Typical crest factors for musical instruments are on the order of 18-20 dB. That is, with musical instruments, peaks tend to be ''peakier'' than for speech. Both the threshold knee-point of the compression detector, as well as the nature of the detector itself needs to be different in order to prevent the hearing aid amplifier from entering compression prematurely. Those compression systems that utilize an RMS detector rather than a peak detector would be a more appropriate choice for music. If a peak detector were to be used, the compression knee-point should be set 5-8 dB higher than for equivalent intensities of speech.
5. Different intensities for speech and music:
Typical outputs for normal intensity speech can range from 53 dB SPL for the [th] as in 'think' to about 77 dB SPL for the [a] in 'father'. Shouted speech can reach 83 dB SPL. This 24 dB range (+/- 12 dB) is related to the characteristics of the human vocal tract and vocal chords. Music can be on the order of 100 dB SPL with peaks and valleys in the spectrum of +/- 18 dB. In fact, peaks for a 100 dB SPL musical input can cause conventional hearing aid microphones to distort (since the maximum transduction capability is 115 dB SPL).
Clearly, for a given hearing loss, a musical input would require less gain in order to have the same output as a typically less intense speech input. Hearing aids need to be designed with unity gain above a certain input level in order to handle the lower gain requirements with musical input.
Looking at this from another point of view, many musical inputs are limited or clipped just after the microphone, prior to any amplification. This ''peak input limiting level'' which is typically set at about 85 dB SPL is quite adequate for speech inputs but is too low for typical musical inputs would cause the music to be distorted and unclear in a conventional hearing aid. The ''peak input limiting level'' should be elevated to at least 105 dB SPL so that amplified music may retain its fidelity.
A website has been constructed demonstrating the deleterious effects of having too low of a peak input limiting level. Please visit:
www.musicandhearingaids.cjb.net. (Chasin, 2003).
Understandably, hearing aids are designed with speech as the acoustic signal of primary focus. The task of designing a hearing aid that also works well for music may be quite daunting. Music spectra are quite variable and there is no single ''articulation index'' for music relating the importance of various frequency bands.
Nevertheless, certain parameters can be utilized with conventional hearing aids to allow amplification of a high fidelity musical signal. Increasing compression detector thresholds, as well as altering the detection of a musical signal by a compressor may improve the usability of hearing aids for musicians with hearing loss. Hearing aids should have the capability to allow the listener to easily adjust gain (and secondarily output) requirements for listening to music.
Chasin, M. (2003). ''Music and Hearing Aids'', Hearing Journal, July.
Kent, R.D., & Read, C. (2002). Acoustic Analysis of Speech, (2nd ed.), New York: Delmar.