From the Desk of Gus Mueller
Dr. Raymond Carhart, the father (or grandfather) of audiology obtained his Master of Arts (1934) and Doctor of Philosophy (1936) degrees at Northwestern University. He joined the U.S. Army in the early 1940s and served as Director of the Acoustic Clinic at the Deshon General Hospital in Butler, Pennsylvania. He then returned to Northwestern, where he became Professor of Audiology in 1947. In the years that followed, Northwestern University dominated the academy scene in audiology, with many of the graduates going on to form their own programs, and become the “who’s who” of audiology for the next generation.
Although the list of superstar audiology graduates from those days at Northwestern is long, one student from the late 1940s who you do not hear as much about is Harriet Haskins. Like her mentor Carhart, she also was in the military during WWII, working as a WAVE at the Philadelphia Naval Station. Most of her later professional career was spent at Johns Hopkins Hospital in Baltimore. What makes all this relevant for this month’s 20Q is Harriet Haskin’s 1949 unpublished masters thesis from Northwestern: A phonetically balanced test of speech discrimination for children.
The PBK list developed by Haskins has certainly withstood the test of time, but it represents only a small sample of the speech recognition material available for the evaluation of the pediatric patient. To bring us up to date, we bring in an audiologist who works with these patients on a daily basis, and is involved in research regarding the efficiency of these different speech tests.
Andrea Hillock-Dunn, AuD, PhD, is the Associate Director of Pediatric Audiology and an Assistant Professor of Hearing and Speech Sciences at the Vanderbilt Bill Wilkerson Center. Dr. Hillock-Dunn is also involved in the training of AuD students, and is responsible for program administration and development in pediatric audiology. You’re probably familiar with her papers and publications dealing with early hearing detection and intervention, and auditory and audiovisual speech recognition in children with normal hearing and in those who are deaf or hard of hearing.
It’s been nearly 70 years since Harriet Haskins developed the PBK list, but the importance of valid and reliable speech recognition testing with the pediatric patient is as important now as it was then. This excellent 20Q article by Andrea offers guidance regarding how to conduct the testing correctly, reviews which tests are currently available, and provides a glimpse at which speech-related tests we might be using in the future.
Gus Mueller, PhD
To browse the complete collection of 20Q with Gus Mueller CEU articles, please visit www.audiologyonline.com/20Q
Pediatric Speech Recognition Measures: What’s Now and What’s Next!
- Readers will be able to explain reasons/rationale for conducting speech recognition testing with children.
- Readers will be able to discuss considerations for selecting appropriate speech recognition testing materials for children.
- Readers will be able to discuss how test and procedural variables impact the results and validity of speech recognition testing in children.
1. Why bother doing speech audiometry in kids at all? I can get everything I need from pure-tone thresholds, right?
Pure-tone thresholds are obviously extremely important, but we gain critical information from speech recognition testing as well. For example, speech testing can: a.) provide a cross-check for behavioral pure-tone data; b.) help quantify benefit from amplification and assist in determining programming and audiologic management decisions (e.g., consideration of alternative devices, FM, need for additional classroom supports, etc.); c.) provide a global metric for monitoring performance over time (sequential appointments); and, d.) identify abnormal system functioning not predicted from the audiogram and possible retrocochlear involvement (anomaly).
2. So it may be helpful to do unaided speech testing during an initial hearing evaluation to rule out some particular pathologies that need special attention?
That’s correct! Patients with Auditory Neuropathy Spectrum Disorder (ANSD) provide an excellent example of the potential disconnect between pure-tone detection ability and speech recognition. In a study by Rance and colleagues (1999), researchers found no significant correlation between closed-set speech recognition and behavioral thresholds in a group of ten children with ANSD. Berlin and colleagues (2010) also described pure-tone detection and word recognition abilities of children with ANSD between 4 and 18 years of age. Although pure-tone averages (PTAs) varied widely from near-normal to profound hearing loss, speech recognition ability was generally poor, especially in noise. Whereas 25 of 68 patients were able to complete open-set word recognition testing in quiet, only 5 could be tested in noise due to floor effects. Speech recognition testing may prove useful in detecting ANSD, especially testing in noise, where less overlap exists between performance of children with ANSD and cochlear loss (compared to quiet). If ANSD is suspected, speech scores should be considered in conjunction other audiologic findings such as OAEs, acoustic reflexes and ABR.
3. What about children with cochlear hearing loss? What does speech audiometry tell me that pure tone thresholds don’t in those cases?
In addition to identifying relatively rare retrocochlear disorders, speech testing may provide a more accurate predication of a child’s functional hearing abilities than the audiogram. Some of our recent data suggests that the audiogram, and even the traditional speech recognition measures commonly performed clinically, may not be maximally sensitive to receptive communication challenges of children with sensorineural hearing loss (SNHL) who use hearing aids (Hillock-Dunn, Taylor, Buss, & Leibold, 2015).
Yes, here are some details. We measured masked speech recognition in 16 school-age hearing aid users using an aided, adaptive spondee recognition task administered in two masker conditions: a.) steady noise, and b.) two-talker speech (Hillock-Dunn et al., 2015). We also pulled data from each child’s most recent clinical evaluation (i.e., PTA, SRT, better ear unaided Phonetically Balanced Kindergarten [PBK] word score in quiet), and analyzed parent responses on two subscales of the Children’s Abbreviated Profile of Hearing Aid Performance questionnaire (Kopun & Stelmachowicz, 1998). While traditional clinical measures were correlated with speech recognition performance in noise, none of the clinical measures were correlated with the aided spondee task in two-talker speech. Moreover, only performance in the two-talker speech masker was correlated with parent-reported speech recognition difficulties in real-world situations. We believe that these findings suggest that traditional clinical measures such as PTA, SRT (quiet) and word recognition (quiet) may underestimate the communication challenges of children with SNHL.
5. Why do you think that it was the SRT with the two-talker competing message that correlated with real world performance?
Not all competing background sounds are created equal! “Energetic” maskers impede speech recognition by causing peripheral interference via overlapping excitation patterns on the basilar membrane. Contrastingly, “informational” maskers are believed to produce both peripheral and central effects. They are associated with larger decrements in performance than peripheral effects and reflect difficulty disentangling the target from the background speech. An example of an energetic masker is fan noise or speech-shaped noise, and an example of an informational masker is competing speech produced by a small number of talkers.
In our study, the informational masker consisted of two women reading children’s books and the signal was a spondee. The spectrotemporal characteristics of the signal and masker were highly similar since both were speech, and the content in the masking stream was meaningful, making it especially detrimental to speech recognition. In such situations the listener must segregate the target from the masking speech and selectively attend to the signal while ignoring the background sound. A classic example of this challenge in everyday life is when a child is struggling to hear his or her teacher because of the chatter of other children in the classroom.
6. Classrooms are really rough, especially for younger children. Is a multi-talker babble even worse than the two-talker scenario?
Surprisingly, more talkers do not necessarily make the listening task harder when masker spectra and intensity levels are equated. Studies by Freyman, Balakrishnan, & Helfer (2004) and others show a nonlinear effect of talker number on masker effectiveness. While the SNR required for sentence recognition accuracy increases between 1 and 2 talkers, there’s generally a decline in masker difficulty as you move from 2 to 6 talkers, with it leveling off thereafter.
7. Why would it level off?
As you add talkers, the masking speech stream becomes more acoustically similar to noise. At some point, the further addition of talkers has minimal influence because the stimulus is already noise-like, and the perceptual or informational masking effect is negligible.
8. This all sounds very interesting, but complex maskers like that are not available in my clinic. That’s why I do my speech recognition testing in quiet.
There are commercially available speech-in-noise tests that can be used clinically with children such as the Words-in-Noise test (WIN; Wilson, 2003) and Bamford-Kowal-Bench Speech-in-Noise test (BKB-SIN; Etymōtic Research, 2005; Bench, Kowal, & Bamford, 1979). For a more comprehensive list of pediatric speech-in-noise measures and test considerations, the interested reader is referred to Table 1 of the article, "Speech perception in noise measures for children: A critical review and case studies" (Schafer, 2010).
9. Speaking of speech stimuli, can you talk about which speech tests I should use? How do I know what to use with a particular patient?
Great question! Choosing an appropriate word list is critical, and different tests are normed on different populations. A good starting point would be to consider the target population for which each test was developed and whether there is published data to which you can compare an individual child’s performance.
Also consider patient-specific factors such as expressive speech and language competency, cognitive development, etc. For example, you might choose a closed-set (point-to-picture) task for an older child with oral-motor issues or severe articulation deficits that could impact scoring accuracy on an open-set test requiring a verbal response. Likewise, the material selected should be appropriate for patients’ cognitive as opposed to chronological age. For example, a 16-year-old patient functioning at the developmental level of a 5-year-old should be tested with materials developed for younger children (e.g., PBK-50 words) as opposed to those for adolescents or adults (e.g., Northwestern University Auditory Test No. 6 [NU-6]).
10. Okay, so where can I find a speech materials hierarchy to help me determine which is a good test to start with for a given patient?
To my knowledge, currently there is no published hierarchy with a suggested test progression ordered by level of difficulty. However, Ryan McCreery and his colleagues developed an aided speech recognition battery for use with children 2 – 9 years of age who were enrolled in the Outcomes in Children with Hearing Loss (OCHL) multi-center grant. The test battery was described in a 2013 poster presented at the American Auditory Society meeting in Phoenix, AZ. With the permission of McCreery and colleagues, I’ve provided a sneak peek at their battery below.
OCHL Speech Hierarchy
- Open & Closed Set Task - 2 years (Ertmer, Miller, & Quesenberry, 2004)
- Test of auditory skills involving audiovisual presentation of 10-item lists of words familiar to most 2-year olds. Contains two parts: 1) Child repeats a word spoken by mom or examiner and utterance is scored on phoneme and word level, and 2) Child identifies word spoken from closed-set of 3 pictures.
- Early Speech Perception, Low Verbal – 2-3 years, Standard – 3 years (Moog & Geers, 1990)
- Closed-set, picture pointing task comprised of 1-, 2- or 3-syllable targets used to assess detection, pattern perception and word identification ability. Auditory-only presentation format. Low-verbal and standard options depending on developmental ability of child.
- Lexical Neighborhood Test, Easy/ Hard – 4-5 years (Kirk, Pisoni, & Osberger, 1995)
- Fifty-item, open-set test comprised of monosyllabic words selected based on language of typical 3-5 year-olds. Auditory-only presentation format. Items in “Easy” versus “Hard” lists differ according to the frequency with which they are produced in the English language and their acoustic-phonetic similarity.
- PBK-50 - 5-9 years (Haskins, 1949)
- Fifty-item, open-set test including phonemically-balanced monosyllabic words that are based on vocabulary of normal hearing kindergarten children. Auditory-only presentation format.
- Computer-Assisted Speech Perception Assessment (CASPA) in steady-state noise - 7-9 years (Boothroyd, 1999)
- Ten-item, open-set test comprised of lists of consonant-vowel-consonant words administered in quiet or noise. Separate word and phoneme (consonant and vowel) scoring increases the number of data points. Although initially developed for adults, this has been successfully administered in children (McCreery et al., 2010).
Please note that this hierarchy is comprised of only word-level stimuli. Consider implementing sentence-level measures and testing in noise when children begin approaching ceiling on LNT, around 4 or 5 years of age. As I mentioned earlier, the Bamford-Kowal-Bench Speech-in-Noise Test (BKB-SIN, Etymotic Research, 2005) based on the Bamford-Kowal-Bench sentences (Bench, Kowal, & Bamford, 1979) has been normed on children as young as 6 years of age.
11. That’s helpful. So, do I follow the ages to know when to bump a child to a harder test, or do I wait until they get 100% correct on the test I’m using?
It depends on the number of words you administer, but for a 25-word list you should advance the child to a harder test once they achieve an accuracy score of roughly 85% or greater. Critical difference values based on adult data indicate that for 25 words, a score of 84% is not significantly different from 96% (Thornton & Raffin, 1978). Children generally show greater score variability than adults, which results in wider confidence intervals. That means they need an even bigger score difference for there to be a significant change across conditions or over time!
12. Now I’m confused about how to report speech scores. Should I still label scores excellent, good, fair or poor?
That categorization is somewhat arbitrary and should probably be thrown out to avoid confusing patients and providers. As William Brandy states in the Handbook of Clinical Audiology, there is variability in labeling convention and differences across speech materials and test administration (e.g., presentation level) that can affect the comparability of scores (p.107, Katz, 2002). For example, using some scales a left ear speech recognition score of 72% might be categorized as “fair” and a right ear score of 84% as “good” even though they aren’t significantly different from one another (even for a 50-word list).
13. With those categories I was using 12% is kind of a big deal. Explain again why you’re saying scores of 72% and 84% are equivocal. How are you defining a significant score difference?
The criterion for calling a difference “significant” depends on the number of words or items presented in the speech recognition test and the test score (Raffin & Thornton, 1980; Thornton & Raffin, 1978). Thornton and Raffin (1978, 1980) published reference tables showing critical difference values for speech recognition scores derived to 10-, 25-, 50- and 100-word lists (assuming a 0.05 confidence level, meaning that differences this big or bigger could occur by chance 5 times per 100) . Returning to the example above, 72% is not significantly different from 84% for 25- or 50- word lists (p > 0.05). This difference is significant, however, for a 100 word list!
But, it’s not quite that simple. I mentioned that it’s not just about the number of words, but also performance score. So you need a larger critical difference at 50% correct, for example, than near ceiling or floor (i.e., the tails of the distribution) where the variability is reduced.
14. Wait - 100 words …What are you thinking?! Children can hardly sit still for 10 words per ear!
So, there is a test that you might try with older children and adolescents that could end up being only 10-words per ear. If you simply want to determine whether or not speech understanding is impaired consider the NU-6 word test ordered by difficulty (Hurley & Sells, 2003). In that test, Hurley and Sells (2003) took the hardest NU-6 words and determined that if you administer those first and a patient (adult) gets 9/10 correct, word recognition is expected to be adequate and the test can be terminated. If they get > 1 word wrong, administer the next 15 words and so on. Similar abbreviated word recognition screening lists are also available with other materials (e.g., CID W-22 words) (Runge & Hosford-Dunn, 1985). You might try this with children 10 years and older. Remember, however, and this is important—these tests are only valid if you use the exact same recording that was used when the word difficulty was determined by the researchers. For the commonly used NU-6 (used by Hurley and Sells), there is a recording available from Auditec of St. Louis with the words ordered appropriately.
15. Okay, thanks. But are most of the tests for younger kids especially long?
There are a number of pediatric speech recognition tests available (some which are included in the hierarchy above) that are comprised of fewer than 50 (and sometimes even 25) words. For example, the CASPA test has lists of only 10 words, but they can be scored phonemically, which increases the number of scored items to 30 (3 phonemes per target word). For tests with longer word lists, if abridged (e.g., ½) lists are used, it is important to recognize the impact on score interpretation and comparison. But, I can sympathize with you and appreciate the need for speed, especially with the super wiggly ones!
16. Can I use monitored live voice (MLV) to speed things up? That’s much faster than using recorded words.
Well, I hate to tell you, but as stated nicely by Hornsby and Mueller (2013), “the words are not the test”. It shouldn’t take much longer to use recorded materials, and your results will be more accurate and meaningful. Numerous studies have cited the drawbacks of live-voice presentation, which has been shown to be unreliable and limit cross score comparison (e.g., Brandy, 1966; Hood & Poole,1980). Furthermore, Mendell and Owen (2011) showed that the average increase in test time for administering a 50-word NU-6 list in recorded versus monitored live voice (MLV) format was only 49 seconds per ear! For a 25-word list, the difference would be even smaller!
17. What if I just bypass the carrier phrases to speed things up a bit?
That might not be a good idea. A recent study by Bonino et al. (2013) showed a significant carrier-phrase benefit in 5-10 year old children and adults (18-30 years) on an open-set speech recognition task (PBK-50 words) in multitalker babble and two-talker speech, but not speech-shaped noise. Bonino and colleagues theorized that the carrier phrase provided an auditory grouping cue that facilitated release from masking, subsequently improving speech recognition performance.
So, by removing the carrier phrase, you’re potentially making the test harder and limiting your ability to compare to data collected with a carrier phrase. If you do choose to adapt tests in such a way, document any change(s) to test administration so that it may be recreated later so that an individual’s scores might be compared over time.
18. You’re telling me that I should always use recorded stimuli, present a lot of words, and always use carrier phrases. At least my VRA testing will still be fairly speedy.
Not so fast. Maybe you could add a little more depth to your VRA testing. Have you heard about Visual Reinforcement Infant Speech Discrimination (VRISD) (Eilers, Wilson, & Moore, 1977) or Visual Reinforcement Assessment of the Perception of Speech Pattern Contrasts (VRA-SPAC) (Eisenberg, Martinez, & Boothroyd, 2004). These procedures measure discrimination of speech feature contrasts using a VRA-style habituation procedure. Basically, a speech sound (e.g., /a/) is played repeatedly until the infant habituates (stops turning) to it. Then, the baby is trained to respond to a deviant, contrasting speech sound (e.g., /u/) by turning his or her head toward the visual reinforcer in the same fashion as for VRA. This procedure has the potential to provide information about receptive communication ability sooner, and requires no additional audiologic training to administer or score. However, it should be cautioned that not all normal hearing infants can accurately perform the task, and a recent study showed that some contrasts that were differentiated electrophysiologically did not produce a consistent behavioral discrimination response (Cone, 2014).
19. You can measure speech feature discrimination electrophysiologically? How is that possible?
Yes! Speech feature discrimination has been measured with ABR, ASSR and CAEPs. In a recent paper, Cone (2015) reported vowel discrimination in infants using an oddball paradigm (repeating, frequent stimulus = standard; infrequent stimulus = deviant). The CAEP amplitude was greater to the deviant than standard vowel, and there were some correlations between CAEP measures (amp, latency) and behavioral vowel discrimination; however, the relationship between electrophysiologic and behavioral data remains complex.
20. Do you think we might start seeing increased speech testing of infants and toddlers in clinics before too long?
Although CAEP-based and VRA-style speech discrimination testing is not currently in widespread clinical use, they are receiving growing attention. With additional research on test parameters, administration and interpretation, I suspect we’ll begin to see increased clinical implementation. Hopefully, these tools will provide (earlier) insight into the speech recognition capabilities of infants and toddlers, particularly those with atypical retrocochlear or neurological functioning.
I also think we’ll see a growing movement toward the use of more realistic speech recognition materials, such as 2- or 3-talker maskers, in clinical assessment. Hopefully, this will result in more measures that are better able to approximate real-world communication ability, better address specific patient needs, and inform management considerations.
Bench, J., Kowal, A., & Bamford, J. (1979). The BKB (Bamford-Kowal-Bench) sentence lists for partially-hearing children. British Journal of Audiology, 13(3), 108-112.
Berlin, C.I., Hood, L.J., Morlet, T., Wilensky, D., Li, L., Mattingly, K.R.,...Frisch, S.A. (2010). Multi-site diagnosis and management of 260 patients with auditory neuropathy/dys-synchrony (auditory neuropathy spectrum disorder). International Journal of Audiology, 49(1), 30-43.
Boothroyd, A. (1999). Computer-assisted speech perception assessment (CASPA): Version.
Brandy, W.T. (1966). Reliability of voice tests of speech discrimination. Journal of Speech, Language, and Hearing Research, 9(3), 461-465.
Cone, B.K. (2014). Infant cortical electrophysiology and perception of vowel contrasts. International Journal of Psychophysiology, 95(2),65-76.
Eilers, R.E., Wilson, W.R., & Moore, J.M. (1977). Developmental changes in speech discrimination in infants. Journal of Speech, Language, and Hearing Research, 20(4), 766-780.
Eisenberg, L.S., Martinez, A.S., & Boothroyd, A. (2004). Perception of phonetic contrasts in infants: Development of the VRASPAC. International Congress Series,1273, 364-367.
Ertmer, D., Miller, C., & Quesenberry, J. (2004). The Open and Closed Set Test [Assessment procedure](Unpublished instrument). West Lafayette, IN: Purdue University.
Freyman, R.L., Balakrishnan, U., & Helfer, K.S. (2004). Effect of number of masking talkers and auditory priming on informational masking in speech recognition. The Journal of the Acoustical Society of America, 115(5), 2246-2256.
Haskins, J. (1949). A phonetically balanced test of speech discrimination for children (unpublished master's thesis). Northwestern University, Evanston, IL.
Hillock-Dunn, A., Taylor, C., Buss, E., & Leibold, L J. (2015). Assessing speech perception in children with hearing loss: what conventional clinical tools may miss. Ear and Hearing, 36(2), e57-60.
Hood, J.D., & Poole, J.P. (1980). Influence of the speaker and other factors affecting speech intelligibility. International Journal of Audiology, 19(5), 434-455.
Hornsby, B., & Mueller, H.G. (2013, July). Monosyllabic word testing: Five simple steps to improve accuracy and efficiency. AudiologyOnline, Article 11978. Retrieved from www.audiologyonline.com
Hurley, R.M., & Sells, J.P. (2003). An abbreviated word recognition protocol based on item difficulty. Ear and Hearing, 24(2), 111-118.
Katz, J. (Ed.). (2002). Handbook of clinical audiology (5 ed.). Baltimore, MD: Lippincott, Williams, and Wilkins.
Kirk, K.I., Pisoni, D.B., & Osberger, M.J. (1995). Lexical effects on spoken word recognition by pediatric cochlear implant users. Ear and Hearing, 16(5), 470-481.
Kopun, J G., & Stelmachowicz, P.G. (1998). Perceived communication difficulties of children with hearing loss. American Journal of Audiology, 7(1), 30-38.
McCreery, R., Ito, R., Spratford, M., Lewis, D., Hoover, B., & Stelmachowicz, P.G. (2010). Performance-intensity functions for normal-hearing adults and children using CASPA. Ear and Hearing, 31(1), 95.
McCreery, R.W., Walker, E., Spratford, M., Hatala, E., & Jacobs, S. (2013). Aided speech recognition in noise for children with hearing loss. Poster presentation at the meeting of the American Auditory Society, Phoenix, AZ.
Mendel, L.L., & Owen, S.R. (2011). A study of recorded versus live voice word recognition. International Journal of Audiology, 50(10), 688-693.
Moog, J.S., & Geers, A.E. (1990). Early speech perception test. St Louis, MO: Central Institute for the Deaf.
Raffin, M.J., & Thornton, A.R. (1980). Confidence levels for differences between speech-discrimination scores. A research note. Journal of Speech and Hearing Research, 23(1), 5-18.
Rance, G., Beer, D.E., Cone-Wesson, B., Shepherd, R.K., Dowell, R. C., King, A. M.,..Clark, G.M. (1999). Clinical findings for a group of infants and young children with auditory neuropathy. Ear and Hearing, 20(3), 238.
Runge, C.A., & Hosford-Dunn,H. (1985). Word recognition performance with modified CID W-22 word lists. Journal of Speech and Hearing Research, 28(3), 355-362.
Schafer, E. (2010). Speech perception in noise measures for children: A critical review and case studies. Journal of Educational Audiology,16, 4-15.
Thornton, A.R., & Raffin, M.J. (1978). Speech-discrimination scores modeled as a binomial variable. Journal of Speech and Hearing Research, 21(3), 507-518.
Wilson, R.H. (2003). Development of a speech-in-multitalker-babble paradigm to assess word-recognition performance. Journal of the American Academy of Audiology, 14(9), 453-470.
Cite this Content as:
Hillock-Dunn, A. (2015, September). 20Q: pediatric speech recognition measures - what's now and what's next! AudiologyOnline, Article 14981. Retrieved from http://www.audiologyonline.com.