Automatic rating of hoarseness by text-based cepstral and prosodic evaluation
The standard for the analysis of distorted voices is perceptual rating
of read-out texts or spontaneous speech. Automatic voice evaluation, however,
is usually done on stable sections of sustained vowels. In this paper, text-based
and established vowel-based analysis are compared with respect to their ability
to measure hoarseness and its subclasses. 73 hoarse patients (48.3±16.8 years)
uttered the vowel /e/ and read the German version of the text “The North Wind
and the Sun”. Five speech therapists and physicians rated roughness, breathiness,
and hoarseness according to the German RBH evaluation scheme. The best
human-machine correlations were obtained for measures based on the Cepstral
Peak Prominence (CPP; up to |r | = 0.73). Support Vector Regression (SVR) on
CPP-based measures and prosodic features improved the results further to r ≈0.8
and confirmed that automatic voice evaluation should be performed on a text
recording.
Share this page