This study examined whether differences in reverberation time (RT) between typical sound field test rooms used in audiology clinics have an effect on speech recognition in multi-talker environments. Separate groups of participants listened to target speech sentences presented simultaneously with 0-to-3 competing sentences through four spatially-separated loudspeakers in two sound field test rooms having RT = 0:6 sec (Site 1: N = 16) and RT = 0:4 sec (Site 2: N = 12). Speech recognition scores (SRSs) for the Synchronized Sentence Set (S3) test and subjective estimates of perceived task difficulty were recorded. Obtained results indicate that the change in room RT from 0.4 to 0.6 sec did not significantly influence SRSs in quiet or in the presence of one competing sentence. However, this small change in RT affected SRSs when 2 and 3 competing sentences were present, resulting in mean SRSs that were about 8-10% better in the room with RT = 0:4 sec. Perceived task difficulty ratings increased as the complexity of the task increased, with average ratings similar across test sites for each level of sentence competition. These results suggest that site-specific normative data must be collected for sound field rooms if clinicians would like to use two or more directional speech maskers during routine sound field testing.
A phoneme segmentation method based on the analysis of discrete wavelet transform spectra is described. The localization of phoneme boundaries is particularly useful in speech recognition. It enables one to use more accurate acoustic models since the length of phonemes provide more information for parametrization. Our method relies on the values of power envelopes and their first derivatives for six frequency subbands. Specific scenarios that are typical for phoneme boundaries are searched for. Discrete times with such events are noted and graded using a distribution-like event function, which represent the change of the energy distribution in the frequency domain. The exact definition of this method is described in the paper. The final decision on localization of boundaries is taken by analysis of the event function. Boundaries are, therefore, extracted using information from all subbands. The method was developed on a small set of Polish hand segmented words and tested on another large corpus containing 16 425 utterances. A recall and precision measure specifically designed to measure the quality of speech segmentation was adapted by using fuzzy sets. From this, results with F-score equal to 72.49% were obtained.
Laughter is one of the most important paralinguistic events, and it has specific roles in human conversation. The automatic detection of laughter occurrences in human speech can aid automatic speech recognition systems as well as some paralinguistic tasks such as emotion detection. In this study we apply Deep Neural Networks (DNN) for laughter detection, as this technology is nowadays considered state-of-the-art in similar tasks like phoneme identification. We carry out our experiments using two corpora containing spontaneous speech in two languages (Hungarian and English). Also, as we find it reasonable that not all frequency regions are required for efficient laughter detection, we will perform feature selection to find the sufficient feature subset.
Hereby there is given the speaker identification basic system. There is discussed application and usage of the voice interfaces, in particular, speaker voice identification upon robot and human being communication. There is given description of the information system for speaker automatic identification according to the voice to apply to robotic-verbal systems. There is carried out review of algorithms and computer-aided learning libraries and selected the most appropriate, according to the necessary criteria, ALGLIB. There is conducted the research of identification model operation performance assessment at different set of the fundamental voice tone. As the criterion of accuracy there has been used the percentage of improperly classified cases of a speaker identification.