Choral singers are among intensive voice users whose excessive vocal effort puts them at risk of developing voice disorders. The aim of the work was to assess voice quality for choral singers in the choir at the Polish-Japanese Academy of Information Technology. This evaluation was carried out using the acoustic parameters from the COVAREP (A Collaborative Voice Analysis Repository For Speech Technologies) repository. A prototype of a mobile application was also prepared to allow the calculation of these parameters. The study group comprised 6 male and 19 female choir singers. The control group consisted of healthy non-singing individuals, 50 men and 39 women. Auditory perceptual assessment (using the RBH scale) as well as acoustic analysis were used to test the voice quality of all the participants. The voice quality of the female choir singers proved to be normal in comparison with the control group. The male choir singers were found to have tense voice in comparison with the controls. The parameters which proved most effective for voice evaluation were Peak Slope and Normalized Amplitude Quotient.
The human voice is one of the basic means of communication, thanks to which one also can easily convey the emotional state. This paper presents experiments on emotion recognition in human speech based on the fundamental frequency. AGH Emotional Speech Corpus was used. This database consists of audio samples of seven emotions acted by 12 different speakers (6 female and 6 male). We explored phrases of all the emotions – all together and in various combinations. Fast Fourier Transformation and magnitude spectrum analysis were applied to extract the fundamental tone out of the speech audio samples. After extraction of several statistical features of the fundamental frequency, we studied if they carry information on the emotional state of the speaker applying different AI methods. Analysis of the outcome data was conducted with classifiers: K-Nearest Neighbours with local induction, Random Forest, Bagging, JRip, and Random Subspace Method from algorithms collection for data mining WEKA. The results prove that the fundamental frequency is a prospective choice for further experiments.
The article is devoted to the problem of voice signals recognition means introduction in the system of distance learning. The results of the conducted research determine the prospects of neural network means of phoneme recognition. It is also shown that the main difficulties of creation of the neural network model, intended for recognition of phonemes in the system of distance learning, are connected with the uncertain duration of a phoneme-like element. Due to this reason for recognition of phonemes, it is impossible to use the most effective type of neural network model on the basis of a multilayered perceptron, at which the number of input parameters is a fixed value. To mitigate this shortcoming, the procedure, allowing to transform the non-stationary digitized voice signal to the fixed quantity of mel-cepstral coefficients, which are the basis for calculation of input parameters of the neural network model, is developed. In contrast to the known ones, the possibility of linear scaling of phoneme-like elements is available in the procedure. The number of computer experiments confirmed expediency of the fact that the use of the offered coding procedure of input parameters provides the acceptable accuracy of neural network recognition of phonemes under near-natural conditions of the distance learning system. Moreover, the prospects of further research in the field of development of neural network means of phoneme recognition of a voice signal in the system of distance learning is connected with an increase in admissible noise level. Besides, the adaptation of the offered procedure to various natural languages, as well as to other applied tasks, for instance, a problem of biometric authentication in the banking sector, is also of great interest.