Search results

Filters

  • Journals
  • Authors
  • Keywords
  • Date
  • Type

Search results

Number of results: 11
items per page: 25 50 75
Sort by:
Download PDF Download RIS Download Bibtex

Abstract

The paper presents an analysis of the voicing of the phoneme /v/ in modern spoken Macedonian. The phoneme /v/ in the standard Macedonian language is classifi ed as a fricative, but some of its characteristics separate it from the other phonemes in this group. This is due to the fact that this phoneme was once a sonorant. In a part of the Macedonian dialects this phoneme is pronounced with marked voicing to this day. This phenomenon is then refl ected in the pronunciation of standard Macedonian. Our analysis is based on a selected corpus of examples that have been spoken by speakers from various dialect origins, in order to assess the any differences in pronouncing of the phoneme /v/ when placed in different phoneme contexts in the word.
Go to article

Authors and Affiliations

Веселинка Лаброска
Бранислав Геразов
Download PDF Download RIS Download Bibtex

Abstract

Choral singers are among intensive voice users whose excessive vocal effort puts them at risk of developing voice disorders. The aim of the work was to assess voice quality for choral singers in the choir at the Polish-Japanese Academy of Information Technology. This evaluation was carried out using the acoustic parameters from the COVAREP (A Collaborative Voice Analysis Repository For Speech Technologies) repository. A prototype of a mobile application was also prepared to allow the calculation of these parameters.

The study group comprised 6 male and 19 female choir singers. The control group consisted of healthy non-singing individuals, 50 men and 39 women. Auditory perceptual assessment (using the RBH scale) as well as acoustic analysis were used to test the voice quality of all the participants. The voice quality of the female choir singers proved to be normal in comparison with the control group.

The male choir singers were found to have tense voice in comparison with the controls. The parameters which proved most effective for voice evaluation were Peak Slope and Normalized Amplitude Quotient.

Go to article

Authors and Affiliations

Krzysztof Szklanny
Download PDF Download RIS Download Bibtex

Abstract

Voice production — emission, raised interest of humans from almost the beginning of the humanity. First written information dates back to the Egyptian times 2500–3000 BC. Practically from early Greek period until XIX century studies of the larynx and the speech apparatus brought new and new facts, both regarding the structures, physiology and clinics. Such ancient researchers as Galen, Morgagni, Eustachii, Casserius created milestones for modern laryngology. Authors hoped to present some facts on the anatomical researches in the field of organs responsible for voice production from historical perspective.
Go to article

Authors and Affiliations

Andrzej Żytkowski
1
Jerzy Walocha
2

  1. Faculty of Philology, Department of Polish Dialectology and Logopedics, University of Lodz, Poland
  2. Department of Anatomy, Jagiellonian University Medical College, Kraków, Poland
Download PDF Download RIS Download Bibtex

Abstract

Hereby there is given the speaker identification basic system. There is discussed application and usage of the voice interfaces, in particular, speaker voice identification upon robot and human being communication. There is given description of the information system for speaker automatic identification according to the voice to apply to robotic-verbal systems. There is carried out review of algorithms and computer-aided learning libraries and selected the most appropriate, according to the necessary criteria, ALGLIB. There is conducted the research of identification model operation performance assessment at different set of the fundamental voice tone. As the criterion of accuracy there has been used the percentage of improperly classified cases of a speaker identification.

Go to article

Authors and Affiliations

Yedilkhan Amirgaliyev
Timur Musabayev
Didar Yedilkhan
Waldemar Wójcik
Zhazira Amirgaliyeva
Download PDF Download RIS Download Bibtex

Abstract

The paper investigates the interdependence between the perceptual identification of the vocalic quality of six isolated Polish vowels traditionally defined by the spectral envelope and the fundamental frequency F0. The stimuli used in the listening experiments were natural female and male voices, which were modified by changing the F0 values in the ±1 octave range. The results were then compared with the outcome of the experiments on fully synthetic voices. Despite the differences in the generation of the investigated stimuli and their technical quality, consistent results were obtained. They confirmed the findings that in the perceptual identification of vowels of key importance is not only the position of the formants on the F1 × F2 plane but also their relationship to F0, the connection between the formants and the harmonics and other factors. The paper presents, in quantitative terms, all possible kinds of perceptual shifts of Polish vowels from one phonetic category to another in the function of voice pitch. An additional perceptual experiment was also conducted to check a broader range of F0 changes and their impact on the identification of vowels in CVC (consonant, vowel, consonant) structures. A mismatch between the formants and the glottal tone value can lead to a change in phonetic category.

Go to article

Authors and Affiliations

Mariusz Owsianny
Download PDF Download RIS Download Bibtex

Abstract

Dialogue in the Classroom: Teaching Strategies and Their Reception by Students – The paper aims to explore Student Voice research within the academic context in terms of theoretical assumptions and a practical approach to its application in the classroom. In the first part, we focus on three main themes which build the explanatory framework: (1) Italian language teaching at Polish universities, (2) the current teaching methodology implemented in the classroom, and (3) Student Voice as a tool to better plan teaching activities. In the second part, we present the findings of a survey conducted among students learning Italian at the Faculty of Applied Linguistics, and we analyze their value for the teaching and learning process.

Go to article

Authors and Affiliations

Marta Kaliska
Download PDF Download RIS Download Bibtex

Abstract

Voice acoustic analysis can be a valuable and objective tool supporting the diagnosis of many neurodegenerative diseases, especially in times of distant medical examination during the pandemic. The article compares the application of selected signal processing methods and machine learning algorithms for the taxonomy of acquired speech signals representing the vowel a with prolonged phonation in patients with Parkinson’s disease and healthy subjects. The study was conducted using three different feature engineering techniques for the generation of speech signal features as well as the deep learning approach based on the processing of images involving spectrograms of different time and frequency resolutions. The research utilized real recordings acquired in the Department of Neurology at the Medical University of Warsaw, Poland. The discriminatory ability of feature vectors was evaluated using the SVM technique. The spectrograms were processed by the popular AlexNet convolutional neural network adopted to the binary classification task according to the strategy of transfer learning. The results of numerical experiments have shown different efficiencies of the examined approaches; however, the sensitivity of the best test based on the selected features proposed with respect to biological grounds of voice articulation reached the value of 97% with the specificity no worse than 93%. The results could be further slightly improved thanks to the combination of the selected deep learning and feature engineering algorithms in one stacked ensemble model.
Go to article

Bibliography

  1.  Y.D. Kumar and A.M. Prasad, “MEMS accelerometer system for tremor analysis”, Int. J Adv. Eng. Global Technol. 2(5), 685‒693 (2014).
  2.  P. Pierleoni, “A Smart Inertial System for 24h Monitoring and Classification of Tremor and Freezing of Gait in Parkinson’s Disease”, IEEE Sens. J. 19(23), 11612‒11623 (2019).
  3.  W. Pawlukowska, K. Honczarenko, and M. Gołąb-Janowska, “Nature of speech disorders in Parkinson disease”, Pol. Neurol. Neurosurg. 47(3), 263‒269 (2013), [in Polish].
  4.  S.A. Factor, Parkinson’s Disease: Diagnosis & Clinical Management, 2nd Edition, 2002.
  5.  R. Chiaramonte and M. Bonfiglio, “Acoustic analysis of voice in Parkinson’s disease: a systematic review of voice disability and meta- analysis of studies”, Rev. Neurologia 70(11), 393‒405 (2020).
  6.  Jiri Mekyska, et al., “Robust and complex approach of pathological speech signal analysis”, Neurocomputing 167, 94‒111 (2015).
  7.  B. Erdogdu Sakar, G. Serbes, C. Sakar, “Analyzing the effectiveness of vocal features in early telediagnosis of Parkinson’s disease”, PLoS One 12, 8 (2017)
  8.  L. Berus, S. Klancnik, M. Brezocnik, and M. Ficko, “Classifying Parkinson’s Disease Based on Acoustic Measures Using Artificial Neural Networks”, Sensors (Basel) 19(1), 16 (2019).
  9.  L. Jeancolas et al., “Automatic detection of early stages of Parkinson’s disease through acoustic voice analysis with mel-frequency cepstral coefficients”, 2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), 2017, pp. 1‒6.
  10.  D.A. Rahn, M. Chou, J.J. Jiang, and Y.Zhang, “Phonatory impairment in Parkinson’s disease: evidence from nonlinear dynamic analysis and perturbation analysis”, J. Voice 21, 64‒71 (2007).
  11.  J. Kurek, B. Świderski, S. Osowski, M. Kruk, and W. Barhoumi, “Deep learning versus classical neural approach to mammogram recognition”, Bull. Pol. Acad. Sci. Tech. Sci. 66(6), 831‒840 (2018).
  12.  S. Sivaranjini and C.M. Sujatha, “Deep learning based diagnosis of Parkinson’s disease using convolutional neural network”, Multimed. Tools Appl. 79, 15467–15479 (2020).
  13.  M. Wodziński, A. Skalski, D. Hemmerling, J.R. Orozco-Arroyave, and E. Noth, “Deep Learning Approach to Parkinson’s Disease Detection Using Voice Recordings and Convolutional Neural Network Dedicated to Image Classification” 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2019, pp. 717‒720.
  14.  J. Chmielińska, K. Białek, A. Potulska-Chromik, J. Jakubowski, E. Majda-Zdancewicz, M. Nojszewska, A. Kostera-Pruszczyk and A. Dobrowolski, “Multimodal data acquisition set for objective assessment of Parkinson’s disease”, Proc. SPIE 11442, Radioelectronic Systems Conference 2019, 114420F (2020).
  15.  M. Kuhn, K. Johnson, Applied predictive modeling, New York: Springer, 2013.
  16.  P. Liang, C. Deng, J. Wu, Z. Yang, and J. Zhu, “Intelligent Fault Diagnosis of Rolling Element Bearing Based on Convolutional Neural Network and Frequency Spectrograms” 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), San Francisco, USA, 2019, pp. 1‒5.
  17.  M.S. Wibawa, I.M.D. Maysanjaya, N.K.D.P. Novianti, and P.N. Crisnapati, “Abnormal Heart Rhythm Detection Based on Spectrogram of Heart Sound using Convolutional Neural Network”, 2018 6th International Conference on Cyber and IT Service Management (CITSM), Parapat, Indonesia, 2019, pp. 1‒4.
  18.  M. Curilem, J.P. Canário, L. Franco, and R.A. Rios, “Using CNN To Classify Spectrograms of Seismic Events From Llaima Volcano (Chile)”, 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brasil, 2018, pp. 1‒8.
  19.  D. Rethage, J. Pons and X. Serra, “A Wavenet for Speech Denoising”, 2018 IEEE Int. Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 2018, pp. 5069‒5073.
  20.  A. Krizhevsky, I. Sutskever, and G.E. Hinton, “Imagenet classifi-cation with deep convolutional neural networks”, Neural Infor-mation Processing Systems, 2012.
  21.  J. Jakubowski and J. Chmielińska, “Detection of driver fatigue symptoms using transfer learning”, Bull. Pol. Acad. Sci. Tech. Sci. 66(6), 869‒874 (2018).
  22.  A. Benba, A. Jilbab, and A. Hammouch, “Voice analysis for detecting persons with Parkinson’s disease using MFCC and VQ”, International conference on circuits, systems and signal processing (ICCSSP’14), Russia, 2014.
  23.  E. Niebudek-Bogusz, J. Grygiel, P. Strumiłło, and M. Śliwińska-Kowalska, “Nonlinear acoustic analysis in the evaluation of occupational voice disorders”, Occupational Medicine, 64(1), 29–35 (2013), [in Polish].
  24.  E. Majda and A.P. Dobrowolski, “Modeling and optimization of the feature generator for speaker recognition systems”, Electr. Rev. 88(12A), 131‒136 (2012).
  25.  Y. Maryn, N. Roy, M. De Bodt, P.B. van Cauwenberge, P. Corthals, “Acoustic measurement of overall voice quality: a meta-analysis”, J. Acoust. Soc. Am. 126(5), 2619‒2634 (2009), doi: 10.1121/1.3224706.
  26.  E. Niebudek-Bogusz, J. Grygiel, P. Strumiłło, and M. Śliwińska-Kowalska, “Mel cepstral analysis of voice in patients with vocal nodules”, Otorhinolaryngology 10(4), 176‒181 (2011), [in Polish].
  27.  A. Krysiak, “Language, speech and communication disorders in Parkinson’s disease”, Neuropsychiatr. Neuropsychol. 6(1), 36–42 (2011), [in Polish].
  28.  F. Alías, J.C. Socoró, and X. Sevillano, “A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds”, Appl. Sci. 6(5), 143 (2016).
  29.  X. Valero and F. Alias, “Gammatone Cepstral Coefficients: Biologically InspiredFeatures for Non-Speech Audio Classification”, IEEE Trans. Multimedia 14(6), 1684‒1689 (2012).
  30.  S. Malcolm, “An Efficient Implementation of the Patterson-Holdworth Auditory Filter Bank”, 35. Apple Computer Technical Report, 1993.
  31.  D.M. Agrawal, H.B. Sailor, M.H. Soni, and H.A. Patil, “Novel TEO-based gammatone features for environmental sound classification”, 2017 25th European Signal Processing Conference (EUSIPCO), Kos, Greece, 2017, pp. 1809‒1813.
  32.  S. Russel and P. Norvig, Artificial intelligence – a modern approach, Upper Saddle River: Pearson Education, 2010.
  33.  A. Chatzimparmpas, R.M. Martins, K. Kucher, and A. Kerren, “StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics”, IEEE Transactions on Visualization and Computer Graphics 27(2), 1547‒1557 (2021), doi: 10.1109/TVCG.2020.3030352.
Go to article

Authors and Affiliations

Ewelina Majda-Zdancewicz
1
ORCID: ORCID
Anna Potulska-Chromik
2
ORCID: ORCID
Jacek Jakubowski
1
ORCID: ORCID
Monika Nojszewska
2
ORCID: ORCID
Anna Kostera-Pruszczyk
2
ORCID: ORCID

  1. Faculty of Electronics, Military University of Technology, ul. Gen. Sylwestra Kaliskiego 2, 00-908 Warsaw, Poland
  2. Department of Neurology, Medical University of Warsaw, ul. Banacha 1a, 02-097 Warsaw, Poland
Download PDF Download RIS Download Bibtex

Abstract

The human voice is one of the basic means of communication, thanks to which one also can easily convey the emotional state. This paper presents experiments on emotion recognition in human speech based on the fundamental frequency. AGH Emotional Speech Corpus was used. This database consists of audio samples of seven emotions acted by 12 different speakers (6 female and 6 male). We explored phrases of all the emotions – all together and in various combinations. Fast Fourier Transformation and magnitude spectrum analysis were applied to extract the fundamental tone out of the speech audio samples. After extraction of several statistical features of the fundamental frequency, we studied if they carry information on the emotional state of the speaker applying different AI methods. Analysis of the outcome data was conducted with classifiers: K-Nearest Neighbours with local induction, Random Forest, Bagging, JRip, and Random Subspace Method from algorithms collection for data mining WEKA. The results prove that the fundamental frequency is a prospective choice for further experiments.

Go to article

Authors and Affiliations

Teodora Dimitrova-Grekow
Aneta Klis
Magdalena Igras-Cybulska
ORCID: ORCID
Download PDF Download RIS Download Bibtex

Abstract

The goal of this article is to present and compare recent approaches which use speech and voice analysis as biomarkers for screening tests and monitoring of some diseases. The article takes into account metabolic, respiratory, cardiovascular, endocrine, and nervous system disorders. A selection of articles was performed to identify studies that assess voice features quantitatively in selected disorders by acoustic and linguistic voice analysis. Information was extracted from each paper in order to compare various aspects of datasets, speech parameters, methods of applied analysis and obtained results. 110 research papers were reviewed and 47 databases were summarized. Speech analysis is a promising method for early diagnosis of certain disorders. Advanced computer voice analysis with machine learning algorithms combined with the widespread availability of smartphones allows diagnostic analysis to be conducted during the patient’s visit to the doctor or at the patient’s home during a telephone conversation. Speech analysis is a simple, low-cost, non-invasive and easy-toprovide method of medical diagnosis. These are remarkable advantages, but there are also disadvantages. The effectiveness of disease diagnoses varies from 65% up to 99%. For that reason it should be treated as a medical screening test and should be an indication of the need for classic medical tests.
Go to article

Authors and Affiliations

Magdalena Igras-Cybulska
1 2
ORCID: ORCID
Daria Hemmerling
1 2
Mariusz Ziółko
1
Wojciech Datka
3 4
Ewa Stogowska
3
Michał Kucharski
1
Rafał Rzepka
5
Bartosz Ziółko
1 5

  1. Techmo sp. z o.o., Kraków, Poland
  2. AGH University of Science and Technology, Kraków, Poland
  3. Medical University of Bialystok, Białystok, Poland
  4. Faculty of Medicine, Jagiellonian University, Kraków, Poland
  5. Hokkaido University Kita Ward, Sapporo, Hokkaido, Japan
Download PDF Download RIS Download Bibtex

Abstract

This work is complementary with Bogusław Wolniewicz’s text Elzenberg about Milosz. The circumstances surrounding the discovery of Czesław Milosz’s article Duty and Henryk Elzenberg’s polemic are portrayed here. Moreover, in the second part we have attempted to evaluate Joseph Conrad’s novel The Rover.

Go to article

Authors and Affiliations

Bogusław Wolniewicz
Jan Zubelewicz
Download PDF Download RIS Download Bibtex

Abstract

Speech emotion recognition (SER) is a complicated and challenging task in the human-computer interaction because it is difficult to find the best feature set to discriminate the emotional state entirely. We always used the FFT to handle the raw signal in the process of extracting the low-level description features, such as short-time energy, fundamental frequency, formant, MFCC (mel frequency cepstral coefficient) and so on. However, these features are built on the domain of frequency and ignore the information from temporal domain. In this paper, we propose a novel framework that utilizes multi-layers wavelet sequence set from wavelet packet reconstruction (WPR) and conventional feature set to constitute mixed feature set for achieving the emotional recognition with recurrent neural networks (RNN) based on the attention mechanism. In addition, the silent frames have a disadvantageous effect on SER, so we adopt voice activity detection of autocorrelation function to eliminate the emotional irrelevant frames. We show that the application of proposed algorithm significantly outperforms traditional features set in the prediction of spontaneous emotional states on the IEMOCAP corpus and EMODB database respectively, and we achieve better classification for both speaker-independent and speaker-dependent experiment. It is noteworthy that we acquire 62.52% and 77.57% accuracy results with speaker-independent (SI) performance, 66.90% and 82.26% accuracy results with speaker-dependent (SD) experiment in final.
Go to article

Bibliography

  1.  M. Gupta, et al., “Emotion recognition from speech using wavelet packet transform and prosodic features”, J. Intell. Fuzzy Syst. 35, 1541–1553 (2018).
  2.  M. El Ayadi, et al., “Survey on speech emotion recognition: Features, classification schemes, and databases”, Pattern Recognit. 44, 572–587 (2011).
  3.  P. Tzirakis, et al., “End-to-end speech emotion recognition using deep neural networks”, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 2018, pp. 5089‒5093, doi: 10.1109/ICASSP.2018.8462677.
  4.  J.M Liu, et al., “Learning Salient Features for Speech Emotion Recognition Using CNN”, 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), Beijing, China, 2018, pp. 1‒5, doi: 10.1109/ACIIAsia.2018.8470393.
  5.  J. Kim, et al., “Learning spectro-temporal features with 3D CNNs for speech emotion recognition”, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, USA, 2017, pp. 383‒388, doi: 10.1109/ACII.2017.8273628.
  6.  M.Y Chen, X.J He, et al., “3-D Convolutional Recurrent Neural Networks with Attention Model for Speech Emotion Recognition”, IEEE Signal Process Lett. 25(10), 1440‒1444 (2018), doi: 10.1109/LSP.2018.2860246.
  7.  V.N. Degaonkar and S.D. Apte, “Emotion modeling from speech signal based on wavelet packet transform”, Int. J. Speech Technol. 16, 1‒5 (2013).
  8.  T. Feng and S. Yang, “Speech Emotion Recognition Based on LSTM and Mel Scale Wavelet Packet Decomposition”, Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence (ACAI 2018), New York, USA, 2018, art. 38.
  9.  P. Yenigalla, A. Kumar, et. al”, Speech Emotion Recognition Using Spectrogram & Phoneme Embedding Promod”, Proc. Interspeech 2018, 2018, pp. 3688‒3692, doi: 10.21437/Interspeech.2018-1811.
  10.  J. Kim, K.P. Truong, G. Englebienne, and V. Evers, “Learning spectro-temporal features with 3D CNNs for speech emotion recognition”, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, USA, 2017, pp. 383‒388, doi: 10.1109/ACII.2017.8273628.
  11.  S. Jing, X. Mao, and L. Chen, “Prominence features: Effective emotional features for speech emotion recognition”, Digital Signal Process. 72, 216‒231 (2018).
  12.  L. Chen, X. Mao, P. Wei, and A. Compare, “Speech emotional features extraction based on electroglottograph”, Neural Comput. 25(12), 3294–3317 (2013).
  13.  J. Hook, et al., “Automatic speech based emotion recognition using paralinguistics features”, Bull. Pol. Ac.: Tech. 67(3), 479‒488, 2019.
  14.  A. Mencattini, E. Martinelli, G. Costantini, M. Todisco, B. Basile, M. Bozzali, and C. Di Natale, “Speech emotion recognition using amplitude modulation parameters and a combined feature selection procedure”, Knowl.-Based Syst. 63, 68–81 (2014).
  15.  H. Mori, T. Satake, M. Nakamura, and H. Kasuya, “Constructing a spoken dialogue corpus for studying paralinguistic information in expressive conversation and analyzing its statistical/acoustic characteristics”, Speech Commun. 53(1), 36–50 (2011).
  16.  B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, and S. Narayanan, “Paralinguistics in speech and language—state- of-the-art and the challenge”, Comput. Speech Lang. 27(1), 4–39 (2013).
  17.  S. Mariooryad and C. Busso, “Compensating for speaker or lexical variabilities in speech for emotion recognition”, Speech Commun. 57, 1–12 (2014).
  18.  G.Trigeorgis et.al, “Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network”, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 2016, pp. 5200‒5204, doi: 10.1109/ ICASSP.2016.7472669.
  19.  Y. Xie et.al, “Attention-based dense LSTM for speech emotion recognition”, IEICE Trans. Inf. Syst. E102.D, 1426‒1429 (2019).
  20.  F. Tao and G.Liu, “Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition”, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 2018, pp. 2906‒2910, doi: 10.1109/ ICASSP.2018.8461750.
  21.  Y.M. Huang and W. Ao, “Novel Sub-band Spectral Centroid Weighted Wavelet Packet Features with Importance-Weighted Support Vector Machines for Robust Speech Emotion Recognition”, Wireless Personal Commun. 95, 2223–2238 (2017).
  22.  Firoz Shah A. and Babu Anto P., “Wavelet Packets for Speech Emotion Recognition”, 2017 Third International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB), Chennai, 2017, pp. 479‒481, doi: 10.1109/ AEEICB.2017.7972358.
  23.  K.Wang, N. An, and L. Li, “Speech Emotion Recognition Based on Wavelet Packet Coefficient Model”, The 9th International Symposium on Chinese Spoken Language Processing, Singapore, China, 2014, pp. 478‒482, doi: 10.1109/ISCSLP.2014.6936710.
  24.  S. Sekkate, et al., “An Investigation of a Feature-Level Fusion for Noisy Speech Emotion Recognition”, Computers 8, 91 (2019).
  25.  Varsha N. Degaonkar and Shaila D. Apte, “Emotion Modeling from Speech Signal based on Wavelet Packet Transform”, Int. J. Speech Technol. 16, 1–5 (2013).
  26.  F. Eyben, et al., “Opensmile: the munich versatile and fast open-source audio feature extractor”, MM ’10: Proceedings of the 18th ACM international conference on Multimedia, 2010, pp. 1459‒1462.
  27.  Ch.-N. Anagnostopoulos, T. Iliou, and I. Giannoukos, “Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011,” Artif. Intell. 43(2), 155–177 (2015).
  28.  H. Meng, T. Yan, F. Yuan, and H. Wei, “Speech Emotion Recognition From 3D Log-Mel SpectrogramsWith Deep Learning Network”, IEEE Access 7, 125868‒125881 (2019).
  29.  Keren, Gil and B. Schuller. “Convolutional RNN: An enhanced model for extracting features from sequential data,” International Joint Conference on Neural Networks, 2016, pp. 3412‒3419.
  30.  C.W. Huang and S.S. Narayanan, “Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition”, IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, 2017, pp. 583‒588, doi: 10.1109/ ICME.2017.8019296.
  31.  S. Mirsamadi, E. Barsoum, and C. Zhang, “Automatic Speech Emotion Recognition using Recurrent Neural Networks with Local Attention”, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA, 2017, pp. 2227- 2231, doi: 10.1109/ICASSP.2017.7952552.
  32.  Ashish Vaswani, et al., “Attention Is All You Need”, 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, USA, 2017.
  33.  X.J Wang, et al., “Dynamic Attention Deep Model for Article Recommendation by Learning Human Editors’ Demonstration”, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Canada, 2017.
  34.  C. Busso, et al., “IEMOCAP: interactive emotional dyadic motion capture database,” Language Resources & Evaluation 42(4), 335 (2008).
  35.  F. Burkhardt, A. Paeschke, M. Rolfes, W.F. Sendlmeier, and B.Weiss, “A database of German emotional speech,” INTERSPEECH 2005 – Eurospeech, Lisbon, Portugal, 2005, pp. 1517‒1520.
  36.  D. Kingma and J. Ba, “International Conference on Learning Representations (ICLR)”, ICLR, San Diego, USA, 2015.
  37.  F. Vuckovic, G. Lauc, and Y. Aulchenko. “Normalization and batch correction methods for high-throughput glycomics”, Joint Meeting of the Society-For-Glycobiology 2016, pp. 1160‒1161.
Go to article

Authors and Affiliations

Hao Meng
1
Tianhao Yan
1
Hongwei Wei
1
Xun Ji
2

  1. Key laboratory of Intelligent Technology and Application of Marine Equipment (Harbin Engineering University), Ministry of Education, Harbin, 150001, China
  2. College of Marine Electrical Engineering, Dalian Maritime University, Dalian, 116026, China

This page uses 'cookies'. Learn more