Search results

Filters

  • Journals
  • Authors
  • Keywords
  • Date
  • Type

Search results

Number of results: 7
items per page: 25 50 75
Sort by:
Download PDF Download RIS Download Bibtex

Abstract

Snoring is a typical and intuitive symptom of the obstructive sleep apnea hypopnea syndrome (OSAHS), which is a kind of sleep-related respiratory disorder having adverse effects on people’s lives. Detecting snoring sounds from the whole night recorded sounds is the first but the most important step for the snoring analysis of OSAHS. An automatic snoring detection system based on the wavelet packet transform (WPT) with an eXtreme Gradient Boosting (XGBoost) classifier is proposed in the paper, which recognizes snoring sounds from the enhanced episodes by the generalization subspace noise reduction algorithm. The feature selection technology based on correlation analysis is applied to select the most discriminative WPT features. The selected features yield a high sensitivity of 97.27% and a precision of 96.48% on the test set. The recognition performance demonstrates that WPT is effective in the analysis of snoring and non-snoring sounds, and the difference is exhibited much more comprehensively by sub-bands with smaller frequency ranges. The distribution of snoring sound is mainly on the middle and low frequency parts, there is also evident difference between snoring and non-snoring sounds on the high frequency part.
Go to article

Authors and Affiliations

Li Ding
1
Jianxin Peng
1
Xiaowen Zhang
2
Lijuan Song
2

  1. School of Physics and Optoelectronics, South China University of Technology, Guangzhou, China
  2. State Key Laboratory of Respiratory Disease, Department of Otolaryngology-Head and Neck Surgery Laboratory of ENT-HNS Disease, First Affiliated Hospital, Guangzhou Medical University, Guangzhou, China
Download PDF Download RIS Download Bibtex

Abstract

Buzz, squeak and rattle (BSR) noise has become apparent in vehicles due to the significant reductions in engine noise and road noise. The BSR often occurs in driving condition with many interference signals. Thus, the automatic BSR detection remains a challenge for vehicle engineers. In this paper, a rattle signal denoising and enhancing method is proposed to extract the rattle components from in-vehicle background noise. The proposed method combines the advantages of wavelet packet decomposition and mathematical morphology filter. The critical frequency band and the information entropy are introduced to improve the wavelet packet threshold denoising method. A rattle component enhancing method based on multi-scale compound morphological filter is proposed, and the kurtosis values are introduced to determine the best parameters of the filter. To examine the feasibility of the proposed algorithm, synthetic brake caliper rattle signals with various SNR ratios are prepared to verify the algorithm. In the validation analysis, the proposed method can well remove the disturbance background noise in the signal and extract the rattle components with well SNR ratios. It is believed that the algorithm discussed in this paper can be further applied to facilitate the detection of the vehicle rattle noise in industry.
Go to article

Authors and Affiliations

Linyuan Liang
1 2
Shuming Chen
1 2
Peiran Li
1

  1. State Key Laboratory of Vehicle NVH and Safety Technology, Chongqing 401122, China
  2. State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China
Download PDF Download RIS Download Bibtex

Abstract

Nonnegative matrix factorization (NMF) is one of the most popular machine learning tools for speech enhancement (SE). However, there are two problems reducing the performance of the traditional NMFbased SE algorithms. One is related to the overlap-and-add operation used in the short time Fourier transform (STFT) based signal reconstruction, and the other is the Euclidean distance used commonly as an objective function; these methods can cause distortion in the SE process. In order to get over these shortcomings, we propose a novel SE joint framework which combines the discrete wavelet packet transform (DWPT) and the Itakura-Saito nonnegative matrix factorisation (ISNMF). In this approach, the speech signal was first split into a series of subband signals using the DWPT. Then, the ISNMF was used to enhance the speech for each subband signal. Finally, the inverse DWPT (IDWT) was utilised to reconstruct these enhanced speech subband signals. The experimental results show that the proposed joint framework effectively enhances the performance of speech enhancement and performs better in the unseen noise case compared to the traditional NMF methods.

Go to article

Authors and Affiliations

Houguang Liu
Wenbo Wang
Lin Xue
Jianhua Yang
Zhihua Wang
Chunli Hua
Download PDF Download RIS Download Bibtex

Abstract

In this paper, a modified sound quality evaluation (SQE) model is developed based on combination of an optimized artificial neural network (ANN) and the wavelet packet transform (WPT). The presented SQE model is a signal processing technique, which can be implemented in current microphones for predicting the sound quality. The proposed method extracts objective psychoacoustic metrics including loudness, sharpness, roughness, and tonality from sound samples, by using a special selection of multi-level nodes of the WPT combined with a trained ANN. The model is optimized using the particle swarm optimization (PSO) and the back propagation (BP) algorithms. The obtained results reveal that the proposed model shows the lowest mean square error and the highest correlation with human perception while it has the lowest computational cost compared to those of the other models and software.

Go to article

Authors and Affiliations

Mehdi Pourseiedrezaei
Ali Loghmani
Mehdi Keshmiri
Download PDF Download RIS Download Bibtex

Abstract

Speech emotion recognition (SER) is a complicated and challenging task in the human-computer interaction because it is difficult to find the best feature set to discriminate the emotional state entirely. We always used the FFT to handle the raw signal in the process of extracting the low-level description features, such as short-time energy, fundamental frequency, formant, MFCC (mel frequency cepstral coefficient) and so on. However, these features are built on the domain of frequency and ignore the information from temporal domain. In this paper, we propose a novel framework that utilizes multi-layers wavelet sequence set from wavelet packet reconstruction (WPR) and conventional feature set to constitute mixed feature set for achieving the emotional recognition with recurrent neural networks (RNN) based on the attention mechanism. In addition, the silent frames have a disadvantageous effect on SER, so we adopt voice activity detection of autocorrelation function to eliminate the emotional irrelevant frames. We show that the application of proposed algorithm significantly outperforms traditional features set in the prediction of spontaneous emotional states on the IEMOCAP corpus and EMODB database respectively, and we achieve better classification for both speaker-independent and speaker-dependent experiment. It is noteworthy that we acquire 62.52% and 77.57% accuracy results with speaker-independent (SI) performance, 66.90% and 82.26% accuracy results with speaker-dependent (SD) experiment in final.
Go to article

Bibliography

  1.  M. Gupta, et al., “Emotion recognition from speech using wavelet packet transform and prosodic features”, J. Intell. Fuzzy Syst. 35, 1541–1553 (2018).
  2.  M. El Ayadi, et al., “Survey on speech emotion recognition: Features, classification schemes, and databases”, Pattern Recognit. 44, 572–587 (2011).
  3.  P. Tzirakis, et al., “End-to-end speech emotion recognition using deep neural networks”, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 2018, pp. 5089‒5093, doi: 10.1109/ICASSP.2018.8462677.
  4.  J.M Liu, et al., “Learning Salient Features for Speech Emotion Recognition Using CNN”, 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), Beijing, China, 2018, pp. 1‒5, doi: 10.1109/ACIIAsia.2018.8470393.
  5.  J. Kim, et al., “Learning spectro-temporal features with 3D CNNs for speech emotion recognition”, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, USA, 2017, pp. 383‒388, doi: 10.1109/ACII.2017.8273628.
  6.  M.Y Chen, X.J He, et al., “3-D Convolutional Recurrent Neural Networks with Attention Model for Speech Emotion Recognition”, IEEE Signal Process Lett. 25(10), 1440‒1444 (2018), doi: 10.1109/LSP.2018.2860246.
  7.  V.N. Degaonkar and S.D. Apte, “Emotion modeling from speech signal based on wavelet packet transform”, Int. J. Speech Technol. 16, 1‒5 (2013).
  8.  T. Feng and S. Yang, “Speech Emotion Recognition Based on LSTM and Mel Scale Wavelet Packet Decomposition”, Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence (ACAI 2018), New York, USA, 2018, art. 38.
  9.  P. Yenigalla, A. Kumar, et. al”, Speech Emotion Recognition Using Spectrogram & Phoneme Embedding Promod”, Proc. Interspeech 2018, 2018, pp. 3688‒3692, doi: 10.21437/Interspeech.2018-1811.
  10.  J. Kim, K.P. Truong, G. Englebienne, and V. Evers, “Learning spectro-temporal features with 3D CNNs for speech emotion recognition”, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, USA, 2017, pp. 383‒388, doi: 10.1109/ACII.2017.8273628.
  11.  S. Jing, X. Mao, and L. Chen, “Prominence features: Effective emotional features for speech emotion recognition”, Digital Signal Process. 72, 216‒231 (2018).
  12.  L. Chen, X. Mao, P. Wei, and A. Compare, “Speech emotional features extraction based on electroglottograph”, Neural Comput. 25(12), 3294–3317 (2013).
  13.  J. Hook, et al., “Automatic speech based emotion recognition using paralinguistics features”, Bull. Pol. Ac.: Tech. 67(3), 479‒488, 2019.
  14.  A. Mencattini, E. Martinelli, G. Costantini, M. Todisco, B. Basile, M. Bozzali, and C. Di Natale, “Speech emotion recognition using amplitude modulation parameters and a combined feature selection procedure”, Knowl.-Based Syst. 63, 68–81 (2014).
  15.  H. Mori, T. Satake, M. Nakamura, and H. Kasuya, “Constructing a spoken dialogue corpus for studying paralinguistic information in expressive conversation and analyzing its statistical/acoustic characteristics”, Speech Commun. 53(1), 36–50 (2011).
  16.  B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, and S. Narayanan, “Paralinguistics in speech and language—state- of-the-art and the challenge”, Comput. Speech Lang. 27(1), 4–39 (2013).
  17.  S. Mariooryad and C. Busso, “Compensating for speaker or lexical variabilities in speech for emotion recognition”, Speech Commun. 57, 1–12 (2014).
  18.  G.Trigeorgis et.al, “Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network”, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 2016, pp. 5200‒5204, doi: 10.1109/ ICASSP.2016.7472669.
  19.  Y. Xie et.al, “Attention-based dense LSTM for speech emotion recognition”, IEICE Trans. Inf. Syst. E102.D, 1426‒1429 (2019).
  20.  F. Tao and G.Liu, “Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition”, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 2018, pp. 2906‒2910, doi: 10.1109/ ICASSP.2018.8461750.
  21.  Y.M. Huang and W. Ao, “Novel Sub-band Spectral Centroid Weighted Wavelet Packet Features with Importance-Weighted Support Vector Machines for Robust Speech Emotion Recognition”, Wireless Personal Commun. 95, 2223–2238 (2017).
  22.  Firoz Shah A. and Babu Anto P., “Wavelet Packets for Speech Emotion Recognition”, 2017 Third International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB), Chennai, 2017, pp. 479‒481, doi: 10.1109/ AEEICB.2017.7972358.
  23.  K.Wang, N. An, and L. Li, “Speech Emotion Recognition Based on Wavelet Packet Coefficient Model”, The 9th International Symposium on Chinese Spoken Language Processing, Singapore, China, 2014, pp. 478‒482, doi: 10.1109/ISCSLP.2014.6936710.
  24.  S. Sekkate, et al., “An Investigation of a Feature-Level Fusion for Noisy Speech Emotion Recognition”, Computers 8, 91 (2019).
  25.  Varsha N. Degaonkar and Shaila D. Apte, “Emotion Modeling from Speech Signal based on Wavelet Packet Transform”, Int. J. Speech Technol. 16, 1–5 (2013).
  26.  F. Eyben, et al., “Opensmile: the munich versatile and fast open-source audio feature extractor”, MM ’10: Proceedings of the 18th ACM international conference on Multimedia, 2010, pp. 1459‒1462.
  27.  Ch.-N. Anagnostopoulos, T. Iliou, and I. Giannoukos, “Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011,” Artif. Intell. 43(2), 155–177 (2015).
  28.  H. Meng, T. Yan, F. Yuan, and H. Wei, “Speech Emotion Recognition From 3D Log-Mel SpectrogramsWith Deep Learning Network”, IEEE Access 7, 125868‒125881 (2019).
  29.  Keren, Gil and B. Schuller. “Convolutional RNN: An enhanced model for extracting features from sequential data,” International Joint Conference on Neural Networks, 2016, pp. 3412‒3419.
  30.  C.W. Huang and S.S. Narayanan, “Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition”, IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, 2017, pp. 583‒588, doi: 10.1109/ ICME.2017.8019296.
  31.  S. Mirsamadi, E. Barsoum, and C. Zhang, “Automatic Speech Emotion Recognition using Recurrent Neural Networks with Local Attention”, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA, 2017, pp. 2227- 2231, doi: 10.1109/ICASSP.2017.7952552.
  32.  Ashish Vaswani, et al., “Attention Is All You Need”, 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, USA, 2017.
  33.  X.J Wang, et al., “Dynamic Attention Deep Model for Article Recommendation by Learning Human Editors’ Demonstration”, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Canada, 2017.
  34.  C. Busso, et al., “IEMOCAP: interactive emotional dyadic motion capture database,” Language Resources & Evaluation 42(4), 335 (2008).
  35.  F. Burkhardt, A. Paeschke, M. Rolfes, W.F. Sendlmeier, and B.Weiss, “A database of German emotional speech,” INTERSPEECH 2005 – Eurospeech, Lisbon, Portugal, 2005, pp. 1517‒1520.
  36.  D. Kingma and J. Ba, “International Conference on Learning Representations (ICLR)”, ICLR, San Diego, USA, 2015.
  37.  F. Vuckovic, G. Lauc, and Y. Aulchenko. “Normalization and batch correction methods for high-throughput glycomics”, Joint Meeting of the Society-For-Glycobiology 2016, pp. 1160‒1161.
Go to article

Authors and Affiliations

Hao Meng
1
Tianhao Yan
1
Hongwei Wei
1
Xun Ji
2

  1. Key laboratory of Intelligent Technology and Application of Marine Equipment (Harbin Engineering University), Ministry of Education, Harbin, 150001, China
  2. College of Marine Electrical Engineering, Dalian Maritime University, Dalian, 116026, China
Download PDF Download RIS Download Bibtex

Abstract

The main objective of this paper is to produce an applications-oriented review covering infrared techniques and devices. At the beginning infrared systems fundamentals are presented with emphasis on thermal emission, scene radiation and contrast, cooling techniques, and optics. Special attention is focused on night vision and thermal imaging concepts. Next section concentrates shortly on selected infrared systems and is arranged in order to increase complexity; from image intensifier systems, thermal imaging systems, to space-based systems. In this section are also described active and passive smart weapon seekers. Finally, other important infrared techniques and devices are shortly described, among them being: non-contact thermometers, radiometers, LIDAR, and infrared gas sensors.

Go to article

Authors and Affiliations

A. Rogalski
K. Chrzanowski
Download PDF Download RIS Download Bibtex

Abstract

A traditional frequency analysis is not appropriate for observation of properties of non-stationary signals. This stems from the fact that the time resolution is not defined in the Fourier spectrum. Thus, there is a need for methods implementing joint time-frequency analysis (t/f) algorithms. Practical aspects of some representative methods of time-frequency analysis, including Short Time Fourier Transform, Gabor Transform, Wigner-Ville Transform and Cone-Shaped Transform are described in this paper. Unfortunately, there is no correlation between the width of the time-frequency window and its frequency content in the t/f analysis. This property is not valid in the case of a wavelet transform. A wavelet is a wave-like oscillation, which forms its own “wavelet window”. Compression of the wavelet narrows the window, and vice versa. Individual wavelet functions are well localized in time and simultaneously in scale (the equivalent of frequency). The wavelet analysis owes its effectiveness to the pyramid algorithm described by Mallat, which enables fast decomposition of a signal into wavelet components.

Go to article

Authors and Affiliations

Andrzej Majkowski
Marcin Kołodziej
Remigiusz J. Rak

This page uses 'cookies'. Learn more