Search results

Filters

  • Journals
  • Authors
  • Keywords
  • Date
  • Type

Search results

Number of results: 3
items per page: 25 50 75
Sort by:
Download PDF Download RIS Download Bibtex

Abstract

This paper describes a Deep Belief Neural Network (DBNN) and Bidirectional Long-Short Term Memory (LSTM) hybrid used as an acoustic model for Speech Recognition. It was demonstrated by many independent researchers that DBNNs exhibit superior performance to other known machine learning frameworks in terms of speech recognition accuracy. Their superiority comes from the fact that these are deep learning networks. However, a trained DBNN is simply a feed-forward network with no internal memory, unlike Recurrent Neural Networks (RNNs) which are Turing complete and do posses internal memory, thus allowing them to make use of longer context. In this paper, an experiment is performed to make a hybrid of a DBNN with an advanced bidirectional RNN used to process its output. Results show that the use of the new DBNN-BLSTM hybrid as the acoustic model for the Large Vocabulary Continuous Speech Recognition (LVCSR) increases word recognition accuracy. However, the new model has many parameters and in some cases it may suffer performance issues in real-time applications.
Go to article

Authors and Affiliations

Łukasz Brocki
Krzysztof Marasek
Download PDF Download RIS Download Bibtex

Abstract

Variation in powertrain parameters caused by dimensioning, manufacturing and assembly inaccuracies may prevent model-based virtual sensors from representing physical powertrains accurately. Data-driven virtual sensors employing machine learning models offer a solution for including variations in the powertrain parameters. These variations can be efficiently included in the training of the virtual sensor through simulation. The trained model can then be theoretically applied to real systems via transfer learning, allowing a data-driven virtual sensor to be trained without the notoriously labour-intensive step of gathering data from a real powertrain. This research presents a training procedure for a data-driven virtual sensor. The virtual sensor was made for a powertrain consisting of multiple shafts, couplings and gears. The training procedure generalizes the virtual sensor for a single powertrain with variations corresponding to the aforementioned inaccuracies. The training procedure includes parameter randomization and random excitation. That is, the data-driven virtual sensor was trained using data from multiple different powertrain instances, representing roughly the same powertrain. The virtual sensor trained using multiple instances of a simulated powertrain was accurate at estimating rotating speeds and torque of the loaded shaft of multiple simulated test powertrains. The estimates were computed from the rotating speeds and torque at the motor shaft of the powertrain. This research gives excellent grounds for further studies towards simulation-to-reality transfer learning, in which a virtual sensor is trained with simulated data and then applied to a real system.
Go to article

Authors and Affiliations

Aku Karhinen
1
ORCID: ORCID
Aleksanteri Hamalainen
1
Mikael Manngard
2
Jesse Miettinen
1
Raine Viitala
1

  1. Department of Mechanical Engineering, Aalto University, 02150, Espoo, Finland
  2. Novia University of Applied Sciences, Juhana Herttuan puistokatu 21, 20100 Turku, Finland
Download PDF Download RIS Download Bibtex

Abstract

Safety and security have been a prime priority in people’s lives, and having a surveillance system at home keeps people and their property more secured. In this paper, an audio surveillance system has been proposed that does both the detection and localization of the audio or sound events. The combined task of detecting and localizing the audio events is known as Sound Event Localization and Detection (SELD). The SELD in this work is executed through Convolutional Recurrent Neural Network (CRNN) architecture. CRNN is a stacked layer of convolutional neural network (CNN), recurrent neural network (RNN) and fully connected neural network (FNN). The CRNN takes multichannel audio as input, extracts features and does the detection and localization of the input audio events in parallel. The SELD results obtained by CRNN with the gated recurrent unit (GRU) and with long short-term memory (LSTM) unit are compared and discussed in this paper. The SELD results of CRNN with LSTM unit gives 75% F1 score and 82.8% frame recall for one overlapping sound. Therefore, the proposed audio surveillance system that uses LSTM unit produces better detection and overall performance for one overlapping sound.
Go to article

Bibliography

[1] UNODC: United Nations Office on Drugs and Crimes, “Burglary | Statistics and data,” 2017. https://dataunodc.un.org/crime/burglary. [2] K. Lashmi and A. S. Pillai, “Ambient Intelligence and IoT Based Decision Support System for Intruder Detection,” 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, 2019, pp. 1-4. https://doi.org/10.1109/ICECCT.2019.8869327 [3] Dr. P. Prakash, R. Suresh and P.N. Kumar Dhinesh, “Smart City Video Surveillance using Fog Computing,” in International Journal of Enterprise Network Management, vol. 10, no. 3/4, pp.389 – 399, 2019. https://doi.org/10.1504/IJENM.2019.103165 [4] Caught on camera, “Different Types of CCTV-CCTV Camera Types and Uses,” 2020. [Online]. Available: https://www.caughtoncamera.net/news/different-types-of-cctv/ . [5] S. Ntalampiras, “Audio Surveillance,” 2012. [pdf]. Available: https://www.itpress.com/Secure/elibrary/papers/9781845645625/9781845645625012FU1.pdf [6] P. Foggia, N. Petkov, A. Saggese, N. Strisciuglio and M. Vento, “Audio Surveillance of Roads: A System for Detecting Anomalous Sounds,” in IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 1, pp. 279-288, Jan. 2016. https://doi.org/10.1109/TITS.2015.2470216 [7] S. Ntalampiras, I. Potamitis and N. Fakotakis, “Probabilistic Novelty Detection for Acoustic Surveillance Under Real-World Conditions,” in IEEE Transactions on Multimedia, vol. 13, no. 4, pp. 713-719, Aug. 2011. https://doi.org/10.1109/TMM.2011.2122247 [8] A. Mesaros et al., “Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 2, pp. 379-393, Feb. 2018. https://doi.org/10.1109/TASLP.2017.2778423 [9] E. Çakır, G. Parascandolo, T. Heittola, H. Huttunen and T. Virtanen, “Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, pp. 1291-1303, June 2017. https://doi.org/10.1109/TASLP.2017.2690575 [10] S. Adavanne, P. Pertilä and T. Virtanen, “Sound event detection using spatial features and convolutional recurrent neural network,” 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, 2017, pp. 771-775. https://doi.org/10.1109/ICASSP.2017.7952260 [11] P. Zinemanas, P. Cancela and M. Rocamora, “End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments,” 2019 24th Conference of Open Innovations Association (FRUCT), Moscow, Russia, 2019, pp. 533-539. https://doi.org/10.23919/FRUCT.2019.8711906 [12] G. Parascandolo, H. Huttunen and T. Virtanen, “Recurrent neural networks for polyphonic sound event detection in real-life recordings,” 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, 2016, pp. 6440-6444. https://doi.org/10.1109/ICASSP.2016.7472917 [13] L. Birnie, T. D. Abhayapala, H. Chen and P. N. Samarasinghe, “Sound Source Localization in a Reverberant Room Using Harmonic Based Music,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 651-655. https://doi.org/10.1109/ICASSP.2019.8683098 [14] L. O. Nunes et al., “A Steered-Response Power Algorithm Employing Hierarchical Search for Acoustic Source Localization Using Microphone Arrays,” in IEEE Transactions on Signal Processing, vol. 62, no. 19, pp. 5171-5183, Oct.1, 2014. https://doi.org/10.1109/TSP.2014.2336636 [15] M. W. Hansen, J. R. Jensen and M. G. Christensen, “Pitch and TDOA-based localization of acoustic sources with distributed arrays,” 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, 2015, pp. 2664-2668. https://doi.org/10.1109/ICASSP.2015.7178454 [16] J. Pak and J. W. Shin, “Sound Localization Based on Phase Difference Enhancement Using Deep Neural Networks,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 8, pp. 1335-1345, Aug. 2019. https://doi.org/10.1109/TASLP.2019.2919378 [17] S. Adavanne, A. Politis and T. Virtanen, “Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network,” 2018 26th European Signal Processing Conference (EUSIPCO), Rome, 2018, pp. 1462-1466, https://doi.org/10.23919/EUSIPCO.2018.8553182
Go to article

Authors and Affiliations

V. S. Suruthhi
1
V. Smita
1
Rolant Gini J.
1
K.I. Ramachandran
2

  1. Department of Electronics and Communication Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India
  2. Centre for Computational Engineering &Networking (CEN), Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India

This page uses 'cookies'. Learn more