Search results

Filters

  • Journals
  • Authors
  • Keywords
  • Date
  • Type

Search results

Number of results: 7
items per page: 25 50 75
Sort by:
Download PDF Download RIS Download Bibtex

Abstract

The aim of the article is to analyze Russian words transcribed into the Polish alphabet extracted from the texts of a Polish conservative-liberal author, S. Michalkiewicz, from the years 2003−2015. The lists of both correctly and incorrectly transcribed units are presented and the mistranscribed words are examined. The categories of transcription errors are provided along with the examples of words in which they occur. The results of the analysis may serve as a point of reference in further studies concerning adherence to the transcription rules of Russian performed on a larger number of texts written by a greater variety of authors.

Go to article

Authors and Affiliations

Daniel Dzienisiewicz
ORCID: ORCID
Download PDF Download RIS Download Bibtex

Abstract

This paper describes research behind a Large-Vocabulary Continuous Speech Recognition (LVCSR) system for the transcription of Senate speeches for the Polish language. The system utilizes severalcomponents: a phonetic transcription system, language and acoustic model training systems, a Voice Activity Detector (VAD), a LVCSR decoder, and a subtitle generator and presentation system. Some of the modules relied on already available tools and some had to be made from the beginning but the authors ensured that they used the most advanced techniques they had available at the time. Finally, several experiments were performed to compare the performance of both more modern and more conventional technologies.
Go to article

Authors and Affiliations

Krzysztof Marasek
Danijel Koržinek
Łukasz Brocki
Download PDF Download RIS Download Bibtex

Abstract

In the paper, various approaches to automatic music audio summarization are discussed. The project described in detail, is the realization of a method for extracting a music thumbnail - a fragment of continuous music of a given duration time that is most similar to the entire music piece. The results of subjective assessment of the thumbnail choice are presented, where four parameters have been taken into account: clarity (representation of the essence of the piece of music), conciseness (the motifs are not repeated in the summary), coherence of music structure, and overall quality of summary usefulness.

Go to article

Authors and Affiliations

Jakub Głaczyński
Ewa Łukasik
Download PDF Download RIS Download Bibtex

Abstract

Orthographic-To-Phonetic (O2P) Transcription is the process of learning the relationship between the written word and its phonetic transcription. It is a necessary part of Text-To-Speech (TTS) systems and it plays an important role in handling Out-Of-Vocabulary (OOV) words in Automatic Speech Recognition systems. The O2P is a complex task, because for many languages, the correspondence between the orthography and its phonetic transcription is not completely consistent. Over time, the techniques used to tackle this problem have evolved, from earlier rules based systems to the current more sophisticated machine learning approaches. In this paper, we propose an approach for Arabic O2P Conversion based on a probabilistic method: Conditional Random Fields (CRF). We discuss the results and experiments of this method apply on a pronunciation dictionary of the Most Commonly used Arabic Words, a database that we called (MCAW-Dic). MCAW-Dic contains over 35 000 words in Modern Standard Arabic (MSA) and their pronunciation, a database that we have developed by ourselves assisted by phoneticians and linguists from the University of Tlemcen. The results achieved are very satisfactory and point the way towards future innovations. Indeed, in all our tests, the score was between 11 and 15% error rate on the transcription of phonemes (Phoneme Error Rate). We could improve this result by including a large context, but in this case, we encountered memory limitations and calculation difficulties.
Go to article

Bibliography

1. Abu-Salim I.M. (1988), Consonant assimilation in Arabic: An auto-segmental perspective, Lingua, 74(1): 45–66, doi: 10.1016/0024-3841(88)90048-4.
2. AbuZeina D., Al-Khatib W., Elshafei M., Al- Muhtaseb H. (2012), Within-word pronunciation variation modeling for Arabic ASRs: a direct datadriven approach, International Journal of Speech Technology, 15(2): 65–75, doi: 10.1007/s10772-011-9122-4.
3. Ahmed M.E. (1991), Toward an Arabic text-to-speech system, The Arabian Journal for Science and Engineering, 16(4): 565–583.
4. Al-Daradkah B., Al-Diri B. (2015), Automatic grapheme-to-phoneme conversion of Arabic text, [in:] 2015 Science and Information Conference (SAI), pp. 468–473, doi: 10.1109/SAI.2015.7237184.
5. Alduais A.M.S. (2013), Quranic phonology and generative phonology: formulating generative phonological rules to non-syllabic Nuun’s Rules, International Journal of Linguistics, 5(5): 33–61, doi: 10.5296/ijl.v5i1.2436.
6. Al-Ghamdi M., Al-Muhtasib H., Elshafei M. (2004), Phonetic rules in Arabic script, Journal of King Saud University – Computer and Information Sciences, 16: 85–115, doi: 10.1016/S1319-1578(04)80010-7.
7. Al-Ghamdi M., Elshafei M., Al-Muhtaseb H. (2009), Arabic broadcast news transcription system, International Journal of Speech Technology, 10(4): 183–195, doi: 10.1007/s10772-009-9026-8.
8. Apostolopoulou M.S., Sotiropoulos D.G., Livieris I.E, Pintelas P. (2009), A memoryless BFGS neural network training algorithm, [in:] Proceeding of the 7th IEEE International Conference on Industrial Informatics (INDIN), pp. 216–221, doi: 10.1109/INDIN.2009.5195806.
9. Bagshaw P.C. (1998), Phonemic transcription by analogy in text-to-speech synthesis: novel word pronunciation and lexicon compression, Computer Speech and Language, 12(2): 119-142, doi: 10.1006/csla.1998.0042
10. Biadsy F., Habash N., Hirschberg J. (2009), Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules, [in:] Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL, Boulder, Colorado, pp. 397–405.
11. Casacuberta F., Vidal E. (2007), Systems and tools for machine translation. GIZA++: Training of statistical translation models, Universitat Politécnica de Valéncia, Spain, https://www.prhlt.upv.es/~evidal/students/master/sht/transp/giza2p.pdf.
12. Cherifi E.H. (2020), MCAW-Dict, Phonetic Dictionary of the Most Commonly used Arabic Words with SIMPA Transcription, https://drive.google.com/file/ d/1hƒ_dPwAXKone7nGIKgelMt8mIzGYFF7d2/view ?usp=sharing.
13. Cherifi E.H., Guerti M. (2017), Phonetisaurusbased letter-to-sound transcription for standard Arabic, [in:] The 5th International Conference on Electrical Engineering (ICEE-B 2017), pp. 45–48, October 29th to 31st, 2017, Boumerdes, Algeria, doi: 10.1109/ICEEB.2017.8192073.
14. El-Imam Y.A.(1989), An unrestricted vocabulary Arabic speech synthesis system, IEEE Transactions on Acoustics, Speech and Signal Processing, 37(12): 1829– 1845, doi: 10.1109/29.45531.
15. El-Imam Y.A. (2004), Phonetization of Arabic: rules and algorithms, Computer Speech and Language, 18: 339–373, doi: 10.1016/S0885-2308(03)00035-4.
16. Elshafei M., Al-Ghamdi M., Al-Muhtaseb H., Al-Najjar A. (2008), Generation of Arabic phonetic dictionaries for speech recognition, [in:] Proceedings of the International Conference on Innovations in Information Technology IIT2008, pp. 59-63. doi: 10.1109/INNOVATIONS.2008.4781716.
17. Ferrat K., Guerti M. (2017), An experimental study of the gemination in Arabic language, Archives of Acoustics, 42(4): 571–578, doi: 10.1515/aoa-2017-0061.
18. Habash N., Rambow O., Roth R. (2009), Mada+ tokan: a toolkit for Arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization, [in:] Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, pp. 102–109.
19. Illina I., Fohr D., Jouvet D. (2012), Pronunciation generation for proper names using Conditional Random Fields [in French: Génération des prononciations de noms propres à l’aide des Champs Aléatoires Conditionnels], Actes de la Conférence Conjointe JEPTALN- RECITAL 2012, Vol. 1, pp. 641–648.
20. Jousse F., Gilleron R., Tellier I., Tommasi M. (2006), Conditional random fields for XML trees [in:] Proceedings of the International Workshop on Mining and Learning with Graphs, ECML/PKDD 2006, pp. 141–148.
21. Kudo T. (2005), CRF++: Yet another CRF toolkit. User’s manual and implementation, https://aithub.com/ UCDenver-ccp/crfpp (retrieved September 20, 2020).
22. Lafferty J., McCallum A., Pereira F. (2001), Conditional Random Fields: probabilistic models for segmenting and labeling sequence data, [in:] Proceedings of the International Conference on Machine Learning ICML’01, pp. 282–289.
23. Luk R.W.P., Damper R.I. (1996), Stochastic phonographic transduction for English, Computer Speech and Language, 10(2): 133–153, doi: 10.1006/csla.1996.0009.
24. McCallum A., Li W. (2003), Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons, [in:] Proceedings of the Seventh Conference on Natural Language Learning at {HLT}-{NAACL}2003, pp. 188– 191, https://www.aclweb.org/anthology/W03-0430.
25. Polyakova T., Bonafonte A. (2005), Main issues in grapheme-to-phonetic transcription for TTS, Procesamiento Del Lenguaje Natural, 2005(35): 29–34, https://www.redalyc.org/articulo.oa?id=5157/5157517 35004.
26. Priva U.C. (2012), Sign and signal deriving linguistic generalizations from information utility, Phd Thesis, Stanford University.
27. Ramsay A., Alsharhan I., Ahmed H. (2014), Generation of a phonetic transcription for modern standard Arabic: A knowledge-based model, Computer Speech and Language, 28(4): 959–978, doi: 10.1016/ j.csl.2014.02.005.
28. Roach P. (1987), English Phonetics and Phonology, 3rd ed., Longman: Cambridge UP. 29. Sejnowsky T., Rosenberg C.R. (1987), Parallel networks that learn to pronounce English text, Complex System, 1(1): 145–168.
30. Selim H., Anbar T. (1987), A phonetic transcription system of Arabic text, [in:] ICASSP’87. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1446–1449, doi: 10.1109/ICASSP.1987.1169472.
31. Sha F., Pereira F. (2003), Shallow parsing with conditional random fields, [in:] Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp. 213–220, doi: 10.3115/1073445.1073473.
32. Sindran F., Mualla F., Haderlein T., Daqrouq K., Nöth E. (2016), Rule-based standard Arabic Phonetization at phoneme, allophone, and syllable level, International Journal of Computational Linguistics (IJCL), 7(2): 23–37.
33. Sînziana M., Iria J. (2011), L1 vs. L2 regularization in text classification when learning from labeled features, [in:] Proceedings of the 2011 10th International Conference on Machine Learning and Applications, Vol. 1, pp. 168–171, doi: 10.1109/ICMLA.2011.85.
34. Toutanova K., Klein D., Manning C.D., Singer Y.Y. (2003), Feature-rich part-of-speech tagging with a cyclic dependency network, [in:] Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp. 252–259, https://www.aclweb.org/anthology/N03-1033.
35. Tsuruoka Y., Tsujii J., Ananiadou S. (2009), Fast full parsing by linear-chain conditional random fields, [in:] Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2009), pp. 790–798, https://www.aclweb.org/anthology/E09-1090.
36. Van Coile B. (1991), Inductive learning of pronunciation rules with the Depes system, [in:] Proceedings of ICASSP 91: The IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 745–748, doi: 10.1109/ICASSP.1991.150448.
37. Wallach H. (2002), Efficient training of conditional random fields, Master’s Thesis, University of Edinburgh.
38. Wells J.C. (2002), SAMPA for Arabic, OrienTel Project, http://www.phon.ucl.ac.uk/home/sampa/ara bic.htm.
39. Yvon F. (1996), Grapheme-to-phoneme conversion using multiple unbounded overlapping chunks, [in:] Proceedings of the Conference on New Methods in Natural Language Processing, NeMLaP’96, pp. 218–228, Ankara, Turkey.

Go to article

Authors and Affiliations

El-Hadi Cherifi
1
Mhania Guerti
1

  1. Department of Electronics, Signal and Communications Laboratory, National Polytechnic School, El-Harrach 16200, Algiers, Algeria
Download PDF Download RIS Download Bibtex

Abstract

Onion yellow dwarf virus (OYDV), an aphid-borne potyvirus is one of the major viral pathogens of garlic causing significant yield losses worldwide. It is found almost everywhere in the world where Allium species is grown. The aim of this study was to test the presence of OYDV infection in garlic from Ethiopia. The presence of the virus was tested by Reverse transcription polymerase chain reaction (RT-PCR). The direct sequencing of the PCR product produced a sequence of 296 bp. Sequence analysis showed 89.27% sequence homology with an isolate from Australia (HQ258894) and 89.29% with an isolate from Spain (JX429964). A phylogenetic tree constructed with MEGA 7.0 revealed high levels of homology with various isolates of OYDV from all over the world and thus further confirmed the identity of the virus.

Go to article

Authors and Affiliations

Yohanis Kebede
Jyoti Singh
Shahana Majumder
ORCID: ORCID
Download PDF Download RIS Download Bibtex

Abstract

In performed experiments, insoluble polyvinylpolypyrrolidone, PVPP as an additive to the extraction buffer was used for isolation of total nucleic acids from hop plants and grapevine in order to obtain templates useful for detection ofHLVd and HSVd by means ofRT-PCR. Addition of2% of PVPP to the original GTC buffer (Chomczynski and Sacchi, 1987) appeared to be the most favorable. Due to PVPP addition, the protocol of extraction of nucleic acids was simplified by shortening of isolation time and reduction of expenses. However, application of the simplified method for obtaining of templates that guaranteed full repeatability of test results was limited to the spring and early summer season.
Go to article

Authors and Affiliations

Mieczysław Cajza
Wojciech Folkman
Download PDF Download RIS Download Bibtex

Abstract

In the spring of 2019, many plants, mainly winter wheat, were observed to have dwarfism and leaf yellowing symptoms. These plants from several regions of Poland were collected and sent to the Plant Disease Clinic of the Institute of Plant Protection – National Research Institute in Poznań to test for the presence of viral diseases. Double antibody sandwich enzyme-linked immunosorbent assay (DAS-ELISA) results showed numerous cases of Wheat dwarf virus (WDV) and a few cases of plant infections caused by Barley yellow dwarf viruses (BYDVs). WDV was detected in 163 out of 236 tested winter wheat plants (69.1%), in 10 out of 27 tested winter barley plants (37%) and in 6 out of 7 triticale plants (85.7%) while BYDVs were found, respectively, in 9.7% (23 out of 236) and in 18.5% (5 out of 27) of tested winter forms of wheat and barley plants. Infected plants came mainly from the regions of Lower Silesia and Greater Poland. Furthermore, individual cases of infections were also confirmed in the following districts: Lubusz, Opole, Silesia, Kuyavia-Pomerania and Warmia-Masuria. Results of Duplex-immunocapture-polymerase chain reaction (Duplex-IC-PCR) indicated the dominance of WDV-W form in wheat and WDV-B form in barley plants. Moreover, results of reverse transcription – polymerase chain reaction (RT-PCR) connected with restriction fragment length polymorphism (RFLP) analysis, performed for 17 BYDVs samples, revealed 8 BYDV-PAS, 4 BYDV-MAV and 2 BYDVPAV as well as the presence of two mixed infections of BYDV-MAV/-PAS and one case of BYDV-MAV/-PAV. Next, RT-PCR reactions confirmed single BYDV-GAV infection and the common presence of BYDV-SGV. To the best of our knowledge, in 2020 the viruses were not a big threat to cereal crops in Poland.

Go to article

Authors and Affiliations

Katarzyna Trzmiel
ORCID: ORCID

This page uses 'cookies'. Learn more