A common observation of everyday life reveals the growing importance of data science methods, which are increasingly more and more important part of the mainstream of knowledge generation process. Digital technologies and their potential for data collection and data processing have initiated the birth of the fourth paradigm of science, based on Big Data. Key to these transformations is datafication and data mining that allow the discovery of knowledge from contaminated data. The main purpose of the considerations presented here is to describe the phenomena that make up these processes and indicate their possible epistemological consequences. It has been assumed that increasing datafication tendencies may result in the formation of a data- centric perception of all aspects of reality, making data and the methods of their processing a kind of higher instance shaping human thinking about the world. This research is theoretical in nature. Such issues as the process of datafication and data science have been analyzed with a focus on the areas that raise doubts about the validity of this form of cognition.
Power big data contains a lot of information related to equipment fault. The analysis and processing of power big data can realize fault diagnosis. This study mainly analyzed the application of association rules in power big data processing. Firstly, the association rules and the Apriori algorithm were introduced. Then, aiming at the shortage of the Apriori algorithm, an IM-Apriori algorithm was designed, and a simulation experiment was carried out. The results showed that the IM-Apriori algorithm had a significant advantage over the Apriori algorithm in the running time. When the number of transactions was 100 000, the running of the IM-Apriori algorithm was 38.42% faster than that of the Apriori algorithm. The IM-Apriori algorithm was little affected by the value of supportmin. Compared with the Extreme Learning Machine (ELM), the IM-Apriori algorithm had better accuracy. The experimental results show the effectiveness of the IM-Apriori algorithm in fault diagnosis, and it can be further promoted and applied in power grid equipment.
The aim of the study was to evaluate the possibility of applying different methods of data mining to model the inflow of sewage into the municipal sewage treatment plant. Prediction models were elaborated using methods of support vector machines (SVM), random forests (RF), k-nearest neighbour (k-NN) and of Kernel regression (K). Data consisted of the time series of daily rainfalls, water level measurements in the clarified sewage recipient and the wastewater inflow into the Rzeszow city plant. Results indicate that the best models with one input delayed by 1 day were obtained using the k-NN method while the worst with the K method. For the models with two input variables and one explanatory one the smallest errors were obtained if model inputs were sewage inflow and rainfall data delayed by 1 day and the best fit is provided using RF method while the worst with the K method. In the case of models with three inputs and two explanatory variables, the best results were reported for the SVM and the worst for the K method. In the most of the modelling runs the smallest prediction errors are obtained using the SVM method and the biggest ones with the K method. In the case of the simplest model with one input delayed by 1 day the best results are provided using k-NN method and by the models with two inputs in two modelling runs the RF method appeared as the best.
This article presents the methodology for exploratory analysis of data from microstructural studies of compacted graphite iron to gain
knowledge about the factors favouring the formation of ausferrite. The studies led to the development of rules to evaluate the content of
ausferrite based on the chemical composition. Data mining methods have been used to generate regression models such as boosted trees,
random forest, and piecewise regression models. The development of a stepwise regression modelling process on the iteratively limited
sets enabled, on the one hand, the improvement of forecasting precision and, on the other, acquisition of deeper knowledge about the
ausferrite formation. Repeated examination of the significance of the effect of various factors in different regression models has allowed
identification of the most important variables influencing the ausferrite content in different ranges of the parameters variability.
The paper analyses the distorted data of an electronic nose in recognizing the gasoline bio-based additives. Different tools of data mining, such as the methods of data clustering, principal component analysis, wavelet transformation, support vector machine and random forest of decision trees are applied. A special stress is put on the robustness of signal processing systems to the noise distorting the registered sensor signals. A special denoising procedure based on application of discrete wavelet transformation has been proposed. This procedure enables to reduce the error rate of recognition in a significant way. The numerical results of experiments devoted to the recognition of different blends of gasoline have shown the superiority of support vector machine in a noisy environment of measurement.
Decision-making processes, including the ones related to ill-structured problems, are of considerable significance in the area of construction projects. Computer-aided inference under such conditions requires the employment of specific methods and tools (non-algorithmic ones), the best recognized and successfully used in practice represented by expert systems. The knowledge indispensable for such systems to perform inference is most frequently acquired directly from experts (through a dialogue: a domain expert - a knowledge engineer) and from various source documents. Little is known, however, about the possibility of automating knowledge acquisition in this area and as a result, in practice it is scarcely ever used. lt has to be noted that in numerous areas of management more and more attention is paid to the issue of acquiring knowledge from available data. What is known and successfully employed in the practice of aiding the decision-making is the different methods and tools. The paper attempts to select methods for knowledge discovery in data and presents possible ways of representing the acquired knowledge as well as sample tools (including programming ones), allowing for the use of this knowledge in the area under consideration.
The application of the 5S methodology to warehouse management represents an important
step for all manufacturing companies, especially for managing products that consist of
a large number of components. Moreover, from a lean production point of view, inventory
management requires a reduction in inventory wastes in terms of costs, quantities and time
of non-added value tasks. Moving towards an Industry 4.0 environment, a deeper understanding
of data provided by production processes and supply chain operations is needed:
the application of Data Mining techniques can provide valuable support in such an objective.
In this context, a procedure aiming at reducing the number and the duration of picking
processes in an Automated Storage and Retrieval System. Association Rule Mining is applied
for reducing time wasted during the storage and retrieval activities of components
and finished products, pursuing the space and material management philosophy expressed
by the 5S methodology. The first step of the proposed procedure requires the evaluation
of the picking frequency for each component. Historical data are analyzed to extract the
association rules describing the sets of components frequently belonging to the same order.
Then, the allocation of items in the Automated Storage and Retrieval System is performed
considering (a) the association degree, i.e., the confidence of the rule, between the components
under analysis and (b) the spatial availability. The main contribution of this work is
the development of a versatile procedure for eliminating time waste in the picking processes
from an AS/RS. A real-life example of a manufacturing company is also presented to explain
the proposed procedure, as well as further research development worthy of investigation.
The paper presents the key-finding algorithm based on the music signature concept. The proposed music signature is a set of 2-D vectors which can be treated as a compressed form of representation of a musical content in the 2-D space. Each vector represents different pitch class. Its direction is determined by the position of the corresponding major key in the circle of fifths. The length of each vector reflects the multiplicity (i.e. number of occurrences) of the pitch class in a musical piece or its fragment. The paper presents the theoretical background, examples explaining the essence of the idea and the results of the conducted tests which confirm the effectiveness of the proposed algorithm for finding the key based on the analysis of the music signature. The developed method was compared with the key-finding algorithms using Krumhansl-Kessler, Temperley and Albrecht-Shanahan profiles. The experiments were performed on the set of Bach preludes, Bach fugues and Chopin preludes.
The use of quantitative methods, including stochastic and exploratory techniques in environmental studies does not seem to be sufficient in practical aspects. There is no comprehensive analytical system dedicated to this issue, as well as research regarding this subject. The aim of this study is to present the Eco Data Miner system, its idea, construction and implementation possibility to the existing environmental information systems. The methodological emphasis was placed on the one-dimensional data quality assessment issue in terms of using the proposed QAAH1 method - using harmonic model and robust estimators beside the classical tests of outlier values with their iterative expansions. The results received demonstrate both the complementarity of proposed classical methods solution as well as the fact that they allow for extending the range of applications significantly. The practical usefulness is also highly significant due to the high effectiveness and numerical efficiency as well as simplicity of using this new tool.