Data mining
Introduction
Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. It is an essential process where intelligent methods are applied to extract data patterns. It is an interdisciplinary subfield of computer science with an overall goal to extract information (with intelligent methods) from a data set and transform the information into a comprehensible structure for further use.
History
The history of data mining dates back to the 1960s. However, the term 'data mining' was only introduced in the 1990s. Data mining is a process that uses a variety of data analysis tools to discover patterns and relationships in data that may be used to make valid predictions. The newest generation of data mining tools is capable of being used by novice computer users and are simple to use as compared to the data mining tools of the past.
Process
The actual data mining task is the automatic or semi-automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection), and dependencies (association rule mining, sequential pattern mining). This usually involves using database techniques such as spatial indices. These patterns can then be seen as a kind of summary of the input data, and may be used in further analysis or, for example, in machine learning and predictive analytics.
Techniques
There are several major data mining techniques have been developing and using in data mining projects recently including association, classification, clustering, prediction, sequential patterns, decision trees, neural networks, and text mining. Data mining techniques are used in a many research areas, including mathematics, cybernetics, genetics and marketing.
Applications
Data mining is used wherever there is digital data available today. Notable examples of data mining can be found throughout business, medicine, science, and surveillance. While data mining can be used to uncover hidden patterns in data, these patterns do not have any meaning until they are validated or can be used to support decision-making processes.
Privacy concerns and ethics
With the advent of data mining, concerns have been raised about whether privacy, confidentiality, and personal security are being threatened. There is a growing public concern that personal information is being abused by both private and public sector entities. Ethical implications involve the nature of the data being mined, the way the data is stored and used, and the impact of the data mining on individuals and societies.