To appear in the 12th International Conference on Data Engineering (ICDE'96), New Orleans, Louisiana, February/March 1996.
Knowledge Discovery from Telecommunication
Network Alarm Databases
K. H?at?onen M. Klemettinen H. Mannila
P. Ronkainen H. Toivonen
University of Helsinki, Department of Computer Science
P.O. Box 26, FIN-00014 Helsinki, Finland
e-mail: [email protected]
A telecommunication network produces daily large amounts of alarm data. The data contains hidden valuable knowledge about the behavior of the network. This knowledge can be used in filtering redundant alarms, locating problems in the network, and possibly in predicting severe faults. We describe the TASA (Telecommunication Network Alarm Sequence Analyzer) system for discovering and browsing knowledge from large alarm databases.
The system is built on the basis of viewing knowledge discovery as an interactive and iterative process, containing data collection, pattern discovery, rule postprocessing, etc. The system uses a novel framework for locating frequently occurring episodes from sequential data.
The TASA system offers a variety of selection and ordering criteria for episodes, and supports iterative retrieval from the discovered knowledge. This means that a large part of the iterative nature of the KDD process can be replaced by iteration in the rule postprocessing stage. The user interface is based on dynamically generated HTML. The system is in experimental use, and the results are encouraging: some of the discovered knowledge is being integrated into the alarm handling software of telecommunication operators.
Knowledge discovery in databases (KDD) has recently attracted a lot of interest from researchers and users of database systems; see [5, 13] for overviews. KDD combines methods and tools from machine learning, statistics, and databases. It can be loosely defined as the task of obtaining useful and interesting knowledge from large collections of data.
KDD is an iterative and interactive process [3, 4]. In the core of the KDD process are the algorithms for discovering different types of patterns (rules, trends, etc.) from data. However, in the whole problem of obtaining useful knowledge, the inference of patterns is only a small part. Among the other tasks are the following:
1. data collection and cleaning (what types of data can be used, how errors in the data are handled, what is to be done with missing data, etc.); identification of the necessary background knowledge;
2. choice of pattern discovery methods (what types of knowledge are to be discovered, parameter selection, etc.);
3. discovery of patterns (data mining);1
4. postprocessing the discovered knowledge (selection of truly interesting patterns, presentation of patterns, etc.);
5. putting the discovered knowledge into use.
In this paper we describe the TASA (Telecommunication Network Alarm Sequence Analyzer) system for discovering knowledge from telecommunication network alarm databases. The system incorporates components for two parts of the KDD process: pattern discovery (or data mining) and postprocessing.
The knowledge discovered in TASA is expressed in terms of rules. The system is based on a novel framework for locating frequently occurring episodes from sequential data. The algorithms we use are based on methods presented in , and they have a completeness property: they find from the data all rules having certain properties.
We use these algorithms to find effectively large sets of patterns from the data (typically thousands of rules). By finding a large collection of rules, a large part of the iterative nature of the KDD process can be replaced by iteration in the rule postprocessing stage.
The TASA system offers a variety of selection and ordering criteria, and supports iterative retrieval from the discovered knowledge. The users can manipulate the collection of discovered rules using selection and ordering operations, as well as more complex operations for including or excluding certain classes of rules.
1We use here the terminology of Fayyad et al. , where data mining" refers to the pattern extraction part of the KDD process.