page 1  (11 pages) 2

Further Efficient Algorithms for Generating Antibody Strings

Patrik D?haeseleer
Department of Computer Science
University of New Mexico
Albuquerque, New Mexico, 87131
[email protected]
http://www.cs.unm.edu/~patrik

Technical Report No. CS95-3, 11/1/95

Abstract

In this report we analyze the algorithm described by Helman et al. in [3], intended to improve the performance of the change-detection method developed by Forrest et al. [1, 2]. We present a greedy algorithm that attempts to reduce the size of the detector set based on a counting recurrence similar to that of Helman et al. In addition to counting the number of strings unmatched by self, we can also count the number of ?holes?, i.e. strings for which no detector can be constructed.

1. Introduction

The change-detection algorithm described in [1, 2] is based on the following ideas:

- We are given a set S of ?self? strings of length l, over an alphabet S of m symbols. For example, this set S might be obtained by splitting up a file into l-bit strings, characterizing the normal behavior pattern of a process, etc.

- Given a matching rule between detectors and self strings, generate a set R of detectors, also called antibodies in [1, 2, 3], such that each detector fails to match any string in S. In this paper, detectors are represented as strings of the same length as the strings of S.

- Monitor S for changes by continually matching the detectors in R against the strings in S. If any of the detectors ever matches a string in S, a change is known to have occurred.

As in [3], we restrict our analysis to the case of binary strings (m=2). The algorithm presented here can easily be extended to a higher-order alphabet. The detectors themselves are binary strings of the same length (l), and the matching rule used is ?r-contiguous-bits?: two strings s and s? match if and only if they are identical on at least r contiguous positions. Parameter r is chosen in advance and determines the matching probability (i.e. the probability of a match between two random strings). In general, an algorithm to generate detectors will depend on how string matching is defined, so different matching rules will require different algorithm to generate detectorss.

In [1] and [2], candidate detectors are generated randomly and tested for matches with strings in S. If a match is found, the candidate is rejected. This generation method does have the advantage that it caan be