
Further Efficient Algorithms for Generating Antibody Strings
Patrik D?haeseleer
Department of Computer Science
University of New Mexico
Albuquerque, New Mexico, 87131
[email protected]
http://www.cs.unm.edu/~patrik
Technical Report No. CS953, 11/1/95
Abstract
In this report we analyze the algorithm described by Helman et al. in [3], intended to improve the performance of the changedetection method developed by Forrest et al. [1, 2]. We present a greedy algorithm that attempts to reduce the size of the detector set based on a counting recurrence similar to that of Helman et al. In addition to counting the number of strings unmatched by self, we can also count the number of ?holes?, i.e. strings for which no detector can be constructed.
1. Introduction
The changedetection algorithm described in [1, 2] is based on the following ideas:
 We are given a set S of ?self? strings of length l, over an alphabet S of m symbols. For example, this set S might be obtained by splitting up a file into lbit strings, characterizing the normal behavior pattern of a process, etc.
 Given a matching rule between detectors and self strings, generate a set R of detectors, also called antibodies in [1, 2, 3], such that each detector fails to match any string in S. In this paper, detectors are represented as strings of the same length as the strings of S.
 Monitor S for changes by continually matching the detectors in R against the strings in S. If any of the detectors ever matches a string in S, a change is known to have occurred.
As in [3], we restrict our analysis to the case of binary strings (m=2). The algorithm presented here can easily be extended to a higherorder alphabet. The detectors themselves are binary strings of the same length (l), and the matching rule used is ?rcontiguousbits?: two strings s and s? match if and only if they are identical on at least r contiguous positions. Parameter r is chosen in advance and determines the matching probability (i.e. the probability of a match between two random strings). In general, an algorithm to generate detectors will depend on how string matching is defined, so different matching rules will require different algorithm to generate detectorss.
In [1] and [2], candidate detectors are generated randomly and tested for matches with strings in S. If a match is found, the candidate is rejected. This generation method does have the advantage that it caan be