## General to Specific Beam Search

### Introduction

The Learn-One-Rule procedure used in CN2 is normally implemented as general to specific beam search. This is not the only way, but is reported to be the most efficient search technique. It generates complexes from the most general, then greedily specialize those complexes until desired performance is reached.

### Teminology

Complex is conjunction of attriute-value specification. It forms the condition part in a rule, like "if condition then predict class".
For the examples in sequential covering page, all the most general complexes are:
 Size=Big Size=Small Colour=Red Colour=Green Colour=Blue Shape=Square Shape=Triangle Weight=Light Weight=Heavy

Specializing a complex is making a conjunction of the comlex with one more attribute-value pair. For example:
 Colour=Green & Shape=Square (specilizing Colour=Green or  Shape=Square) Colour=Blue & Weight=Heavy (specilizing Colour=Blue or  Weight=Heavy)

### Algorithm

1. Initialize a set of most general complexes.
2. Evaluate performances of those complexes over the example set.
• Count how many positive and negative examples it covers.
• Evaluate their performances.
3. Sort complexes according to their performances.
4. If the best complex satisfies some threshold, form the hypothesis and return.
5. Otherwise, pick k best performing complexes for the next generation.
6. Specializing all k complexes in the set to find new set of less general complexes.
7. Go to step 2.
The number k is the beam factor of the search, meaning the maximum number of complexes to be specialized.

### Visualization

Below is an illustrative figure of beam search. In the first step, 2 best complexes are found, namely A=A1 and B=B2. None of them satisfy the threshold, then the next level complexes are expanded and found 2 best complexes, eg. A=A1 & D=D1 and B=B2 & A=A2. The procedure keeps going until we find a complex satisfies threshold

### Evaluation performance function for beam search

 There are two options for evaluation function: 1. Using Entropy. This is original option. The evaluation is based on Entropy of the set covered by that complex. Here is an example of a hypothesis covering 8 positive and 2 negative examples. p1 = P(positive) = 8/(2+8) = 0.8; p2 = P(negative) = 2/(2+8) = 0.2; Entropy = - p1 * log(p1) - p2 * log(p2) = 0.72. In this function, the smaller entropy is, the better the complex is. In other words, the accuracy function can be defined as (1-Entropy). 2. Using Laplace Accuracy. Suppose that a complex covers y positive and n negative examples. Then its accuracy for positive class pridiction is estimated as:  Advantage of Laplace Accuracy function over Entropy is when consider 2 hypothesis: One with 100+ and 1-, one with 3+ and 0-. Using Entropy, the latter (with Entropy = 0) is considered to be better. However, Laplace accuracy of the (100+,1-) is 99%, while the (3+,0-) is 80%. It states that the former is better than the latter. Using Laplace Accuracy, complex performance is evaluated as a hypothesis with the classified class is the majority it covers.