General to Specific Beam Search


The Learn-One-Rule procedure used in CN2 is normally implemented as general to specific beam search. This is not the only way, but is reported to be the most efficient search technique. It generates complexes from the most general, then greedily specialize those complexes until desired performance is reached.


Complex is conjunction of attriute-value specification. It forms the condition part in a rule, like "if condition then predict class".
For the examples in sequential covering page, all the most general complexes are: 
Size=Big Size=Small Colour=Red
Colour=Green Colour=Blue Shape=Square
Shape=Triangle Weight=Light Weight=Heavy

Specializing a complex is making a conjunction of the comlex with one more attribute-value pair. For example:
Colour=Green & Shape=Square (specilizing Colour=Green or 
Colour=Blue & Weight=Heavy   (specilizing Colour=Blue or 


  1. Initialize a set of most general complexes.
  2. Evaluate performances of those complexes over the example set. 
    • Count how many positive and negative examples it covers. 
    • Evaluate their performances. 
  3. Sort complexes according to their performances. 
  4. If the best complex satisfies some threshold, form the hypothesis and return. 
  5. Otherwise, pick k best performing complexes for the next generation. 
  6. Specializing all k complexes in the set to find new set of less general complexes.
  7. Go to step 2.
The number k is the beam factor of the search, meaning the maximum number of complexes to be specialized.


Below is an illustrative figure of beam search. In the first step, 2 best complexes are found, namely A=A1 and B=B2. None of them satisfy the threshold, then the next level complexes are expanded and found 2 best complexes, eg. A=A1 & D=D1 and B=B2 & A=A2. The procedure keeps going until we find a complex satisfies threshold

Evaluation performance function for beam search

  There are two options for evaluation function:

1. Using Entropy. This is original option. The evaluation is based on Entropy of the set covered by that complex. Here is an example of a hypothesis covering 8 positive and 2 negative examples.

p1 = P(positive) = 8/(2+8) = 0.8;
p2 = P(negative) = 2/(2+8) = 0.2;
Entropy = - p1 * log(p1) - p2 * log(p2) = 0.72.
In this function, the smaller entropy is, the better the complex is. In other words, the accuracy function can be defined as (1-Entropy).

2. Using Laplace Accuracy. Suppose that a complex covers y positive and n negative examples. Then its accuracy for positive class pridiction is estimated as: 

Advantage of Laplace Accuracy function over Entropy is when consider 2 hypothesis: One with 100+ and 1-, one with 3+ and 0-. Using Entropy, the latter (with Entropy = 0) is considered to be better. However, Laplace accuracy of the (100+,1-) is 99%, while the (3+,0-) is 80%. It states that the former is better than the latter.

Using Laplace Accuracy, complex performance is evaluated as a hypothesis with the classified class is the majority it covers.


Beam search tries to find the optimal complex in a heuristical way. It has 2 advantages:
Developed by Canh Hao Nguyen and Hong Chung Nguyen.