Fuzzy Decision Trees

Introduction

A fuzzy decision tree is a collection of branches  in which the end point corresponds outcomes and the branches correspond to paths that lead to the outcome. Unless you understand AI the title doesnt mean much on its own. I shall split the explanation in two parts principles of a decision tree, machine learning and fuzzy decision trees. A ridiculously simple example follows.

Aircraft landing – A simplified decision Tree Example

In this table the decision that has to be made is whether the aircraft has to apply airbrakes or not. The things to consider are aircraft speed which can be: fast, slow or very fast;  the runway size which can be: short, medium or long. And the conclusion is whether the aircraft should apply air brakes.

Table 1.

Speed Runway Size Apply Air Brakes
Slow Short Yes
Slow Medium No
Slow Long No
Fast Short Yes
Fast Medium Yes
Fast Long No
Very Fast Short Yes
Very Fast Medium Yes
Very Fast Long Yes

 

 

Figure 1 Aircraft Decision Tree.

Explanation of the Decision Tree

The decision tree in Figure 1 can be used to program rules into a computer. In my case it was a programming language called fril created by the university of Bristol AI department the brainschild of professor Jim Baldwin.

Some rules read from branches of the diagram are as follows:

If (speed=slow) AND (runway=short):  Apply Airbrakes

If (speed=fast) AND (runway=short or runway=medium): Apply Airbrakes

If( speed=very fast): Apply airbrakes.

If( speed=fast) AND (runway size=long ): Don’t apply Airbrakes.

This raises the following questions how do we construct a decision tree with the minimal number of branches so that the minimal lines of code will be used to program the computer.

The answer lies in calculating the node that produces the most information, the result of such a calculation is called the information gain. The branch with the highest information gain should go first the next branch, this process is repeated for all branches until all the data is classified.

An algorithm called ID3 was used to create code that flew and landed an aircraft from one destination to another-safely.

Table Minimization with ID3

The following decision tree is for the classification of cats. The resulting decision tree is formed from calculations based on the information gain.

Table 2 Cat or Not

Diet Size Colour habitat Species Class “cat”
Meat Large Striped Jungle Tiger Yes
Meat Large Tawny Jungle Lion Yes
Meat Small Striped House Tabby Yes
Meat Small Brown Jungle Weasel No
Grass Large Striped Plains Zebra No
Grass Small Grey Plains Rabbit No
Grass Large Tawny Plains Antelope No

 

I will not go into the detailed maths of the information gain but what should be noted is that the table is formed of four attributes i.e. diet size colour and habitat a label species and a classification which is whether it is a cat or not.

Figure 2 – Cat Decision Tree

The resulting decision tree is shown in figure 2. I have applied the information gain calculation at each branch in the decision tree to determine which branch comes first second and third. The result is that once two attributes have been selected with the highest information gain optimal or near optimal decision tree is formed and  the table is simplified to just  a few rules.

Fuzzy Decision Trees

Instead of having a value of true or false, fuzzy logic measures truth with probability. It is up to the user of this data to enforce the boundaries. For example if the data was said to be true with a probability of 0.59, then would could say that this data is uncertain, -that is if we have set the uncertainty boundaries at say 0.4 to 0.6. If it was true with a probability of 0.76, then we could say that the data is true,-If the truth boundaries are upwards of 0.60. The fuzzy decision tree algorithm uses a tree formation algorithm called mass assignment ID3. A classification is not determined by a yes and no answer but by the probabilities of the classification being true and the boundaries for true uncertain and false. The branches of the tree are formed by positioning those ends with the highest information gain, at the front of the tree. This produces near optimal decision trees, with data that has not got clearly defined attributes, more often than not.