Multi Expression Programming

horizontal rule

Home
What's new?
People
MEP description
Papers
Source Code
Software
Links

Content

MEP representation

Decoding MEP chromosome and the fitness assignment process

Why encoding multiple solutions within a chromosome?

How to encode multiple solutions within a chromosome?

 

 

MEP representation

MEP genes are (represented by) substrings of a variable length. The number of genes per chromosome is constant. This number defines the length of the chromosome. Each gene encodes a terminal or a function symbol. A gene that encodes a function includes pointers towards the function arguments. Function arguments always have indices of lower values than the position of the function itself in the chromosome.

The proposed representation ensures that no cycle arises while the chromosome is decoded (phenotypically transcripted). According to the proposed representation scheme, the first symbol of the chromosome must be a terminal symbol. In this way, only syntactically correct programs (MEP individuals) are obtained.

Example

Consider a representation where the numbers on the left positions stand for gene labels. Labels do not belong to the chromosome, as they are provided only for explanation purposes.

For this example we use the set of functions

F = {+, *},

and the set of terminals

T = {a, b, c, d}.

An example of chromosome using the sets F and T is given below:

1: a

2: b

3: + 1, 2

4: c

5: d

6: + 4, 5

7: * 3, 6

 

The maximum number of symbols in MEP chromosome is given by the formula:

Number_of_Symbols = (n + 1) * (Number_of_Genes – 1) + 1,

where n is the number of arguments of the function with the greatest number of arguments.

The maximum number of effective symbols is achieved when each gene (excepting the first one) encodes a function symbol with the highest number of arguments. The minimum number of effective symbols is equal to the number of genes and it is achieved when all genes encode terminal symbols only.

 

Decoding MEP chromosome and the fitness assignment process

Now we are ready to describe how MEP individuals are translated into computer programs. This translation represents the phenotypic transcription of the MEP chromosomes.

Phenotypic translation is obtained by parsing the chromosome top-down. A terminal symbol specifies a simple expression. A function symbol specifies a complex expression obtained by connecting the operands specified by the argument positions with the current function symbol.

For instance, genes 1, 2, 4 and 5 in the previous example encode simple expressions formed by a single terminal symbol. These expressions are:

E1 = a,

E2 = b,

E4 = c,

E5 = d,

Gene 3 indicates the operation + on the operands located at positions 1 and 2 of the chromosome. Therefore gene 3 encodes the expression:

E3 = a + b.

Gene 6 indicates the operation + on the operands located at positions 4 and 5. Therefore gene 6 encodes the expression:

E6 = c + d.

Gene 7 indicates the operation * on the operands located at position 3 and 6. Therefore gene 7 encodes the expression:

E7 = (a + b) * (c + d).

E7 is the expression encoded by the whole chromosome.

There is neither practical nor theoretical evidence that one of these expressions is better than the others. This is why each MEP chromosome is allowed to encode a number of expressions equal to the chromosome length (number of genes). The chromosome described above encodes the following expressions:

E1 = a,

E2 = b,

E3 = a + b,

E4 = c,

E5 = d,

E6 = c + d,

E7 = (a + b) * (c + d).

The value of these expressions may be computed by reading the chromosome top down. Partial results are computed by dynamic programming and are stored in a conventional manner.

Due to its multi expression representation, each MEP chromosome may be viewed as a forest of trees rather than as a single tree, which is the case of Genetic Programming.

As MEP chromosome encodes more than one problem solution, it is interesting to see how the fitness is assigned.

The chromosome fitness is usually defined as the fitness of the best expression encoded by that chromosome.

For instance, if we want to solve symbolic regression problems, the fitness of each sub-expression Ei may be computed using the formula:

where ok,i is the result obtained by the expression Ei for the fitness case k and wk is the targeted result for the fitness case k. In this case the fitness needs to be minimized.

The fitness of an individual is set to be equal to the lowest fitness of the expressions encoded in the chromosome:

When we have to deal with other problems, we compute the fitness of each sub-expression encoded in the MEP chromosome. Thus, the fitness of the entire individual is supplied by the fitness of the best expression encoded in that chromosome.

 

Why encoding multiple solutions within a chromosome?

When you compute the value of an expression encoded as a GP tree you have the compute the value of all subtrees. This means that all GP subtrees can be viewed as a potential solution of the problem being solved. Most of the GP techniques considers only the one tree while ignoring all the other subtrees. However, the value/fitness for all subtrees is computed by GP. The biggest difference between MEP and other GP techniques is that MEP outputs the best subtree encoded in a chromosome. Note that the complexity (roughly speaking - the running time) is the same for MEP and other GP techniques encoding 1 solution/chromosome.

The second reason for the this question is motivated by the No Free Lunch Theorems for Search. There is neither practical nor theoretical evidence that one of the solutions encoded in a chromosome is better than the others. More than that, Wolpert and McReady proved that we cannot use the search algorithm's behavior so far for a particular test function to predict its future behavior on that function.

 

How to encode multiple solutions within a chromosome?

The second question is more difficult than the first one. There is no general prescription on how to encode multiple solutions in the same chromosome. In most cases this involves creativity and imagination. However, some general suggestions can be given.

In order to obtain some benefits from encoding more than one solution in a chromosome we have to spend a similar effort (computer time, memory etc) as in the case of encoding a single solution in a chromosome. For instance, if we have 10 solutions encoded in chromosome and the time needed to extract and decode these solutions is 10 times larger than the time needed to extract and process one solution we got nothing. In this case we can not talk about an useful encoding.

We have to be careful when we want to encode a multiple solutions in a variable length chromosome (for instance in a standard GP chromosome), because this kind of chromosome will tend to increase its size in order to encode more and more solutions. And this could lead to bloat.

Usually encoding multiple solutions in a chromosome might require storing the partial results. Sometimes this can be achieved by using the Dynamic Programming technique. This kind of model is employed by the Multi Expression Programming.

horizontal rule

Home | What's new? | People | MEP description | Papers | Source Code | Software | Links

 
contact:
Last updated: miercuri mai 03, 2006.