Finding f(x) with Genetic Algorithms
Robert Young and Jordan Chin
Vancouver Technical Secondary
Floor Location : S 192 E

A genetic algorithm (GA) is a computer search technique involving concepts from evolutionary biology. Basically, random solutions are initially generated to represent individuals in a virtual population. Each solution (individual) is given a fitness rating based on how well it solves the problem. Through selection of these solutions (those with the highest fitness), these solutions undergo reproduction. Crossovers (and mutation) of the surviving solutions are used to create the next generation of solutions. Thus, every generation is an evolution towards the final solution. Since their conception, GA's have been used as adaptive algorithms for solving practical problems by many scientists, engineers, and software programmers.

Therefore, we wondered how effective it would be to use a GA to approximate functions (function approximation software often requires advanced math). Through the project, we pondered questions such as, "What is the fewest points necessary for good approximations?", "How fast can the GA be?", and "Can it output practical functions?". This project aims to create a computer program that, given a set of points, can approximate a mathematical function using a GA, without prior knowledge of what function the data may resemble.

With this program, users are able to perform quick approximations with any two-variable data sets (for example, from a laboratory experiment). Furthermore, since the program outputs the top 10 functions that it has fitted, along with their percentage of error from the data, it may be able to give users insight into other ways their data may relate. Finally, this program can be an open-source alternative to expensive programs available. The program was first programmed in C++ and prepared to be tested for its optimal parameters (the variables that play a role in GA's, including mutation rate, crossover rate, and number of species). To accomplish this, another GA was coded that would swap different parameter values and would thus search for the optimum values. Each combination of parameters was rated out of 50 for the amount of time it took, and another 50 for the accuracy in finding a solution. We ran this GA with various data sets created to ensure a globally maximal solution. From there, the optimal values were used in the remainder of the testing.

Now, the program was tested for performance (for example when a different number of points are used and the number of generations it would take for the percentage error of the solution to plateau). In this test, 5, 10, and 20 data points were taken each from 5 known functions and were used on the program. Finally, the program was compared to other function approximation programs currently available against criteria that we designed to suit our purpose.

Based on the data collected and analyzed, the program managed to fit data well to functions within reasonable time, although results often had complex expressions. Still, portions of the expressions could often easily be simplified by hand. Therefore, it is safe to conclude that the program is for the most part accurate and practical for human use.