Natural Code

Code, science and politics.

Data streams, part 1

American Scientist has an article in their latest issue that talks about the algorithmic difficulties of processing streams of data. It begins by talking about how search engines identify the most popular people (“The Britney Spears problem”), and then goes on to use simple number streams to explain the subject.

I was making progress in reading the article, when I got sidetracked by this :

Although identifying the most frequent item in a stream is hard in general, there is an ingenious way of doing it in one special case—namely, when the most common item is so popular that it accounts for a majority of the stream entries (more than half the elements).

The algorithm that accomplishes this task requires just two registers, and it runs in a constant amount of time per stream element. (Before reading on you might want to try constructing such an algorithm for yourself.)

This is where I put down the magazine, opened SciTE, and wrote this :

require 'stream_algorithm'

reader = StreamReader.new
reader.check :total_amount, :biggest_element

stream = NumberStream.new( reader ).start

The code for the required ’stream_algorithm’ file is at the end of this post. When I run this, I get output that looks like this :

Starting streamreader...
Stream running!
Reading : 8
Reading : 8
Reading : 6
Reading : 4
Reading : 5
Reading : 8
Reading : 2
Reading : 6
Reading : 2
Reading : 8
Stream stopped.
Total amount : 10
Biggest element : 8

Cool. Now I’ve got a simple way of testing and playing with the concepts in the article. Next I’ll try to make the stream reader keep track of the most frequent number using only two numerical variables.

To be continued.

Here’s the code for the NumberStream and StreamReader classes.

#stream_algorithm.rb
def force string
  puts string
  STDOUT.flush
end

class NumberStream
  attr_reader :length, :reader

  def initialize reader, length=10
    @reader = reader
    @length = length
  end

  def start
    force "Stream running!"

    1.upto(length) { next_number }; stop
  end

  def stop
    force "Stream stopped."
    reader.display_results
  end

  def next_number
    reader.send rand( 10 )
    sleep 0.1
  end
end

class StreamReader
  attr_accessor :to_check

  def initialize
    force "Starting streamreader..."
  end

  def send element
    force "Reading : #{element}"
    process element
  end

  def check *args
    @to_check = args
    @to_check.each { |thing| eval "@#{thing} = 0" }
  end

  def process element
    if to_check.include? :total_amount
      @total_amount += 1
    end

    if to_check.include? :biggest_element
      @biggest_element = element if @biggest_element < element
    end
  end

  def display_results
    to_check.each do |thing|
      display_name = thing.to_s.capitalize.gsub( /_/, ' ' )
      variable = eval "@#{thing}"

      force "#{display_name} : #{variable}"
    end
  end
end

July 6, 2008 Posted by naturalcode | Uncategorized | , , , | 1 Comment

A basic genetic algorithm (part 3)

I’ve posted the updated code here, on Refactor :my => ‘code’, a cool little site I just found.

I’ll go refactor someone else’s code on there later, could help me with my coding too.

The program works now, although it’s not too efficient. I need to add sexual reproduction, ’cause right now they’re asexual and that’s slower.

EDIT: Okay, I’ll just leave it up on the other site then, because WordPress was throwing HTML in my Ruby, like this :

def initialize(copy_genome=‘off’ <img src=“http://s.wordpress.com/wp-includes/images/smilies/icon_wink.gif” alt=“;)” class=“wp-smiley”>

June 8, 2008 Posted by naturalcode | Uncategorized | , , , , | No Comments Yet

A basic genetic algorithm (part 2)

Okay, so I’ve broken this up into two files :

equation.rb

class Array
  def random
    self[rand(self.length)]
  end
end

class Genome
  attr_accessor :code

  @@decode = {   '0000' => '0.0', '0001' => '1.0',
  '0010' => '2.0', '0011' => '3.0', '0100' => '4.0',
  '0101' => '5.0', '0110' => '6.0', '0111' => '7.0',
  '1000' => '8.0', '1001' => '9.0', '1010' => '+',
  '1011' => '-', '1100' => '*', '1101' => '/' }
  @@operators = %w[+ - * /]
  @@numbers = %w[0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0]

  def initialize(min_length=4, max_length=32)
    length = min_length + rand(1 + max_length - min_length)
    @code = generate_code(length)
  end

  def generate_code(length)
    code = ''
    1.upto(length) do
      code += ['0', '1'].random
    end
    code
  end

  def decode
    decoded = ''
    expected = :number

    1.upto(@code.length) do |i|
      if i % 4 == 0
        coded = @code.slice((i-4)..(i-1))
        symbol = @@decode[coded].to_s
        puts symbol
        if expected == :number && @@numbers.include?(symbol)
          decoded += symbol
          expected = :o perator
        elsif expected == :o perator &&
        @@operators.include?(symbol)
          decoded += symbol
          expected = :number
        else
          puts "#{expected} expected, #{symbol} found."
        end
      end
    end

    if expected == :number
      decoded.chop!
    end

    return decoded
  end
end

class Equation
  attr_accessor :genome, :phenotype

  def initialize
    @genome = Genome.new(4, 64)
    @phenotype = @genome.decode
  end
end

formula_ga.rb

require 'equation'

class Population
  def initialize(size=50, target_number=11)
    @equations = []
    @size = size
    @target_number = target_number
    1.upto(size) { @equations << Equation.new }
  end

  def members
    @equations
  end

  def evaluate_fitness(equation)
    answer = eval(equation.phenotype)
    deviation = @target_number - answer.to_f
    fitness = 1 / deviation
  end

  def sort_by_fitness(equation_array)

  end

  def next_generation
    reproduction_pool = []
    1.upto(@size / 2) do
      reproduction_pool << @equations.random
    end
  end
end

pop = Population.new
pop.members.each do |member|
  puts member.genome.code
  puts member.phenotype
  x = eval(member.phenotype)
  puts x.to_s
  puts "Fitness : #{pop.evaluate_fitness(member)}"
  puts ""
end

June 7, 2008 Posted by naturalcode | Uncategorized | , , , , | No Comments Yet

A basic genetic algorithm (part 1)

This here is an intro to genetic algorithms with a nice little biological analogy and everything. It starts by explaining the evolution of blind, clumsy, algae-eating, cave-dwelling creatures called Hooters into light-seeing, eagle-dodging, moss-eating machines.

It then goes on to explain how this is relevant to computing, and gives a simple example of a problem that can be solved with a genetic algorithm. In this case it involves evolving equations that give a number you’re looking for.

I’m already a little familiar with GAs, I’ve written one that generated words out of random characters, but it was slightly ugly. Especially the fitness function. I’ll to do this one properly.

I’m going to solve the equation problem and post my code later (it’ll be in Ruby).

UPDATE : Here’s the code so far. I’m off to play DOTA.


class Array
  def random
    self[rand(self.length)]
  end
end

class Genome
  attr_accessor :code

  def initialize(min_length=4, max_length=32)
    length = min_length + rand(1 + max_length - min_length)
    @code = generate_code(length)
  end

  def generate_code(length)
    code = ''
    1.upto(length) do
      code += ['0', '1'].random
    end
    code
  end
end

class Equation
  def initialize
    @genome = Genome.new
  end

  def genome
    @genome.code
  end
end

class Population
  def initialize(size=50)
    @equations = []
    1.upto(size) { @equations << Equation.new }
  end

  def members
    @equations
  end
end

Population.new.members.each { |member| puts member.genome}

Anyone know of a better way to easily manipulate a series of bits than by storing it in a string? It’s fine if I just have a few but I like to make these things scale, if possible, and I know from experience that this kind of thing tends toward total memory consumption.

June 3, 2008 Posted by naturalcode | Technology | , , , , | No Comments Yet