Ruby 1.9 and Unicode: The BOM Will Fuck Your Shit Up
So I’ve been playing around with the things mentioned in the title, and I found out something unfortunate when I moved a UTF-8 encoded file from a Ruby 1.9 machine to a Ruby 1.8 machine.
There’s this thing called a Byte Order Marker (BOM) that text editors use, apparently to remind themselves of the file’s UTF-8 encoding. I’m pretty sure it’s useless, because UTF-8 doesn’t actually have a variable byte order to keep track of, but there you go.
Basically, it’s 3 bytes that the text editor inserts at the beginning of a text file, and then hides from you. It might look like a plain text file, but it’s actually got 3 hidden bytes for no good reason. When you try to run it through the Ruby 1.8 interpreter, it’ll see 3 invalid characters on Line 1 and throw an error right away.
This sort of error message is pretty unhelpful, especially when you appear to have nothing at all on Line 1. You might enable visible whitespace: still nothing. You might try opening it in another text editor or IDE: you will likely still not see the problem, as the only program I’ve tried so far that doesn’t hide the BOM is NetBeans.
SciTE has two different UTF-8 encoding settings: UTF-8 and UTF-8 Cookie. In theory, the plain UTF-8 setting uses a Byte Order Marker, while UTF-8 Cookie setting doesn’t. In practice, the choice doesn’t seem to affect whether or not the Ruby interpreter chokes on the file, at least not with Ruby 1.8.
With 1.9 I’ve still had problems one or two times, but of the kind that could be fixed by closing the text editor, opening the file in NetBeans, removing the BOM, and restarting the text editor.
It’s not perfect, but at least it works now, even if it’s very slightly buggy.
Ruby 1.9 and Code Generation: How I Learned to Stop Worrying and Love Unicode
So I was working on this Ruby-based tool for generating Netbeans-compliant Swing app projects. Basically, I create a file that looks like this:
require 'java_swing' Swing.app 'Project03AK', :subtitle => 'Laptop lending tracker', :desc => 'This program keeps track of laptops borrowed by students.' do # Insert code here end
I run this script, and it generates a Netbeans project with a main class that’s a Swing window, automatically centered and titled, the project and the window all have nice clean standardized names. Everything was going great until I got to the part where I started inserting comments in the generated Java code.
Basically, I have this Ruby script that inserts the arguments passed to Swing.app into a bunch of templates, and uses the resulting text to generate both the Java code and the related Netbeans project files. The problem here is that both Ruby and SciTE, my text editor, encode text in ASCII by default, whereas Netbeans encodes text in UTF-8.
That’s fine as long as Ruby is only generating code that uses the 26 english letters and regular english punctuation, but as soon as you start using things like àccéntêd characters, Netbeans interprets it as gibberish. I go to a French school, and my professors do not accept me handing in gibberish (except for VB code), so this is a problem.
If you don’t know/care about any of these encoding schemes or non-english characters, you need to read this. I did a few hours ago, and it helped me figure all of this out.
Basically, the solution is to install Ruby 1.9, which has Unicode support, and then go to File->Encoding->UTF-8 in SciTE. An é in the text editor will then be written to the generated Java files as a UTF-8 é, which will then be correctly interpreted as an é by Netbeans.
Data streams, part 2
In a previous post I mentioned I was working on a program to test the data stream concepts from this article. I finished it and implemented the method for finding the most frequent element in a stream.
It keeps track of the most frequent element in a stream of letters or numbers using only two integer variables. It’s not guaranteed to be accurate, but it’s a constant-space algorithm that can work on a stream of indefinite length, and gives good results as long as the most frequent element makes up more than half of the total elements in the stream.
It’s more a small library than a program, actually. I just use it like this :
require 'stream_algorithm'
Which calls up this file. Next, I make a Stream Reader which I store in a variable…
reader = StreamReader.new.check :biggest_element, :most_frequent
>Starting StreamReader...
…and tell it to check for the biggest element in the stream (takes only one integer variable to track), as well as the most frequent element in the stream (takes only two integer/string variables but isn’t always reliable).
NumberStream.new( reader, :length => 10,
:skew => {:toward => 2, :percent => 55} ).start
This creates a Number Stream, assigns it the Stream Reader, gives the stream a length of 10 numbers, and skews the odds so that 55% of the time, the stream will contain 2 instead of a random number. The stream is then started.
The output from this part of the program looks like this :
>NumberStream running! >Reading : 3 3 x 1 >Reading : 1 3 x 0 >Reading : 2 2 x 1 >Reading : 7 2 x 0 >Reading : 2 2 x 1 >Reading : 7 2 x 0 >Reading : 2 2 x 1 >Reading : 2 2 x 2 >Reading : 2 2 x 3 >Reading : 2 2 x 4 >Stream stopped. >Biggest element : 7 >Most frequent : 2
I can do the same with letters :
reader.check :total_amount, :most_frequent
stream = LetterStream.new( reader, :length => 10,
:skew => {:toward => 'i', :percent => 45} ).start
The Reader’s check method resets its instance variables, so it’s ready to be used on a new stream. I create a new Letter Stream, and it gives me something like this :
>LetterStream running! >Reading : h h x 1 >Reading : i h x 0 >Reading : i i x 1 >Reading : z i x 0 >Reading : i i x 1 >Reading : i i x 2 >Reading : i i x 3 >Reading : i i x 4 >Reading : t i x 3 >Reading : c i x 2 >Stream stopped. >Total amount : 10 >Most frequent : i
And finally, I can do something like…
10.times { stream.start }
…and it’ll make the stream run to its length ten times while conserving state between iterations.
I thought it was fun, anyways.
From the ShoesFest
Yesterday was the first installment of the ShoesFest, a day of testing / learning why the lucky stiff’s Shoes toolkit for making multi-platform windowed Ruby apps.
I was on the irc channel for most of the day. Someone on there asked if there would be regular events, and _why seemed open to the idea.
I finally read Nobody Knows Shoes, and I threw together this little program. It’s a fake login window that responds only to malformed SQL. I’ll make it better and more interesting.
Shoes.app :title => 'HTSL Login', :width => 400, :height => 300, :resizable => false do
def login username, password
if username == "' or 1=1; DROP TABLE users; --"
alert 'Oh shit!'
else
alert 'Invalid username or password.'
end
end
background gradient( rgb(150,150,255), rgb(255,255,255) )
flow :width => '100%', :height => '70%' do
stack :width => '60%', :margin => 50 do
para "User name : \n\n", 'Password : '
@username = edit_line :top => 3, :left => 105
@password = edit_line :top => 46, :left => 105
button( 'Log in', :top => 100, :left => 110 ) { login @username.text, @password.text }
end
end
flow :width => '100%', :height => '30%', :margin => 18 do
para "Users currently logged in : #{logged_in = 1 + rand(2000)}\n",
"Total users : #{logged_in + rand(1000)}"
end
end
A basic genetic algorithm (part 3)
I’ve posted the updated code here, on Refactor :my => ‘code’, a cool little site I just found.
I’ll go refactor someone else’s code on there later, could help me with my coding too.
The program works now, although it’s not too efficient. I need to add sexual reproduction, ’cause right now they’re asexual and that’s slower.
EDIT: Okay, I’ll just leave it up on the other site then, because WordPress was throwing HTML in my Ruby, like this :
def initialize(copy_genome=‘off’ <img src=“http://s.wordpress.com/wp-includes/images/smilies/icon_wink.gif” alt=“;)” class=“wp-smiley”>
A basic genetic algorithm (part 2)
Okay, so I’ve broken this up into two files :
equation.rb
class Array
def random
self[rand(self.length)]
end
end
class Genome
attr_accessor :code
@@decode = { '0000' => '0.0', '0001' => '1.0',
'0010' => '2.0', '0011' => '3.0', '0100' => '4.0',
'0101' => '5.0', '0110' => '6.0', '0111' => '7.0',
'1000' => '8.0', '1001' => '9.0', '1010' => '+',
'1011' => '-', '1100' => '*', '1101' => '/' }
@@operators = %w[+ - * /]
@@numbers = %w[0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0]
def initialize(min_length=4, max_length=32)
length = min_length + rand(1 + max_length - min_length)
@code = generate_code(length)
end
def generate_code(length)
code = ''
1.upto(length) do
code += ['0', '1'].random
end
code
end
def decode
decoded = ''
expected = :number
1.upto(@code.length) do |i|
if i % 4 == 0
coded = @code.slice((i-4)..(i-1))
symbol = @@decode[coded].to_s
puts symbol
if expected == :number && @@numbers.include?(symbol)
decoded += symbol
expected =
perator
elsif expected ==
perator &&
@@operators.include?(symbol)
decoded += symbol
expected = :number
else
puts "#{expected} expected, #{symbol} found."
end
end
end
if expected == :number
decoded.chop!
end
return decoded
end
end
class Equation
attr_accessor :genome, :phenotype
def initialize
@genome = Genome.new(4, 64)
@phenotype = @genome.decode
end
end
formula_ga.rb
require 'equation'
class Population
def initialize(size=50, target_number=11)
@equations = []
@size = size
@target_number = target_number
1.upto(size) { @equations << Equation.new }
end
def members
@equations
end
def evaluate_fitness(equation)
answer = eval(equation.phenotype)
deviation = @target_number - answer.to_f
fitness = 1 / deviation
end
def sort_by_fitness(equation_array)
end
def next_generation
reproduction_pool = []
1.upto(@size / 2) do
reproduction_pool << @equations.random
end
end
end
pop = Population.new
pop.members.each do |member|
puts member.genome.code
puts member.phenotype
x = eval(member.phenotype)
puts x.to_s
puts "Fitness : #{pop.evaluate_fitness(member)}"
puts ""
end
A basic genetic algorithm (part 1)
This here is an intro to genetic algorithms with a nice little biological analogy and everything. It starts by explaining the evolution of blind, clumsy, algae-eating, cave-dwelling creatures called Hooters into light-seeing, eagle-dodging, moss-eating machines.
It then goes on to explain how this is relevant to computing, and gives a simple example of a problem that can be solved with a genetic algorithm. In this case it involves evolving equations that give a number you’re looking for.
I’m already a little familiar with GAs, I’ve written one that generated words out of random characters, but it was slightly ugly. Especially the fitness function. I’ll to do this one properly.
I’m going to solve the equation problem and post my code later (it’ll be in Ruby).
UPDATE : Here’s the code so far. I’m off to play DOTA.
class Array
def random
self[rand(self.length)]
end
end
class Genome
attr_accessor :code
def initialize(min_length=4, max_length=32)
length = min_length + rand(1 + max_length - min_length)
@code = generate_code(length)
end
def generate_code(length)
code = ''
1.upto(length) do
code += ['0', '1'].random
end
code
end
end
class Equation
def initialize
@genome = Genome.new
end
def genome
@genome.code
end
end
class Population
def initialize(size=50)
@equations = []
1.upto(size) { @equations << Equation.new }
end
def members
@equations
end
end
Population.new.members.each { |member| puts member.genome}
Anyone know of a better way to easily manipulate a series of bits than by storing it in a string? It’s fine if I just have a few but I like to make these things scale, if possible, and I know from experience that this kind of thing tends toward total memory consumption.
-
Archives
- August 2008 (6)
- July 2008 (13)
- June 2008 (6)
-
Categories
-
RSS
Entries RSS
Comments RSS

