Natural Code

Code, science and politics.

Ruby 1.9 and Unicode: The BOM Will Fuck Your Shit Up

So I’ve been playing around with the things mentioned in the title, and I found out something unfortunate when I moved a UTF-8 encoded file from a Ruby 1.9 machine to a Ruby 1.8 machine.

There’s this thing called a Byte Order Marker (BOM) that text editors use, apparently to remind themselves of the file’s UTF-8 encoding. I’m pretty sure it’s useless, because UTF-8 doesn’t actually have a variable byte order to keep track of, but there you go.

Basically, it’s 3 bytes that the text editor inserts at the beginning of a text file, and then hides from you. It might look like a plain text file, but it’s actually got 3 hidden bytes for no good reason. When you try to run it through the Ruby 1.8 interpreter, it’ll see 3 invalid characters on Line 1 and throw an error right away.

This sort of error message is pretty unhelpful, especially when you appear to have nothing at all on Line 1. You might enable visible whitespace: still nothing. You might try opening it in another text editor or IDE: you will likely still not see the problem, as the only program I’ve tried so far that doesn’t hide the BOM is NetBeans.

SciTE has two different UTF-8 encoding settings: UTF-8 and UTF-8 Cookie. In theory, the plain UTF-8 setting uses a Byte Order Marker, while UTF-8 Cookie setting doesn’t. In practice, the choice doesn’t seem to affect whether or not the Ruby interpreter chokes on the file, at least not with Ruby 1.8.

With 1.9 I’ve still had problems one or two times, but of the kind that could be fixed by closing the text editor, opening the file in NetBeans, removing the BOM, and restarting the text editor.

It’s not perfect, but at least it works now, even if it’s very slightly buggy.

August 30, 2008 Posted by naturalcode | Technology | , , , , , , , , , , | 3 Comments

Ruby 1.9 and Code Generation: How I Learned to Stop Worrying and Love Unicode

So I was working on this Ruby-based tool for generating Netbeans-compliant Swing app projects. Basically, I create a file that looks like this:

require 'java_swing'

Swing.app 'Project03AK', :subtitle => 'Laptop lending tracker',
:desc => 'This program keeps track of laptops borrowed by students.' do
  # Insert code here
end

I run this script, and it generates a Netbeans project with a main class that’s a Swing window, automatically centered and titled, the project and the window all have nice clean standardized names. Everything was going great until I got to the part where I started inserting comments in the generated Java code.

Basically, I have this Ruby script that inserts the arguments passed to Swing.app into a bunch of templates, and uses the resulting text to generate both the Java code and the related Netbeans project files. The problem here is that both Ruby and SciTE, my text editor, encode text in ASCII by default, whereas Netbeans encodes text in UTF-8.

That’s fine as long as Ruby is only generating code that uses the 26 english letters and regular english punctuation, but as soon as you start using things like àccéntêd characters, Netbeans interprets it as gibberish. I go to a French school, and my professors do not accept me handing in gibberish (except for VB code), so this is a problem.

If you don’t know/care about any of these encoding schemes or non-english characters, you need to read this. I did a few hours ago, and it helped me figure all of this out.

Basically, the solution is to install Ruby 1.9, which has Unicode support, and then go to File->Encoding->UTF-8 in SciTE. An é in the text editor will then be written to the generated Java files as a UTF-8 é, which will then be correctly interpreted as an é by Netbeans.

August 24, 2008 Posted by naturalcode | Technology | , , , , , , , , , , , | 4 Comments

Sorry, ISPs. You’ll have to deliver what you promise.

Unless you own a telecommunications company with a business model based on selling more bandwidth than you can deliver and then punishing your customers, this should be good news.

The FCC ruled against Comcast on Friday, saying that their interference with customers’ file transfers was a violation of federal policy. While it’s far from a guarantee of future network neutrality, it’s probably as good a precedent as we could have hoped for from this case.

Comcast has been given until the end of the year to get their act together and stop undermining the proper function of their service. While that is a lot further than the deadline I would have given them, the good news is that the Electronic Freedom Foundation has released the Switzerland Network Testing Tool. This means that Comcast’s customers should now be able to easily watch their ISP to make sure that they’re complying with the ruling.

August 3, 2008 Posted by naturalcode | Law and Rights, Technology | , , , , , , , | No Comments Yet

Let’s save this planet. It’ll be fun. Really.

From Bob Park’s What’s New :

UNCOOL: LOT OF HEAT FROM GLOBAL-WARMING DENIERS.
Suppose, I asked myself, that the deniers are right and the CO2 thing is a mistake? What will happen if the world takes the CO2 thing seriously, adopting common sense measures to counter anthropogenic warming and there never was any warming in the first place? 1) there will more non-renewable resources to leave to our progeny; 2) we will breath cleaner air and see the stars again, the way we saw them half a century ago; 3) we could stop paving over the planet, and 4) cut down on the number of billionaires. If we’re wrong we could have a party. We could have a party either way.

See, it’s not so bad, is it? Really, we’ll be OK if we get started on this right away, and we might even make money and invent some cool shit in the process.

Saving the environment is a reasonable, obvious thing to do, even if Stephen Harper thinks it’s a communist plot to destroy the economy of oil-producing nations.

July 31, 2008 Posted by naturalcode | Politics, Science, Technology | , , , , , , , , , , | No Comments Yet

Alaska : Lose The Clown

Sure, he thinks that the Internet is a series of tubes, and yet assumes himself knowledgeable enough to regulate said Internet, but at least he’s an honest, hardworking guy, right?

Not so much. It seems that corruption is never far behind incompetence : Senator Ted “An Internet Was Sent By My Staff” Stevens accepts bribes and lies about it, according to a federal grand jury that just indicted him on seven (7) felonies. He apparently accepted $250,000 in bribe money from an oil and contruction services company between 2001 and 2006.

Maybe the company in question paid him to issue his utterly ignorant statements about the tubular nature of the Internet in an effort to push their pro-pipeline oil agenda?

It would explain why the company in question is now out of business. The tube remarks didn’t go over so well, and we all got to see the proof of his ignorance thanks to one of the few Tubes that actually does make up part of the Internet.

Seriously though, Alaska. Lose the clown.

July 29, 2008 Posted by naturalcode | Politics, Technology | , , , , , , , | No Comments Yet

No secret software for public voting or security!

Another blogger has an article up about Christine Peterson’s talk at the O’Reilly Open Source Convention. She argues that privacy and security are compatible, and she’s right.

She predicts that Washington’s technologically clueless will use top-down, individual-surveillance methods when they have access to next-generation technology like high-precision chemical detectors, and that they will do it using secret procedures and secret software just as they have done with electronic voting.

Unlike what happened with electronic voting, she says, we need to see this coming, head it off, and make it clear that software secrecy and individual surveillance are bad security measures. It can’t be framed as a debate around open source software. It needs to be a security issue, and those of us who understand that the Internet isn’t a series of tubes need to explain it clearly.

Here’s my shot at it:

There is only one way of proving that a program is secure : getting as many people as possible to test it and examine the code. Anything short of this is a half-measure.

Any idiot can design a security system that he can’t figure out how to break. Diebold, the makers of the American electronic voting machines, may claim that their system is secure. All this means is that they haven’t spotted any flaws.

Their are a lot of smart hackers out there. Few, if any, work for Diebold. Many of them may be hostile to your country’s government or people.

Don’t assume you have the best hackers, or that secrecy will protect you. The Germans tried that in World War II, and their supposedly unbreakable Enigma cipher machine was defeated. Their secret communications were intercepted.

If we want to assure our physical security, we need to make our security systems open to inspection to make sure that they actually work.

We also need to use the tools we have to go after the threats that exist. As far as I know, there is no machine that detects terrorists. There are, or will be soon, machines that allow us to test for individual particles of specific substances. The obvious use that almost every politician will find for this? Drug testing. Cracking down a little bit more on what you get to do with your body. Taking away your freedom.

A sensible and effective security policy would be to use these detectors to find things like anthrax and plutonium. Something tells me that if you find the guys smuggling WMDs, you’ll find the terrorists.

Unless we do something about it, we will instead get a security policy based on wiretapping citizens and drug testing, using secret systems.

Telephone conversations are not a threat. Marijuana is not a threat. The real threats are natural and economic disasters, WMDs, private companies with exclusive control over the democratic process, and politicians who don’t understand security or technology.

July 28, 2008 Posted by naturalcode | Technology | , , , , , , , , , | 2 Comments

Randy Pausch on brick walls

Others have remembered him much more eloquently than I could have, so I’m going to keep this short and quote the Professor himself :

“So that was a bit of a setback. But remember, the brick walls are there for a reason. The brick walls are not there to keep us out. The brick walls are there to give us a chance to show how badly we want something. Because the brick walls are there to stop the people who don’t want it badly enough. They’re there to stop the other people.”

Whenever I need motivation to learn something new or accomplish something difficult, it helps to remember those words. There are no obstacles that can’t be overcome : there are only those which we put the necessary effort into overcoming, and those which we don’t want badly enough.

He will be missed, and his legacy will live on.

July 27, 2008 Posted by naturalcode | Technology | , | 1 Comment

Data streams, part 2

In a previous post I mentioned I was working on a program to test the data stream concepts from this article. I finished it and implemented the method for finding the most frequent element in a stream.

It keeps track of the most frequent element in a stream of letters or numbers using only two integer variables. It’s not guaranteed to be accurate, but it’s a constant-space algorithm that can work on a stream of indefinite length, and gives good results as long as the most frequent element makes up more than half of the total elements in the stream.

It’s more a small library than a program, actually. I just use it like this :

require 'stream_algorithm'

Which calls up this file. Next, I make a Stream Reader which I store in a variable…

reader = StreamReader.new.check :biggest_element,
:most_frequent
>Starting StreamReader...

…and tell it to check for the biggest element in the stream (takes only one integer variable to track), as well as the most frequent element in the stream (takes only two integer/string variables but isn’t always reliable).

NumberStream.new( reader, :length => 10,
:skew => {:toward => 2, :percent => 55} ).start

This creates a Number Stream, assigns it the Stream Reader, gives the stream a length of 10 numbers, and skews the odds so that 55% of the time, the stream will contain 2 instead of a random number. The stream is then started.

The output from this part of the program looks like this :

 >NumberStream running!
 >Reading : 3  3 x 1
 >Reading : 1  3 x 0
 >Reading : 2  2 x 1
 >Reading : 7  2 x 0
 >Reading : 2  2 x 1
 >Reading : 7  2 x 0
 >Reading : 2  2 x 1
 >Reading : 2  2 x 2
 >Reading : 2  2 x 3
 >Reading : 2  2 x 4
 >Stream stopped.
 >Biggest element : 7
 >Most frequent : 2
 

I can do the same with letters :

reader.check :total_amount, :most_frequent

stream = LetterStream.new( reader, :length => 10,
:skew => {:toward => 'i', :percent => 45} ).start

The Reader’s check method resets its instance variables, so it’s ready to be used on a new stream. I create a new Letter Stream, and it gives me something like this :

>LetterStream running!
>Reading : h  h x 1
>Reading : i  h x 0
>Reading : i  i x 1
>Reading : z  i x 0
>Reading : i  i x 1
>Reading : i  i x 2
>Reading : i  i x 3
>Reading : i  i x 4
>Reading : t  i x 3
>Reading : c  i x 2
>Stream stopped.
>Total amount : 10
>Most frequent : i

And finally, I can do something like…

10.times { stream.start }

…and it’ll make the stream run to its length ten times while conserving state between iterations.

I thought it was fun, anyways.

July 15, 2008 Posted by naturalcode | Technology | , , , | No Comments Yet

From the ShoesFest

Yesterday was the first installment of the ShoesFest, a day of testing / learning why the lucky stiff’s Shoes toolkit for making multi-platform windowed Ruby apps.

I was on the irc channel for most of the day. Someone on there asked if there would be regular events, and _why seemed open to the idea.

I finally read Nobody Knows Shoes, and I threw together this little program. It’s a fake login window that responds only to malformed SQL. I’ll make it better and more interesting.

Shoes.app :title => 'HTSL Login', :width => 400, :height => 300, :resizable => false do

  def login username, password
    if username == "' or 1=1; DROP TABLE users; --"
        alert 'Oh shit!'
      else
        alert 'Invalid username or password.'
    end
  end

  background gradient( rgb(150,150,255), rgb(255,255,255) )

  flow :width => '100%', :height => '70%' do
    stack :width => '60%', :margin => 50 do

      para "User name : \n\n", 'Password : '

      @username = edit_line :top => 3, :left => 105
      @password = edit_line :top => 46, :left => 105 

      button( 'Log in', :top => 100, :left => 110 ) { login @username.text, @password.text }

    end
  end

  flow :width => '100%', :height => '30%', :margin => 18 do
    para "Users currently logged in : #{logged_in = 1 + rand(2000)}\n",
    "Total users : #{logged_in + rand(1000)}"
  end

end

July 12, 2008 Posted by naturalcode | Technology | , , , , | No Comments Yet

A basic genetic algorithm (part 1)

This here is an intro to genetic algorithms with a nice little biological analogy and everything. It starts by explaining the evolution of blind, clumsy, algae-eating, cave-dwelling creatures called Hooters into light-seeing, eagle-dodging, moss-eating machines.

It then goes on to explain how this is relevant to computing, and gives a simple example of a problem that can be solved with a genetic algorithm. In this case it involves evolving equations that give a number you’re looking for.

I’m already a little familiar with GAs, I’ve written one that generated words out of random characters, but it was slightly ugly. Especially the fitness function. I’ll to do this one properly.

I’m going to solve the equation problem and post my code later (it’ll be in Ruby).

UPDATE : Here’s the code so far. I’m off to play DOTA.


class Array
  def random
    self[rand(self.length)]
  end
end

class Genome
  attr_accessor :code

  def initialize(min_length=4, max_length=32)
    length = min_length + rand(1 + max_length - min_length)
    @code = generate_code(length)
  end

  def generate_code(length)
    code = ''
    1.upto(length) do
      code += ['0', '1'].random
    end
    code
  end
end

class Equation
  def initialize
    @genome = Genome.new
  end

  def genome
    @genome.code
  end
end

class Population
  def initialize(size=50)
    @equations = []
    1.upto(size) { @equations << Equation.new }
  end

  def members
    @equations
  end
end

Population.new.members.each { |member| puts member.genome}

Anyone know of a better way to easily manipulate a series of bits than by storing it in a string? It’s fine if I just have a few but I like to make these things scale, if possible, and I know from experience that this kind of thing tends toward total memory consumption.

June 3, 2008 Posted by naturalcode | Technology | , , , , | No Comments Yet