snax

ruby performance

This is like a totally secret Rails opportunity for you. A friend whose name I must not disclose (at least before Mike Arrington does) got his startup accepted into Y Combinator. His company is making a sort of enterprise thing that also involves a magic device.

Well, they need another Rails person. Email ftravis@gmail.com and you'll be put in touch. I guess you would get cash money and equity and all that business.

I know the last jobs I posted about didn't work out due to the CNET layoffs, so, you know, no promises.

postscript

Hey! I'm happy to announce that I've joined the staff of Twitter, Inc. Expect great things.

May 09, 2008

Automatically tag your music collection with metadata from Last.fm.

what it is

A while back Last.fm released a command line tool to retrieve metadata for an arbitrary mp3 from their new fingerprint database. I tried it yesterday and it seemed way better than MusicBrainz. So, as a person with a lot of random mp3s, I cooked up a script for retagging entire folders of songs.

Some neat things used in the script:

  • id3lib-ruby for handling mp3 tags
  • Text for calculating Levenshtein distance to the nearest correct genre name—amatch is a compiled version of the same thing, but not Windows-compatible
  • the incredibly comprehensive Last.fm API
  • XSD::Mapping for parsing the XML responses (better than Hpricot for small, well-formed documents)

A handy feature in the script is the ability to add the top 10 tagged genres to the comment field, so you can use iTunes or Foobar smart playlists for fancier multi-genre sorting. This is similar to lastfmtagger, but not Mac-specific.

demo

Before running sweeper --genre:

$ id3info 1_001.mp3
*** Tag information for 1_001.mp3
*** mp3 info
MPEG1/layer III
Bitrate: 128KBps
Frequency: 44KHz

After:

$ id3info 1_001.mp3
*** Tag information for 1_001.mp3
=== TPE1 (Lead performer(s)/Soloist(s)): Photon Band
=== TIT2 (Title/songname/content description): To Sing For You
=== WORS (Official internet radio station homepage): http://www.last.fm/music/Ph
oton+Band/_/To+Sing+For+You
=== TCON (Content type): Psychadelic
=== COMM (Comments): ()[]: rock, psychedelic, mod, Philly
*** mp3 info
MPEG1/layer III
Bitrate: 128KBps
Frequency: 44KHz

quickstart

Documentation is here, but for OS X:

sudo port install id3lib
sudo gem install sweeper
sweeper --help

Linux is similar to the above, depending on your distribution.

On Windows, you can just download a zipfile from the Rubyforge page and extract sweeper.exe to somewhere in your path.

I expect this to be eventually replaced by an official Last.fm tool, but for now, patches are welcome. It would be especially nice if someone could write a tutorial to help non-Ruby people install the script.

If you are going to contribute some code, grab the SVN checkout from Fauna, since the gem doesn't ship with the test mp3s.

SVN, I know—how embarrassing!

April 13, 2008

BleakHouse 4 came to life this weekend.

new implementation

BleakHouse now tracks the spawn points of every object on the heap, somewhat like Valgrind and somewhat like Dike.

This means there is no framing necessary, and the analysis task runs in seconds instead of hours. On the other hand, the pure-C instrumentation means it's fast enough to run in production, won't introduce new leaks in your app, and can track T_NODE and other Ruby internals.

sample

After exactly 2000 requests:

$ bleak /tmp/bleak.13795.0.dump 
1334329 total objects
Final heap size 1334329 filled, 1132647 free
Displaying top 100 most common line/class pairs
408149 __null__:__null__:__node__
273858 (eval):3:String
135304 __null__:__null__:String
29998 /opt/local/lib/ruby/gems/1.8/gems/mongrel-1.1.4/lib/mongrel.rb:122:String
14000 /rails/activesupport/lib/active_support/core_ext/hash/keys.rb:8:String
11825 /rails/actionpack/lib/action_controller/base.rb:1215:String
7022 /opt/local/lib/ruby/site_ruby/1.8/rubygems/specification.rb:557:Array
5995 /rails/actionpack/lib/action_controller/session/cookie_store.rb:145:String
4524 /opt/local/lib/ruby/gems/1.8/specifications/gettext-1.90.0.gemspec:14:String
4000 /opt/local/lib/ruby/1.8/cgi/session.rb:299:Array
4000 /rails/actionpack/lib/action_controller/response.rb:10:Array
...

Somebody's got an eval leak, for sure. And those session.rb counts are pretty suspicious.

The BleakHouse docs are here. The codebase is very solid and I look forward to adding some neat things in 4.1 and 4.2.

credit where it's due

Part of the development of BleakHouse 4 was sponsored by a Rails company you have definitely heard of.

April 06, 2008

I put together some benchmarks for the three main Rails fulltext search solutions: Sphinx/Ultrasphinx, Ferret/acts_as_ferret, and Solr/acts_as_solr. The book Advanced Rails Recipes was a big help in getting Ferret and Solr running quickly.

dataset

The dataset is the entire KJV Bible, indexed by verse and also by book. This gives us 31,102 smallish records and 66 large ones. Ferret and Solr both use a Ruby method for loading the per-book contents (since they traverse a Rails association), while Sphinx (with Ultrasphinx) uses :concatenate to generate a GROUP_CONCAT MySQL query.

You can checkout or browse the benchmark app yourself from here. Especially note the model configurations. The app should be runnable; the migrations include the dataset load. ...

March 17, 2008

Ahead of schedule, Ultrasphinx 1.9 is out with delta indexing, ERB support in the .base files, and official compatibility with Sphinx 0.9.8-rc1.

what it is

Delta indexing speeds up your updates by not reindexing the entire dataset every time. Instead, it keeps a main index which is updated rarely, and a delta index, which is updated frequently and only contains recently changed records.

Of course, your records need timestamps for this to work.

See the documentation for more details. There is also an explanation of the implementation on the forum.

gotchas

Note that there are some gotchas surrounding Sphinx and index merges, mainly that facet counts and text sorting may not be perfectly accurate. In an append-rich environment (most web apps) these tend not to matter.

March 08, 2008

Spokeo, a crappy social network aggregator service, spammed my entire address book without my consent—almost 1000 contacts. My apologies if you got hit. Pain...

Once you sign up, if you click any 'friend' email address, instead of seeing a detail page about them, you get the following dialog:

Notice that it insinuates you are a loser unless you click 'yes'. I've never been peer-pressured by a web app before.

All the checkboxes were marked by default. On the original page where you input your webmail details, the site specifically says "we will not send emails to your contacts," so I kinda thought I was covered:

Of course the message itself says that I gave my "explicit approval."

So now people are getting fake invite codes (because you don't actually need an invite to sign up for the site), and my own inbox is full of bounce replies. Seriously... what a disaster.

Plus it seems like a clear violation of CAN-SPAM.

irony

Spokeo is a Rails site:

$ curl --head http://www.spokeo.com
...
Server: Mongrel 1.1.3

They used my own codes against me.

March 06, 2008