snax

ruby performance

xapian search plugin

Francis Irving sent me a note about his work on a new Rails search plugin, acts_as_xapian. It uses the Xapian engine, which is a C++ indexer similar to Lucene. A particularly neat feature is built-in spellcheck.

I still plan to benchmark all these plugins on the Wikipedia dataset...it's been delayed by the new job. If anyone has a big piece of iron I could use for a couple weeks I would appreciate it (16GB ram, hundreds of GB of free diskspace, no production load).

May 26, 2008

5 comments

grosser says (May 27, 2008):

If you do a new benchmark, please do not miss act_as_searchable/HyperEstraier.

A.t.m., an updated/better commented version can be found at my branch. Install instructions are here.

grempe says (May 27, 2008):

"If anyone has a big piece of iron I could use for a couple weeks I would appreciate it."

Try an EC2 X-Large Instance. $0.80 per hour (15GB RAM, 4 cores).

evan says (May 27, 2008):

I thought about that, but I'd rather not spend the $268 it takes to keep it running for two weeks, since I can't finish the whole task at once.

Alexey Kovyrin says (July 16, 2008):

Any results so far? We're really curious here about the results... :-)

Nahum Wild says (August 11, 2008):

I've used HyperEstraier before on a small dataset (< 100,000) and it was fine. Did some complex filtering with it too that worked a treat. The only problem is that _after_ I implemented it I discovered a number of comments along the lines of it having scale issues, especially as it approached 1 million entries. Slowness, lots of long reindexing needed all the time etc... I didn't experience this myself as our dataset didn't get anywhere near that large. I'd be interested to see how it performs with Wikipedia, don't have any iron for you though sorry :-(

Add a comment

Various HTML tags allowed. Use <pre> for code blocks and <code> for inline references.