Lucene on CPAN
Posted in perl Tue, 22 Aug 2006 01:15:00 GMT
Lucene is a Java-based fulltext indexing and search solution run under the Apache Foundation. It is arguably the most popular of many fulltext solutions now but its use of Java makes it a secondary choice for many non-Java projects. To improve the speed of Lucene, there are several C/C++ ports including:
For a long time Perl users who wanted fulltext search capabilities chose between Plucene (a Perl-port of Lucene), Xapian (a C++ library with many language bindings) and several others. Plucene has a variety of performance problems so Xapian became the choice for many projects. Recently, the Lucene module was added to CPAN as a wrapper around CLucene. The naming gets funny with all the ports because the Lucene Perl-module uses CLucene, not Lucene, and the CLucene project doesn't seem to be run under the Lucene project at the Apache Foundation, which runs the Lucene4c and Lucy C ports.
I've been using Xapian and the Search::Xapian Perl-bindings for a while now and will continue to do so. It will be interesting to see if there's a shake out with the various C/C++ Lucene ports and if any of them will gain popularity. It will also be interesting to see if any of the C/C++ ports grow beyond being just a port of Lucene.
You forgot KinoSearch
Thanks for mentioning KinoSearch. I updated the article to say Perl users could choose Plucene, Xapian and “several others.”. I chose not to explicitly mention KinoSearch in the article b/c: (a) effort on KinoSearch may have moved to Lucy according to the Lucy page (b) Plucene and Xapian are the only search solutions I’ve seen mentioned on the #catalyst channel (granted that’s a small subset of perl users) and the only ones that seem to have a Catalyst model and (c) I didn’t want too many search engine names in the blog article. I did, however, list it in the referenced Search Libraries wiki page along with Namazu, Swish-e and others when I posted this. I also just added Ferret to that page. Ferret is the ruby port of Lucene whose author is now also working on Lucy.
Howdy… KinoSearch’s author here… It’s funny that people in the catalyst channel mention Plucene but not KinoSearch. Plucene is no longer actively developed, and at least one of its primary authors (Tony Bowden) has been using KinoSearch. (shout-out to Plucene here). FYI, KS is being actively developed and will continue to be so until such time as Lucy is sufficiently feature-rich and mature to supersede it.
Thanks for posting Marvin. I should mention that Plucene is usually discussed on #catalyst as something not to use ;) That being said, I’ve only heard Plucene and Xapian come up. I didn’t know Plucene isn’t being developed anymore; it’s good to know and the link is a good read.
What will happen after Lucy supercedes KS? Will there be a Perl binding using the KS API or will method calls change porting from KS to Lucy?
BTW, if you want to add some coolness factor for KS, make a Catalyst model: Catalyst::Model::KinoSearch ;)
I expect that Lucy will have a different API than KS. WRT Catalyst::Model::KinoSearch, I have my hands full! Somebody else will have to write it and support it.