Sat, 23 May 2009 07:31:00 GMT
I recently had a discussion on taxonomies and ontologies for formal classification. These are often used to organize websites, databases, etc. ; however I havne't run into any real world applications where these are shared across multiple projects and software platforms using a develop once, use many approach. Looking around, I found a few open standards that are listed below, but I wonder if and how these are used in the wild. If anyone knows of successful, compelling implementations (of these or other) standards, please let me know.
- SKOS - Simple Knowledge Organization System: "a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is built upon RDF and RDFS, and its main objective is to enable easy publication of controlled structured vocabularies for the Semantic Web" according to Wikipedia. This standard has been developed for some time and seems to have some traction. There is some activity at the Library of Congress and Apache Forrest project. The W3C Glossary and Dictionary Project also put together a set of SKOS formatted glossaries.
- OWL - Web Ontology Language: according to Wikipedia, "a family of knowledge representation languages for authoring ontologies, and is endorsed by the World Wide Web Consortium. This family of languages is based on two (largely, but not entirely, compatible) semantics: OWL DL and OWL Lite semantics are based on Description Logics, which have attractive and well-understood computational properties, while OWL Full uses a novel semantic model intended to provide compatibility with RDF Schema. OWL ontologies are most commonly serialized using RDF/XML syntax. OWL is considered one of the fundamental technologies underpinning the Semantic Web, and has attracted both academic and commercial interest."
- TCS - Taxonomic Concept Transfer Schema: This is supported by the Drupal Taxonomy project.
Given the interest in the semantic web and semantic web standards, are we getting to a point where real life classification projects have started to use the XML standards, or are people still mostly using non-standard approaches (e.g. text files and spreadsheets) to manage this information?
Posted in xapian
Mon, 26 Jan 2009 05:53:00 GMT
A few years ago I had picked Xapian after evaluating a number of solutions. More recently, the popularity surge of Lucene had me curious to learn about it. I needed to do a rip and replace of MySQL fulltext search due to scaling issues so I decided to check out clucene. I quickly found out the API was not as up to date as Lucene (a fast moving target) and that the mailing list had only had 4 posts in the last year or so. That led to a conclusion to move away from clucene. After that, I was told to check out Solr as an easy way to use Lucene without needing to implement Java. I replaced MySQL with Xapian but still had Solr in the back of my mind to check out.
Recently, an email from Jonathan Drake, Senior Developer at YouSport.com, came across the xapian-discuss mailing list that said:
We were using Solr before but it was constantly causing headaches in terms of scalability and complexity. I gave Xapian a go and so far I'm blown away by how awesome it is. Its incredibly lightweight, its scaled a 100 times better and everyone involved is happier.
I'm curious to hear what scaling and complexity problems they faced, but it's good to hear a strong endorsement of Xapian from a former Solr developer. That, and a quick check of the current users page listing del.icio.us with over 100 million documents, seems to indicate that Xapian remains a strong contender in the search space. That being said, I work with very scalable Lucene-based solutions as well, just in Java projects.
Posted in perl
Sun, 25 Jan 2009 20:53:00 GMT
Byrne Reese, the product manager behind MT 4.0, and Aaron Stone, both formerly of Six Apart, recently discussed their project, mod_perlite, with chromatic in the article CGI is Dead; mod_perlite is Alive! mod_perlite is designed to bring some of the ease of use of mod_php to Perl. For where this can help, think of WordPress and their Famous 5-Minute Install. Now imagine having the same thing for MovableType and other Perl apps. To catch up on mod_perlite, follow the article comments and the PerlMonks thread.
Posted in perl
Sun, 25 Jan 2009 08:51:00 GMT
Continuing the thread of useful projects for Perl, I think it would be very beneficial to enhance Perl's relatively unstructured documentation system (POD) to bring it up to par with other languages today, e.g. Java and possibly others. Structured documentation should enable many benefits to speed development that I've long been a fan of including the following:
- ability to generate standardized docs listing params, results, exceptions, etc. like Javadoc,
- ability to auto-generate lists of files, classes and methods like rdoc, and
- ability to auto-display required parameters like in Visual Studio 2008.
I recently used C# and Visual Studio 2008 for the first time and the auto-display of parameters in the IDE made development much more efficient with a new and unfamiliar language / API. It's something I'd like to see integrated into a Perl IDE, e.g. Eclipse/EPIC, Padre, ActiveState Komodo, etc.
A new documentation system like this for Perl may be easier to build off of Moose which could make it attractive as a Tim Toady Bicarbonate project. Some of the benefits of Javadoc and lessons from C# XML doc can be seen in this thread: C# documentation comments: useless?
Ideally, this effort could be headed up by a TPF or EPO working group consisting of multiple people and organizations including people with interest in documentation and IDEs, e.g. developers or product managers from the various Perl IDE products.
Sat, 24 Jan 2009 17:04:00 GMT
I was recently researching home NAS solutions. Synology and QNAP, both of which are based on Linux, came up as some of the leading contenders. HP's MediaSmart which is based on Windows Home Server (a derivative of Windows Server 2003 SP2) also seems popular. A post in this thread raised some interesting issues by claiming QNAP and Synology are the closed solutions while Windows Home Server is the open one. This is especially interesting given the GPL v2 licensing of Linux and how it is used in embedded solutions.
The main thing to consider when evaluating this choice is how much flexibility you require. NAS boxes such as the QNAP or ReadyNAS are, in fact, specialized servers, built on a proprietary embedded OS. Bug fixes and increased functionality are provided, occasionally, via firmware updates; additionally a limited number of "add-ons" (essentially, applications written for the particular box) are available to provide other capabilities.
By contrast, an open server platform such as WHS should be much more extensible over time. WHS is based on Microsoft's enterprise-class server products (presently Server 2003, but future versions are rumored to be based on Server 2008), so it in fact is built on an extremely reliable and stable core.
So, in practice, the question comes down to who is actually providing their source code.
- Microsoft WHS: For the foreseeable future, we can rest assured the source will not be available for WHS.
- QNAP: QNAP offers source code but a quick check of Wikipedia's QNAP TS-101 page shows that people have been porting SqueezeBox's GPL SqueezeServer (formerly SlimServer) software to QNAP. It would be interesting to find out why people are porting SlimServer.
- Synology: Access to Synology source code was very easy to find here on their GPL page, http://www.synology.com/enu/gpl/, and in fact, their source code is posted to SourceForge.net.
Considering that Synology and QNAP make their source code available and HP MediaSmart / WHS does not, it's clear that the former are more open than the latter. With HP MediaSmart and Windows, it's possible to end up with a solution where you can no longer upgrade your software as with my recent discovery that VB 6 is alive and well.
Posted in perl
Sat, 24 Jan 2009 16:45:00 GMT
I decided to see if there was any EPO chatter on use.perl.org after Larard mentioned it here. A quick search pulled up chromatic's Jan 16, 2009 article on a new book proposal for modern Perl programming. I was curious to see if chromatic was going to endorse the Enlightened Perl Organization and focus on their module list for his book but it doesn't seem to be the case. After reading the article, I couldn't help but get the feeling that it would be "Yet Another Way To Do It" which raised questions about how it would be received by the Perl community, one that seems to resist coalescing around new ideas. TIMTOWTDI is often cited as a strength but, taken to an extreme, it is also seen as a weakness by many.
I'd like to see chromatic's book come to reality but I'd also like to see it on conjunction with a larger effort of recommended Perl practices, either through EPO or a new TPF working group working with some of the same ideas of EPO. I've been participating in some technology working groups and I think working groups are a good way to move ideas forward. Getting more participants and participating organizations involved so the recommendations can be vetted by more people and have hopefully authority behind them can hopefully generate more adoption. I think leveraging these types of groups would be a more ideal way to move Perl coding forward with more widespread and visible endorsements than to have more individual, isolated books and articles on programming styles.
Thu, 15 Jan 2009 13:07:00 GMT
TC recently reported Morten Lund's personal bankruptcy over his investment in Nyhedsavisen, a free daily newspaper:
Yesterday he was declared personally bankrupt by the Copenhagen Maritime and Commercial Court after losing 10M Krona in an investment into a Danish newspaper, Nyhedsavisen, which went badly wrong.
The TC article doesn't mention what went wrong but Lund has always been brutally honest and his recent blog article and apology, The day’s when I fucked up - (Gaza, Moral, Dickhead(s)), gives an idea which revolves around his personal guarantee of salary and compensation to Nyhedsavisen former chairman Svenn Dam and CEO Morten Nissen Nielsen.
- Jan/Feb 2008 I did a deal with my TOP management - I personally underwrote a 1 year salary compensation and some super-warrants (In case the venture should fail) - because I believed in them. I gave them my word....
- my x-lawfirm told me that the contract with TOP MANAGEMENT had never been signed - I made one of the worst mistakes of my life (confused, afraid and full of self pity) - I told Morten and Svenn that the contract was not signed and that I would not honour it. I broke my own rule number one I RAN FROM MY WORD - and of course they got insanely mad. I would have gone ballistic as well. Eepecially as the signed contract was at my x-lawfirms other office (Svenn used them). The rest is well know - I was accused of everything possible - and had to defend myself - (of course I had done nothing even remotely illegal - but I was blamed and hurt like very few Danish entrepreneurs my age - actually I felt as if I had lost my arms and legs) - as a guy who runs from his word deserves.
The following articles provide a more complete timeline of events:
- Jan 2, 2008, Kristine Lowe: Skype-investor takes control of bet noire Nyhedsavisen
- Sep 8, 2009, Kristine Lowe: The Danish freesheet war ends: Nyhedsavisen folds
- Sep 10, 2008, Morten Lund: The Day I Woke Up Without Arms And Legs
- Jan 8, 2009, Morten Lund: The day’s when I fucked up - (Gaza, Moral, Dickhead(s))
- Jan 14, 2009, Kristine Lowe: Business angel goes bust
Thu, 15 Jan 2009 11:26:00 GMT
Journal email provides the capability to capture information that used in email delivery but not captured in the actual email message. Two areas where this makes an impact is in the area of BCCs and distribution list expansion which are not typically captured in RFC-2822 email. To handle this, different email servers have created their own solutions. One example is Microsoft's Envelope Journaling which creates a new email message that contains BCC and distribution list information while embedding the original email as an attachment.
Given the increasing importance of email capture and the expanding number of enterprise email servers, it seems time to add a journaling format to existing email standards (or to a new standard).
Thu, 15 Jan 2009 11:19:00 GMT
Given that many, but not all, web browsers support compression, it seems to make sense to compress static HTML pages and only inflate them (or inflate and recompress using a different algorithm) if the webbrowser does not understand the algorithm used for storage. The way this could work is to is to set a file extension (or other method) to let the webserver know the file is compressed and what algorithm is used. For example a file with the .htz extension (similar to .tgz) could indicate to Apache that the file is Gzipped. If the client understands Gzip, then the file can be delivered as is. If the client does not understand Gzip, the file would be inflated.
A mod_inflate like this would provide two benefits (a) less on-disk storage requirements and (b) lower webserver overhead when many clients understand the compression being used. Is there such a project like this? If not, should there be one?
Posted in perl
Wed, 14 Jan 2009 07:21:00 GMT
A number of people I know through the Perl community have come together to form The Enlightened Perl Organization (EPO). The goal is to modernize Perl 5 and make it competitive with new developments in programming languages, given that it's unknown when Christmas (the delivery date for Perl 6) will arrive.
My take on this is that while other organizations focus on ongoing development of Perl 6, EPO will seek to enhance Perl 5 and take it out of "maintenance mode." Enhancing Perl 5 will hopefully bring much needed modernization to the Perl 5 core that people can use sooner rather than later. One of the most exciting developments in the Perl community, which addresses some of the core criticism of Perl 5, is Moose, an object system that modernizes Perl 5. Unlike previous efforts efforts to enhance Perl 5's object system, this one seems to have gained a lot of traction with 136 current logins on the #moose IRC channel. Moose is different enough that some have even claimed that it is not Perl; however, this is clearly not the case as Moose and non-Moose objects and be freely intermingled within Perl projects. For some information Moose, check out this article by Jon Rockway. In addition to Moose, check out KiokuDB an interface for schema-less databases like Amazon SimpleDB and CouchDB as well as more traditional DBI for RDBMs. In addition to supporting projects, ideally Perl 5's core module list can be modernized so more people will be able to take advantage of and feel comfortable recommending modern approaches to Perl development.
At the same time, I'd also like to see them tackle a few more persistent issues, the most important of which is CPAN usability. There is no doubt the Perl community and the CPAN are very compelling; however, installing CPAN dependencies is more difficult than it needs to be. Installation often requires many interactive prompts and can take a long time for applications with many dependencies. There are typically no 5 minute installs like exist for WordPress, PHPbb, and MediaWiki. Some exceptions include qpsmtpd and Catalyst using Matt Trout's cat-install script.
I welcome EPO as another organization in the Perl community to keep Perl modern and vibrant.