<?xml version="1.0" encoding="UTF-8"?>
<feed xml:lang="en-US" xmlns="http://www.w3.org/2005/Atom">
  <title>Dev411 Blog: Category scalability</title>
  <subtitle type="html">John Wang on Technology</subtitle>
  <id>tag:www.dev411.com,2005:Typo</id>
  <generator uri="http://www.typosphere.org" version="4.0">Typo</generator>
  <link href="http://www.dev411.com/blog/xml/atom/category/feed.xml" rel="self" type="application/atom+xml"/>
  <link href="http://www.dev411.com/blog/tag/scalability" rel="alternate" type="text/html"/>
  <updated>2007-06-16T12:30:25-05:00</updated>
  <entry>
    <author>
      <name>John Wang</name>
    </author>
    <id>urn:uuid:f3906b2c-8ae7-4b4c-a5b6-bc1bd25f1746</id>
    <published>2007-02-05T18:28:00-06:00</published>
    <updated>2007-06-16T12:30:25-05:00</updated>
    <title type="html">Displaying Dates and Times Using JavaScript</title>
    <link href="http://www.dev411.com/blog/2007/02/05/displaying-dates-and-times-using-javascript" rel="alternate" type="text/html"/>
    <category term="scalability" scheme="http://www.dev411.com/blog/tag/scalability" label="scalability"/>
    <category term="typo" scheme="http://www.dev411.com/blog/tag/typo" label="typo"/>
    <category term="javascript" scheme="http://www.dev411.com/blog/tag/javascript" label="javascript"/>
    <category term="dhtml" scheme="http://www.dev411.com/blog/tag/dhtml" label="dhtml"/>
    <category term="datetime" scheme="http://www.dev411.com/blog/tag/datetime" label="datetime"/>
    <summary type="html">&lt;p&gt;Some considerations when displaying dates and times on a website include showing delta times, customized timezones and caching. Often it's nice to show a delta time like "10 minutes ago" or "5 days ago" to give readers a frame of reference instead of an absolute date. When the date is far enough in the past and an absolute date becomes desired, customizing the date to the user's timezone is useful. And if your site grows large enough that caching becomes useful, finding a way to display customized deltas and timezone information in a cacheable static page becomes an ideal solution.&lt;/p&gt;</summary>
    <content type="html">&lt;p&gt;Some considerations when displaying dates and times on a website include showing delta times, customized timezones and caching. Often it's nice to show a delta time like "10 minutes ago" or "5 days ago" to give readers a frame of reference instead of an absolute date. When the date is far enough in the past and an absolute date becomes desired, customizing the date to the user's timezone is useful. And if your site grows large enough that caching becomes useful, finding a way to display customized deltas and timezone information in a cacheable static page becomes an ideal solution.&lt;/p&gt;

&lt;p&gt;JavaScript is an ideal solution for all three issues. With JavaScript you can place an absolute date in the web page and have the JS dynamically update it when the page is loaded. This can be used to calculate delta times and accommodate timezones as well. The result is that the page can embed the same date every time and thus becomes more cache-friendly.&lt;/p&gt;

&lt;p&gt;The &lt;a href="http://typosphere.org"&gt;Typo&lt;/a&gt; blog engine (which runs this blog) comes with a useful MIT-licensed JavaScript in it's &lt;a href="http://trac.typosphere.org/browser/trunk/public/javascripts/typo.js"&gt;typo.js&lt;/a&gt; script. Just copy three of the JS date/time functions, wrap your dates with spans (using the appropriate class name and absolute date in the span title) and then call &lt;span class="fix"&gt;show_dates_as_local_time()&lt;/span&gt; when your page is finished loading. The two other functions you'll need are &lt;span class="fix"&gt;get_local_time_for_date(time)&lt;/span&gt; and &lt;span class="fix"&gt;distance_of_time_in_words(minutes)&lt;/span&gt;. This is what I did for &lt;a href="http://planet.catalystframework.org/"&gt;Planet Catalyst&lt;/a&gt;'s &lt;a href="http://plagger.org"&gt;Plagger&lt;/a&gt; theme a while back.&lt;/p&gt;

&lt;p&gt;Although it's pretty easy to accommodate timezones, the Typo script doesn't do that. I've done this for some projects and might post some code in the future but it's not hard.&lt;/p&gt;

&lt;p&gt;Customization and cacheability, two great advantages for using JavaScript to handle dates and times.&lt;/p&gt;</content>
  </entry>
  <entry>
    <author>
      <name>John Wang</name>
    </author>
    <id>urn:uuid:70f5b48b-2255-4114-9908-b96502a45017</id>
    <published>2006-10-24T11:56:00-05:00</published>
    <updated>2007-06-16T12:30:25-05:00</updated>
    <title type="html">Planning Ahead for Open Source Storage Scaling</title>
    <link href="http://www.dev411.com/blog/2006/10/24/planning-ahead-for-open-source-storage-scaling" rel="alternate" type="text/html"/>
    <category term="scalability" scheme="http://www.dev411.com/blog/tag/scalability" label="scalability"/>
    <summary type="html">&lt;p&gt;Recently eWeek ran an article on &lt;a href="http://www.eweek.com/article2/0,1895,2024696,00.asp"&gt;eHarmony's storage scaling solution choice&lt;/a&gt; which discussed how they chose to go with proprietary solutions from &lt;a href=""&gt;3PAR&lt;/a&gt; and &lt;a href="http://www.eweek.com/article2/0,1895,2024696,00.asp"&gt;ONStor&lt;/a&gt;. I was hoping to learn something interesting about their deployment architecture but the most interesting things I learned was that eHarmony has 8+ million users, 9+ million photos and their proprietary solution vendor choice. Some interesting quotes from Mark Douglas, eHarmony's VP of Technology:&lt;/p&gt;</summary>
    <content type="html">&lt;p&gt;Recently eWeek ran an article on &lt;a href="http://www.eweek.com/article2/0,1895,2024696,00.asp"&gt;eHarmony's storage scaling solution choice&lt;/a&gt; which discussed how they chose to go with proprietary solutions from &lt;a href=""&gt;3PAR&lt;/a&gt; and &lt;a href="http://www.eweek.com/article2/0,1895,2024696,00.asp"&gt;ONStor&lt;/a&gt;. I was hoping to learn something interesting about their deployment architecture but the most interesting things I learned was that eHarmony has 8+ million users, 9+ million photos and their proprietary solution vendor choice. Some interesting quotes from Mark Douglas, eHarmony's VP of Technology:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"We find ourselves having to buy storage about every 90 days."&lt;/li&gt;
&lt;li&gt;"The other solutions we considered had a learning curve and a level of complexity that we just didn't want to undertake."&lt;/li&gt;
&lt;li&gt;"There was going to be a lot of hands-on work to do with our six years' worth of data. We wanted a more automated system, for sure."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It seems like what happened is that they didn't plan for growth and by the time it hit them they were too busy and didn't want to deal with it. Going with proprietary solutions seemed like the easy way out. However, one has to wonder if relying on proprietary solutions is a good decision for further scaling needs. In Steve Bryant's article "&lt;a href="http://googlewatch.eweek.com/blogs/google_watch/archive/2006/10/03/13557.aspx"&gt;Top 10 Reasons It's Almost Impossible to Compete with Google&lt;/a&gt;" he lists distributed infrastructure as the very first reason:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Huge, Distributed Infrastructure -- The obvious advantage is Google's huge infrastructure, which is distributed across 450,000+ servers across the globe. By distributing its infrastructure, Google decreases router and switch delays and delivers faster performance to its worldwide users. Not only is search faster, but products work better too.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One thing that has been popularized about Google is their massive use of cheap, commodity hardware, not large proprietary systems like those that 3PAR and OnStor seem to build. While Google uses the closed-source &lt;a href="http://en.wikipedia.org/wiki/Googlefs"&gt;GoogleFS&lt;/a&gt;, there are some similar FOSS solutions, namely SixApart/Danga's &lt;a href="http://danga.com/mogilefs/"&gt;MogileFS&lt;/a&gt;. MogileFS was built for LiveJournal because the alternatives were, according to &lt;a href="http://danga.com/words/2005_oscon/oscon-2005.pdf"&gt;Brad Fitzpatrick's 2005 OSCON presentation (pdf)&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;closed, non-existent, expensive, in development, complicated, ...&lt;/li&gt;
	&lt;li&gt;&lt;i&gt;scary/impossible when it came to data recovery&lt;/i&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The PDF presentation is a bit long at 80 slides b/c it covers all of LiveJournal. I've &lt;a href="http://www.dev411.com/slides/livejournal_20050804_mogilefs.pdf"&gt;extracted the MogileFS slides&lt;/a&gt; which is just 13 slides to give you an overview. If you are interested, it does make sense to read the full presentation because it also goes over Perlbal and memcached. One great thing about MogileFS is the automatic ability to make multiple backup copies means that RAID and tape backup are not required. A big difference compared to the "big iron" solutions that 3PAR and ONStor seem to provide. SixApart continues to support MogileFS as an FOSS project and recently held a &lt;a href="http://mogilefs.schtuff.com/mogilesummit"&gt;MogileFS Users/Developers Summit&lt;/a&gt; at their San Francisco headquarters. Although MogileFS is the primary, stable and proven FOSS DFS at the moment, there are others in development, including the &lt;a href="http://www.dev411.com/blog/2006/08/23/hadoop-distributed-file-system"&gt;Hadoop DFS&lt;/a&gt; which is part of the Apache Lucence project.&lt;/p&gt;

&lt;p&gt;The interesting thing about eHarmony's choice is that MogileFS is a free open-source solution that can more than fulfill their needs. LiveJournal has 60+ million images comprising 6-7TB of information stored in their MogileFS. A little forward planning can help your site scale storage without having to rely on proprietary solutions.&lt;/p&gt;</content>
  </entry>
  <entry>
    <author>
      <name>John Wang</name>
    </author>
    <id>urn:uuid:f26c236f-b200-4c9c-aea9-992c5154d261</id>
    <published>2006-10-05T22:07:00-05:00</published>
    <updated>2007-06-16T12:30:25-05:00</updated>
    <title type="html">MySQL Deployment Presentations</title>
    <link href="http://www.dev411.com/blog/2006/10/05/mysql-deployment-presentations" rel="alternate" type="text/html"/>
    <category term="scalability" scheme="http://www.dev411.com/blog/tag/scalability" label="scalability"/>
    <category term="mysql" scheme="http://www.dev411.com/blog/tag/mysql" label="mysql"/>
    <summary type="html">&lt;p&gt;I just ran across the &lt;a href="http://www.mysql.com/industry/web/"&gt;MySQL Web 2.0 page&lt;/a&gt; which lists a number of their users including the following:

&lt;div style="text-align:center"&gt;&lt;a href="http://www.mysql.com/industry/web/"&gt;&lt;img src="http://www.dev411.com/images/articles/200610/blog_mysql_web20.png" alt="MySQL Web2.0" /&gt;&lt;/a&gt;&lt;/div&gt;

&lt;p&gt;The most interesting thing from that page, however, is links to various presentations given by those sites on how they architected their sites to scale with MySQL, some of them scaling up to hundreds of MySQL servers.&lt;/p&gt;</summary>
    <content type="html">&lt;p&gt;I just ran across the &lt;a href="http://www.mysql.com/industry/web/"&gt;MySQL Web 2.0 page&lt;/a&gt; which lists a number of their users including the following:

&lt;div style="text-align:center"&gt;&lt;a href="http://www.mysql.com/industry/web/"&gt;&lt;img src="http://www.dev411.com/images/articles/200610/blog_mysql_web20.png" alt="MySQL Web2.0" /&gt;&lt;/a&gt;&lt;/div&gt;

&lt;p&gt;The most interesting thing from that page, however, is links to various presentations given by those sites on how they architected their sites to scale with MySQL, some of them scaling up to hundreds of MySQL servers.&lt;/p&gt;

&lt;p&gt;I've included a list of the presentations below because you have to do some page hopping to find which customer pages actually have presentations. The presentations are in PDF and PPT which I've indicated. None of the presentations are in XUL nor do any use the Takahashi method, both of which are becoming more popular.&lt;/p&gt;

&lt;ul&gt;

&lt;li&gt;&lt;a href="http://mysqluc.com/presentations/mysql05/morelock_phillip.pdf"&gt;eVite 2005-04-19 pdf&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="http://www.ludicorp.com/flickr/zend-talk.ppt#264,26,Hardware%20Layouts%20for%20LAMP%20Installations" target="_new"&gt;Flickr 2005-10-18 ppt&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.niallkennedy.com/blog/uploads/flickr_php.pdf" target="_new"&gt;Flickr 2006-03-14 pdf&lt;/a&gt;: ~25,000 db transactions/second peak&lt;/li&gt;

&lt;li&gt;&lt;a href="http://www.danga.com/words/2004_mysqlcon/mysql-slides.pdf#search=%22livejournal%20presentation%22"&gt;LiveJournal 2004-04-01 pdf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://danga.com/words/2005_oscon/oscon-2005.pdf#search=%22livejournal%20presentation%22"&gt;LiveJournal 2005-08-04 pdf&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="http://mysqluc.com/presentations/mysql06/mixi_update.pdf" target="_new"&gt;Mixi 2006-04-26 pdf&lt;/a&gt;: More than 100 MySQL servers; Add more than 10 servers/month&lt;/li&gt;

&lt;li&gt;&lt;a href="http://mysqluc.com/presentations/mysql06/carroll_dorion.ppt" target="_new"&gt;Technorati 2006-04-26 ppt&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="http://mysqluc.com/presentations/mysql05/presz_ed.pdf" target="_new"&gt;TicketMaster 2005-09-05 pdf&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="http://mysqluc.com/presentations/mysql06/mituzas_wikipedia.pdf" target="_new"&gt;Wikipedia 2006-04-26 pdf&lt;/a&gt;: &amp;gt;25000 SQL requests per second&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wonder if there's anything similar for PostgreSQL deployments at high traffic sites. Let me know if there are other deployment presentations for MySQL, PosgreSQL or other databases at well known, high traffic sites.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; Lukas Kahwe Smith gave me a link to OmniTI's PostgreSQL deployment presentation which discusses their migration from Oracle.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://images.omniti.net/omniti.com/~jesus/misc/BBPostgres.pdf "&gt;OmniTI pdf&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Update 2:&lt;/strong&gt; Some additional information on MySQL and UUID is available on articles by &lt;a href="http://feedlounge.com/blog/2005/11/20/switched-to-postgresql/"&gt;FeedLounge&lt;/a&gt; and &lt;a href="http://brad.livejournal.com/2173718.html"&gt;Brad Fitzpatrick&lt;/a&gt; (chief architect for SixApart). Interesting discussion on how InnoDB clusters by PK.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update 3:&lt;/strong&gt; Robert Ficcaglia gave me this link to the &lt;a href="http://meta.wikimedia.org/wiki/Wikimedia_servers"&gt;Wikimedia Servers page&lt;/a&gt; at meta.wikimedia.org. Looks like good reading!&lt;/p&gt;</content>
  </entry>
  <entry>
    <author>
      <name>John Wang</name>
    </author>
    <id>urn:uuid:4b737820-e1f9-4c71-b40a-02c51f89f602</id>
    <published>2006-08-23T00:49:00-05:00</published>
    <updated>2007-06-16T12:30:24-05:00</updated>
    <title type="html">Hadoop Distributed File System</title>
    <link href="http://www.dev411.com/blog/2006/08/23/hadoop-distributed-file-system" rel="alternate" type="text/html"/>
    <category term="scalability" scheme="http://www.dev411.com/blog/tag/scalability" label="scalability"/>
    <summary type="html">&lt;p&gt;I just ran across the &lt;a href="http://wiki.apache.org/lucene-hadoop/"&gt;Hadoop DFS&lt;/a&gt; which is an open source alternative to distributed file systems such as &lt;a href="http://en.wikipedia.org/wiki/Google_File_System"&gt;GoogleFS&lt;/a&gt;, &lt;a href="http://www.isilon.com/products/index.php?page=onefs"&gt;OneFS&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/List_of_file_systems#Distributed_parallel_fault_tolerant_file_systems"&gt;and others&lt;/a&gt;. GoogleFS and OneFS are both proprietary so it's nice to finally have a FOSS solution. MySpace uses OneFS. From the &lt;a href="http://wiki.apache.org/lucene-hadoop/"&gt;Hadoop Wiki&lt;/a&gt;:&lt;/p&gt;

&lt;div class="quote_simple"&gt;Hadoop's Distributed File System is designed to reliably store very large files across machines in a large cluster. It is inspired by the &lt;a href="http://labs.google.com/papers/gfs.html"&gt;Google File System&lt;/a&gt;. Hadoop DFS stores each file as a sequence of blocks, all blocks in a file except the last block are the same size. Blocks belonging to a file are replicated for fault tolerance. The block size and replication factor are configurable per file. Files in HDFS are "write once" and have strictly one writer at any time.&lt;/div&gt;

&lt;p&gt;Until now, I had only been aware of &lt;a href="http://www.danga.com/mogilefs/"&gt;MogileFS&lt;/a&gt; for FOSS solutions, however MogileFS is designed for smaller files such as images and the others are designed for very large files. It will be interesting to see how much traction Hadoop DFS gets since it could be very useful and a good FOSS compliment to MogileFS. Hadoop is part of the Lucene Apache project.&lt;/p&gt;</summary>
    <content type="html">&lt;p&gt;I just ran across the &lt;a href="http://wiki.apache.org/lucene-hadoop/"&gt;Hadoop DFS&lt;/a&gt; which is an open source alternative to distributed file systems such as &lt;a href="http://en.wikipedia.org/wiki/Google_File_System"&gt;GoogleFS&lt;/a&gt;, &lt;a href="http://www.isilon.com/products/index.php?page=onefs"&gt;OneFS&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/List_of_file_systems#Distributed_parallel_fault_tolerant_file_systems"&gt;and others&lt;/a&gt;. GoogleFS and OneFS are both proprietary so it's nice to finally have a FOSS solution. MySpace uses OneFS. From the &lt;a href="http://wiki.apache.org/lucene-hadoop/"&gt;Hadoop Wiki&lt;/a&gt;:&lt;/p&gt;

&lt;div class="quote_simple"&gt;Hadoop's Distributed File System is designed to reliably store very large files across machines in a large cluster. It is inspired by the &lt;a href="http://labs.google.com/papers/gfs.html"&gt;Google File System&lt;/a&gt;. Hadoop DFS stores each file as a sequence of blocks, all blocks in a file except the last block are the same size. Blocks belonging to a file are replicated for fault tolerance. The block size and replication factor are configurable per file. Files in HDFS are "write once" and have strictly one writer at any time.&lt;/div&gt;

&lt;p&gt;Until now, I had only been aware of &lt;a href="http://www.danga.com/mogilefs/"&gt;MogileFS&lt;/a&gt; for FOSS solutions, however MogileFS is designed for smaller files such as images and the others are designed for very large files. It will be interesting to see how much traction Hadoop DFS gets since it could be very useful and a good FOSS compliment to MogileFS. Hadoop is part of the Lucene Apache project.&lt;/p&gt;

</content>
  </entry>
</feed>

