03.25.07

200 Terabytes Served in 81 Days = 2.47TB per day!

Posted in Geek Bits, General at 10:46 pm

We interrupt our regular programming with a bit of geek braggery. What? We don’t have regular programming here? Sheesh, what kind of blog is this? It’s not a blog? It’s a “Web Log”? Whatever… back to the topic at hand…

A year and a half ago the company where I was contracting decided to build a video site for gamers. It was all the rage, and they didn’t want to fall behind. The programming team was tasked with building the application, and I was tasked with designing the download systems. The manager at that time was very interested in reducing costs, and was more than happy to listen to my suggestions for a cluster of inexpensive commodity boxes that would each have a copy of the video repository on a large RAID volume and be kept in sync with rsync. I am a fan of the Google approach to colo hardware, where you build it cheap and easy to replace.

While I was out of town dealing with my father’s funeral my manager’s manager trashed all my ideas, said we were a Dell shop and were going to buy Dell gear, and he ordered two Dell servers and a cheap EMC SAN that he dictated would be our video download system. Managers who don’t know their tech should stay out of the server room. He spent $25k for an entry level SAN that couldn’t handle the sustained activity of our video download servers a mere two months after we went live. I wish he hadn’t moved on before then, because I am the sort of person to remind a manager of my original proposal and ask them to explain again why it was rejected.

Anyway… With the ‘big man’ out of the picture, and the system he mandated collapsing miserably under the load, I was able to once again pitch my idea. We did use Dell machines for the commodity boxes, but that’s becuase the Dell PowerEdge 1800 is a very un-Dell-like machine. Well, it was, before Dell dropped it. For around $1400 you could get a PowerEdge 1800 with dual processors, dual power supplies, and a CERC SATA RAID controller, all with a three year on-site warranty. It was a sweet deal, and unlike most Dell boxes the PowerEdge 1800 didn’t use special drive rails; so we could pack it with six Seagate 500G SATA drives fairly cheaply. Configured as RAID-5, that gave us a little under 2.5TG per server for storing videos. A simple rsync script keeps the video archives in sync after user and editor uploads. Apache 2.2 configured with the worker MPM lets me do about 600 simultaneous connections per machine before I start running out of memory. Six of these boxes behind a HAProxy load balancing system can completely saturate a 1Gbps fiber Cogent network drop with plenty of cycles to spare, and we’ve held that level of bandwidth for amazingly sustained periods of time.

For the HAProxy box I used a Dell PowerEdge 2850 with PCIe slots. In order to make the most of our bandwidth I used an Intel PCIe 4x Gigabit Fiber card for the cogent drop and Intel PCIe 4x Gigabit Copper card for the server side. The load balancer and six servers were connected via a Cisco Gig-E switch.

In the past 81 days we have served 200 Terabytes from this little server cluster that I designed and built on a tight budget. I think that is cool. :)

-Chris Knight

[ This page has been linked on the HAProxy website, so I will be revisiting it shortly and posting some more technical details including config files. 2007/06/20 ]

5 Comments »

  1. Jaime Nebrera said,

    06.20.07 at 6:34 am

    Hi Chris,

    If you are serving static content (videos) it might be better to use Cherokee web server instead of Apache. I suspect its peak performance is much better. Of course you cant do many fancy things with it, but that might be done with some Apache’s running the PHP whatever side and then Cherokees for the videos.

    Just my 2 cents

    Regards

  2. Dan Podeanu said,

    10.06.07 at 2:01 am

    Hey Chris,

    Actually, any single-threaded web server (thttpd, mathopd) would totally outperform Apache when it comes to purely static content (as I understand is your case) - and I’m talking more than 10 times faster under heavy load.

    A single thttpd can saturate a gigabit connection if files are big enough (such as yours).

    Of course, while keeping haproxy to load balance.

    Regards

  3. Jeff said,

    12.28.07 at 3:49 pm

    There are quite a few other load balancing software in the open source community. Do you guys tried LVS Layer 3/4 IP load balancing?

  4. john said,

    02.14.08 at 1:55 am

    do you have a how to please thx

  5. Jan Miczaika said,

    05.23.08 at 4:39 am

    Just to throw another hat in the ring, I would suggest you check out Lighttpd. A number of large sites (including wikipedia, youtube and meebo) use it for serving static content. We have had in production use since almost 1.5 years and we have never had a problem.

RSS feed for comments on this post · TrackBack URL

Leave a Comment