200 Terabytes Served in 81 Days = 2.47TB per day!

We interrupt our regular programming with a bit of geek braggery. What? We don’t have regular programming here? Sheesh, what kind of blog is this? It’s not a blog? It’s a “Web Log”? Whatever… back to the topic at hand…

A year and a half ago the company where I was contracting decided to build a video site for gamers. It was all the rage, and they didn’t want to fall behind. The programming team was tasked with building the application, and I was tasked with designing the download systems. The manager at that time was very interested in reducing costs, and was more than happy to listen to my suggestions for a cluster of inexpensive commodity boxes that would each have a copy of the video repository on a large RAID volume and be kept in sync with rsync. I am a fan of the Google approach to colo hardware, where you build it cheap and easy to replace.

While I was out of town dealing with my father’s funeral my manager’s manager trashed all my ideas, said we were a Dell shop and were going to buy Dell gear, and he ordered two Dell servers and a cheap EMC SAN that he dictated would be our video download system. Managers who don’t know their tech should stay out of the server room. He spent $25k for an entry level SAN that couldn’t handle the sustained activity of our video download servers a mere two months after we went live. I wish he hadn’t moved on before then, because I am the sort of person to remind a manager of my original proposal and ask them to explain again why it was rejected.

Anyway… With the ‘big man’ out of the picture, and the system he mandated collapsing miserably under the load, I was able to once again pitch my idea. We did use Dell machines for the commodity boxes, but that’s because the Dell PowerEdge 1800 is a very un-Dell-like machine. Well, it was, before Dell dropped it. For around $1400 you could get a PowerEdge 1800 with dual processors, dual power supplies, and a CERC SATA RAID controller, all with a three year on-site warranty. It was a sweet deal, and unlike most Dell boxes the PowerEdge 1800 didn’t use special drive rails; so we could pack it with six Seagate 500G SATA drives fairly cheaply. Configured as RAID-5, that gave us a little under 2.5TG per server for storing videos. A simple rsync script keeps the video archives in sync after user and editor uploads. Apache 2.2 configured with the worker MPM lets me do about 600 simultaneous connections per machine before I start running out of memory. Six of these boxes behind a HAProxy load balancing system can completely saturate a 1Gbps fiber Cogent network drop with plenty of cycles to spare, and we’ve held that level of bandwidth for amazingly sustained periods of time.

For the HAProxy box I used a Dell PowerEdge 2850 with PCIe slots. In order to make the most of our bandwidth I used an Intel PCIe 4x Gigabit Fiber card for the cogent drop and Intel PCIe 4x Gigabit Copper card for the server side. The load balancer and six servers were connected via a Cisco Gig-E switch.

In the past 81 days we have served 200 Terabytes from this little server cluster that I designed and built on a tight budget. I think that is cool. 🙂

-Chris Knight

Comments are closed.

I use Amazon affiliate links in some of my posts. I think it is fair to say my writing is not influenced by the $0.40 I earned in 2022.