Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.
It's pretty basic ... Database is a Postgres 9.1 on a 16c DL385g7 w/40GB, 900GB OCZ RevoDrive PCIe SSD, memcache for fragment caching, sessions and worker updates, beanstalk as backend job queue (imports and other such), nginx as reverse proxy to rails w/passenger & apache running on DL120 12GB machines, GlusterFS for the assets on Dell R410 w/ off the shelf disks, 2x Dell R410 w/32GB Varnish as caching proxies for the assets. There is more stuff going on on other machines, for monitoring (icinga, munin), backup (bacula), git, puppet, firewalling, ... the usual. We make heavy use of OpenVZ to compartmentalize configurations and thus problems.

Soup.me will also feature redis, twisted/python and make use of the PostGIS extension.

As you can see, the setup is really nothing special, but there are a few problems that might be interesting:

 - With our usage pattern (files written on one host that are promptly requested on multiple others), GlusterFS generates some problems, either complete client system hang-ups when using the fuse module, or slow but steady server side memory leaking with NFS. For assets, we probably should move to a NAS or a cloud service, but it's a sticky issue, and now that the kinks are figured out, the bi-weekly restart doesn't matter that much.

 - Postgres is a mighty and magnificient beast, but as with any SQL database, it's a bit unpredictable. For example, a few months ago we gained massive performance improvements just by re-adjusting the estimations of the execution planner (cpu_tuple_cost = 0.3 ftw), and everything was just dandy. But sometimes, it becomes clear that certain spikes, eg from a vacuum, while necessary, eat all the reserves and lead to various detrimental locking scenarious, so while munin might tell us that we have reserves, we really don't, especially not at night.

At that point I gotta tip my hat towards OCZ with one hand and shake an angry fist at them with the other. While their PCIe products seem to be really worth their money, they create artificial compatibility issues within their product lines. For example, today, we had to install a certain version of the ubuntu server kernel on the database host, which is squeeze, to be able to install the non-dkms proprietary linux driver for the new SSD, called "RevoDrive 3 X2". Yet, the product we are currently using, the "RevoDrive X2" works like a charm with the stock squeeze kernel package. Very irritating. Especially the non-DKMS packaging. If you got to have a proprietary driver (I can tolerate that, I really can, especially for the performance I'm getting), then for fucks sake, at least use DKMS. Growl. A tip of the hat goes to OCZ for the performance and reliability though. I wouldn't have expected that. Speaking for the RevoDrive X2 version though, the 3X2 is going into production tonight/tomorrow morning.

Ok, I've explained that the postgres machine isn't operating in the green zone anymore, and now there is the question on how to mitigate that. That would be done either by the purchase of a very expensive intel-based DL380, which reportedly performs better than the DL385 (Opteron based, but don't get me wrong: awesome machines. Inexpensive if you need a lot of cores and massive throughput.) or by another DL385 for way less money and more cores.

This is when we start playing the streaming replication game and utilize pg_pool, which comes as it's own bag of cats. I like the thought of playing with that more when I don't remember my MySQL replication days.

Also, with a database size of 400GB, setting up replication without taking a day down time is another quandary, but supposedly, postgres today can even do that. We'll see.

Soup's setup is hosted and connected by (with a gbit port) Nessus, who I highly recommend. They also make easyname, which is were the domains are hosted. If you need a place for your domains: I have yet to find a better DNS interface.

Ok, now I wrote more than I intended. Feel free to ask questions.
Reposted fromelpollodiablo elpollodiablo

Don't be the product, buy the product!