Location-based microblogging service Foursquare is having a few problems at the moment, celebrating the repair of a database failure that cause 11 hours of downtime with the launch of a status blog - and yet more downtime.
The problems started on Monday, when the site - which, for the uninitiated, allows users to 'check-in' to locations such as restaurants, coffee shops, and cinemas, letting their friends know where they are - when a database shard started receiving far more traffic than it was supposed to.
In a posting to its blog, the company stated that "one of these shards was performing poorly because a disproportionate share of check-ins were being written to it." Various measures intended to alleviate the problem failed before a last-ditch attempt to fix the performance problem by adding an entirely new shard to share the load solved the issue in an innovative way: by taking down the entire site.
The screaming and shouting that followed saw Foursquare engineers spending "the next five hours trying different approaches to migrating data to the new shard and then restarting the site," but each time the site came back up the original problem shard would become overloaded again - stopping the site as soon as it started.
Thankfully, Foursquare reports that it is working with its database providers to add graceful degradation features that should prevent this sort of issue in future - but another partial outage took the site down again on Tuesday for around six hours.
The issues faced by Foursquare are similar to those that took social networking site Facebook down for several hours late last month - and while Facebook is confident that its database problems are now resolved, it looks like Foursquare still has some way to go.