PLEASE BOOKMARK (usually control-D) THIS PAGE NOW SO YOU CAN FIND IT AGAIN IN CASE OF AN EMERGENCY!


If you are experiencing a problem not reported here, check our web panel for more information.
(Please remember, posting in the comments here IS NOT an official way to contact DreamHost.)

Rock being moved to new hardware

Posted 1 day, 10 hours ago (May 14th, 2008 at 1:37 pm PST) by JamesH

Severity: Low   Resolved: Yes

Webserver rock is having trouble booting, so we’re going to be moving it to new hardware. This process should take about thirty minutes.

Update: this has been finished and all sites hosted on rock should be up and running, if you have one that’s not please contact support.

New Feature: Passenger (mod_rails)

Posted 2 days, 11 hours ago (May 13th, 2008 at 12:02 pm PST) by Dallas

As of today, all DreamHost customers can enable Passenger (mod_rails) for Ruby on Rails applications.

Briefly, all you do is enable the Ruby on Rails Passenger (mod_rails) option for any existing or new web domain in the DreamHost web control panel. When you then point that domain’s web directory to the public directory of an existing Ruby on Rails application it will work automatically. For more detailed information, check out our Passenger wiki page. You may also want to read additional details in the DreamHost blog post.

Brief outage for mysqls stimpy mimsey toonces roary and snowball

Posted 2 days, 12 hours ago (May 13th, 2008 at 11:24 am PST) by JamesH

Severity: Low   Resolved: Yes

We need to do some maintenance on the mysql servers stimpy mimsey toonces roary and snowball, so this will be unavailable briefly.

We’ve finished this maintenance and all of these servers are up.

Clifford blew a Drive!

Posted 2 days, 17 hours ago (May 13th, 2008 at 6:14 am PST) by Sandon

Severity: Medium   Resolved: No

The MySQL server ‘clifford’ has blown a hard-drive and I had to replace it and the array is rebuilding as we speak. This should take approximately 30 minutes but the server will be down/inaccessible in the meantime. This should only effect people with ‘clifford’ as their MySQL server. You can find this out from Goodies -> Manage MySQL as your server will be listed in the format of ’servername:service’ and if your ’servername’ is ‘clifford’ than this is effecting you.

From the Happy DreamHost NightCrew!

Update: 8:18 PDT.

Clifford is having problems with building the array so one of our senior admins has been contacted and is working on resolving the issue.

Update: 11:25 PDT.

We are still working on fixing clifford and apologize for the continued outage. There’s currently no eta on when it will be up, however our admin team is hard at work to get it going as soon as possible.

Update: 12:11 PDT.

We’re now in the process of moving clifford to new hardware and restoring from backups. Databases hosted on clifford should start coming online soon, though the entire process may take a few hours.

Bitters down for the count!

Posted 4 days ago (May 11th, 2008 at 11:13 pm PST) by Brian

Severity: Low   Resolved: Yes

Bitters seems to have lost it’s local drive, so we’re moving it to new hardware. Domains on it should be working within 30-45 minutes or so.

We apologize for the problems this causes!

Everything should be back up and running. If you’re still having problems, please let support know!

Blingy Slowness Issues

Posted 5 days, 12 hours ago (May 10th, 2008 at 11:56 am PST) by mir

Severity: Low   Resolved: Yes

We’ve continued to offload the problematic file server and loads across the cluster have actually looked quite good (almost all servers under a load of 5, many around 2-3 and only one was ever at or above 10) so we had hoped that we could restore snapshot backups for users on that server. Unfortunately the result increased the disk usage again on that file server to the point that is has been displaying the same symptoms as before. The good news is that the fix is simply a matter of disabling the backup snapshots again and then dropping that data. We’ve already done the disabling and are in the process of dropping - this is resulting in some very inflated loads across the cluster but as soon as we have completed this I’ll be issuing soft reboots (easier on the hardware) that will fix the loads you’re seeing as well as adding an update here. This should eliminate the performance issues that have been reported (we’ll just have to get that particular file server even lower before we restore those snapshots again). My apologies for this and for the concerns that it has caused.

Update: The utilization is already down from 93% to 92% but this is multiple TB of data at issue so it may take a few hours to complete the dropping of the data. I’ll be keeping an eye on it and rebooting any servers that appear to need extra attention (changes like this often result in that extra step being needed to stabilize things). There will be another update once we’re satisfied that things are fixed.

Update: We’re now back down to 89% - I am still seeing load issues so we’re not out of the woods yet but once the process completes we’ll know if that did it or not (I will continue to update).

Update: Utilization is at 87% and loads have improved so I am updating the severity to medium but not marking it resolved as we still need to see the loads and performance back to what they were a week ago.

Update Tuesday, May 13th: Utilization crept back up (it appears that there were still some rules on the file server leading to creation of the snapshots we deleted, these have been removed and the deletion process is underway again). The admin team is also looking into completely removing this file server from the system (for now we intend to keep offloading it non-stop to ensure that no more issues crop up).

Update Thursday, May 15th: The snapshot data is gone and loads have dropped back to what they should be but we’re going to be moving people off of that file server (we have new hardware coming in that should allow us to completely offload it and eventually scrap it).

Failing over Milk

Posted 5 days, 15 hours ago (May 10th, 2008 at 8:06 am PST) by Sandon

Severity: Medium   Resolved: Yes

The web-server ‘Milk’ is being failed-over due to bad hardware causing it to crash a couple minutes after it boots. It is being moved over to new hardware and the move should be completed within 30 minutes. This should only effect people with ‘milk’ as their webserver. You can find out if milk is your server by clicking “account status” in the panel and it would be listed as “Your web server”

Edit: Turns out this was a bad site on the server causing it to crash in strange ways. This site has been disabled and your sites should be back online!

Emergency OS patch on file server

Posted 6 days, 15 hours ago (May 9th, 2008 at 8:27 am PST) by Kelly

Severity: Low   Resolved: Yes

Per an open ticket with Sun we need to apply two patches to one of our file servers. This is to hopefully fix a degraded zpool which will not finish a parity rebuild. There are exactly 78 users in the frisky cluster on this file server. This will bring your email and web services offline for the time being. The patch should only require about 15 minutes of downtime. I apologize for doing this patch during peak hours, but we really need to get this data back up to full integrity.

Tech nerd details: The raid array is a raidz2 operating with one failed disk. It has been rebuilding off of a hot spare for a day or two and reset the rebuilding process itself after getting to 99%. We contacted Sun and after analyzing troubleshooting information believe a kernel + ZFS patch should resolve the problem. Fortunately this is a raidz2 so it can sustain a disk failure and still be fault tolerant.

Update: Well, one of the two patches was installed. It seems SunSolve is rejecting our service contract to download a specific patch in the dependency tree. We’ve updated our case with Sun. The system is back online and serving files!

DingDong and Pizarro down

Posted 1 week ago (May 8th, 2008 at 7:14 pm PST) by glen

Severity: Low   Resolved: Yes

The HTTP servers DingDong and Pizarro are both currently unresponsive to our reboot efforts. We are working on getting manual reboots done in their respective data centers or evaluating whither moving to new hardware is necessary. Estimated downtime as long as 1 hour.

Update 7:44p
Pizarro is back up and seemingly stable on new hardware
DingDong is awaiting a tech to reach it’s data center still.

Update 8:15p

DingDong has been manually rebooted and is up. Tests of sites show them responding. If you have any further issues related to either of these 2 servers, please contact support through our panel.

Central database crash

Posted 1 week, 1 day ago (May 7th, 2008 at 4:45 pm PST) by Kelly

Our central database server crashed and restarted itself. It is currently replaying transaction logs and should be back in under an hour. This should not affect your websites, email, etc, but the user control panel (https://panel.dreamhost.com) and similar services are down until it comes online.

We are monitoring the situation and will report back where when it comes online!

Edit: And webmail! I forgot those two were tied together. Regular IMAP/POP3 email access should continue to work.

Edit: 5:28PM Pacific It looks like we’re back in business! The user control panel and webmail are working. We will continue to check the rest of our central services and update this if we find anything else still broken! If you are having problems, please contact technical support.