Message boards :
News :
Hard drive failure on our primary database server
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
Our shiny new primary database server, that's been responsible for the nice performance lately, decided that things have been way too stable. So this Sunday morning at about 6:38 EET* the server killed it's primary hard drive bringing everything to a grinding halt. :( I've switched to using our replicate DB until data center staff can replace our failed hard drive and/or server. I'm currently bringing the services back online slowly to catch things up. Note that things might be slower until the first onslaught of clients reconnecting is over. Good news is that there shouldn't be more than few seconds of DB changes lost because our database is replicated to the secondary server. Please, do tell if you see something strange. Bad news is that there's going to be a maintenance break in the near future when I switch primary DB back to the resurrected server (maybe next weekend, if things run fine with only one DB server). *=That's 5:38 CET or Sat 20:38 PST and for other timezones, please see http://www.timeanddate.com/worldclock/fixedtime.html?iso=20120212T0635&p1=101&sort=1. |
Send message Joined: 11 Feb 12 Posts: 3 Credit: 2,382,042 RAC: 0 |
I joined up to find this, my PC has a horde of ATI WU's but won't run them, do I have to complete a number of CPU WU's to start the process ? And good luck with the re-start... dunx |
Send message Joined: 2 Oct 11 Posts: 238 Credit: 386,574,742 RAC: 11,602 |
Thanks for the update Teemu, it is appreciated. I iz also got icons! |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,407,912 RAC: 3,717 |
I joined up to find this, my PC has a horde of ATI WU's but won't run them, do I have to complete a number of CPU WU's to start the process ? No, they should just start up on their own. Please take this to the Number Crunching forum and you should gets lots of good ideas, probably lots of questions first though. |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
Hello, Just to let you know that I'm aware that the validator and assimilator are lagging. This also affects work generation somewhat so scheduler keeps running out of work. :( I'll try to help them perform better but final fix might be to get our primary DB back online. -w |
Send message Joined: 23 Jun 11 Posts: 87 Credit: 798,452,366 RAC: 0 |
I have some old server hardware I'd be willing to donate if you are in need of it.. |
Send message Joined: 23 Jun 11 Posts: 87 Credit: 798,452,366 RAC: 0 |
Manner..... PM me. |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
Hello, Okay, we are now back using primary DB server and scheduler is a lot speedier now. Hopefully it will last a lot longer this time round. :) Thanks, DarkRyder, for the offer but I lease my servers so we don't lacky anything hardware wise. I'm sure there are some other projects that'll welcome any and all hardware donations, though. -w |
Send message Joined: 23 Jun 11 Posts: 87 Credit: 798,452,366 RAC: 0 |
np man, just wanted to do my part. :) Good luck with the new server back online :) |
Send message Joined: 23 Jun 11 Posts: 87 Credit: 798,452,366 RAC: 0 |
is the server still having hardware problems? seems like the site has went down 3 times in the past 2 weeks.... |
Send message Joined: 23 Jun 11 Posts: 87 Credit: 798,452,366 RAC: 0 |
bueller ? |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
Hi, It shouldn't be, even though I'm a bit worried about it's new drive as well. They might break soon too as well since they seem to be quite old. :( There has been some Apache related crashes in the last weeks where it fails to reload itself correctly after log maintenance. Backend has been running during those but obviously BOINC Client can't connect to upload and fetch work while Apache is down so that doesn't matter. :) Crashes seem to be related to mod_fcgid so if it keeps happening I'll probably just disable that. |
Send message Joined: 23 Jun 11 Posts: 87 Credit: 798,452,366 RAC: 0 |
what kind of drives are you using? I might have some extra i can send ya if needed. |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
Hi, Thanks for the offer but the drives comes with the server service as well and I'm not really allowed to change/touch them myself. :) Looks like they are using some Seagate drives on those servers. As it's a service, I can have the whole server replaced pretty much anytime I want. This includes upgrading to faster and better hw (which cost a bit more but still). |