Hard drive failure on our primary database server

log in

Advanced search

Message boards : News : Hard drive failure on our primary database server

Author Message
Teemu Mannermaa
Project administrator
Project developer
Project tester
Send message
Joined: 20 Apr 11
Posts: 356
Credit: 734,903,353
RAC: 191,107
Message 2620 - Posted: 12 Feb 2012, 18:33:28 UTC

Our shiny new primary database server, that's been responsible for the nice performance lately, decided that things have been way too stable. So this Sunday morning at about 6:38 EET* the server killed it's primary hard drive bringing everything to a grinding halt. :(

I've switched to using our replicate DB until data center staff can replace our failed hard drive and/or server. I'm currently bringing the services back online slowly to catch things up. Note that things might be slower until the first onslaught of clients reconnecting is over.

Good news is that there shouldn't be more than few seconds of DB changes lost because our database is replicated to the secondary server. Please, do tell if you see something strange. Bad news is that there's going to be a maintenance break in the near future when I switch primary DB back to the resurrected server (maybe next weekend, if things run fine with only one DB server).

*=That's 5:38 CET or Sat 20:38 PST and for other timezones, please see http://www.timeanddate.com/worldclock/fixedtime.html?iso=20120212T0635&p1=101&sort=1.

BONC
Send message
Joined: 11 Feb 12
Posts: 3
Credit: 2,382,042
RAC: 0
Message 2622 - Posted: 12 Feb 2012, 19:01:46 UTC

I joined up to find this, my PC has a horde of ATI WU's but won't run them, do I have to complete a number of CPU WU's to start the process ?

And good luck with the re-start...

dunx

Chris SProject donor
Avatar
Send message
Joined: 2 Oct 11
Posts: 232
Credit: 375,070,201
RAC: 151
Message 2625 - Posted: 12 Feb 2012, 22:51:55 UTC

Thanks for the update Teemu, it is appreciated.
____________
I iz also got icons!



mikey
Avatar
Send message
Joined: 22 Jun 11
Posts: 1969
Credit: 1,000,866,048
RAC: 0
Message 2630 - Posted: 13 Feb 2012, 12:29:51 UTC - in response to Message 2622.

I joined up to find this, my PC has a horde of ATI WU's but won't run them, do I have to complete a number of CPU WU's to start the process ?

And good luck with the re-start...

dunx


No, they should just start up on their own. Please take this to the Number Crunching forum and you should gets lots of good ideas, probably lots of questions first though.

Teemu Mannermaa
Project administrator
Project developer
Project tester
Send message
Joined: 20 Apr 11
Posts: 356
Credit: 734,903,353
RAC: 191,107
Message 2633 - Posted: 13 Feb 2012, 15:51:19 UTC

Hello,

Just to let you know that I'm aware that the validator and assimilator
are lagging. This also affects work generation somewhat so scheduler keeps running out of work. :( I'll try to help them perform better but final fix might be to get our primary DB back online.

-w

DarkRyderProject donor
Send message
Joined: 23 Jun 11
Posts: 87
Credit: 793,991,635
RAC: 5
Message 2653 - Posted: 14 Feb 2012, 18:59:39 UTC

I have some old server hardware I'd be willing to donate if you are in need of it..

DarkRyderProject donor
Send message
Joined: 23 Jun 11
Posts: 87
Credit: 793,991,635
RAC: 5
Message 2654 - Posted: 14 Feb 2012, 19:02:48 UTC

Manner..... PM me.

Teemu Mannermaa
Project administrator
Project developer
Project tester
Send message
Joined: 20 Apr 11
Posts: 356
Credit: 734,903,353
RAC: 191,107
Message 2700 - Posted: 18 Feb 2012, 15:52:48 UTC

Hello,

Okay, we are now back using primary DB server and scheduler is a lot speedier now. Hopefully it will last a lot longer this time round. :)

Thanks, DarkRyder, for the offer but I lease my servers so we don't lacky anything hardware wise. I'm sure there are some other projects that'll welcome any and all hardware donations, though.

-w

DarkRyderProject donor
Send message
Joined: 23 Jun 11
Posts: 87
Credit: 793,991,635
RAC: 5
Message 2706 - Posted: 19 Feb 2012, 7:07:58 UTC

np man, just wanted to do my part. :) Good luck with the new server back online :)

DarkRyderProject donor
Send message
Joined: 23 Jun 11
Posts: 87
Credit: 793,991,635
RAC: 5
Message 2739 - Posted: 23 Feb 2012, 5:12:04 UTC

is the server still having hardware problems? seems like the site has went down 3 times in the past 2 weeks....

DarkRyderProject donor
Send message
Joined: 23 Jun 11
Posts: 87
Credit: 793,991,635
RAC: 5
Message 2784 - Posted: 28 Feb 2012, 22:23:11 UTC

bueller ?

Teemu Mannermaa
Project administrator
Project developer
Project tester
Send message
Joined: 20 Apr 11
Posts: 356
Credit: 734,903,353
RAC: 191,107
Message 2835 - Posted: 6 Mar 2012, 5:28:57 UTC

Hi,

It shouldn't be, even though I'm a bit worried about it's new drive as well. They might break soon too as well since they seem to be quite old. :(

There has been some Apache related crashes in the last weeks where it fails to reload itself correctly after log maintenance. Backend has been running during those but obviously BOINC Client can't connect to upload and fetch work while Apache is down so that doesn't matter. :) Crashes seem to be related to mod_fcgid so if it keeps happening I'll probably just disable that.

DarkRyderProject donor
Send message
Joined: 23 Jun 11
Posts: 87
Credit: 793,991,635
RAC: 5
Message 2846 - Posted: 6 Mar 2012, 22:58:03 UTC - in response to Message 2835.

what kind of drives are you using? I might have some extra i can send ya if needed.

Teemu Mannermaa
Project administrator
Project developer
Project tester
Send message
Joined: 20 Apr 11
Posts: 356
Credit: 734,903,353
RAC: 191,107
Message 2851 - Posted: 7 Mar 2012, 16:02:43 UTC

Hi,

Thanks for the offer but the drives comes with the server service as well and I'm not really allowed to change/touch them myself. :) Looks like they are using some Seagate drives on those servers.

As it's a service, I can have the whole server replaced pretty much anytime I want. This includes upgrading to faster and better hw (which cost a bit more but still).

Message boards : News : Hard drive failure on our primary database server


Main page · Your account · Message boards


Copyright © 2011-2017 Moo! Wrapper Project