Hard drive failure on our primary database server

\n studio-striking\n

Message boards : News : Hard drive failure on our primary database server
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 2620 - Posted: 12 Feb 2012, 18:33:28 UTC

Our shiny new primary database server, that's been responsible for the nice performance lately, decided that things have been way too stable. So this Sunday morning at about 6:38 EET* the server killed it's primary hard drive bringing everything to a grinding halt. :(

I've switched to using our replicate DB until data center staff can replace our failed hard drive and/or server. I'm currently bringing the services back online slowly to catch things up. Note that things might be slower until the first onslaught of clients reconnecting is over.

Good news is that there shouldn't be more than few seconds of DB changes lost because our database is replicated to the secondary server. Please, do tell if you see something strange. Bad news is that there's going to be a maintenance break in the near future when I switch primary DB back to the resurrected server (maybe next weekend, if things run fine with only one DB server).

*=That's 5:38 CET or Sat 20:38 PST and for other timezones, please see http://www.timeanddate.com/worldclock/fixedtime.html?iso=20120212T0635&p1=101&sort=1.
ID: 2620 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BONC

Send message
Joined: 11 Feb 12
Posts: 3
Credit: 2,382,042
RAC: 0
Message 2622 - Posted: 12 Feb 2012, 19:01:46 UTC

I joined up to find this, my PC has a horde of ATI WU's but won't run them, do I have to complete a number of CPU WU's to start the process ?

And good luck with the re-start...

dunx
ID: 2622 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Chris S
Avatar

Send message
Joined: 2 Oct 11
Posts: 238
Credit: 386,580,598
RAC: 11,603
Message 2625 - Posted: 12 Feb 2012, 22:51:55 UTC

Thanks for the update Teemu, it is appreciated.
I iz also got icons!



ID: 2625 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,844,407,912
RAC: 3,236
Message 2630 - Posted: 13 Feb 2012, 12:29:51 UTC - in response to Message 2622.  

I joined up to find this, my PC has a horde of ATI WU's but won't run them, do I have to complete a number of CPU WU's to start the process ?

And good luck with the re-start...

dunx


No, they should just start up on their own. Please take this to the Number Crunching forum and you should gets lots of good ideas, probably lots of questions first though.
ID: 2630 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 2633 - Posted: 13 Feb 2012, 15:51:19 UTC

Hello,

Just to let you know that I'm aware that the validator and assimilator
are lagging. This also affects work generation somewhat so scheduler keeps running out of work. :( I'll try to help them perform better but final fix might be to get our primary DB back online.

-w
ID: 2633 · Rating: 0 · rate: Rate + / Rate - Report as offensive
DarkRyder

Send message
Joined: 23 Jun 11
Posts: 87
Credit: 798,452,366
RAC: 0
Message 2653 - Posted: 14 Feb 2012, 18:59:39 UTC

I have some old server hardware I'd be willing to donate if you are in need of it..
ID: 2653 · Rating: 0 · rate: Rate + / Rate - Report as offensive
DarkRyder

Send message
Joined: 23 Jun 11
Posts: 87
Credit: 798,452,366
RAC: 0
Message 2654 - Posted: 14 Feb 2012, 19:02:48 UTC

Manner..... PM me.
ID: 2654 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 2700 - Posted: 18 Feb 2012, 15:52:48 UTC

Hello,

Okay, we are now back using primary DB server and scheduler is a lot speedier now. Hopefully it will last a lot longer this time round. :)

Thanks, DarkRyder, for the offer but I lease my servers so we don't lacky anything hardware wise. I'm sure there are some other projects that'll welcome any and all hardware donations, though.

-w
ID: 2700 · Rating: 0 · rate: Rate + / Rate - Report as offensive
DarkRyder

Send message
Joined: 23 Jun 11
Posts: 87
Credit: 798,452,366
RAC: 0
Message 2706 - Posted: 19 Feb 2012, 7:07:58 UTC

np man, just wanted to do my part. :) Good luck with the new server back online :)
ID: 2706 · Rating: 0 · rate: Rate + / Rate - Report as offensive
DarkRyder

Send message
Joined: 23 Jun 11
Posts: 87
Credit: 798,452,366
RAC: 0
Message 2739 - Posted: 23 Feb 2012, 5:12:04 UTC

is the server still having hardware problems? seems like the site has went down 3 times in the past 2 weeks....
ID: 2739 · Rating: 0 · rate: Rate + / Rate - Report as offensive
DarkRyder

Send message
Joined: 23 Jun 11
Posts: 87
Credit: 798,452,366
RAC: 0
Message 2784 - Posted: 28 Feb 2012, 22:23:11 UTC

bueller ?
ID: 2784 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 2835 - Posted: 6 Mar 2012, 5:28:57 UTC

Hi,

It shouldn't be, even though I'm a bit worried about it's new drive as well. They might break soon too as well since they seem to be quite old. :(

There has been some Apache related crashes in the last weeks where it fails to reload itself correctly after log maintenance. Backend has been running during those but obviously BOINC Client can't connect to upload and fetch work while Apache is down so that doesn't matter. :) Crashes seem to be related to mod_fcgid so if it keeps happening I'll probably just disable that.
ID: 2835 · Rating: 0 · rate: Rate + / Rate - Report as offensive
DarkRyder

Send message
Joined: 23 Jun 11
Posts: 87
Credit: 798,452,366
RAC: 0
Message 2846 - Posted: 6 Mar 2012, 22:58:03 UTC - in response to Message 2835.  

what kind of drives are you using? I might have some extra i can send ya if needed.
ID: 2846 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 2851 - Posted: 7 Mar 2012, 16:02:43 UTC

Hi,

Thanks for the offer but the drives comes with the server service as well and I'm not really allowed to change/touch them myself. :) Looks like they are using some Seagate drives on those servers.

As it's a service, I can have the whole server replaced pretty much anytime I want. This includes upgrading to faster and better hw (which cost a bit more but still).
ID: 2851 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Message boards : News : Hard drive failure on our primary database server


 
Copyright © 2011-2024 Moo! Wrapper Project