News on Aqua Outage

\n studio-striking\n

Message boards : Number crunching : News on Aqua Outage
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 692 - Posted: 25 Jun 2011, 11:23:05 UTC
Last modified: 25 Jun 2011, 11:46:07 UTC

For Aqua Fans out there....

Peter over on the BOINC Dev Forum posted this yesterday as a result of a speculative "you still alive?" email he sent to "an" Aqua address:

==================
Hi Peter!

I'm Neo on the AQUA@home forum. Unfortunately, I ran a maintenance script in the wrong directory on the AQUA@home server (on Monday, I think), which then deleted a few critical BOINC config files instead of the old files it was supposed to delete. After that, pretty much everything slowly stopped working, and I didn't figure out what had happened until the next day. To avoid more problems, we've turned the server off until Boinc Admin gets back from vacation on Monday. He'd probably (hopefully) know enough of what was in those configuration files that we can start from a very old backup and make the necessary updates to those. If not, there's very small chance that we can recover the deleted files.

Morals of the story:
* Don't assume that there are recent backups
* Scripts that delete files en masse should require the user to specify the directory or use an absolute address instead of assuming the current directory

Hopefully we can recover, and hopefully people aren't too angry at us. Sorry for the inconvenience.

Sincerely,
Neil
=================

Looks like more news Monday/Tuesday next week

Regards
Zy
ID: 692 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 715 - Posted: 27 Jun 2011, 5:46:13 UTC - in response to Message 692.  

Unfortunately, I ran a maintenance script in the wrong directory on the AQUA@home server (on Monday, I think), which then deleted a few critical BOINC config files instead of the old files it was supposed to delete. After that,


Oops, that's a scary. I've certainly made my share of such mistakes in my sysadmin career so I can (at least somewhat) understand how that Aqua admin is feeling right about now. :( I'm sure they can recover from it, though.

But that reminds me that I need to make sure MY backups are going through successfully.. :)

-w
ID: 715 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 20 Jun 11
Posts: 34
Credit: 6,294,097,710
RAC: 6,843,101
Message 781 - Posted: 9 Jul 2011, 3:52:24 UTC - in response to Message 715.  

Aqua recovered from that problem but encountered a major credit dysfunction with the controlled 'credits from BOINC central' award system. As a result wildly varying credits were being awarded.

My understanding is that you folks tested the 'new credit scheme' (is that the Cromwell New Model Army?) -- found it seriously flawed and were able to opt out while running BOINC server code.

Personally, I don't it is a good idea to compel projects to an externally derived credit system as I honestly don't expect the code from a central developing group to be able to handle (or even understand) all the variables presented by millions of users, scores of projects, and thousands of different types of work units -- even as a snap shot, let alone the very much moving target presented by the BOINC distributed processing environment.




Oops, that's a scary. I've certainly made my share of such mistakes in my sysadmin career so I can (at least somewhat) understand how that Aqua admin is feeling right about now. :( I'm sure they can recover from it, though.

But that reminds me that I need to make sure MY backups are going through successfully.. :)

-w

ID: 781 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Message boards : Number crunching : News on Aqua Outage


 
Copyright © 2011-2024 Moo! Wrapper Project