132k sec. to finish a wu?

\n studio-striking\n

Message boards : Number crunching : 132k sec. to finish a wu?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Mike029.SETI.USA [BlackOps]
Avatar

Send message
Joined: 2 May 11
Posts: 7
Credit: 309,232,131
RAC: 0
Message 2716 - Posted: 20 Feb 2012, 0:41:40 UTC

Not sure what happened here but found my box working 2 full days to finish this one wu.

http://moowrap.net/workunit.php?wuid=7059864

Can you take a look at it and perhaps figure out why it took so long. I do not run cpu projects on this box. It has a 5970 and a 5870 running Moo only. Anyone else seeing these wu's.
ID: 2716 · Rating: 0 · rate: Rate + / Rate - Report as offensive
juice3

Send message
Joined: 6 Dec 11
Posts: 60
Credit: 306,719,331
RAC: 0
Message 2723 - Posted: 20 Feb 2012, 22:30:29 UTC - in response to Message 2716.  

I'd have aborted it.

I don't find that ATI projects are set it and forget it. I micro-manage my crunchers a little too much I guess.
ID: 2723 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Mike029.SETI.USA [BlackOps]
Avatar

Send message
Joined: 2 May 11
Posts: 7
Credit: 309,232,131
RAC: 0
Message 2728 - Posted: 21 Feb 2012, 14:15:01 UTC

I guess, I've never seen this happen before that's all.
ID: 2728 · Rating: 0 · rate: Rate + / Rate - Report as offensive
DarkRyder

Send message
Joined: 23 Jun 11
Posts: 87
Credit: 798,452,366
RAC: 0
Message 2731 - Posted: 21 Feb 2012, 15:59:51 UTC

i've seen a few that ran too long, (9-10hrs) and had to cancel them. Hasnt happened since.
ID: 2731 · Rating: 0 · rate: Rate + / Rate - Report as offensive
juice3

Send message
Joined: 6 Dec 11
Posts: 60
Credit: 306,719,331
RAC: 0
Message 2734 - Posted: 21 Feb 2012, 18:36:51 UTC

The other thing I've seen is that WU's will get up to 80+% complete and then a new WU will start with High Priority, yet they are due at he same time..

Driver Issue in the end.
ID: 2734 · Rating: 0 · rate: Rate + / Rate - Report as offensive
DarkRyder

Send message
Joined: 23 Jun 11
Posts: 87
Credit: 798,452,366
RAC: 0
Message 2738 - Posted: 21 Feb 2012, 22:27:23 UTC

i just gave my machine a good kick, and it hasnt acted up since. lol
ID: 2738 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Mike029.SETI.USA [BlackOps]
Avatar

Send message
Joined: 2 May 11
Posts: 7
Credit: 309,232,131
RAC: 0
Message 2748 - Posted: 23 Feb 2012, 15:49:38 UTC - in response to Message 2728.  

Just caught another one running for 7 hours. This last wu was stuck at 98%

I'm going to setup Boinc Task to suspend any wu that takes over 20 mins to try to avoid this in the future.

I've never had problems like this with moo running on this machine. No other users seems to be experiencing this so I have to believe the problem is on my end.

ID: 2748 · Rating: 0 · rate: Rate + / Rate - Report as offensive
valterc

Send message
Joined: 10 May 11
Posts: 17
Credit: 8,115,591,523
RAC: 21,908,846
Message 2749 - Posted: 23 Feb 2012, 15:54:49 UTC - in response to Message 2738.  
Last modified: 23 Feb 2012, 15:56:57 UTC

I also got few of them, maybe 2 or 3 in the last two months. It seems that, for yet unknown reasons, the computation will be directed to the cpu instead of the gpu. I don't know how to solve this. Meanwhile I wrote a little script that simply suspend a task if its cpu time is greater than some user defined limit.

Here it is:

use POSIX qw(strftime);

my $path = "C:\\Program Files\\boinc";
my $url =  'http://moowrap.net/'; 
my $task = 'dnetc';
my $limit = 5*60;   # time in seconds (cpu time)

@tasks = `"$path\\boinccmd" --get_results`;
for $i (0..$#tasks) {
  if($tasks[$i]=~/^(\d+)\)/) {
    if($tasks[++$i] =~ /$task/) {
      $tasks[$i] =~ /name: (.*)/;
      $name = $1;
      $tasks[$i + 12] =~ /state: (.*)/;    #check if running
      if($1 == 1) {
        $tasks[$i + 16] =~ /time: (.*)/;   
        $cputime = $1;
        print $name, " ",$cputime;
        if($cputime > $limit) {
          print " (suspending)\n";
          `"$path\\boinccmd" --result $url $name suspend`;
           open(LOG, '>>cpu_watch.log');
           $now = strftime "%Y-%m-%d %H:%M:%S - ", localtime;
           print LOG $now, ' ', $name;
           close(LOG);
        } else {
          print "\n";
        }
      }
    }
  }
}


It's written in perl, get it from activestate, and schedule this to run every (say) 10 minutes.... It should be self-explanatory, just drop me a note if you need help.

edit: aarghh. the code tag don't understand indentation spaces....
ID: 2749 · Rating: 0 · rate: Rate + / Rate - Report as offensive
DarkRyder

Send message
Joined: 23 Jun 11
Posts: 87
Credit: 798,452,366
RAC: 0
Message 2751 - Posted: 23 Feb 2012, 16:53:24 UTC

i've had probably 3 total since the january.
ID: 2751 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,844,401,288
RAC: 3,256
Message 2754 - Posted: 23 Feb 2012, 17:41:47 UTC - in response to Message 2749.  

I also got few of them, maybe 2 or 3 in the last two months. It seems that, for yet unknown reasons, the computation will be directed to the cpu instead of the gpu. I don't know how to solve this.


This was a problem last year, I have forgotten the fix though. It could be a Boinc problem though, waaay too many issues between then and now to remember everything!
ID: 2754 · Rating: 0 · rate: Rate + / Rate - Report as offensive
juice3

Send message
Joined: 6 Dec 11
Posts: 60
Credit: 306,719,331
RAC: 0
Message 2759 - Posted: 24 Feb 2012, 5:36:42 UTC - in response to Message 2754.  

Help a non script guy in windows with that script! ;)
ID: 2759 · Rating: 0 · rate: Rate + / Rate - Report as offensive
valterc

Send message
Joined: 10 May 11
Posts: 17
Credit: 8,115,591,523
RAC: 21,908,846
Message 2761 - Posted: 24 Feb 2012, 11:05:32 UTC - in response to Message 2759.  

More instuctions about the script:

a) install perl (it's a scripting programming language, similar to php, get it from the activestate site, it's free, download the free community edition for your arch 32 or 64 bit)
b) paste the code into notepad and save it somewhere giving a name like cpu_watch.pl
c) using notepad modify, if needed, some of the variables of the script, check the boinc location and the limit (now it's 5 minutes of cpu time)
d) you can try it by double clicking it...
e) schedule it to run every some time (like 10 minutes) using the windows task scheduler or the windows at command fom a cmd shell.

if you run the script every 10 minutes with the 5 minutes limit you will loose at worst ~15 minutes of gpu crunching.

in the same place you put the script you will find the logfile (cpu_watch.log).

If the script suspends a wu you can later decide if you want to abort or continue it.

hope this helps, tell me if you find problems
ID: 2761 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Message boards : Number crunching : 132k sec. to finish a wu?


 
Copyright © 2011-2024 Moo! Wrapper Project