CPU units getting errors

\n studio-striking\n

Questions and Answers : Unix/Linux : CPU units getting errors
Message board moderation

To post messages, you must log in.

AuthorMessage
Clod Patry
Volunteer moderator
Volunteer developer
Volunteer tester

Send message
Joined: 2 May 11
Posts: 65
Credit: 242,754,987
RAC: 0
Message 491 - Posted: 27 May 2011, 1:41:25 UTC

when running the cpu app, the CPU units seems to fails to run properly,
i'm getting:

Thu 26 May 2011 07:17:52 PM EDT | Moo! Wrapper | Starting dnetc_r72_1306448940_12_12_0
Thu 26 May 2011 07:17:52 PM EDT | Moo! Wrapper | Starting task dnetc_r72_1306448940_12_12_0 using dnetc version 102
Thu 26 May 2011 07:17:54 PM EDT | Moo! Wrapper | Computation for task dnetc_r72_1306448940_12_12_0 finished
Thu 26 May 2011 07:17:54 PM EDT | Moo! Wrapper | Output file dnetc_r72_1306448940_12_12_0_0 for task dnetc_r72_1306448940_12_12_0 absent


This is the stderr output:
http://moowrap.net/result.php?resultid=357454
ID: 491 · Rating: 0 · rate: Rate + / Rate - Report as offensive
ebahapo
Avatar

Send message
Joined: 2 May 11
Posts: 8
Credit: 503,560
RAC: 0
Message 552 - Posted: 31 May 2011, 16:26:30 UTC - in response to Message 491.  
Last modified: 31 May 2011, 16:27:49 UTC

My Linux host is failing all WUs with the same status error.

Please, advise.

TIA
ID: 552 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 562 - Posted: 1 Jun 2011, 19:24:31 UTC - in response to Message 552.  

My Linux host is failing all WUs with the same status error.


Looks like another signal 11 (SIGSEGV or segmentation fault), which indicates a bug in the wrapper. I need to investigate and fix it for next app version. Until that's done, any constantly failing hosts should crunch on something else. :( (This even affects my own Linux CPU crunchers.)

-w
ID: 562 · Rating: 0 · rate: Rate + / Rate - Report as offensive
A.M.

Send message
Joined: 20 Jun 11
Posts: 2
Credit: 4,199,383
RAC: 0
Message 672 - Posted: 20 Jun 2011, 5:55:17 UTC

For what it's worth (maybe nothing) I seem to be having the same issue. The first five failed at exactly 575.31 seconds of runtime, and the rest within two seconds after reaching one hour of runtime.

On the other hand, the ATI workunits seem to be running quite happily so far.
ID: 672 · Rating: 0 · rate: Rate + / Rate - Report as offensive
schnupsi

Send message
Joined: 29 Jun 11
Posts: 1
Credit: 0
RAC: 0
Message 790 - Posted: 13 Jul 2011, 12:55:19 UTC - in response to Message 491.  

Same here! Is this going to be fixed any time soon?
ID: 790 · Rating: 0 · rate: Rate + / Rate - Report as offensive
bzaborow

Send message
Joined: 24 Sep 11
Posts: 3
Credit: 34,704,414
RAC: 0
Message 1087 - Posted: 25 Sep 2011, 9:41:13 UTC

I get similar errors on CPU units, however only and allways when I suspend the tasks. Nvidia units suspend correctly. The tasks seem not to respond to some signals from boinc (after suspend request they still run and consume CPU, while boincmgr shows "computation error" status, I had to kill them manually).

my boinc version: 6.12.33 (linux), dnetc version - 1.02 (downloaded automagicaly by boinc)
ID: 1087 · Rating: 0 · rate: Rate + / Rate - Report as offensive
ebahapo
Avatar

Send message
Joined: 2 May 11
Posts: 8
Credit: 503,560
RAC: 0
Message 1119 - Posted: 1 Oct 2011, 14:41:35 UTC

FWIW,

Here's what I see in stderr.txt for this WU:

[Sep 30 08:18:55 UTC] RC5-72: Completed CF:D90F55B6:00000000 (1.00 stats units)
                      0.00:13:35.35 - [5,267,571 keys/s]
[Sep 30 08:18:55 UTC] Shutdown - packet limit exceeded.
[Sep 30 08:18:55 UTC] RC5-72: Summary: 1 packet (1.00 stats units)
                      0.00:13:35.35 - [5,267,571 keys/s]
[Sep 30 08:18:55 UTC] RC5-72: 0 packets remain in in.r72
[Sep 30 08:18:55 UTC] RC5-72: 1 packet (1.00 stats units) is in out.r72
[Sep 30 08:18:55 UTC] *Break* Shutting down...
[Sep 30 08:18:55 UTC] Shutdown complete.


HTH
ID: 1119 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Matthias Lehmkuhl

Send message
Joined: 22 Oct 11
Posts: 4
Credit: 3,145,103
RAC: 1
Message 1243 - Posted: 25 Oct 2011, 13:55:34 UTC
Last modified: 25 Oct 2011, 13:58:52 UTC

same for me (sample result)
process got signal 11 after the result has finished 7 of 8 packets.
http://moowrap.net/result.php?resultid=3509487

Host (Ubuntu 11.10 64bit)
http://moowrap.net/results.php?hostid=4968

on an other host (also Ubuntu 11.10 64bit)
http://moowrap.net/results.php?hostid=4972
the wrapper crashed and the dnetc program is still calculating on the result.
could only terminate the dnetc program due shutting down the machine.
Edit Error message: 1 cruncher has been started.
Matthias
ID: 1243 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile thebestjaspreet

Send message
Joined: 17 Jun 11
Posts: 4
Credit: 89,874,328
RAC: 33,099
Message 1653 - Posted: 9 Dec 2011, 0:08:17 UTC - in response to Message 1087.  

I get similar errors on CPU units, however only and allways when I suspend the tasks. Nvidia units suspend correctly. The tasks seem not to respond to some signals from boinc (after suspend request they still run and consume CPU, while boincmgr shows "computation error" status, I had to kill them manually).

my boinc version: 6.12.33 (linux), dnetc version - 1.02 (downloaded automagicaly by boinc)

I do get same error on Ubuntu CPU tasks.
my boinc version: 6.12.33 (linux), dnetc version - 1.02 (downloaded automagicaly by boinc)

"I always win law permanently"
ID: 1653 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Questions and Answers : Unix/Linux : CPU units getting errors


 
Copyright © 2011-2024 Moo! Wrapper Project