CUDA Failure

Questions and Answers : Windows : CUDA Failure
Message board moderation

To post messages, you must log in.

AuthorMessage
Nflight

Send message
Joined: 2 May 11
Posts: 4
Credit: 104,017,532
RAC: 0
Message 34 - Posted: 2 May 2011, 17:56:07 UTC

I have a Windows XP 64, AMD 5400+ running a 9800 GTX cuda GPU. I am seeing consistent failures with no actual valid work units completed.
ID: 34 · Rating: 0 · rate: Rate + / Rate - Report as offensive
frankhagen

Send message
Joined: 2 May 11
Posts: 27
Credit: 586,012
RAC: 4
Message 35 - Posted: 2 May 2011, 18:13:48 UTC - in response to Message 34.  

same over here - i bet they crash because maximum runtime is too low..
ID: 35 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile [AF>EDLS] Polynesia
Avatar

Send message
Joined: 1 May 11
Posts: 23
Credit: 1,574,433
RAC: 0
Message 38 - Posted: 2 May 2011, 19:30:38 UTC

you receive at least the units for I no ...

Config : i7 860 2.8ghz, 8g ram, boinc : 6.12.26, GPU : GTX 470 Zotac Amp Edition 1280 mo DDR5
ID: 38 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Clod Patry
Volunteer moderator
Volunteer developer
Volunteer tester

Send message
Joined: 2 May 11
Posts: 65
Credit: 242,754,987
RAC: 0
Message 41 - Posted: 2 May 2011, 21:10:28 UTC - in response to Message 34.  
Last modified: 2 May 2011, 21:13:45 UTC

I have a Windows XP 64, AMD 5400+ running a 9800 GTX cuda GPU. I am seeing consistent failures with no actual valid work units completed.

What's the error in your Messages?
Does it stop right away after it started?
What type of card do you have?
Which driver version?



I can see:

dnetc v2.9109-518-CTR-10092921 for CUDA 3.1 on Win32 (WindowsNT 5.1).
Using email address (distributed.net ID) 'EMAIL@yahoo.com'

[May 02 17:55:20 UTC] nvcuda.dll Version: 6.14.11.9621
[May 02 17:55:20 UTC] Unable to create CUDA stream
[May 02 17:55:20 UTC] Unable to initialize CUDA.
[May 02 17:55:20 UTC] *Break* Shutting down...
13:55:21 (6096): input buffer 0 packets (1074790400 bytes), checkpoint file 0 packets (1082589184 bytes), output buffer 1952257862 packets (-1077459503 bytes)
13:55:21 (6096): premature exit detected, app exit status: 0xfffffffd
13:55:21 (6096): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1 -e EMAIL@yahoo.com)


Based on:
[May 02 17:55:20 UTC] Unable to create CUDA stream
[May 02 17:55:20 UTC] Unable to initialize CUDA.
Are you sure you're CUDA is properly installed?
ID: 41 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Nflight

Send message
Joined: 2 May 11
Posts: 4
Credit: 104,017,532
RAC: 0
Message 51 - Posted: 3 May 2011, 0:19:38 UTC - in response to Message 41.  
Last modified: 3 May 2011, 0:21:05 UTC

@Clod Patry - The work unit finishes then while waiting for validation it results in "Error while computing". No other messages are available.
I am running the project PrimeGrid with no problems.
ID: 51 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Clod Patry
Volunteer moderator
Volunteer developer
Volunteer tester

Send message
Joined: 2 May 11
Posts: 65
Credit: 242,754,987
RAC: 0
Message 52 - Posted: 3 May 2011, 0:23:50 UTC - in response to Message 51.  
Last modified: 3 May 2011, 0:26:06 UTC

We'll have to dig a little bit why this is causing a problem.


Which driver version?
ID: 52 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 381
Credit: 822,356,221
RAC: 0
Message 60 - Posted: 3 May 2011, 9:23:49 UTC - in response to Message 52.  

Which driver version?


Looking at the output, it says

[May 02 17:55:20 UTC] nvcuda.dll Version: 6.14.11.9621


which means it's driver v196.21. (Sure, it's also in http://moowrap.net/show_host_detail.php?hostid=115 but that's too easy. :) ) Unfortunately, this is too old and could explain the CUDA stream creation errors in the log. Looks like v256 is the minimum needed for CUDA 3.1 (Dnet Client was compiled for that version).

-w
ID: 60 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Bok

Send message
Joined: 2 May 11
Posts: 3
Credit: 255,799
RAC: 0
Message 61 - Posted: 3 May 2011, 9:45:09 UTC

I'm getting the same thng, even though I have the latest drivers.

Take a look at [url=http://moowrap.net/result.php?resultid=8422]this[result]

It's actually crunched through 3 of the packets ok before it gets a maximum disk usage.

I have it set to use 99% of disk space, so I'm unsure why it get's that message.

Bok
ID: 61 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 381
Credit: 822,356,221
RAC: 0
Message 65 - Posted: 3 May 2011, 12:06:23 UTC - in response to Message 61.  

Take a look at [url=http://moowrap.net/result.php?resultid=8422]this[url]
I have it set to use 99% of disk space, so I'm unsure why it get's that message.


That's a setting in the WU and is set at server side. It was originally set too low for these systems that generate so much stderr output due to a wrapper bug and missing Dnet ID.

I've already bumped the value for any newly generated work so these should go away.

-w
ID: 65 · Rating: 0 · rate: Rate + / Rate - Report as offensive
frankhagen

Send message
Joined: 2 May 11
Posts: 27
Credit: 586,012
RAC: 4
Message 97 - Posted: 4 May 2011, 13:02:15 UTC - in response to Message 65.  

ok, seems to work now.

next thing needed is a way to get rid of automatic core selection which costs a lot of performance for many of us.
ID: 97 · Rating: 0 · rate: Rate + / Rate - Report as offensive
philip-in-hongkong

Send message
Joined: 2 May 11
Posts: 4
Credit: 8,685,685
RAC: 6,170
Message 102 - Posted: 4 May 2011, 23:01:36 UTC - in response to Message 65.  

That's a setting in the WU and is set at server side. It was originally set too low for these systems that generate so much stderr output due to a wrapper bug and missing Dnet ID.

I've already bumped the value for any newly generated work so these should go away.

-w


The problem seems persisted. One of the WU this morning got the following error logged -

2011/5/5 上午 03:12:02 Moo! Wrapper Aborting task dnetc_r72_1304316070_0: exceeded disk limit: 0.48MB > 0.48MB
2011/5/5 上午 03:12:09 Moo! Wrapper Computation for task dnetc_r72_1304316070_0 finished
2011/5/5 上午 03:12:11 Moo! Wrapper Started upload of dnetc_r72_1304316070_0_0
2011/5/5 上午 03:12:18 Moo! Wrapper Finished upload of dnetc_r72_1304316070_0_0

Philip
ID: 102 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Bok

Send message
Joined: 2 May 11
Posts: 3
Credit: 255,799
RAC: 0
Message 105 - Posted: 5 May 2011, 2:34:10 UTC - in response to Message 102.  

Looking at that wu name, it looks like one of the old ones prior to the change the admin made to correct it.
ID: 105 · Rating: 0 · rate: Rate + / Rate - Report as offensive
philip-in-hongkong

Send message
Joined: 2 May 11
Posts: 4
Credit: 8,685,685
RAC: 6,170
Message 106 - Posted: 5 May 2011, 3:07:03 UTC - in response to Message 105.  

I did a reset to clear the old WUs prior to downloading some new ones. Perhaps there are still some old WUs in the queue and I should wait a little longer before downloading WUs again. I shall try again when I return home this evening.
ID: 106 · Rating: 0 · rate: Rate + / Rate - Report as offensive
philip-in-hongkong

Send message
Joined: 2 May 11
Posts: 4
Credit: 8,685,685
RAC: 6,170
Message 114 - Posted: 5 May 2011, 16:14:34 UTC - in response to Message 106.  

I did a reset to clear the old WUs prior to downloading some new ones. Perhaps there are still some old WUs in the queue and I should wait a little longer before downloading WUs again. I shall try again when I return home this evening.


Downloaded three WUs, two look like old ones and one seems new. Crunched the new one successfully. I thought I will abort the two old ones and wait for awhile before try again.
ID: 114 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 381
Credit: 822,356,221
RAC: 0
Message 115 - Posted: 5 May 2011, 17:26:48 UTC - in response to Message 114.  

Downloaded three WUs, two look like old ones and one seems new. Crunched the new one successfully.


Unfortunately, the two workunits (4599 and 4871) that you have "in progress" are most likely going to end in error. You should try to get one of the new ones (that have longer names) or indeed wait for either the next application version that should fix the long stderr that's giving problems or for somebody else crunch the older ones.

-w
ID: 115 · Rating: 0 · rate: Rate + / Rate - Report as offensive
philip-in-hongkong

Send message
Joined: 2 May 11
Posts: 4
Credit: 8,685,685
RAC: 6,170
Message 124 - Posted: 6 May 2011, 0:07:13 UTC - in response to Message 115.  

Downloaded three WUs, two look like old ones and one seems new. Crunched the new one successfully.


Unfortunately, the two workunits (4599 and 4871) that you have "in progress" are most likely going to end in error. You should try to get one of the new ones (that have longer names) or indeed wait for either the next application version that should fix the long stderr that's giving problems or for somebody else crunch the older ones.

-w


I tried download again this morning but still got two WUs (client 1.00) with short names. So I think I will wait till the next application version is released. Thanks.

Philip
ID: 124 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile [AF>EDLS] Polynesia
Avatar

Send message
Joined: 1 May 11
Posts: 23
Credit: 1,574,433
RAC: 0
Message 136 - Posted: 6 May 2011, 17:51:39 UTC
Last modified: 6 May 2011, 17:52:49 UTC

I did not feel that changing the line:

<command_line>-ini-Dnetc.ini runoffline-multiok=1</command_line>

by

<command_line>-ini dnetc.ini -runoffline -multiok=1 -c 10</command_line>

réduice CUDA time units because the last two times were 8000 sec each Envrionment ... unless these units is very long compared to the previous ....

Config : i7 860 2.8ghz, 8g ram, boinc : 6.12.26, GPU : GTX 470 Zotac Amp Edition 1280 mo DDR5
ID: 136 · Rating: 0 · rate: Rate + / Rate - Report as offensive
frankhagen

Send message
Joined: 2 May 11
Posts: 27
Credit: 586,012
RAC: 4
Message 139 - Posted: 6 May 2011, 19:34:39 UTC - in response to Message 136.  

réduice CUDA time units because the last two times were 8000 sec each Envrionment ... unless these units is very long compared to the previous ....


check what's in stderr!

there are packets of different sizes. for my hosts it's about 20% faster..
ID: 139 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Questions and Answers : Windows : CUDA Failure


 
Copyright © 2011-2020 Moo! Wrapper Project