CUDA app 1.01 crashes when a core is manually selected

\n studio-striking\n

Questions and Answers : Windows : CUDA app 1.01 crashes when a core is manually selected
Message board moderation

To post messages, you must log in.

AuthorMessage
Microcruncher*
Avatar

Send message
Joined: 8 May 11
Posts: 11
Credit: 1,075,941
RAC: 0
Message 167 - Posted: 8 May 2011, 18:40:02 UTC
Last modified: 8 May 2011, 18:44:32 UTC

1. I manually selected a core #9) and the CUDA app crashed.
2. The app crunched with the default -1 setting (autoselect) and everything worked.
3. I selected core #10 and the CUDA app crashed.

Here is the output from the first crash:

<core_client_version>6.10.60</core_client_version>
<![CDATA[
<message>
 - exit code 195 (0xc3)
</message>
<stderr_txt>
19:03:59 (4048): wrapper: starting
19:03:59 (4048): device: GeForce GTX 470 (driver version 26658, CUDA version 3020, compute capability 2.0, 1248MB, 1344 GFLOPS peak)
19:03:59 (4048): checkpoint interval: 39 min (task 2800000 GFLOPS, 35 min)
19:03:59 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 1/10
19:04:00 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes)
19:04:00 (4048): premature exit detected, app exit status: 0x0
19:04:00 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 2/10
19:04:01 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes)
19:04:01 (4048): premature exit detected, app exit status: 0x0
19:04:01 (4048): no progress detected during last retry
19:04:01 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 3/10
19:04:02 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes)
19:04:02 (4048): premature exit detected, app exit status: 0x0
19:04:02 (4048): no progress detected during last retry
19:04:02 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 4/10
19:04:03 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes)
19:04:03 (4048): premature exit detected, app exit status: 0x0
19:04:03 (4048): no progress detected during last retry
19:04:03 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 5/10
19:04:04 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes)
19:04:04 (4048): premature exit detected, app exit status: 0x0
19:04:04 (4048): no progress detected during last retry
19:04:04 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 6/10
19:04:05 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes)
19:04:05 (4048): premature exit detected, app exit status: 0x0
19:04:05 (4048): no progress detected during last retry
19:04:05 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 7/10
19:04:06 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes)
19:04:06 (4048): premature exit detected, app exit status: 0x0
19:04:06 (4048): no progress detected during last retry
19:04:06 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 8/10
19:04:07 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes)
19:04:07 (4048): premature exit detected, app exit status: 0x0
19:04:07 (4048): no progress detected during last retry
19:04:07 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 9/10
19:04:08 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes)
19:04:08 (4048): premature exit detected, app exit status: 0x0
19:04:08 (4048): no progress detected during last retry
19:04:08 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 10/10
19:04:09 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes)
19:04:09 (4048): premature exit detected, app exit status: 0x0
19:04:09 (4048): no progress detected during last retry
19:04:09 (4048): too many retries (max 10), cancelling
19:04:09 (4048): called boinc_finish

</stderr_txt>
]]>
ID: 167 · Rating: 0 · rate: Rate + / Rate - Report as offensive
frankhagen

Send message
Joined: 2 May 11
Posts: 27
Credit: 1,151,788
RAC: 0
Message 168 - Posted: 8 May 2011, 18:44:24 UTC - in response to Message 167.  

1. I selected a core manually (#9) and the CUDA app crashed.
2. I crunched with the default -1 setting (autoselect) and everything worked.
3. I selected core #10 and the CUDA app crashed.


hi ralf.. ;)


this usually happens when a WU is restarted after changeing somthing - the next one should do fine..
ID: 168 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Microcruncher*
Avatar

Send message
Joined: 8 May 11
Posts: 11
Credit: 1,075,941
RAC: 0
Message 169 - Posted: 8 May 2011, 18:49:30 UTC - in response to Message 168.  
Last modified: 8 May 2011, 18:52:31 UTC

1. I selected a core manually (#9) and the CUDA app crashed.
2. I crunched with the default -1 setting (autoselect) and everything worked.
3. I selected core #10 and the CUDA app crashed.


hi ralf.. ;)


this usually happens when a WU is restarted after changeing somthing - the next one should do fine..

Hi!

Just tested it. Result: It does not work.
ID: 169 · Rating: 0 · rate: Rate + / Rate - Report as offensive
frankhagen

Send message
Joined: 2 May 11
Posts: 27
Credit: 1,151,788
RAC: 0
Message 170 - Posted: 8 May 2011, 19:27:57 UTC - in response to Message 169.  

Just tested it. Result: It does not work.


had that on 2 hosts after a change. aborting tasks and a reboot...
ID: 170 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Clod Patry
Volunteer moderator
Volunteer developer
Volunteer tester

Send message
Joined: 2 May 11
Posts: 65
Credit: 242,754,987
RAC: 0
Message 171 - Posted: 8 May 2011, 19:39:21 UTC

Microcruncher: we'll look into that bug.
Do you have the same behavior if you choose any other core or just this one?

Thanks for this report.
ID: 171 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Microcruncher*
Avatar

Send message
Joined: 8 May 11
Posts: 11
Credit: 1,075,941
RAC: 0
Message 173 - Posted: 8 May 2011, 20:10:38 UTC - in response to Message 171.  
Last modified: 8 May 2011, 20:24:56 UTC

Microcruncher: we'll look into that bug.
Do you have the same behavior if you choose any other core or just this one?

Thanks for this report.

I tried #9 and #10 based own my own DNETC experience and the infos I read in another thread. It looks like the wrapper has problems to start the application when a non-default core is selected (core != -1).

FYI: Benchmarking with the client worked fine but exposed another oddity:

Stock clocks (608 MHz):

[May 08 19:59:55 UTC] nvcuda.dll Version: 8.17.12.6658
[May 08 19:59:55 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[May 08 20:00:07 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd)
                      0.00:00:09.00 [482,940,286 keys/sec]
[May 08 20:00:07 UTC] RC5-72: using core #1 (CUDA 1-pipe 128-thd).
[May 08 20:00:26 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd)
                      0.00:00:16.70 [271,027,727 keys/sec]
[May 08 20:00:26 UTC] RC5-72: using core #2 (CUDA 1-pipe 256-thd).
[May 08 20:00:45 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd)
                      0.00:00:16.24 [272,199,342 keys/sec]
[May 08 20:00:45 UTC] RC5-72: using core #3 (CUDA 2-pipe 64-thd).
[May 08 20:01:04 UTC] RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd)
                      0.00:00:16.41 [267,968,215 keys/sec]
[May 08 20:01:04 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd).
[May 08 20:01:22 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd)
                      0.00:00:16.06 [276,352,944 keys/sec]
[May 08 20:01:22 UTC] RC5-72: using core #5 (CUDA 2-pipe 256-thd).
[May 08 20:01:42 UTC] RC5-72: Benchmark for core #5 (CUDA 2-pipe 256-thd)
                      0.00:00:16.66 [279,636,065 keys/sec]
[May 08 20:01:42 UTC] RC5-72: using core #6 (CUDA 4-pipe 64-thd).
[May 08 20:02:00 UTC] RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd)
                      0.00:00:16.27 [271,926,870 keys/sec]
[May 08 20:02:00 UTC] RC5-72: using core #7 (CUDA 4-pipe 128-thd).
[May 08 20:02:19 UTC] RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd)
                      0.00:00:15.97 [279,151,098 keys/sec]
[May 08 20:02:19 UTC] RC5-72: using core #8 (CUDA 4-pipe 256-thd).
[May 08 20:02:38 UTC] RC5-72: Benchmark for core #8 (CUDA 4-pipe 256-thd)
                      0.00:00:16.17 [287,450,082 keys/sec]
[May 08 20:02:38 UTC] RC5-72: using core #9 (CUDA 1-pipe 64-thd busy wait).
[May 08 20:02:50 UTC] RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd busy wait)
                      0.00:00:08.98 [482,568,096 keys/sec]
[May 08 20:02:50 UTC] RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep 100us).
[May 08 20:03:08 UTC] RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd sleep 100us)
                      0.00:00:15.49 [279,471,354 keys/sec]
[May 08 20:03:08 UTC] RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep dynamic).
[May 08 20:03:19 UTC] RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd sleep dynamic)
                      0.00:00:08.98 [480,847,615 keys/sec]
[May 08 20:03:19 UTC] RC5-72 benchmark summary :
                      Default core : #0 (CUDA 1-pipe 64-thd)
                      Fastest core : #0 (CUDA 1-pipe 64-thd)


Overclocked (the card works fine at MUCH higher speeds) to 700 MHz:


dnetc v2.9109-518-GTR-10092921 for CUDA 3.1 on Win32 (WindowsNT 6.0).

[May 08 20:03:38 UTC] nvcuda.dll Version: 8.17.12.6658
[May 08 20:03:38 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[May 08 20:03:49 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd)
                      0.00:00:08.72 [496,838,646 keys/sec]
[May 08 20:03:49 UTC] RC5-72: using core #1 (CUDA 1-pipe 128-thd).
[May 08 20:03:59 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd)
                      0.00:00:07.50 [577,351,008 keys/sec]
[May 08 20:03:59 UTC] RC5-72: using core #2 (CUDA 1-pipe 256-thd).
[May 08 20:04:14 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd)
                      0.00:00:12.26 [351,353,726 keys/sec]
[May 08 20:04:14 UTC] RC5-72: using core #3 (CUDA 2-pipe 64-thd).
[May 08 20:04:24 UTC] RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd)
                      0.00:00:07.64 [573,982,213 keys/sec]
[May 08 20:04:24 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd).
[May 08 20:04:39 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd)
                      0.00:00:11.24 [390,686,746 keys/sec]
[May 08 20:04:39 UTC] RC5-72: using core #5 (CUDA 2-pipe 256-thd).
[May 08 20:04:55 UTC] RC5-72: Benchmark for core #5 (CUDA 2-pipe 256-thd)
                      0.00:00:13.85 [336,736,012 keys/sec]
[May 08 20:04:55 UTC] RC5-72: using core #6 (CUDA 4-pipe 64-thd).
[May 08 20:05:10 UTC] RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd)
                      0.00:00:11.87 [386,138,010 keys/sec]
[May 08 20:05:10 UTC] RC5-72: using core #7 (CUDA 4-pipe 128-thd).
[May 08 20:05:25 UTC] RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd)
                      0.00:00:13.05 [330,056,350 keys/sec]
[May 08 20:05:25 UTC] RC5-72: using core #8 (CUDA 4-pipe 256-thd).
[May 08 20:05:43 UTC] RC5-72: Benchmark for core #8 (CUDA 4-pipe 256-thd)
                      0.00:00:13.94 [324,903,932 keys/sec]
[May 08 20:05:43 UTC] RC5-72: using core #9 (CUDA 1-pipe 64-thd busy wait).
[May 08 20:05:53 UTC] RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd busy wait)
                      0.00:00:07.91 [555,824,000 keys/sec]
[May 08 20:05:53 UTC] RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep 100us).
[May 08 20:06:11 UTC] RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd sleep 100us)
                      0.00:00:15.55 [279,783,119 keys/sec]
[May 08 20:06:11 UTC] RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep dynamic).
[May 08 20:06:21 UTC] RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd sleep dynamic)
                      0.00:00:07.84 [550,046,661 keys/sec]
[May 08 20:06:21 UTC] RC5-72 benchmark summary :
                      Default core : #0 (CUDA 1-pipe 64-thd)
                      Fastest core : #1 (CUDA 1-pipe 128-thd)
[May 08 20:06:21 UTC] Core #1 is significantly faster than the default core.
                      The GPU core selection has been made as a tradeoff between core speed
                      and responsiveness of the graphical desktop.
                      Please file a bug report along with the output of -cpuinfo
                      only if the the faster core selection does not degrade graphics performance.


Hint compare the performance of Core #1 and Core #3 at stock clocks / when overclocked. More than 100% performance increase with 15% higher clocks? By the way: The app should be recompiled with the 3.2 SDK. On several occasions CUDA apps compiled with the 3.1 SDK didn't work correctly: For example PrimeGrid's tpsieve refuses to find a single factor (on a test range with 173 factors) and runs 50% slower when compiled with the SDK 3.1 for Linux.
ID: 173 · Rating: 0 · rate: Rate + / Rate - Report as offensive
frankhagen

Send message
Joined: 2 May 11
Posts: 27
Credit: 1,151,788
RAC: 0
Message 174 - Posted: 8 May 2011, 20:27:57 UTC - in response to Message 173.  

Hint compare the performance of Core #1 and Core #3 at stock clocks / when overclocked. More than 100% performance increase with 15% higher clocks? By the way: The app should be recompiled with the 3.2 SDK. On several occasions CUDA apps compiled with the 3.1 SDK didn't work correctly: For example PrimeGrid's tpsieve refuses to find a single factor (on a test range with 173 factors) and runs 50% slower when compiled with the SDK 3.1 for Linux.



NOW we're talking.. ;)
ID: 174 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Questions and Answers : Windows : CUDA app 1.01 crashes when a core is manually selected


 
Copyright © 2011-2024 Moo! Wrapper Project