Questions and Answers : 
            Windows : 
        CUDA app 1.01 crashes when a core is manually selected
Message board moderation
    
| Author | Message | 
|---|---|
|  Send message Joined: 8 May 11 Posts: 11 Credit: 1,075,941 RAC: 0 | 
 1. I manually selected a core #9) and the CUDA app crashed. 2. The app crunched with the default -1 setting (autoselect) and everything worked. 3. I selected core #10 and the CUDA app crashed. Here is the output from the first crash: <core_client_version>6.10.60</core_client_version> <![CDATA[ <message> - exit code 195 (0xc3) </message> <stderr_txt> 19:03:59 (4048): wrapper: starting 19:03:59 (4048): device: GeForce GTX 470 (driver version 26658, CUDA version 3020, compute capability 2.0, 1248MB, 1344 GFLOPS peak) 19:03:59 (4048): checkpoint interval: 39 min (task 2800000 GFLOPS, 35 min) 19:03:59 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 1/10 19:04:00 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes) 19:04:00 (4048): premature exit detected, app exit status: 0x0 19:04:00 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 2/10 19:04:01 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes) 19:04:01 (4048): premature exit detected, app exit status: 0x0 19:04:01 (4048): no progress detected during last retry 19:04:01 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 3/10 19:04:02 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes) 19:04:02 (4048): premature exit detected, app exit status: 0x0 19:04:02 (4048): no progress detected during last retry 19:04:02 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 4/10 19:04:03 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes) 19:04:03 (4048): premature exit detected, app exit status: 0x0 19:04:03 (4048): no progress detected during last retry 19:04:03 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 5/10 19:04:04 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes) 19:04:04 (4048): premature exit detected, app exit status: 0x0 19:04:04 (4048): no progress detected during last retry 19:04:04 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 6/10 19:04:05 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes) 19:04:05 (4048): premature exit detected, app exit status: 0x0 19:04:05 (4048): no progress detected during last retry 19:04:05 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 7/10 19:04:06 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes) 19:04:06 (4048): premature exit detected, app exit status: 0x0 19:04:06 (4048): no progress detected during last retry 19:04:06 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 8/10 19:04:07 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes) 19:04:07 (4048): premature exit detected, app exit status: 0x0 19:04:07 (4048): no progress detected during last retry 19:04:07 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 9/10 19:04:08 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes) 19:04:08 (4048): premature exit detected, app exit status: 0x0 19:04:08 (4048): no progress detected during last retry 19:04:08 (4048): wrapper: running dnetc518-win32-x86-cuda31.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 10/10 19:04:09 (4048): input buffer 7 packets (1264 bytes), checkpoint file 0 packets (0 bytes), output buffer 0 packets (0 bytes) 19:04:09 (4048): premature exit detected, app exit status: 0x0 19:04:09 (4048): no progress detected during last retry 19:04:09 (4048): too many retries (max 10), cancelling 19:04:09 (4048): called boinc_finish </stderr_txt> ]]> | 
| Send message Joined: 2 May 11 Posts: 27 Credit: 1,151,788 RAC: 0 | 
 1. I selected a core manually (#9) and the CUDA app crashed. hi ralf.. ;) this usually happens when a WU is restarted after changeing somthing - the next one should do fine.. | 
|  Send message Joined: 8 May 11 Posts: 11 Credit: 1,075,941 RAC: 0 | 
 1. I selected a core manually (#9) and the CUDA app crashed. Hi! Just tested it. Result: It does not work. | 
| Send message Joined: 2 May 11 Posts: 27 Credit: 1,151,788 RAC: 0 | 
 Just tested it. Result: It does not work. had that on 2 hosts after a change. aborting tasks and a reboot... | 
| Send message Joined: 2 May 11 Posts: 65 Credit: 242,754,987 RAC: 0 | 
 Microcruncher: we'll look into that bug. Do you have the same behavior if you choose any other core or just this one? Thanks for this report. | 
|  Send message Joined: 8 May 11 Posts: 11 Credit: 1,075,941 RAC: 0 | 
 Microcruncher: we'll look into that bug. I tried #9 and #10 based own my own DNETC experience and the infos I read in another thread. It looks like the wrapper has problems to start the application when a non-default core is selected (core != -1). FYI: Benchmarking with the client worked fine but exposed another oddity: Stock clocks (608 MHz): [May 08 19:59:55 UTC] nvcuda.dll Version: 8.17.12.6658
[May 08 19:59:55 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[May 08 20:00:07 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd)
                      0.00:00:09.00 [482,940,286 keys/sec]
[May 08 20:00:07 UTC] RC5-72: using core #1 (CUDA 1-pipe 128-thd).
[May 08 20:00:26 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd)
                      0.00:00:16.70 [271,027,727 keys/sec]
[May 08 20:00:26 UTC] RC5-72: using core #2 (CUDA 1-pipe 256-thd).
[May 08 20:00:45 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd)
                      0.00:00:16.24 [272,199,342 keys/sec]
[May 08 20:00:45 UTC] RC5-72: using core #3 (CUDA 2-pipe 64-thd).
[May 08 20:01:04 UTC] RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd)
                      0.00:00:16.41 [267,968,215 keys/sec]
[May 08 20:01:04 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd).
[May 08 20:01:22 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd)
                      0.00:00:16.06 [276,352,944 keys/sec]
[May 08 20:01:22 UTC] RC5-72: using core #5 (CUDA 2-pipe 256-thd).
[May 08 20:01:42 UTC] RC5-72: Benchmark for core #5 (CUDA 2-pipe 256-thd)
                      0.00:00:16.66 [279,636,065 keys/sec]
[May 08 20:01:42 UTC] RC5-72: using core #6 (CUDA 4-pipe 64-thd).
[May 08 20:02:00 UTC] RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd)
                      0.00:00:16.27 [271,926,870 keys/sec]
[May 08 20:02:00 UTC] RC5-72: using core #7 (CUDA 4-pipe 128-thd).
[May 08 20:02:19 UTC] RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd)
                      0.00:00:15.97 [279,151,098 keys/sec]
[May 08 20:02:19 UTC] RC5-72: using core #8 (CUDA 4-pipe 256-thd).
[May 08 20:02:38 UTC] RC5-72: Benchmark for core #8 (CUDA 4-pipe 256-thd)
                      0.00:00:16.17 [287,450,082 keys/sec]
[May 08 20:02:38 UTC] RC5-72: using core #9 (CUDA 1-pipe 64-thd busy wait).
[May 08 20:02:50 UTC] RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd busy wait)
                      0.00:00:08.98 [482,568,096 keys/sec]
[May 08 20:02:50 UTC] RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep 100us).
[May 08 20:03:08 UTC] RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd sleep 100us)
                      0.00:00:15.49 [279,471,354 keys/sec]
[May 08 20:03:08 UTC] RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep dynamic).
[May 08 20:03:19 UTC] RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd sleep dynamic)
                      0.00:00:08.98 [480,847,615 keys/sec]
[May 08 20:03:19 UTC] RC5-72 benchmark summary :
                      Default core : #0 (CUDA 1-pipe 64-thd)
                      Fastest core : #0 (CUDA 1-pipe 64-thd)
Overclocked (the card works fine at MUCH higher speeds) to 700 MHz: 
dnetc v2.9109-518-GTR-10092921 for CUDA 3.1 on Win32 (WindowsNT 6.0).
[May 08 20:03:38 UTC] nvcuda.dll Version: 8.17.12.6658
[May 08 20:03:38 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[May 08 20:03:49 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd)
                      0.00:00:08.72 [496,838,646 keys/sec]
[May 08 20:03:49 UTC] RC5-72: using core #1 (CUDA 1-pipe 128-thd).
[May 08 20:03:59 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd)
                      0.00:00:07.50 [577,351,008 keys/sec]
[May 08 20:03:59 UTC] RC5-72: using core #2 (CUDA 1-pipe 256-thd).
[May 08 20:04:14 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd)
                      0.00:00:12.26 [351,353,726 keys/sec]
[May 08 20:04:14 UTC] RC5-72: using core #3 (CUDA 2-pipe 64-thd).
[May 08 20:04:24 UTC] RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd)
                      0.00:00:07.64 [573,982,213 keys/sec]
[May 08 20:04:24 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd).
[May 08 20:04:39 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd)
                      0.00:00:11.24 [390,686,746 keys/sec]
[May 08 20:04:39 UTC] RC5-72: using core #5 (CUDA 2-pipe 256-thd).
[May 08 20:04:55 UTC] RC5-72: Benchmark for core #5 (CUDA 2-pipe 256-thd)
                      0.00:00:13.85 [336,736,012 keys/sec]
[May 08 20:04:55 UTC] RC5-72: using core #6 (CUDA 4-pipe 64-thd).
[May 08 20:05:10 UTC] RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd)
                      0.00:00:11.87 [386,138,010 keys/sec]
[May 08 20:05:10 UTC] RC5-72: using core #7 (CUDA 4-pipe 128-thd).
[May 08 20:05:25 UTC] RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd)
                      0.00:00:13.05 [330,056,350 keys/sec]
[May 08 20:05:25 UTC] RC5-72: using core #8 (CUDA 4-pipe 256-thd).
[May 08 20:05:43 UTC] RC5-72: Benchmark for core #8 (CUDA 4-pipe 256-thd)
                      0.00:00:13.94 [324,903,932 keys/sec]
[May 08 20:05:43 UTC] RC5-72: using core #9 (CUDA 1-pipe 64-thd busy wait).
[May 08 20:05:53 UTC] RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd busy wait)
                      0.00:00:07.91 [555,824,000 keys/sec]
[May 08 20:05:53 UTC] RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep 100us).
[May 08 20:06:11 UTC] RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd sleep 100us)
                      0.00:00:15.55 [279,783,119 keys/sec]
[May 08 20:06:11 UTC] RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep dynamic).
[May 08 20:06:21 UTC] RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd sleep dynamic)
                      0.00:00:07.84 [550,046,661 keys/sec]
[May 08 20:06:21 UTC] RC5-72 benchmark summary :
                      Default core : #0 (CUDA 1-pipe 64-thd)
                      Fastest core : #1 (CUDA 1-pipe 128-thd)
[May 08 20:06:21 UTC] Core #1 is significantly faster than the default core.
                      The GPU core selection has been made as a tradeoff between core speed
                      and responsiveness of the graphical desktop.
                      Please file a bug report along with the output of -cpuinfo
                      only if the the faster core selection does not degrade graphics performance.
Hint compare the performance of Core #1 and Core #3 at stock clocks / when overclocked. More than 100% performance increase with 15% higher clocks? By the way: The app should be recompiled with the 3.2 SDK. On several occasions CUDA apps compiled with the 3.1 SDK didn't work correctly: For example PrimeGrid's tpsieve refuses to find a single factor (on a test range with 173 factors) and runs 50% slower when compiled with the SDK 3.1 for Linux. | 
| Send message Joined: 2 May 11 Posts: 27 Credit: 1,151,788 RAC: 0 | 
 Hint compare the performance of Core #1 and Core #3 at stock clocks / when overclocked. More than 100% performance increase with 15% higher clocks? By the way: The app should be recompiled with the 3.2 SDK. On several occasions CUDA apps compiled with the 3.1 SDK didn't work correctly: For example PrimeGrid's tpsieve refuses to find a single factor (on a test range with 173 factors) and runs 50% slower when compiled with the SDK 3.1 for Linux. NOW we're talking.. ;) |