Questions and Answers :
Windows :
Multi-GPU task takes longer than single-gpu task
Message board moderation
Previous · 1 · 2 · 3
Author | Message |
---|---|
Send message Joined: 5 May 11 Posts: 17 Credit: 103,092,604 RAC: 0 |
Ok, I figured it out. I did the check for core performance. It came back saying core 3 was best. I tried changing the config using the local app, but that didn't work. When I set my core 3 in my Moo Wrapper preferences, my times for dual 5870s went to 9 minutes. Thanks, Dan |
Send message Joined: 3 May 11 Posts: 8 Credit: 73,794,244 RAC: 0 |
Ok, I figured it out. I did the check for core performance. It came back saying core 3 was best. I tried changing the config using the local app, but that didn't work. When I set my core 3 in my Moo Wrapper preferences, my times for dual 5870s went to 9 minutes. Sounds like its working. Now the only question I have is what are the points you are getting awarded for each task? |
Send message Joined: 5 May 11 Posts: 17 Credit: 103,092,604 RAC: 0 |
Ok, I figured it out. I did the check for core performance. It came back saying core 3 was best. I tried changing the config using the local app, but that didn't work. When I set my core 3 in my Moo Wrapper preferences, my times for dual 5870s went to 9 minutes. 12 to 13K. But it's running real hot, 87C. I dropped the clocks to stock. |
Send message Joined: 3 May 11 Posts: 8 Credit: 73,794,244 RAC: 0 |
Sounds good. And yes it would run hot OC'ed :) I notice that OC of the card is not as important as the OC of the CPU for this project. |
Send message Joined: 2 May 11 Posts: 47 Credit: 319,540,306 RAC: 1 |
Okay, I finally had time to run the -bench test. Here are the results for the two machines in question: Single 5870: [May 12 01:47:17 UTC] RC5-72: using core #0 (IL 4-pipe c). [May 12 01:47:21 UTC] RC5-72: Benchmark for core #0 (IL 4-pipe c) 0.00:00:02.18 [2,030,898,067 keys/sec] [May 12 01:47:21 UTC] RC5-72: using core #1 (IL 4-pipe c alt). [May 12 01:47:28 UTC] RC5-72: Benchmark for core #1 (IL 4-pipe c alt) 0.00:00:03.76 [1,158,838,017 keys/sec] [May 12 01:47:28 UTC] RC5-72: using core #2 (IL 4-pipe 2 threads). [May 12 01:47:34 UTC] RC5-72: Benchmark for core #2 (IL 4-pipe 2 threads) 0.00:00:03.98 [1,101,613,569 keys/sec] [May 12 01:47:34 UTC] RC5-72: using core #3 (IL 4-pipe cs-1). [May 12 01:47:38 UTC] RC5-72: Benchmark for core #3 (IL 4-pipe cs-1) 0.00:00:02.18 [2,022,288,665 keys/sec] [May 12 01:47:38 UTC] RC5-72 benchmark summary : Default core : #0 (IL 4-pipe c) Fastest core : #0 (IL 4-pipe Dual 5870: [May 12 01:57:38 UTC] RC5-72: using core #0 (IL 4-pipe c). [May 12 01:57:46 UTC] RC5-72: Benchmark for core #0 (IL 4-pipe c) 0.00:00:05.75 [755,039,695 keys/sec] [May 12 01:57:46 UTC] RC5-72: using core #1 (IL 4-pipe c alt). [May 12 01:57:52 UTC] RC5-72: Benchmark for core #1 (IL 4-pipe c alt) 0.00:00:03.71 [1,173,790,362 keys/sec] [May 12 01:57:52 UTC] RC5-72: using core #2 (IL 4-pipe 2 threads). [May 12 01:57:58 UTC] RC5-72: Benchmark for core #2 (IL 4-pipe 2 threads) 0.00:00:03.34 [1,306,783,130 keys/sec] [May 12 01:57:58 UTC] RC5-72: using core #3 (IL 4-pipe cs-1). [May 12 01:58:03 UTC] RC5-72: Benchmark for core #3 (IL 4-pipe cs-1) 0.00:00:02.17 [2,044,294,794 keys/sec] [May 12 01:58:03 UTC] RC5-72 benchmark summary : Default core : #0 (IL 4-pipe c) Fastest core : #3 (IL 4-pipe cs-1) [May 12 01:58:03 UTC] Core #3 is significantly faster than the default core. The GPU core selection has been made as a tradeoff be ... and responsiveness of the graphical desktop. Please file a bug report along with the output of -cp ... only if the the faster core selection does not degrad ... Observations: 1) The possible options are 0, 1, 2, or 3. 2) Option 3 seems to be the way to go. Clearly for the dual, toss-up with 0 for the single. 3) With option 1, the single card benchmarks WAY faster than the dual. 4) With option 3, the benchmarks are almost identical. How can this be? A question: Does crossfire on/off make any difference? For those few of you that do NOT have this problem, do you have crossfire turned on or off? Reno, NV Team SETI.USA |
Send message Joined: 3 May 11 Posts: 8 Credit: 73,794,244 RAC: 0 |
Okay, I finally had time to run the -bench test. Here are the results for the two machines in question: If core 0 is slow then setting core use to 3 will help with speed but pay attention to the credit granted. With the dual setup the wrapper on your bench will use core 0 for its base and if you set the core to 3 you will get lower credit because it thinks it was a mini rather then full task. Also I run all my dual + with out crossfire / sli. At least that is what I have found. |
Send message Joined: 6 May 11 Posts: 2 Credit: 433,431,506 RAC: 995,299 |
Well, when I run -bench repeatedly, I get wild variations in the results per core. Does the system need to be idled to get an accurate -bench result? Shouldn't be running BOINC during the test? |
Send message Joined: 2 May 11 Posts: 47 Credit: 319,540,306 RAC: 1 |
4) With option 3, the benchmarks are almost identical. How can this be? I added the jumper and turned on crossfire. No change to benchmarks at all. FWIW, I am seeing only ~21% load on one card, and ~54% load on the other. Reno, NV Team SETI.USA |
Send message Joined: 2 May 11 Posts: 27 Credit: 1,151,788 RAC: 0 |
Well, when I run -bench repeatedly, I get wild variations in the results per core. Does the system need to be idled to get an accurate -bench result? Shouldn't be running BOINC during the test? for sure - you'll want to disable everything which could interfer.. |
Send message Joined: 2 May 11 Posts: 47 Credit: 319,540,306 RAC: 1 |
FWIW, I am seeing only ~21% load on one card, and ~54% load on the other. Looks like it might be a HW issue, that never exposed itself on DNETC, but does here. I will report back in several days. Reno, NV Team SETI.USA |
Send message Joined: 2 May 11 Posts: 53 Credit: 255,380,797 RAC: 8,740 |
Okay, I finally had time to run the -bench test. Here are the results for the two machines in question: On my dual 5870 card machine I do not have crossfire activated and I am having no trouble with that machine. |
Send message Joined: 5 May 11 Posts: 17 Credit: 103,092,604 RAC: 0 |
I tried with CF on and CF off. Even took one card out. Until I set the core to 3 in Moo Wrapper perferences here it always took one hour+. Dan |
Send message Joined: 2 May 11 Posts: 65 Credit: 242,754,987 RAC: 0 |
Conan: the -bench that you're running is per cruncher. So if you have 2x5870, it will launched 2 crunchers, which will result in doubling the speed. |
Send message Joined: 3 May 11 Posts: 11 Credit: 992,778,237 RAC: 535,148 |
Vista64 with 1xHD5870 and 1xHD5850 (Catalyst 11.5) running on a AMD Phenom II x4 955 After running the -bench command i switched to the recommended core 3, but the load of the cards is still not over 90% on both of them. dnetc v2.9109-518-GTR-10092921 for ATI Stream on Win32 (WindowsNT 6.0). Then i suspended the BOINC CPU tasks and the load on both GPUs went up to over 95% but the CPU load for dnetc518-win32-x86-stream.exe went to 50%, meaning it used 2 full cores. This was never the case at DNETC. Something is still not right there. Also noticed that the 1 packet on each GPU finished within a couple seconds, at least according to the load decreases visible with MSI Afterburner and the percentage display within BOINC ??? Also the GPU load indicates that the single packets might have varying runtimes on the same GPU ??? With the BOINC CPU tasks suspended the runtime also improved, as expected, since the load on the GPUs inreceased. DNETC used "distributed.net v2.9108-517 client for ATI Stream on Win32" (see bench result below) with the following job.xml file for ATI cards: <job_desc> dnetc v2.9108-517-GTR-10021520 for ATI Stream on Win32 (WindowsNT 6.0). Is there a reason why you use "distributed.net v2.9109-518" opposed to "distributed.net v2.9108-517" here? |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
Is there a reason why you use "distributed.net v2.9109-518" opposed to "distributed.net v2.9108-517" here? I used it because that's the latest release version (non-alpha). It also does have some additional CPU/GPU detected and a new RC5-72 core that should help with GUI lag, according to release notes. -w |
Send message Joined: 18 May 11 Posts: 46 Credit: 1,254,302,893 RAC: 0 |
As some one mentioned, we would DEARLY love to have the project crunch on individual GPUs rather than use all in the machine. It is more efficient for the computers and more work would get done for the project. Teemu, I hadn't seen this before so please excuse my posts requesting a single GPU client. It's too bad though as like others my dual ATI boxes also don't work well. |