Multi-GPU task takes longer than single-gpu task

\n studio-striking\n

Questions and Answers : Windows : Multi-GPU task takes longer than single-gpu task
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Dan

Send message
Joined: 5 May 11
Posts: 17
Credit: 103,092,604
RAC: 0
Message 240 - Posted: 11 May 2011, 22:44:30 UTC

Ok, I figured it out. I did the check for core performance. It came back saying core 3 was best. I tried changing the config using the local app, but that didn't work. When I set my core 3 in my Moo Wrapper preferences, my times for dual 5870s went to 9 minutes.

Thanks,

Dan
ID: 240 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sitarow

Send message
Joined: 3 May 11
Posts: 8
Credit: 73,794,244
RAC: 0
Message 241 - Posted: 11 May 2011, 23:20:27 UTC - in response to Message 240.  

Ok, I figured it out. I did the check for core performance. It came back saying core 3 was best. I tried changing the config using the local app, but that didn't work. When I set my core 3 in my Moo Wrapper preferences, my times for dual 5870s went to 9 minutes.

Thanks,

Dan



Sounds like its working.

Now the only question I have is what are the points you are getting awarded for each task?
ID: 241 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Dan

Send message
Joined: 5 May 11
Posts: 17
Credit: 103,092,604
RAC: 0
Message 242 - Posted: 11 May 2011, 23:36:35 UTC - in response to Message 241.  

Ok, I figured it out. I did the check for core performance. It came back saying core 3 was best. I tried changing the config using the local app, but that didn't work. When I set my core 3 in my Moo Wrapper preferences, my times for dual 5870s went to 9 minutes.

Thanks,

Dan



Sounds like its working.

Now the only question I have is what are the points you are getting awarded for each task?

12 to 13K. But it's running real hot, 87C. I dropped the clocks to stock.
ID: 242 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sitarow

Send message
Joined: 3 May 11
Posts: 8
Credit: 73,794,244
RAC: 0
Message 243 - Posted: 11 May 2011, 23:40:41 UTC - in response to Message 242.  

Sounds good. And yes it would run hot OC'ed :) I notice that OC of the card is not as important as the OC of the CPU for this project.
ID: 243 · Rating: 0 · rate: Rate + / Rate - Report as offensive
zombie67 [MM]
Avatar

Send message
Joined: 2 May 11
Posts: 47
Credit: 319,540,306
RAC: 1
Message 245 - Posted: 12 May 2011, 2:11:18 UTC

Okay, I finally had time to run the -bench test. Here are the results for the two machines in question:

Single 5870:

[May 12 01:47:17 UTC] RC5-72: using core #0 (IL 4-pipe c).
[May 12 01:47:21 UTC] RC5-72: Benchmark for core #0 (IL 4-pipe c)
0.00:00:02.18 [2,030,898,067 keys/sec]
[May 12 01:47:21 UTC] RC5-72: using core #1 (IL 4-pipe c alt).
[May 12 01:47:28 UTC] RC5-72: Benchmark for core #1 (IL 4-pipe c alt)
0.00:00:03.76 [1,158,838,017 keys/sec]
[May 12 01:47:28 UTC] RC5-72: using core #2 (IL 4-pipe 2 threads).
[May 12 01:47:34 UTC] RC5-72: Benchmark for core #2 (IL 4-pipe 2 threads)
0.00:00:03.98 [1,101,613,569 keys/sec]
[May 12 01:47:34 UTC] RC5-72: using core #3 (IL 4-pipe cs-1).
[May 12 01:47:38 UTC] RC5-72: Benchmark for core #3 (IL 4-pipe cs-1)
0.00:00:02.18 [2,022,288,665 keys/sec]
[May 12 01:47:38 UTC] RC5-72 benchmark summary :
Default core : #0 (IL 4-pipe c)
Fastest core : #0 (IL 4-pipe

Dual 5870:


[May 12 01:57:38 UTC] RC5-72: using core #0 (IL 4-pipe c).
[May 12 01:57:46 UTC] RC5-72: Benchmark for core #0 (IL 4-pipe c)
0.00:00:05.75 [755,039,695 keys/sec]
[May 12 01:57:46 UTC] RC5-72: using core #1 (IL 4-pipe c alt).
[May 12 01:57:52 UTC] RC5-72: Benchmark for core #1 (IL 4-pipe c alt)
0.00:00:03.71 [1,173,790,362 keys/sec]
[May 12 01:57:52 UTC] RC5-72: using core #2 (IL 4-pipe 2 threads).
[May 12 01:57:58 UTC] RC5-72: Benchmark for core #2 (IL 4-pipe 2 threads)
0.00:00:03.34 [1,306,783,130 keys/sec]
[May 12 01:57:58 UTC] RC5-72: using core #3 (IL 4-pipe cs-1).
[May 12 01:58:03 UTC] RC5-72: Benchmark for core #3 (IL 4-pipe cs-1)
0.00:00:02.17 [2,044,294,794 keys/sec]
[May 12 01:58:03 UTC] RC5-72 benchmark summary :
Default core : #0 (IL 4-pipe c)
Fastest core : #3 (IL 4-pipe cs-1)
[May 12 01:58:03 UTC] Core #3 is significantly faster than the default core.
The GPU core selection has been made as a tradeoff be ...
and responsiveness of the graphical desktop.
Please file a bug report along with the output of -cp ...
only if the the faster core selection does not degrad ...

Observations:

1) The possible options are 0, 1, 2, or 3.

2) Option 3 seems to be the way to go. Clearly for the dual, toss-up with 0 for the single.

3) With option 1, the single card benchmarks WAY faster than the dual.

4) With option 3, the benchmarks are almost identical. How can this be?

A question: Does crossfire on/off make any difference? For those few of you that do NOT have this problem, do you have crossfire turned on or off?


Reno, NV
Team SETI.USA
ID: 245 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sitarow

Send message
Joined: 3 May 11
Posts: 8
Credit: 73,794,244
RAC: 0
Message 246 - Posted: 12 May 2011, 2:20:37 UTC - in response to Message 245.  

Okay, I finally had time to run the -bench test. Here are the results for the two machines in question:

Dual 5870:


[May 12 01:57:38 UTC] RC5-72: using core #0 (IL 4-pipe c).
[May 12 01:57:46 UTC] RC5-72: Benchmark for core #0 (IL 4-pipe c)
0.00:00:05.75 [755,039,695 keys/sec]
[May 12 01:57:46 UTC] RC5-72: using core #1 (IL 4-pipe c alt).
[May 12 01:57:52 UTC] RC5-72: Benchmark for core #1 (IL 4-pipe c alt)
0.00:00:03.71 [1,173,790,362 keys/sec]
[May 12 01:57:52 UTC] RC5-72: using core #2 (IL 4-pipe 2 threads).
[May 12 01:57:58 UTC] RC5-72: Benchmark for core #2 (IL 4-pipe 2 threads)
0.00:00:03.34 [1,306,783,130 keys/sec]
[May 12 01:57:58 UTC] RC5-72: using core #3 (IL 4-pipe cs-1).
[May 12 01:58:03 UTC] RC5-72: Benchmark for core #3 (IL 4-pipe cs-1)
0.00:00:02.17 [2,044,294,794 keys/sec]
[May 12 01:58:03 UTC] RC5-72 benchmark summary :
Default core : #0 (IL 4-pipe c)
Fastest core : #3 (IL 4-pipe cs-1)
[May 12 01:58:03 UTC] Core #3 is significantly faster than the default core.


If core 0 is slow then setting core use to 3 will help with speed but pay attention to the credit granted.

With the dual setup the wrapper on your bench will use core 0 for its base and if you set the core to 3 you will get lower credit because it thinks it was a mini rather then full task.

Also I run all my dual + with out crossfire / sli.
At least that is what I have found.
ID: 246 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mumps [MM]

Send message
Joined: 6 May 11
Posts: 2
Credit: 433,431,506
RAC: 995,299
Message 248 - Posted: 12 May 2011, 2:53:11 UTC

Well, when I run -bench repeatedly, I get wild variations in the results per core. Does the system need to be idled to get an accurate -bench result? Shouldn't be running BOINC during the test?
ID: 248 · Rating: 0 · rate: Rate + / Rate - Report as offensive
zombie67 [MM]
Avatar

Send message
Joined: 2 May 11
Posts: 47
Credit: 319,540,306
RAC: 1
Message 249 - Posted: 12 May 2011, 2:55:01 UTC - in response to Message 245.  
Last modified: 12 May 2011, 2:58:36 UTC

4) With option 3, the benchmarks are almost identical. How can this be?

A question: Does crossfire on/off make any difference? For those few of you that do NOT have this problem, do you have crossfire turned on or off?


I added the jumper and turned on crossfire. No change to benchmarks at all.

FWIW, I am seeing only ~21% load on one card, and ~54% load on the other.
Reno, NV
Team SETI.USA
ID: 249 · Rating: 0 · rate: Rate + / Rate - Report as offensive
frankhagen

Send message
Joined: 2 May 11
Posts: 27
Credit: 1,151,788
RAC: 0
Message 250 - Posted: 12 May 2011, 4:32:35 UTC - in response to Message 248.  

Well, when I run -bench repeatedly, I get wild variations in the results per core. Does the system need to be idled to get an accurate -bench result? Shouldn't be running BOINC during the test?


for sure - you'll want to disable everything which could interfer..
ID: 250 · Rating: 0 · rate: Rate + / Rate - Report as offensive
zombie67 [MM]
Avatar

Send message
Joined: 2 May 11
Posts: 47
Credit: 319,540,306
RAC: 1
Message 251 - Posted: 12 May 2011, 6:42:40 UTC - in response to Message 249.  

FWIW, I am seeing only ~21% load on one card, and ~54% load on the other.


Looks like it might be a HW issue, that never exposed itself on DNETC, but does here. I will report back in several days.
Reno, NV
Team SETI.USA
ID: 251 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Conan
Avatar

Send message
Joined: 2 May 11
Posts: 53
Credit: 255,380,797
RAC: 8,740
Message 252 - Posted: 12 May 2011, 7:41:16 UTC - in response to Message 245.  

Okay, I finally had time to run the -bench test. Here are the results for the two machines in question:

Single 5870:

[May 12 01:47:17 UTC] RC5-72: using core #0 (IL 4-pipe c).
[May 12 01:47:21 UTC] RC5-72: Benchmark for core #0 (IL 4-pipe c)
0.00:00:02.18 [2,030,898,067 keys/sec]
[May 12 01:47:21 UTC] RC5-72: using core #1 (IL 4-pipe c alt).
[May 12 01:47:28 UTC] RC5-72: Benchmark for core #1 (IL 4-pipe c alt)
0.00:00:03.76 [1,158,838,017 keys/sec]
[May 12 01:47:28 UTC] RC5-72: using core #2 (IL 4-pipe 2 threads).
[May 12 01:47:34 UTC] RC5-72: Benchmark for core #2 (IL 4-pipe 2 threads)
0.00:00:03.98 [1,101,613,569 keys/sec]
[May 12 01:47:34 UTC] RC5-72: using core #3 (IL 4-pipe cs-1).
[May 12 01:47:38 UTC] RC5-72: Benchmark for core #3 (IL 4-pipe cs-1)
0.00:00:02.18 [2,022,288,665 keys/sec]
[May 12 01:47:38 UTC] RC5-72 benchmark summary :
Default core : #0 (IL 4-pipe c)
Fastest core : #0 (IL 4-pipe

Dual 5870:


[May 12 01:57:38 UTC] RC5-72: using core #0 (IL 4-pipe c).
[May 12 01:57:46 UTC] RC5-72: Benchmark for core #0 (IL 4-pipe c)
0.00:00:05.75 [755,039,695 keys/sec]
[May 12 01:57:46 UTC] RC5-72: using core #1 (IL 4-pipe c alt).
[May 12 01:57:52 UTC] RC5-72: Benchmark for core #1 (IL 4-pipe c alt)
0.00:00:03.71 [1,173,790,362 keys/sec]
[May 12 01:57:52 UTC] RC5-72: using core #2 (IL 4-pipe 2 threads).
[May 12 01:57:58 UTC] RC5-72: Benchmark for core #2 (IL 4-pipe 2 threads)
0.00:00:03.34 [1,306,783,130 keys/sec]
[May 12 01:57:58 UTC] RC5-72: using core #3 (IL 4-pipe cs-1).
[May 12 01:58:03 UTC] RC5-72: Benchmark for core #3 (IL 4-pipe cs-1)
0.00:00:02.17 [2,044,294,794 keys/sec]
[May 12 01:58:03 UTC] RC5-72 benchmark summary :
Default core : #0 (IL 4-pipe c)
Fastest core : #3 (IL 4-pipe cs-1)
[May 12 01:58:03 UTC] Core #3 is significantly faster than the default core.
The GPU core selection has been made as a tradeoff be ...
and responsiveness of the graphical desktop.
Please file a bug report along with the output of -cp ...
only if the the faster core selection does not degrad ...

Observations:

1) The possible options are 0, 1, 2, or 3.

2) Option 3 seems to be the way to go. Clearly for the dual, toss-up with 0 for the single.

3) With option 1, the single card benchmarks WAY faster than the dual.

4) With option 3, the benchmarks are almost identical. How can this be?

A question: Does crossfire on/off make any difference? For those few of you that do NOT have this problem, do you have crossfire turned on or off?



On my dual 5870 card machine I do not have crossfire activated and I am having no trouble with that machine.
ID: 252 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Dan

Send message
Joined: 5 May 11
Posts: 17
Credit: 103,092,604
RAC: 0
Message 253 - Posted: 12 May 2011, 8:35:34 UTC - in response to Message 251.  

I tried with CF on and CF off. Even took one card out. Until I set the core to 3 in Moo Wrapper perferences here it always took one hour+.

Dan
ID: 253 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Clod Patry
Volunteer moderator
Volunteer developer
Volunteer tester

Send message
Joined: 2 May 11
Posts: 65
Credit: 242,754,987
RAC: 0
Message 259 - Posted: 12 May 2011, 14:10:20 UTC

Conan: the -bench that you're running is per cruncher.
So if you have 2x5870, it will launched 2 crunchers, which will result in doubling the speed.
ID: 259 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Marty

Send message
Joined: 3 May 11
Posts: 11
Credit: 992,745,689
RAC: 535,216
Message 334 - Posted: 16 May 2011, 14:55:17 UTC
Last modified: 16 May 2011, 15:39:07 UTC

Vista64 with 1xHD5870 and 1xHD5850 (Catalyst 11.5) running on a AMD Phenom II x4 955

After running the -bench command i switched to the recommended core 3, but the load of the cards is still not over 90% on both of them.

dnetc v2.9109-518-GTR-10092921 for ATI Stream on Win32 (WindowsNT 6.0).
Please provide the *entire* version descriptor when submitting bug reports.
The distributed.net bug report pages are at http://bugs.distributed.net/

[May 16 15:23:57 UTC] RC5-72: using core #0 (IL 4-pipe c).
[May 16 15:24:02 UTC] RC5-72: Benchmark for core #0 (IL 4-pipe c)
0.00:00:02.26 [1,944,551,487 keys/sec]
[May 16 15:24:02 UTC] RC5-72: using core #1 (IL 4-pipe c alt).
[May 16 15:24:08 UTC] RC5-72: Benchmark for core #1 (IL 4-pipe c alt)
0.00:00:03.86 [1,128,436,111 keys/sec]
[May 16 15:24:08 UTC] RC5-72: using core #2 (IL 4-pipe 2 threads).
[May 16 15:24:29 UTC] RC5-72: Benchmark for core #2 (IL 4-pipe 2 threads)
0.00:00:17.23 [249,835,586 keys/sec]
[May 16 15:24:29 UTC] RC5-72: using core #3 (IL 4-pipe cs-1).
[May 16 15:24:33 UTC] RC5-72: Benchmark for core #3 (IL 4-pipe cs-1)
0.00:00:02.23 [1,998,692,544 keys/sec]
[May 16 15:24:33 UTC] RC5-72 benchmark summary :
Default core : #0 (IL 4-pipe c)
Fastest core : #3 (IL 4-pipe cs-1)
[May 16 15:24:33 UTC] Core #3 is marginally faster than the default core.
Testing variability might lead to pick one or the other.


Then i suspended the BOINC CPU tasks and the load on both GPUs went up to over 95% but the CPU load for dnetc518-win32-x86-stream.exe went to 50%, meaning it used 2 full cores. This was never the case at DNETC. Something is still not right there.
Also noticed that the 1 packet on each GPU finished within a couple seconds, at least according to the load decreases visible with MSI Afterburner and the percentage display within BOINC ??? Also the GPU load indicates that the single packets might have varying runtimes on the same GPU ???

With the BOINC CPU tasks suspended the runtime also improved, as expected, since the load on the GPUs inreceased.

DNETC used "distributed.net v2.9108-517 client for ATI Stream on Win32" (see bench result below) with the following job.xml file for ATI cards:
<job_desc>
<task>
<application>dnetc_1.31_windows_intelx86__ati14.exe</application>
<command_line>-runoffline -multiok=1 -ckpoint chkpoint -pausefile pause -exitfile exit -inbase in -outbase out -priority 5 -n -1 -runbuffers -l stderr.txt</command_line>
</task>
</job_desc>


dnetc v2.9108-517-GTR-10021520 for ATI Stream on Win32 (WindowsNT 6.0).
Please provide the *entire* version descriptor when submitting bug reports.
The distributed.net bug report pages are at http://bugs.distributed.net/

[May 16 15:38:05 UTC] RC5-72: using core #0 (IL 4-pipe c).
[May 16 15:38:10 UTC] RC5-72: Benchmark for core #0 (IL 4-pipe c)
0.00:00:02.38 [1,835,759,074 keys/sec]
[May 16 15:38:10 UTC] RC5-72: using core #1 (IL 4-pipe c alt).
[May 16 15:38:16 UTC] RC5-72: Benchmark for core #1 (IL 4-pipe c alt)
0.00:00:03.90 [1,119,097,905 keys/sec]
[May 16 15:38:16 UTC] RC5-72: using core #2 (IL 4-pipe 2 threads).
[May 16 15:38:23 UTC] RC5-72: Benchmark for core #2 (IL 4-pipe 2 threads)
0.00:00:03.72 [1,182,694,097 keys/sec]
[May 16 15:38:23 UTC] RC5-72 benchmark summary :
Default core : #0 (IL 4-pipe c)
Fastest core : #0 (IL 4-pipe c)


Is there a reason why you use "distributed.net v2.9109-518" opposed to "distributed.net v2.9108-517" here?
ID: 334 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 373 - Posted: 19 May 2011, 0:51:27 UTC - in response to Message 334.  

Is there a reason why you use "distributed.net v2.9109-518" opposed to "distributed.net v2.9108-517" here?


I used it because that's the latest release version (non-alpha). It also does have some additional CPU/GPU detected and a new RC5-72 core that should help with GUI lag, according to release notes.

-w
ID: 373 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Beyond
Avatar

Send message
Joined: 18 May 11
Posts: 46
Credit: 1,254,302,893
RAC: 0
Message 468 - Posted: 24 May 2011, 23:04:51 UTC - in response to Message 99.  

As some one mentioned, we would DEARLY love to have the project crunch on individual GPUs rather than use all in the machine. It is more efficient for the computers and more work would get done for the project.

Hi,

I understand that request but unfortunately this is not up to me. Distributed.net Client doesn't allow selecting which GPU they use and they detect and use all of them. Until they support this switch, there's nothing much I can do. Sorry. :(

-w

Teemu, I hadn't seen this before so please excuse my posts requesting a single GPU client. It's too bad though as like others my dual ATI boxes also don't work well.
ID: 468 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 · 2 · 3

Questions and Answers : Windows : Multi-GPU task takes longer than single-gpu task


 
Copyright © 2011-2024 Moo! Wrapper Project