Low CPU utilization with differing GPU models

\n studio-striking\n

Message boards : Number crunching : Low CPU utilization with differing GPU models
Message board moderation

To post messages, you must log in.

AuthorMessage
Alyx

Send message
Joined: 15 May 11
Posts: 3
Credit: 10,445,835
RAC: 0
Message 1983 - Posted: 30 Dec 2011, 3:55:22 UTC

I'm unsure if having differing GPU models is causing this, but I've got a GTX460 and a GTX570 in a machine. The WUs are attempting to use both GPUs but i'm getting 0% utilization with occasional spikes.
ID: 1983 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,826,336,240
RAC: 3,658
Message 1988 - Posted: 30 Dec 2011, 12:45:21 UTC - in response to Message 1983.  

I'm unsure if having differing GPU models is causing this, but I've got a GTX460 and a GTX570 in a machine. The WUs are attempting to use both GPUs but i'm getting 0% utilization with occasional spikes.


By design Moo uses ALL gpu's in the machine to work on a single workunit, I would suggest you take one of your gpu's out and see if you can crunch a unit, if so put it back in and take out the other gpu and see if it works. If both work independently then it could be as simple as have you loaded the Nvidia driver software twice? Does Boinc see both cars, look in the Messages near the top and see if both cards are found. Two cards that are not identical can be very problematic getting them to work! But others have come up with the right tweak so don't give up yet!
ID: 1988 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Alyx

Send message
Joined: 15 May 11
Posts: 3
Credit: 10,445,835
RAC: 0
Message 1999 - Posted: 30 Dec 2011, 23:46:36 UTC

Thanks for reply. Its good to hear that some are having success. I'll keep fiddling with it.
ID: 1999 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 2044 - Posted: 2 Jan 2012, 0:47:22 UTC - in response to Message 1983.  

I'm unsure if having differing GPU models is causing this, but I've got a GTX460 and a GTX570 in a machine. The WUs are attempting to use both GPUs but i'm getting 0% utilization with occasional spikes.


Hmm, that spiking doesn't sound good. If everything is optimal, the cards should be constantly loaded, if not 100% but close. (Except occasionally your faster card would finish all work before the slower one so faster card would flatline until slower card finishes the last packet.)

This might be a driver issue, or it could be due to not having enough CPU power available for the D.Net Client to keep feeding work to your GPUs. It's also possible that the cards are overheating, or OS/some other apps is interfering, to the level to stop running our workload.

-w
ID: 2044 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Beyond
Avatar

Send message
Joined: 18 May 11
Posts: 46
Credit: 1,254,302,893
RAC: 0
Message 2053 - Posted: 2 Jan 2012, 16:08:57 UTC - in response to Message 2044.  
Last modified: 2 Jan 2012, 16:50:48 UTC

I'm unsure if having differing GPU models is causing this, but I've got a GTX460 and a GTX570 in a machine. The WUs are attempting to use both GPUs but i'm getting 0% utilization with occasional spikes.

Hmm, that spiking doesn't sound good. If everything is optimal, the cards should be constantly loaded, if not 100% but close. (Except occasionally your faster card would finish all work before the slower one so faster card would flatline until slower card finishes the last packet.)

This might be a driver issue, or it could be due to not having enough CPU power available for the D.Net Client to keep feeding work to your GPUs. It's also possible that the cards are overheating, or OS/some other apps is interfering, to the level to stop running our workload. -w

My Moo! machines are all ATI and I'm seeing this spiking also on all 7. Drivers are 11.9 and 10.12, both run about the same here. Doesn't matter if there's any other CPU load or not, Afterburner shows the GPU load jumping from around 40% to as high as 99% (sometimes staying at 40-80% for most of the WU). A very few WUs will run at ~99% all the way. Looking at the WU completion times for these boxes compared to similar machines on the stats pages they seem to be as fast as anything out there so I assume this is normal.

Tried running 4 dual ATI GPU boxes again. One (2 x 5850 cards) ran OK but now and then a WU would just stop processing until I intervened. Not good. Two others ran miserably slow and I mean SLOW, so they got pulled quickly. The 4th is running OK so far. Interestingly it has 2 x 4770 cards and 10.12 drivers. Go figure. The BOINC clients are 6.12.41-43 on everything. All are Win7-64.

Edit: Also the dual GPU machine uses about 40% of a CPU core for Moo! while the single GPU machines all use < 1%.
ID: 2053 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 2055 - Posted: 2 Jan 2012, 17:11:21 UTC

.....If everything is optimal, the cards should be constantly loaded, if not 100% but close.....


That 100% loading stopped as soon as the fragmented units arrived - been jumping around on utilisation ever since. The closer to 12 (in the second last number in file name) it gets the more the utilisation. As crude examples ...

.... _400_768 will get around 9min + utilisation everywhere in band 65-99%

.... _12_768 will pretty well be on the money 8mins utilisation around 99%

Truth of most WUs lay in the middle of the two as the parameters and contents of the WU changes.

The fragmented units are a real pain ..... I've started suspending individual WUs of less than 100 groups, batch them then do a run of the collected ones with different ccc settings else its hard to keep temperatures and maximum utilisation constant, there is too wide a variation and without batching like this the ccc setting is way low to keep temperatures acceptable.

I hope Upstream get a grip of this soon, and clear out the fragmented ones, its getting real irritating baby sitting them.

Regards
Zy
ID: 2055 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Beyond
Avatar

Send message
Joined: 18 May 11
Posts: 46
Credit: 1,254,302,893
RAC: 0
Message 2058 - Posted: 2 Jan 2012, 17:50:15 UTC - in response to Message 2055.  

That 100% loading stopped as soon as the fragmented units arrived - been jumping around on utilisation ever since. The closer to 12 (in the second last number in file name) it gets the more the utilisation. As crude examples ...

.... _400_768 will get around 9min + utilisation everywhere in band 65-99%

.... _12_768 will pretty well be on the money 8mins utilisation around 99%

Thanks Zydor for the explanation. Looking at the WUs, that's exactly what's happening here. The one's with the second to the last number closer to 12 are using 97-99% while the others are jumping all over the place. Completion times aren't that bad though even on the fragmented WUs. Are things running worse with your 4 GPU machine and is that the 12.1 preview driver? Better for you than 11.9?
ID: 2058 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 2059 - Posted: 2 Jan 2012, 18:22:57 UTC - in response to Message 2058.  

I am on 12.1 - I changed when I did the rebuilds to both my machines, that may have complicated things this end in terms of my perception of events as the fragmented units started whilst I was away doing the rebuilds.

I'll change to 11.9 see what happens - send out a search party if I am not back in 30 mins rofl :)

Regards
Zy
ID: 2059 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Beyond
Avatar

Send message
Joined: 18 May 11
Posts: 46
Credit: 1,254,302,893
RAC: 0
Message 2066 - Posted: 2 Jan 2012, 22:02:54 UTC - in response to Message 2059.  

I'll change to 11.9 see what happens - send out a search party if I am not back in 30 mins rofl :) Regards Zy

Rescue dogs sent, single malt scotch is in the flask around the neck (Guinness is in the backpack)...
ID: 2066 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 2067 - Posted: 2 Jan 2012, 22:03:38 UTC - in response to Message 2066.  

*hic*

Ta :)

Regards
Zy
ID: 2067 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Beyond
Avatar

Send message
Joined: 18 May 11
Posts: 46
Credit: 1,254,302,893
RAC: 0
Message 2082 - Posted: 3 Jan 2012, 16:43:38 UTC - in response to Message 2067.  

*hic* Ta :) Zy

After recovering did you gain any insights on drivers?
ID: 2082 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 2083 - Posted: 3 Jan 2012, 17:47:32 UTC
Last modified: 3 Jan 2012, 17:48:49 UTC

More observation than insight I guess. Both boxes were ok (or seemed to be) after the rebuilds, but then seemed to decline and fall over.

I switched my No2 box from 12.1 to 11.9, no change or difference, so left it there on 11.9. It has a single 5850 in it, and the driver hassles are mainly multi-GPU. With this box, it turns out the main problem was the "non-obvious" system drivers, the background ones. I ran a Driver Updater through it and it picked up a dozen candidates, and I let it go for the lot. Worked a treat. Been stable ever since.

My main box (twin 5970) was on 12.1, switched to 11.9, and if anything it was worse. I also lost the multi-GPU improvements in 12.1 doing that (fixed CPU useage bug, correct lapsed time reporting on WUs etc). So I switched back to 12.1, got it going ok in the end by doing a real clean hard sweep with Driver Cleaner and a Registry Cleaner, its stable now.

So.... was it 12.1 all along, in truth, I dont think so. Main Box turned out to be a fragmented graphics driver that needed cleaning out (again), and No2 box needed some system driver updates. Its easy to point fingers at the driver saying "not me guv", but in truth this time I think it was my fault. See how it goes, but both boxes been stable now for hours, so I think all is well again.

Regards
Zy
ID: 2083 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Beyond
Avatar

Send message
Joined: 18 May 11
Posts: 46
Credit: 1,254,302,893
RAC: 0
Message 2084 - Posted: 3 Jan 2012, 18:31:34 UTC - in response to Message 2083.  

Well I'm confused now. On the Moo site your single GPU box shows as 1.4.1607, which I think is 11.11:

Catalyst 11.11 WHQL is an OpenGL 4.1 and OpenCL 1.1 driver:
- OpenCL: CL_PLATFORM_VERSION: OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1)
- OpenCL: CL_DRIVER_VERSION: CAL 1.4.1607 (VM)
- OpenGL version: 4.1.11251

http://www.geeks3d.com/20111116/amd-catalyst-11-11-whql/

All my 11.9 machines are reporting 1.4.1546.
ID: 2084 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Message boards : Number crunching : Low CPU utilization with differing GPU models


 
Copyright © 2011-2024 Moo! Wrapper Project