Message boards :
Number crunching :
Low CPU utilization with differing GPU models
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 May 11 Posts: 3 Credit: 10,445,835 RAC: 0 |
I'm unsure if having differing GPU models is causing this, but I've got a GTX460 and a GTX570 in a machine. The WUs are attempting to use both GPUs but i'm getting 0% utilization with occasional spikes. |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,401,288 RAC: 3,256 |
I'm unsure if having differing GPU models is causing this, but I've got a GTX460 and a GTX570 in a machine. The WUs are attempting to use both GPUs but i'm getting 0% utilization with occasional spikes. By design Moo uses ALL gpu's in the machine to work on a single workunit, I would suggest you take one of your gpu's out and see if you can crunch a unit, if so put it back in and take out the other gpu and see if it works. If both work independently then it could be as simple as have you loaded the Nvidia driver software twice? Does Boinc see both cars, look in the Messages near the top and see if both cards are found. Two cards that are not identical can be very problematic getting them to work! But others have come up with the right tweak so don't give up yet! |
Send message Joined: 15 May 11 Posts: 3 Credit: 10,445,835 RAC: 0 |
Thanks for reply. Its good to hear that some are having success. I'll keep fiddling with it. |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
I'm unsure if having differing GPU models is causing this, but I've got a GTX460 and a GTX570 in a machine. The WUs are attempting to use both GPUs but i'm getting 0% utilization with occasional spikes. Hmm, that spiking doesn't sound good. If everything is optimal, the cards should be constantly loaded, if not 100% but close. (Except occasionally your faster card would finish all work before the slower one so faster card would flatline until slower card finishes the last packet.) This might be a driver issue, or it could be due to not having enough CPU power available for the D.Net Client to keep feeding work to your GPUs. It's also possible that the cards are overheating, or OS/some other apps is interfering, to the level to stop running our workload. -w |
Send message Joined: 18 May 11 Posts: 46 Credit: 1,254,302,893 RAC: 0 |
I'm unsure if having differing GPU models is causing this, but I've got a GTX460 and a GTX570 in a machine. The WUs are attempting to use both GPUs but i'm getting 0% utilization with occasional spikes. My Moo! machines are all ATI and I'm seeing this spiking also on all 7. Drivers are 11.9 and 10.12, both run about the same here. Doesn't matter if there's any other CPU load or not, Afterburner shows the GPU load jumping from around 40% to as high as 99% (sometimes staying at 40-80% for most of the WU). A very few WUs will run at ~99% all the way. Looking at the WU completion times for these boxes compared to similar machines on the stats pages they seem to be as fast as anything out there so I assume this is normal. Tried running 4 dual ATI GPU boxes again. One (2 x 5850 cards) ran OK but now and then a WU would just stop processing until I intervened. Not good. Two others ran miserably slow and I mean SLOW, so they got pulled quickly. The 4th is running OK so far. Interestingly it has 2 x 4770 cards and 10.12 drivers. Go figure. The BOINC clients are 6.12.41-43 on everything. All are Win7-64. Edit: Also the dual GPU machine uses about 40% of a CPU core for Moo! while the single GPU machines all use < 1%. |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
.....If everything is optimal, the cards should be constantly loaded, if not 100% but close..... That 100% loading stopped as soon as the fragmented units arrived - been jumping around on utilisation ever since. The closer to 12 (in the second last number in file name) it gets the more the utilisation. As crude examples ... .... _400_768 will get around 9min + utilisation everywhere in band 65-99% .... _12_768 will pretty well be on the money 8mins utilisation around 99% Truth of most WUs lay in the middle of the two as the parameters and contents of the WU changes. The fragmented units are a real pain ..... I've started suspending individual WUs of less than 100 groups, batch them then do a run of the collected ones with different ccc settings else its hard to keep temperatures and maximum utilisation constant, there is too wide a variation and without batching like this the ccc setting is way low to keep temperatures acceptable. I hope Upstream get a grip of this soon, and clear out the fragmented ones, its getting real irritating baby sitting them. Regards Zy |
Send message Joined: 18 May 11 Posts: 46 Credit: 1,254,302,893 RAC: 0 |
That 100% loading stopped as soon as the fragmented units arrived - been jumping around on utilisation ever since. The closer to 12 (in the second last number in file name) it gets the more the utilisation. As crude examples ... Thanks Zydor for the explanation. Looking at the WUs, that's exactly what's happening here. The one's with the second to the last number closer to 12 are using 97-99% while the others are jumping all over the place. Completion times aren't that bad though even on the fragmented WUs. Are things running worse with your 4 GPU machine and is that the 12.1 preview driver? Better for you than 11.9? |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
I am on 12.1 - I changed when I did the rebuilds to both my machines, that may have complicated things this end in terms of my perception of events as the fragmented units started whilst I was away doing the rebuilds. I'll change to 11.9 see what happens - send out a search party if I am not back in 30 mins rofl :) Regards Zy |
Send message Joined: 18 May 11 Posts: 46 Credit: 1,254,302,893 RAC: 0 |
I'll change to 11.9 see what happens - send out a search party if I am not back in 30 mins rofl :) Regards Zy Rescue dogs sent, single malt scotch is in the flask around the neck (Guinness is in the backpack)... |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
*hic* Ta :) Regards Zy |
Send message Joined: 18 May 11 Posts: 46 Credit: 1,254,302,893 RAC: 0 |
*hic* Ta :) Zy After recovering did you gain any insights on drivers? |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
More observation than insight I guess. Both boxes were ok (or seemed to be) after the rebuilds, but then seemed to decline and fall over. I switched my No2 box from 12.1 to 11.9, no change or difference, so left it there on 11.9. It has a single 5850 in it, and the driver hassles are mainly multi-GPU. With this box, it turns out the main problem was the "non-obvious" system drivers, the background ones. I ran a Driver Updater through it and it picked up a dozen candidates, and I let it go for the lot. Worked a treat. Been stable ever since. My main box (twin 5970) was on 12.1, switched to 11.9, and if anything it was worse. I also lost the multi-GPU improvements in 12.1 doing that (fixed CPU useage bug, correct lapsed time reporting on WUs etc). So I switched back to 12.1, got it going ok in the end by doing a real clean hard sweep with Driver Cleaner and a Registry Cleaner, its stable now. So.... was it 12.1 all along, in truth, I dont think so. Main Box turned out to be a fragmented graphics driver that needed cleaning out (again), and No2 box needed some system driver updates. Its easy to point fingers at the driver saying "not me guv", but in truth this time I think it was my fault. See how it goes, but both boxes been stable now for hours, so I think all is well again. Regards Zy |
Send message Joined: 18 May 11 Posts: 46 Credit: 1,254,302,893 RAC: 0 |
Well I'm confused now. On the Moo site your single GPU box shows as 1.4.1607, which I think is 11.11: Catalyst 11.11 WHQL is an OpenGL 4.1 and OpenCL 1.1 driver: - OpenCL: CL_PLATFORM_VERSION: OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1) - OpenCL: CL_DRIVER_VERSION: CAL 1.4.1607 (VM) - OpenGL version: 4.1.11251 http://www.geeks3d.com/20111116/amd-catalyst-11-11-whql/ All my 11.9 machines are reporting 1.4.1546. |