Multi-GPU task takes longer than single-gpu task

Author	Message
zombie67 [MM] Send message Joined: 2 May 11 Posts: 47 Credit: 319,540,306 RAC: 0	Message 47 - Posted: 2 May 2011, 21:58:36 UTC Last modified: 2 May 2011, 21:58:43 UTC I have two machines, one with a single 5870, and one with two 5870s. A task on the single GPU machine takes 500-900 seconds. A task on the dual GPU machine takes 2600-3400 seconds. The dual GPU machine takes 4-5 times longer than a single GPU? It should be less, not more. Looks like the app may be broken. Single 5870: http://moowrap.net/results.php?hostid=116&offset=0&show_names=0&state=3&appid= Dual 5870: http://moowrap.net/results.php?hostid=118&offset=0&show_names=0&state=3&appid= Reno, NV Team SETI.USA ID: 47 · Rating: 0 · rate: /

Marty Send message Joined: 3 May 11 Posts: 11 Credit: 1,023,372,603 RAC: 0	Message 66 - Posted: 3 May 2011, 12:41:09 UTC - in response to Message 47. Question for project admin: Assuming this project is built on DNETC sources: Was something changed in the ATI application. On my multi GPU system (1x 5870 and 1x 5850) it behaves differently than on DNETC. One of the two GPUs only gets back up to 50% GPU usage after it finished the first segment (block of work). Previously both GPUs were running over 90% GPU usage the whole time, except for the end when the faster GPU has to wait for the slower on to finish it's last segment. ID: 66 · Rating: 0 · rate: /

Sabroe_SMC Send message Joined: 3 May 11 Posts: 5 Credit: 1,001,872,600 RAC: 0	Message 69 - Posted: 3 May 2011, 14:50:43 UTC When MooWrapper is running on my dual Gpu rig (GTX570 + 480) only one of the GPUs is used. The other one is sleeping. But the app is taking both of them. Not real good. Bye bye ID: 69 · Rating: 0 · rate: /

Teemu Mannermaa Project administrator Project developer Project tester Send message Joined: 20 Apr 11 Posts: 389 Credit: 822,556,349 RAC: 0	Message 80 - Posted: 3 May 2011, 23:42:54 UTC - in response to Message 66. Assuming this project is built on DNETC sources: Was something changed in the ATI application. I'm using Distributed.net Client without any changes just like they are on their download site. I don't have their sources so I can't build a custom client (there's also no need for such thing). One of the two GPUs only gets back up to 50% GPU usage after it finished the first segment (block of work). Previously both GPUs were running over 90% GPU usage the whole time, except for the end when the faster GPU has to wait for the slower on to finish it's last segment. Interesting as it should behave like the latter part. That's also how it works on my own system (and on many others too, I would hope). This can be a bug in the dnet client but there's also some changes I can do in my wrapper code to see if things improve. I believe this is also something dnetc@home battled with some earlier versions of their apps. Not sure if they ever got it working since I obviously didn't have that problem. :) -w ID: 80 · Rating: 0 · rate: /

[SETI.USA]Tank_Master Send message Joined: 2 May 11 Posts: 4 Credit: 28,716,459 RAC: 0	Message 81 - Posted: 4 May 2011, 1:53:01 UTC I am getting the same thing on my nvidia cards. I have a GeForce 295 (dual GPU) and a GeForce 430 in the same machine. Currently, only one of the 295 GPUs is 50% active and the other 2 are completly idle. I to have complete a few WUs on my nvidia card. The single ATI 6970 card in the same computer is working as one would expect. Might I sugest running one WU on each GPU core instead of one WU for all GPUs? This may minimise issues for systems with a mix of varing card types as well. ID: 81 · Rating: 0 · rate: /

zombie67 [MM] Send message Joined: 2 May 11 Posts: 47 Credit: 319,540,306 RAC: 0	Message 82 - Posted: 4 May 2011, 2:28:31 UTC - in response to Message 80. I believe this is also something dnetc@home battled with some earlier versions of their apps. Not sure if they ever got it working since I obviously didn't have that problem. :) It is clearly not working on most multi-GPU machines. Even when the GPUs are identical. What is going to be done to address it? Anything? Reno, NV Team SETI.USA ID: 82 · Rating: 0 · rate: /

Maxwell [MM] Send message Joined: 2 May 11 Posts: 5 Credit: 100,847,975 RAC: 0	Message 83 - Posted: 4 May 2011, 2:34:23 UTC This is a solvable problem - the old DNETC project did it. A 2x5870 machine should complete a WU in ~55% of the time it takes a single 5870 machine. If that's not happening, something is clearly wrong. ID: 83 · Rating: 0 · rate: /

Shadow.SETI.USA [TopGun] Send message Joined: 2 May 11 Posts: 8 Credit: 157,289,433 RAC: 0	Message 84 - Posted: 4 May 2011, 3:33:20 UTC - in response to Message 80. Last modified: 4 May 2011, 3:34:04 UTC Assuming this project is built on DNETC sources: Was something changed in the ATI application. I'm using Distributed.net Client without any changes just like they are on their download site. I don't have their sources so I can't build a custom client (there's also no need for such thing). One of the two GPUs only gets back up to 50% GPU usage after it finished the first segment (block of work). Previously both GPUs were running over 90% GPU usage the whole time, except for the end when the faster GPU has to wait for the slower on to finish it's last segment. Interesting as it should behave like the latter part. That's also how it works on my own system (and on many others too, I would hope). This can be a bug in the dnet client but there's also some changes I can do in my wrapper code to see if things improve. I believe this is also something dnetc@home battled with some earlier versions of their apps. Not sure if they ever got it working since I obviously didn't have that problem. :) -w I think the problem with Dnetc was that some work units had an odd number of packets, so one GPU would still be working while the other had nothing to do. Then it took a few workunits for the other one to kick back in again. So a WU with 12 packets would run normal with 2 GPU's, yet one with 11 or 13 packets would cause at least one (if not both) GPU to hang. ID: 84 · Rating: 0 · rate: /

Marty Send message Joined: 3 May 11 Posts: 11 Credit: 1,023,372,603 RAC: 0	Message 85 - Posted: 4 May 2011, 6:19:46 UTC - in response to Message 84. Last modified: 4 May 2011, 6:22:36 UTC I think the problem with Dnetc was that some work units had an odd number of packets, so one GPU would still be working while the other had nothing to do. Then it took a few workunits for the other one to kick back in again. So a WU with 12 packets would run normal with 2 GPU's, yet one with 11 or 13 packets would cause at least one (if not both) GPU to hang. The case of the uneven packets explains the idle GPU at the end or the faster GPU idling with even packets. I see one GPU run 50% through most/all of it's packets while the other one runs at 100%. It all might be connected with the high CPU issue described in the thread. Haven't had time to test this by disabling CPU computation and running Moowrap exclusively. Edit: Ok for this kind of credit i can live with idling GPUs :D 14379 12212 3 May 2011 \| 12:13:55 UTC 3 May 2011 \| 16:18:37 UTC Completed and validated 681.87 62.29 8,590.69 Distributed.net Client v1.00 (ati14) 14183 12017 3 May 2011 \| 12:13:55 UTC 4 May 2011 \| 6:10:40 UTC Completed and validated 720.32 5.48 9,075.10 Distributed.net Client v1.00 (ati14) 14179 12013 3 May 2011 \| 12:13:55 UTC 4 May 2011 \| 6:10:40 UTC Completed and validated 697.41 19.08 8,786.55 Distributed.net Client v1.00 (ati14) ID: 85 · Rating: 0 · rate: /

Bryan Send message Joined: 2 May 11 Posts: 15 Credit: 370,678,308 RAC: 0	Message 98 - Posted: 4 May 2011, 18:01:25 UTC As some one mentioned, we would DEARLY love to have the project crunch on individual GPUs rather than use all in the machine. It is more efficient for the computers and more work would get done for the project. DNETC blew us off and refused to address this. It is frustrating to see a GPU sit idle because there are an odd number of packets to process. ID: 98 · Rating: 0 · rate: /

Teemu Mannermaa Project administrator Project developer Project tester Send message Joined: 20 Apr 11 Posts: 389 Credit: 822,556,349 RAC: 0	Message 99 - Posted: 4 May 2011, 20:25:09 UTC - in response to Message 98. As some one mentioned, we would DEARLY love to have the project crunch on individual GPUs rather than use all in the machine. It is more efficient for the computers and more work would get done for the project. Hi, I understand that request but unfortunately this is not up to me. Distributed.net Client doesn't allow selecting which GPU they use and they detect and use all of them. Until they support this switch, there's nothing much I can do. Sorry. :( -w ID: 99 · Rating: 0 · rate: /

Bryan Send message Joined: 2 May 11 Posts: 15 Credit: 370,678,308 RAC: 0	Message 108 - Posted: 5 May 2011, 6:04:45 UTC - in response to Message 99. As some one mentioned, we would DEARLY love to have the project crunch on individual GPUs rather than use all in the machine. It is more efficient for the computers and more work would get done for the project. Hi, I understand that request but unfortunately this is not up to me. Distributed.net Client doesn't allow selecting which GPU they use and they detect and use all of them. Until they support this switch, there's nothing much I can do. Sorry. :( -w I thank you for an honest reply :) DNETC basically blew us off and never addressed the request/problem. ID: 108 · Rating: 0 · rate: /

Shadow.SETI.USA [TopGun] Send message Joined: 2 May 11 Posts: 8 Credit: 157,289,433 RAC: 0	Message 112 - Posted: 5 May 2011, 13:41:28 UTC - in response to Message 99. As some one mentioned, we would DEARLY love to have the project crunch on individual GPUs rather than use all in the machine. It is more efficient for the computers and more work would get done for the project. Hi, I understand that request but unfortunately this is not up to me. Distributed.net Client doesn't allow selecting which GPU they use and they detect and use all of them. Until they support this switch, there's nothing much I can do. Sorry. :( -w Is there a way to change the number of packets in each work unit? If they all had an even number of packets, it would at least eliminate one of the GPU's idling in a multi GPU setup. ID: 112 · Rating: 0 · rate: /

frankhagen Send message Joined: 2 May 11 Posts: 27 Credit: 1,151,788 RAC: 0	Message 113 - Posted: 5 May 2011, 13:57:11 UTC - in response to Message 112. Is there a way to change the number of packets in each work unit? If they all had an even number of packets, it would at least eliminate one of the GPU's idling in a multi GPU setup. at least some of them - but there are also packets with less than 64 units. of course it would be ideal to have an even nummber of packets with the same number of unites contained. ID: 113 · Rating: 0 · rate: /

Teemu Mannermaa Project administrator Project developer Project tester Send message Joined: 20 Apr 11 Posts: 389 Credit: 822,556,349 RAC: 0	Message 118 - Posted: 5 May 2011, 17:38:37 UTC - in response to Message 112. Due to the way I'm generating the work, it would be hard make exact packages all the time. There will be some variance in the work amount. Although, there are some things I could change and one solution might be to eventually allocate work for hosts based on number of packets in a wu and number of devices (can be even or odd) on a host. I will keep these problems in mind but I think at the moment best thing is to make the bulk of the run work better for more people. After that we can think about optimizing the relative short end of the run. :) -w ID: 118 · Rating: 0 · rate: /

kashi Send message Joined: 5 May 11 Posts: 7 Credit: 13,680,807 RAC: 0	Message 125 - Posted: 6 May 2011, 0:14:38 UTC Tasks take one hour on my HD 5970 @ 770/600. Use 2 full CPU cores. Stderr output shows application is restarting every 5 minutes. With no CPU projects running GPU usage is 99% but temperature of core and VRMs is low and power draw is also low. The GPU engines are revving at full speed but have almost no torque. Xeon W3520 @ 3.1 GHz, Win 7 64, BOINC 6.12.22. Cat 11.4 Preview (1.4.1353). ID: 125 · Rating: 0 · rate: /

3327 Send message Joined: 5 May 11 Posts: 3 Credit: 585,243 RAC: 0	Message 148 - Posted: 7 May 2011, 10:41:57 UTC - in response to Message 125. Tasks take one hour on my HD 5970 @ 770/600. Use 2 full CPU cores. Stderr output shows application is restarting every 5 minutes. With no CPU projects running GPU usage is 99% but temperature of core and VRMs is low and power draw is also low. The GPU engines are revving at full speed but have almost no torque. Xeon W3520 @ 3.1 GHz, Win 7 64, BOINC 6.12.22. Cat 11.4 Preview (1.4.1353). I made my first attempt, last night, with moowrap--2x5870s, hesitant because of the infamous hanging cpu problem some of us experienced with dnetc. every WU I tried, 4, immediately jumped to 14.814% with 25% cpu, 2 threads or 1 core, of an i7. the first error-ed after I suspended and resumed, but I aborted the others at ~15min, still at 14.814%, given I see numerous machines with 2x5870s completing WUs in~600secs. kashi, I have the same indications, low core and vrm temps, yet both gpus (950/1350) supposedly at 100%. I hope I am wrong, but there is no reason to believe this is not the same mess that plagued some of us trying to crunch dnetc. the only differences as I see them are that I have two threads, an entire core at 100% of an i7, and that I hav yet to get one that crunched or that I could tolerate until the end. the dnetc problem was sporadic and the WUs took at most 13min though most of the hanging with the cpu occurred after the gpus crunched the WU. the dnetc crew said there was nothing they could do about it, just a wrapper, and that distributed.net was not very helpful because so few had the issue or at least not enough complained about it. if you or anyone has ideas regarding the mysterious aspect of some dual 5870 machines that triggers the issue, please pass along the idea. win7 64b driver 11.4 2x5870s (oc'd but I eliminated that as the issue at dnetc) boinc 10.58 i7, 950 (not oc'd) ID: 148 · Rating: 0 · rate: /

[AF>Amis des Lapins] Nabz37 Send message Joined: 7 May 11 Posts: 1 Credit: 864,812,102 RAC: 0	Message 149 - Posted: 7 May 2011, 13:41:30 UTC Hi, I just tested the project on my two ATI tri-GPU rigs (3x5870 and 5970+5870) both under Win7-64 and 11-3 driver I got no computation errors, but completion times are quite awfull, between 5500 and 5900s What's wrong ? ID: 149 · Rating: 0 · rate: /

zombie67 [MM] Send message Joined: 2 May 11 Posts: 47 Credit: 319,540,306 RAC: 0	Message 150 - Posted: 7 May 2011, 13:54:12 UTC Yep. All tasks on my dual 5870 machine take ~1-2 hours. On my single 5870, they take ~15 minutes. Reno, NV Team SETI.USA ID: 150 · Rating: 0 · rate: /

frankhagen Send message Joined: 2 May 11 Posts: 27 Credit: 1,151,788 RAC: 0	Message 151 - Posted: 7 May 2011, 14:43:39 UTC - in response to Message 150. Yep. All tasks on my dual 5870 machine take ~1-2 hours. On my single 5870, they take ~15 minutes. really strange! but DA's bright new credit-system has decided to compensate for that and is granting a freaky load of credits for those WU's now.. ID: 151 · Rating: 0 · rate: /