Message boards :
Number crunching :
2 WU on cuda cards?
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Aug 11 Posts: 7 Credit: 20,109,913 RAC: 0 |
Is there a way to run 2 wu at once on cuda card? I have modify dnetc-1.00.ini (add max-threads=2) but dnetc518-win32-x86-cuda31.exe crashed after few minutes. Is it possible to make app_info.xml with <coproc> <type>CUDA</type> <count>0.50</count> </coproc> Thanks in advance. |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,407,912 RAC: 3,717 |
Is there a way to run 2 wu at once on cuda card? No, Moo and Dnetc are designed to use all the gpu power on one unit. |
Send message Joined: 14 Aug 11 Posts: 7 Credit: 20,109,913 RAC: 0 |
Anyway I would like to see useful app_info only for testing. |
Send message Joined: 14 Aug 11 Posts: 7 Credit: 20,109,913 RAC: 0 |
Does anybody know how to make app_info for Moo? |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
Hi, Sure, I can make you one. Is there a particular reason you need one? You can limit Moo to use only first card found if you want to run something else for the other one but running two at the same time, like mikey said, is unfortunately not possible. -w |
Send message Joined: 14 Aug 11 Posts: 7 Credit: 20,109,913 RAC: 0 |
Why is not possible? Hardware or software limitation? I have compare gtx465 and gtx470. Same speed of gpu, memory and shaders but gtx470 has got 25% more ROP-s and 27% more shaders. Time for crunching 1wu in not 25% faster on gtx470. If Moo use all gpu power why I can not get 25% gain? I am interesting to try crunching 2wu at the same time to see is there any speed improvement. Sorry for my English, it is not my native language. Best regards. |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
Why is not possible? Hardware or software limitation? We use D.net Client that detects cards by itself and uses all found by default. It has a software limitation where it can't be told to only use a specific card. There's a workaround that allows limiting number of cards it uses (only in order it sees them). -w |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
Hi, There's an example 32-bit Windows ATI cruncher app_info.xml available at http://moowrap.net/download/app_info-win32-ati14-example.xml. It might be usable as is except ATI card Also -w |
Send message Joined: 14 Aug 11 Posts: 7 Credit: 20,109,913 RAC: 0 |
Thanks foe trying but it does not working. Error after 17s of crunching. Also it is missing cudart32_31_9.dll. I have try with <file_info> <name>cudart32_31_9.dll</name> </file_info> After all dnetc518-win32-x86-cuda31.exe does not start, only dnetc_1.02_windows_intelx86__cuda31.exe. Here is app_info: <app_info> <app> <name>dnetc</name> <user_friendly_name>Distributed.net Client</user_friendly_name> </app> <file_info> <name>dnetc_1.02_windows_intelx86__cuda31.exe</name> <executable/> </file_info> <file_info> <name>dnetc518-win32-x86-cuda31.exe</name> <executable/> </file_info> <file_info> <name>cudart32_31_9.dll</name> </file_info> <file_info> <name>dnetc-1.00.ini</name> </file_info> <file_info> <name>job-cuda31-1.00.xml</name> </file_info> <app_version> <app_name>dnetc</app_name> <version_num>102</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.050000</avg_ncpus> <max_ncpus>0.895864</max_ncpus> <plan_class>cuda31</plan_class> <api_version>6.13.0</api_version> <file_ref> <file_name>dnetc_1.02_windows_intelx86__cuda31.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>dnetc518-win32-x86-cuda31.exe</file_name> <copy_file/> </file_ref> <file_ref> <file_name>dnetc-1.00.ini</file_name> <open_name>dnetc.ini</open_name> <copy_file/> </file_ref> <file_ref> <file_name>job-cuda31-1.00.xml</file_name> <open_name>job.xml</open_name> <copy_file/> </file_ref> <coproc> <type>CUDA</type> <count>1.000000</count> </coproc> <gpu_ram>262144000.000000</gpu_ram> </app_version> </app_info> |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
Hi, Oops, for some reason I thought you wanted one for ATI but clearly you talked about CUDA cards there. Sorry about that, but looks like you got it working after all. :) We use 0.20 for I did have one example for CUDAs up at http://moowrap.net/download/app_info-example.xml, which has the old -w |
Send message Joined: 14 Aug 11 Posts: 7 Credit: 20,109,913 RAC: 0 |
Thanks a lot. It works after a little modification. 4 short wu at once: [/img] |
Send message Joined: 30 Sep 11 Posts: 3 Credit: 43,358,783 RAC: 0 |
Hi, Hi :-). I'd like to have 2 WU on my ATI card. To do that, I've tried this app_info for my ATI and worked fine. After that, I've tried to modified the <coproc> from 1 to 0.5 in this mode: <coproc> <type>ATI</type> <count>0.500000</count> </coproc> but does not work :-(. I receive an "HTTP Error" from the server when I do update. Please can you help me? Bye. |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
Hi, I don't think scheduler expects people to try to use half ATI card and might actually crash. I'll have to take a look so that it won't at least crash.. Why are you trying to use two tasks on one card? I highly doubt it'll be twice as fast since both of them will use the same card (at full speed) and D.Net Client might even get so tangled that it can't finish correctly. -w |
Send message Joined: 30 Sep 11 Posts: 3 Credit: 43,358,783 RAC: 0 |
Hi, Hi :-). I'm trying to run two WU on single ATI card because the card passes often from 88% to 98%. With two WU running on a card would always be 100%. I use this system on other projects and improved performance is about 5% - 10%, so I would be happy if I could use it for Moo :-). Thank you. |
Send message Joined: 1 Jan 12 Posts: 13 Credit: 21,324,276 RAC: 0 |
I want in on that also, Moo works fine, but poem uses 25-35% of each gpu. I should be able to double, or maybe tripple up and see huge gains. |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,407,912 RAC: 3,717 |
I want in on that also, Moo works fine, but poem uses 25-35% of each gpu. I should be able to double, or maybe tripple up and see huge gains. As Teemu said it sounds good but doesn't really work in practice. The problems is the way gpu memory works and how much is on each card and then how the project app uses that gpu memory. Unfortunately the gpu memory is not suited for 'side by side' calculations, meaning one thing at a time, ie NO multitasking! So running one unit or 10 units at once means each unit beyond the first must wait for the memory to be released and therefore is not faster. This releasing and then capturing and the releasing and then capturing again by multiple units is slower than just one units running thru. Moo uses ALL the gpu's in your system to do its thing, it is just designed that way. For example on an AMD 5770 it usually takes about an hour to finish one workunit, putting two 5770's in one box cuts that time in half to about 30 minutes, which is about as long as an AMD 5870 takes! You are talking about going the OTHER WAY here! |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
... I should be able to double, or maybe tripple up and see huge gains ... Not happening. GPUs are only designed for the execution of one program at a time, it cannot multitask like a multi-thread CPU. That will change with AMDs new architecture which will give that ability, however, on anything less than 7XXX, it will not. What happens is .... it will load 2... 3 WUs, whatever you give it (it will choke on four), and then divide time equally between the loaded WUs, net result is no gain, for (say) two WUs it takes twice as long to crunch, with the net result of the same time per WU. With very small WUs (aka milkyway), you can get a gain of about 2-3% by loading two per GPU as you save time in the load/unload/windup of each WU. It can be hassle as well, so horses for courses as to whether or not its worth it even at MW. .... but here .... not a chance, dont waste time trying, it will not work the way you anticipate. Regards Zy |
Send message Joined: 30 Sep 11 Posts: 3 Credit: 43,358,783 RAC: 0 |
... I should be able to double, or maybe tripple up and see huge gains ... Hi. I use this system on another project (two WU running on an Nvidia) and I have a good increase (I recently bought an AMD and I would try on Moo). Did You say that AMD work is different then or you were talking in general about all GPU? Bye and thank you :-) |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
GPUs can only run one thread, thats all thats physically designed for it, so they can only do one task at a time. If two are present, it shares time between both WUs, it physically cant do anything else. I used to run NVidia in 9800GTX+ days and going back to when they first started, didnt work then either - I moved to AMD after the final Firmi farce - it maybe something in current NVidia architecture that gives some room, but it will not be crunching, it can only crunch one at a time. The other major factor is that as pointed out above Moo is a multi-thread GPU app designed to use up all GPU space as it becomes available. There is an issue with part of that at present as Upstream are feeding us fragmented WUs, and the usual allocation tailored to card type is hard-impossibe until "clean" WUs start again. Thats why Teemu gave an extra 20% as a temporary measure whilst we battled through the fragged ones. That special case may give some temporary space for gain ... dont know until you try I guess, but you will need to use a special app_info to prevent the Moo WU grabbing all GPUs as its designed to do. Bare in mind if this is going for NVidia cards, that NVidia works sloooow here .... not the best of ideas to run them at Moo, unless there is a particular personal reason. Regards Zy |
Send message Joined: 1 Jan 12 Posts: 13 Credit: 21,324,276 RAC: 0 |
I don't intend to doubleup on anything that uses 90%+ of the gpu, I was refering to poem @ home that is running between 25% and 35% of each GPU. I am more looking for the low hanging fruit here then to squeeze every last bit out of the gpus. I have 2 ATI 6990's, and my eyes are rolling back in my head trying to find out how to do this (if it doesn't work, so be it). Can anuone point me in the right direction? |