Message boards :
Number crunching :
App_info..xml
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 Oct 11 Posts: 238 Credit: 386,587,798 RAC: 11,603 |
Can anyone post a simple App_info.xml file that will allow an ATI card to run two workunits simultaneously? I just used one at Milkyway with a 12% increase in throughput :-)) Very many thanks. I iz also got icons! |
Send message Joined: 27 Jul 11 Posts: 342 Credit: 252,653,488 RAC: 0 |
Not sure of my ground here, but Moo!, like DNETC, only allows 1 WU to crunch at a time using all available GPUs simultaneously. Is that correct Zydor? |
Send message Joined: 2 Oct 11 Posts: 238 Credit: 386,587,798 RAC: 11,603 |
Hi John, I'm fairly certain you are correct but I thought I'd ask and get it confirmed one way or the other anyway. |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
Not sure of my ground here, but Moo!, like DNETC, only allows 1 WU to crunch at a time using all available GPUs simultaneously. Yup - thats by design as the core DNETC app is mult-threaded. Not much point trying to negate the benefits of muti-threading by doing it yourself with multiple apps. Ends up a heath robinson affair for little gain. Sometimes it will work forcing (say) one app per gpu and you get an increase .... well, fine ... but in the long run, with multi-threaded apps, leave it to the application - in the long run its much better at it than you or I :) Regards Zy |
Send message Joined: 25 Oct 11 Posts: 5 Credit: 5,329,105 RAC: 0 |
This may not be the right place to ask this but, I seem to be having a bit of a problem with the tasks not running. I have 4 GTX295s and have run them with no problem before but now they will not start. and I get this message in BM; 1/15/2012 11:49:11 AM | Moo! Wrapper | Server can't open database I am running POEM at the same time and it uses 5 of my 6 cpu cores. I usually try to hold back 1 of my 295 cores for general computing to keep the video from lagging but, when I run MOO I let it use all of them. I am using BOINC version 7.0.3. What you do today you will have to live with tonight |
Send message Joined: 25 Oct 11 Posts: 5 Credit: 5,329,105 RAC: 0 |
This may not be the right place to ask this but, I am still pulling new tasks for the GPU but none of them are starting. I have suspended all tasks until this can be worked out. What you do today you will have to live with tonight |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
Set your cache to zero, then do a project detatch/reattatch (not reset - must be attatch/reattatch) which will download a fresh set of Project files. See how the first ones run, if ok, set your cache to your normal value, and you should be fine. Downloaded WUs seem to be running fine, and the server is up and solid. It can happen there are database issues on the server, but more likely something has gone wrong with your Project files, the attatch / reattatch should resolve that. If it doesnt post again. Regards Zy |
Send message Joined: 1 Jan 12 Posts: 13 Credit: 21,324,276 RAC: 0 |
App_info: IThe only way I could get moo to run on boinc 7.x was an appinfo file. I have 2 6990's, (total of 4 gpu's). The problem I am running into is if I set the "<count>1.0</count>" to anything larger then 1, the units won't start. So I am stuck with 4 running at a time. On the plus side the gpu's are bouncing between 97 & 99% the ENTIRE time, no dropoffs. If I pause all the moo units, and let what is running finish, I can run one at a time, with many large drops below 70%, So I can see where running 2 at a time would help. If you are going to run more then one, stagger their start times by 5 minutes or so, then one runningunit will cover the low utilization of gpu's at the end of another unit - especially if you have more then one gpu. I would use (total # of gpus)/(<count>1.0</count>)= running tasks. (use the parts between the ********* but not the *******) ************ <app_info> <app> <name>dnetc</name> <user_friendly_name>Distributed.net Client</user_friendly_name> </app> <file_info> <name>dnetc_1.02_windows_intelx86__ati14.exe</name> <executable/> </file_info> <file_info> <name>dnetc518-win32-x86-stream.exe</name> <executable/> </file_info> <file_info> <name>dnetc-1.00.ini</name> </file_info> <file_info> <name>job-ati14-1.00.xml</name> </file_info> <app_version> <app_name>dnetc</app_name> <version_num>102</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.250</avg_ncpus> <max_ncpus>1.0</max_ncpus> <plan_class>ati14</plan_class> <flops>1157115231469.729200</flops> <api_version>6.13.0</api_version> <file_ref> <file_name>dnetc_1.02_windows_intelx86__ati14.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>dnetc518-win32-x86-stream.exe</file_name> <copy_file/> </file_ref> <file_ref> <file_name>dnetc-1.00.ini</file_name> <open_name>dnetc.ini</open_name> <copy_file/> </file_ref> <file_ref> <file_name>job-ati14-1.00.xml</file_name> <open_name>job.xml</open_name> <copy_file/> </file_ref> <coproc> <type>ATI</type> <count>1.0</count> </coproc> </app_version> </app_info> ****************** |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
Interesting result :) I suspect where the improvement is coming from is the last stage of a WU where its at around 99.7% complete, then takes time to collate the result - on mine that takes 2 mins or so. If I pause all the moo units, and let what is running finish, I can run one at a time, with many large drops below 70%, So I can see where running 2 at a time would help The large drops below 70% are as a result of the Stat Unit Fragmentation, the standard Moo WU with 12 Stat Unit Groups runs at around 98-99%. That changed when the fragmented units started coming through, its still the same multi thread, multi-gpu application, but it struggles with fragmented files. In the normal course of events there is very minor effect using an app_info as the Moo WU has very high utilisation. Its looking like an app_info is taking up the slack caused by fragmentation. Your result of 97-99% is back to what the Moo app always provided on multiple gpu's. So step slowly putting the next bit into play .... If you want more than one WU per GPU change the lines .... <type>ATI</type> <count>1.0</count> to read ... <type>ATI</type> <count>0.5</count> and that should startup 2 per GPU. Watch your heat though, Moo is normaly above average in heat output, albeit in acceptable levels. Its looking like the app_info with one per gpu is getting round the fragmentation effect, and returns to normal behaviour of 98-99% utilisation, so its *possible* two per gpu is going to make the gpu work real hard for little gain and much increased heat. Will only know by trying two per gpu, but watch temps, would hate to see 6990s burn out ... I'll run down my cache, and give it whirl on one per gpu, see how goes, I have only got a 2.5hr cache set as I was running it down getting ready for the new Moo App, so I should get a result by midday at the latest with one per gpu. Regards Zy |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,407,912 RAC: 3,236 |
If you want more than one WU per GPU change the lines .... I think this tells the gpu to only use 1/2 the gpu memory for each unit, not the way you were thinking Wizzo of putting a 2.0 in there and running more than one unit. Zy can you put even smaller numbers in there? Such as 0.25 or even 0.125? The Moo on one of my 5870's is taking about 24.5K for the two files to run while crunching. I have 1gb cards and I even have a set of two 5770 1gb cards in one machine, it is taking about 33k or so for one process, times 2, and then 1k for another process. That is LOTS of memory left over!! |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
Usually that switch in the app_info controls the number of gpus running. Normally .... its used like this .... 0.5 would equal 2 per gpu 0.33 would equal 3 per gpu 0.25 would equal 4 per gpu Do the maths - you'll get the idea Usually no point whatsoever going beyond 2 ...... extreme circumstances maybe three, but very rare, above three is just silly. Usually 2 does the job. I also found it would not load two for some reason, threw http error back at me, maybe co-incidence, dont know. Anyway its running with one per gpu at present - about halfway through, so far so good. I'll post when done. Regards Zy |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
..... and the result of the Hampshire Jury is ..... a rematch :) Total recorded cpu time was 8856, equates to 2214 for the batch of four, which makes it 36.9mins, or an average of 9.22 mins per WU. Thats pretty well what I am doing them at present with fragmented WUs. Usually I am at between 8mins 45 and 9 mins 20 fragmented (7mins 45secs unfragmented with 12 Stat Unit Groups) - mega fragmented and lots of stats units going to 10mins plus. So the figure to hang the hat off was an average of 9.22 mins, little change to now. All had slightly varying stats units, and differing fragmentation levels, so averaging is as good a way as any. I'm doing the next batch separated by 5 mins per WU, that will take time to settle down, so probably will not know the results of that for about an hour realistically. So far its no great increase, but lets see what the staggered start does. Meanwhile anyone wanting to try it, you will not lose anything by doing so. Keep your eye on temperatures, mine behaved ... but you never know, not run many like this so far. When the app_info is made it has to go inside the moo project directory, close down BAM completely stopping running apps, then reopen it and it should run. Make sure cache is run down as an app_info, will at startup, clear out all existing pre-app_info WUs without fear or favour and mark them as communication errors - its a bit pocessive rofl :) Back in an hour or so :) Regards Zy |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
Its looking like a runner ..... its very hard to quantify as there are so many factors. However I reckon overall its about a 10% gain as an overall average. Still doubtfull in some respects, however it does look ok .... so its worth trying for longer. I'll give it an extended go for another 2 or 3 hours and see how it goes. For those running twin 5970s, watch the fourth GPU like a hawk. It is running about 4 degrees or so above norm, in the end I downclocked the cards GPUs by 10Mhz to aim off for it. Its fine, but just be wary of the fourth GPU as always, the heat design bug is still lurking as an issue to bare in mind. Dont let that put you off, just keep an eye open for the first hour or so until you are happy its settled down. Regards Zy |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
Suspending the test of this until I see what 1.03 can do, no point going ahead if 1.03 solves the ills, albeit the change log doesnt seem to get to these issues (not surprisingly). Regards Zy |
Send message Joined: 2 Oct 11 Posts: 238 Credit: 386,587,798 RAC: 11,603 |
Thanks for the updates there Zy, it's all looking quite interesting .... I iz also got icons! |
Send message Joined: 1 Jan 12 Posts: 13 Credit: 21,324,276 RAC: 0 |
I was wanting to increase the "<count>1.0</count>" to 2.0 because at 1 @ 4gpu's I am running 4 units at once, If it would work at 2 @ 4gpu's I should be running 2 units. On Poem I run .2 to have 5 at a time per gpu, and only use one gpu per card, the other 2 gpu's either run collatz or milkyway. It seems the cound it the numer of gpu's each task uses (as far as boinc knows), so telling boinc moo uses 2 gpus - in a 4 gpu system it should start 2 units. |
Send message Joined: 25 Oct 11 Posts: 5 Credit: 5,329,105 RAC: 0 |
Set your cache to zero, then do a project detatch/reattatch (not reset - must be attatch/reattatch) which will download a fresh set of Project files. See how the first ones run, if ok, set your cache to your normal value, and you should be fine. Zydor, Thanks, I tried that. when I reattached I got both cpu and gpu tasks. The cpu tasks are running fine but the gpu tasks will not start. My gpus have been idle for over 10 hours now. I was wondering if I tried an appinfo if it would work for my nVidia cards? What you do today you will have to live with tonight |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
I was wanting to increase the "<count>1.0</count>" to 2.0 because at 1 @ 4gpu's I am running 4 units at once, If it would work at 2 @ 4gpu's I should be running 2 units. On Poem I run .2 to have 5 at a time per gpu, and only use one gpu per card, the other 2 gpu's either run collatz or milkyway. It seems the cound it the numer of gpu's each task uses (as far as boinc knows), so telling boinc moo uses 2 gpus - in a 4 gpu system it should start 2 units. The Moo app is designed to be multi threaded, and will use all gpus it finds - designed to do that. If you want to try one of more gpus from Moo per core, then the app_info switch is the mechanism as outlined above. It is not designed to mix and match WUs from different projects at the same time, if it does so, great all is well, if it doesnt, thats that, wasnt designed to do it the first place. Any other useage or perception or wish list will have unpredictable effect as its not designed to do it. Pays your money and takes your choice as they say :) Regards Zy |
Send message Joined: 2 May 11 Posts: 47 Credit: 319,540,306 RAC: 1 |
Zydor, Thanks, I tried that. when I reattached I got both cpu and gpu tasks. The cpu tasks are running fine but the gpu tasks will not start. My gpus have been idle for over 10 hours now. Same here. I tried detaching and reattaching. GPU tasks download, but refuse to run. Edit: If it matters, the tasks are 1.02. This is a win7 64 machine with ATI cards. Reno, NV Team SETI.USA |
Send message Joined: 1 Jan 12 Posts: 13 Credit: 21,324,276 RAC: 0 |
The only way I could get the units to run under 7.x was the app_info file. Take the example I put a few posts ago and try it. abort all work from the project first, copy the text and paste it to a text file, rename the text file app_info.xml and put it in the moo directory. restart boinc (sometimes a reboot is easier then task manager. If it works, great, if not delete it and reboot again. |