Message boards :
Number crunching :
App_info..xml
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 6 Dec 11 Posts: 60 Credit: 306,719,331 RAC: 0 |
We need an updated app_info for the 1.3 application. Mine's not working: <app_info> <app> <name>dnetc</name> <user_friendly_name>Distributed.net Client</user_friendly_name> </app> <file_info> <name>dnetc_wrapper_1.3_windows_intelx86__ati14.exe</name> <executable/> </file_info> <file_info> <name>dnetc518-win32-x86-stream.exe</name> <executable/> </file_info> <file_info> <name>dnetc-gpu-1.3.ini</name> </file_info> <file_info> <name>job-ati14-1.00.xml</name> </file_info> <app_version> <app_name>dnetc</app_name> <version_num>130</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.500000</avg_ncpus> <max_ncpus>0.895864</max_ncpus> <plan_class>ati14</plan_class> <flops>1157115231469.729200</flops> <api_version>7.0.0</api_version> <file_ref> <file_name>dnetc_wrapper_1.3_windows_intelx86__ati14.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>dnetc518-win32-x86-stream.exe</file_name> <copy_file/> </file_ref> <file_ref> <file_name>dnetc-gpu-1.3.ini</file_name> <open_name>dnetc.ini</open_name> <copy_file/> </file_ref> <file_ref> <file_name>job-ati14-1.00.xml</file_name> <open_name>job.xml</open_name> <copy_file/> </file_ref> <coproc> <type>ATI</type> <count>0.5</count> </coproc> <gpu_ram>262144000.000000</gpu_ram> </app_version> </app_info> |
Send message Joined: 11 May 11 Posts: 44 Credit: 291,412,341 RAC: 0 |
We need an updated app_info for the 1.3 application. Agree Someone please XML is above me I was left behind when BASIC faded out with DOS |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
The application number is shown inside the Stderr for completed WUs as: Distributed.net Client v1.03 (ati14) Try changing: <version_num>130</version_num> To: <version_num>103</version_num> (dnetc_wrapper_1.3_windows_intelx86__ati14.exe is correct, dont change that) Regards Zy |
Send message Joined: 11 May 11 Posts: 44 Credit: 291,412,341 RAC: 0 |
Thanx Zydor! That did the thing I testing it on a XP box with a single HD5950 With <count>0.5</count> it's running 2 WUs using 99% GPU power but HTTP error when Moo request new tasks now the funny part: When I stop boinc mgr & change it 1.0 count, restart boinc then it reports the completed tasks without an error & get new work. I stopped/start changing the setting twice today to get new work All the WUs passed without an error Run times / WU is now between 58 and 65 minutes when it is running 2 together giving me about the same time of 29 - 32 min / WU with a single WU running I used to get that before the bad WUs started early in Dec. One change I plan to do in the appinfo is to change the max cpu back to 1 cause I'm running CAL 11.4 that does not hog the cpu |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
With Ah, right! Our scheduler probably doesn't like a count with fractions. I'll see if I can fix that so that you can use such interesting setup. This might be reason for some of the crashes I've seen on our schduler. :) -w |
Send message Joined: 11 May 11 Posts: 44 Credit: 291,412,341 RAC: 0 |
I'll see if I can fix that so that you can use such interesting setup. Tks Teemu In the meantime I have "no new tasks " on while running 2 WUs together |
Send message Joined: 6 Dec 11 Posts: 60 Credit: 306,719,331 RAC: 0 |
Thanx Zydor! That's the shit! I lost two days worth of work when I completely removed the app info file from the folder and did an update. Good to know. |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
I'll see if I can fix that so that you can use such interesting setup. Okay, scheduler crashes for non-integral card counts should be fixed now. Do let me know if you are still seeing strangeness. -w |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,407,912 RAC: 3,236 |
Thanx Zydor! Sorry for your loss but it just means the rest of us have more units to crunch! |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
Sorry for your loss but it just means the rest of us have more units to crunch! rofl :) The main event is assured, its only the manner of your death thats in doubt :) Regards Zy |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
Suspending the test of this until I see what 1.03 can do, no point going ahead if 1.03 solves the ills, albeit the change log doesnt seem to get to these issues (not surprisingly). V1.3 is solid on the PC so I restarted the test with twin 5970s. Its worth doing with 5970s. The mileage you'll get - as always with infinite variation on individual setups - will vary, probably considerably given two gpus on one card. Overall it looks like around 30 secs to a min per WU, hard to be specific, some were greater savings, the fragmented nature of the WUs means its hard to estimate without any kind of reliable pole to revolve around. Other cards will likely see a better increase as they will not have to fight the VRM heat problem. But, yup, its worth doing with 5970s. Use the app_info above in this thread (dont forget to change version statement from 130 to 103). Card temperatures will definitely rise, so if the card(s) were tuned to produce the maximum, already with single WUs, expect to have to reduce GPU clocks by circa 30, this is to keep the VRMs under control as they are now doing - crudely - twice the work. As always you are limited by the 5970 VRM design fault, so watch the second (or fourth if a twin 5970) like a hawk with GPU-Z. You'll find the max temps will occur around 60-80% done, so keep an eye on the percent done and run GPU-Z around those times - probably best to keep it open until WU finishes until you are sure you have got the new VRM levels nailed. So ..... good to go .... just watch the VRM temps like a hawk for the first half dozen runs until you are sure you have VRM temps under control. Suggest you set cache for 0.1 until you are sure all is well and you want to continue with it longer term to save trashed WUs. Regards Zy |
Send message Joined: 11 May 11 Posts: 44 Credit: 291,412,341 RAC: 0 |
The modifications Teemu made stopped the HTTP error with 0.5 for 2 tasks OCd to 840 Mhz core the GPUs are running cool below 70deg @ 99% usage with memory clock dropped to 700 Mhz The graphs showed a nice upswing in performance on a single 6950 in one box & a 5850 in another running 2 WUs together. But now Moo sees it as slower cards due to the increased run-time & start sending me "tiny" 190s WUs. Result - performance lower than before :( Back to square one |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
Keep it there for a while - the mechanism used for sizing WUs is adaptive. It will (or should ...) end up detecting the card capability, and be back to previous sizes, just takes time to adapt. Additional Note to my Post on 5970s: 5970 Cards should stick to one per GPU - two per is usually too much for it. Might get away with two per on single 5970s, but those running twin 5970s will grind to a halt with 8 WUs running. One per GPU on single 5970 cards is fine and alleviates nearly all the problem. Regards Zy |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
But now Moo sees it as slower cards due to the increased run-time & start sending me "tiny" 190s WUs. That's not "tiny", it's the normal WU. The tiny work is now reserved for CPUs and GPUs are using small, normal or huge tasks. :) I'm thinking/planning to switch to using peak flops to determine what sized work to send to GPUs. That should be more stable, especially when deploying new app versions (that reset the stats and it'll take some time to measure things again) but I need to first gather some stats about work and GPU device spread to optimize this determination. I might also tweak the actual size of these work as I aim for one task to take around 1-2h to crunch. A project pref to set the size preference is not out of the question either. Except that pref probably needs to be host specific, somehow.. -w |
Send message Joined: 6 Dec 11 Posts: 60 Credit: 306,719,331 RAC: 0 |
But now Moo sees it as slower cards due to the increased run-time & start sending me "tiny" 190s WUs. That's not "tiny", it's the normal WU. The tiny work is now reserved for CPUs and GPUs are using small, normal or huge tasks. :) Teemu, Can I volunteer to help get some standard run times for each size WU for various GPUs? There is so much talk about what size WU is a person running, how do I know what size, my times got longer. When someone starts working with moowrap they get smaller WU's and then the system starts sending them larger WU's as they finish those. If your new to moo/dnet folks might not know this and to them it would look like their run times are increasing. |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,407,912 RAC: 3,236 |
But now Moo sees it as slower cards due to the increased run-time & start sending me "tiny" 190s WUs. So the 'regular' units are the 192, 193, 194, 195 and 204 versions? I am doing them in about 15 minutes on a 5770. |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
So the 'regular' units are the 192, 193, 194, 195 and 204 versions? I am doing them in about 15 minutes on a 5770. Normal units are those with at least 192 as the second number from end of the result name. Huge ones are for at least 768 in that same place. Small ones are at least 32. Actual numbers for these vary slightly due to fragmentation and the way work generator works. Tiny CPU work contains 9 stat units, and usually that's exact. -w |
Send message Joined: 18 May 11 Posts: 46 Credit: 1,254,302,893 RAC: 0 |
The application number is shown inside the Stderr for completed WUs as: Just to make things clearer, the first app_info.xml listed in this thread is all messed up, the second one much better. I modified things a bit and this is what I'm using (no api_version or gpu_ram statements, ncpu statements changed): <app_info> <app> <name>dnetc</name> <user_friendly_name>Distributed.net Client</user_friendly_name> </app> <file_info> <name>dnetc_wrapper_1.3_windows_intelx86__ati14.exe</name> <executable/> </file_info> <file_info> <name>dnetc518-win32-x86-stream.exe</name> <executable/> </file_info> <file_info> <name>dnetc-gpu-1.3.ini</name> </file_info> <file_info> <name>job-ati14-1.00.xml</name> </file_info> <app_version> <app_name>dnetc</app_name> <version_num>103</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.15</avg_ncpus> <max_ncpus>1.0</max_ncpus> <plan_class>ati14</plan_class> <flops>1157115231469.729200</flops> <file_ref> <file_name>dnetc_wrapper_1.3_windows_intelx86__ati14.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>dnetc518-win32-x86-stream.exe</file_name> <copy_file/> </file_ref> <file_ref> <file_name>dnetc-gpu-1.3.ini</file_name> <open_name>dnetc.ini</open_name> <copy_file/> </file_ref> <file_ref> <file_name>job-ati14-1.00.xml</file_name> <open_name>job.xml</open_name> <copy_file/> </file_ref> <coproc> <type>ATI</type> <count>0.5</count> </coproc> </app_version> </app_info> Like you say, about a 10% overall increase in production, GPU usage running at 97%+ and increased temps (3C-5C). |
Send message Joined: 11 May 11 Posts: 44 Credit: 291,412,341 RAC: 0 |
After 6 days I'm still receiving normal (192s) on the 6950 & 6850. Could it be something in the app-info file causing this. They take about 15 min each average where a normal non fragmented task takes about 8 to 10 minutes each when running 2 tasks on a single GPU. I noticed some very bad ones running up to 38 min With the large tasks (768s) I get much better performance It will be appreciated if you can make this option available when you get a break. Thanks again for the support we all received in the past |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
After 6 days I'm still receiving normal (192s) on the 6950 & 6850. It could be the flops setting in the app-info file that affects this for anonymous platform users. However, scheduler will still measure the actual speed and adjust things if the flops provided to it from your BOINC Client/app-info is strange so it could be hard to convince it otherwise even with tweaking that value. I've witnessed that the normal units are been given excessively to user so it seems scheduler is preferring them and this doesn't seem to change even after stats have been gathered. I do plan to look into fixing this soonish as it would be better to get huge ones out more since that would also help with server/traffic load. -w |