Posts by NeoMetal*

1) Message boards : News : Happy Holidays! (Message 1904)
Posted 25 Dec 2011 by NeoMetal*
Post:
Thank you Teemu. You really do care. ;)

And here's to hoping that Santa's Christmas spirit lasts until the new year.
What a guy. :)

Merry Christmas everyone.
NeoMetal
2) Message boards : Number crunching : Faulty WUs (Message 1809)
Posted 18 Dec 2011 by NeoMetal*
Post:
Teemu, Teemu, are you trying to avoid this credit issue? Or are you on vacation (holiday). Reread this thread and you'll see everybody's talking about the lower credit to run time problem with these different WUs. And please don't just say "well these will hopefully pass soon". We are losing credit NOW. These worthless outside of Boinc cobblestone credits actually do mean a lot to some of us in the 'Boinc world'. It's so easy to change with a few keystrokes. We will all be very grateful despite no one else having the balls to ask you directly like this so far (most therapist say to attack a problem head on and not be passive aggressive and dismissive about it). Maybe even some compensation for all the bad WUs wear & tear on our computer hardware, not to mention the wasted electricity (money). Thank you for listening
3) Message boards : Number crunching : To All With Long Or Inconsistent Run Times (Message 1802)
Posted 18 Dec 2011 by NeoMetal*
Post:
I've used Prosses Lasso before but haven't in awhile, which Process do I raise the Priority of, the DNET Client or the Wrapper. I've raised the Wrapper for now to see what happens ...

I just set them both to above but the DNET client is the important one. And set the I/O priority to high as well for both.


I set the Client & Wrapper to Above Normal on all 4 Box's but it seems to have only have slightly increased the speed of the Wu's on 1 of the Box's ...

Try setting to high and keep I/O to high as well. Set Power Profile (for Moo files) to High Performance. Do you have 2 or more GPUs on any boxes? If so this may only help a little because of other issues that a lot of other ppl have with Moo. This isn't a cure all because everybodys setup if different. Another thing is I find that ATI/AMD drivers 11-6, 11-11c and 12-1 preview driver (I use currently) work best. Some older ones work good too. Never use 11-7 & 11-8 and avoid 11-9 & 11-10 even though they may work OK for some. Did you get better times with 1 core open? If not try that again with these settings. Could be that some may never get good consistent times with all cores running. May be just "one of those things".
Do you use Throttle? Setting max CPU % on it to 99%, 98%, 97% or even 95% can help, as well as reduce some crashes if you have those with any kind of regularity. This sets the CPU load % not the core % (like in Boinc settings) so it keeps the CPU from being peg max all the time and gives your computer some head room for load spikes (the cause of some crashes, especially OCed systems running Primegrid LLR WUs). Let me know how things go.

To Teemu, thanks for the clarification on the Dnet client priority scale and it's complex workings.

NeoMetal
4) Message boards : Number crunching : Faulty WUs (Message 1791)
Posted 17 Dec 2011 by NeoMetal*
Post:
Teemu, please give us our run time credit back for these old Dnet resends. You have the power with a few keystrokes to make it happen. 500 per WU is a lot to us. You can always change it back once the regular WUs are back (or not). We will love you for it.
5) Message boards : Number crunching : To All With Long Or Inconsistent Run Times (Message 1783)
Posted 17 Dec 2011 by NeoMetal*
Post:
I've used Prosses Lasso before but haven't in awhile, which Process do I raise the Priority of, the DNET Client or the Wrapper. I've raised the Wrapper for now to see what happens ...

I just set them both to above but the DNET client is the important one. And set the I/O priority to high as well for both.
6) Message boards : Number crunching : To All With Long Or Inconsistent Run Times (Message 1779)
Posted 17 Dec 2011 by NeoMetal*
Post:
Something else I should've mentioned. I came across a long thread somewhere else on the internet a while back and learned a lot about priority and coding. There are 2 different levels of priority. A base priority that is deep coded to not allow the other more higher surface level priorities to go below it. In other words it overrides the less stringent high level priority settings (such as task manager). The dnetc-1.00.ini config file only goes up to 9 which is still below normal on the priority # scale so it still gets over ridden by most other processes on your computer. To get full continuous access to the CPU you need to change the base priority to above the other CPU process (not just Boinc projects). Only system processes and a small few of other above normal (or high) processes can override the same base priority level setting if MOO is set to the same base level thus nearly continuous access to the CPU. There's also the I/O priority which raising will help when different processes have the same base priority. I hope I remembered all this right. DO NOT set Boinc CPU project themselves above normal or your system will become unresponsive and lock up. Even at normal your computer will be very sluggish. This is because CPU only projects use 100% (or near) CPU resources. Setting GPU apps to that higher setting works because they use so little percentage of CPU resources for it's needs but does prioritizes over CPU only project processes. That's what Process lasso will do (as well as Priority 1.2 app) change the deep low level base priority. So give it a try if nothing else works (or try it first) before going nuts over long run times.
7) Message boards : Number crunching : To All With Long Or Inconsistent Run Times (Message 1778)
Posted 17 Dec 2011 by NeoMetal*
Post:
I read so many posts about long or inconsistent GPU run times along with other problems when crunching on all CPU cores at the same time as GPU crunching. When you have an open CPU core just to get faster times it's usually a CPU or I/O priority thing. CPU WUs from other projects tend to hog resources and CPU time. There are different ways to help correct this or at least improve it but the best and easiest so far is a app called Process Lasso. Get it here http://bitsum.com/prolasso.php. I used to use an app called Priority from http://www.efmer.eu/boinc/boinc_tasks/index.html (which also has Throttle and BoincTasks that are must haves if using Boinc. Priority is on the Throttle download page. It used to be part of Throttle) which works fine itself but Process Lasso does SO MUCH MORE including set I/O priorities and affinity on any process on your computer. You should be able to run all CPU cores at the same time now as well. It would take many paragraphs to tell you what it can do so just go to the web site and see for your self.

I used to get inconsistent times before learning that other CPU tasks weren't allowing continuous access to the CPU (due to low priority) but ever since I started using a base priority app, setting the GPU process apps (including Milky Way, Collatz, Primegrid, etc.) to above normal (some may need to set theirs to high), all times have been consistent for weeks (for same # stat and size WUs usually within seconds). As far as 2 or more GPUs on the same system this will help too but the other bugs involved may still keep you from your best times. I even ran a 5830 and a 4850 together with improvement a while back.

So give this a try first before doing drastic changes to settings or other things. It MIGHT save you some hassle. Report back to this thread (if) any improvements, so others can see how well it works (or doesn't) for you. Hope this helps a lot of people.

NeoMetal
8) Message boards : Number crunching : Faulty WUs (Message 1767)
Posted 16 Dec 2011 by NeoMetal*
Post:
Seems the Wu's are taking a longer time to finish though ...


Since there's more packets in work units due to fragmentation then that means more context switches for the GPU when D.net Client moves through the packets. This can slow down the performance for obvious reasons and I wouldn't be surprised to see higher CPU usage as well. Just like having the client interrupted all the time can increase CPU usage due to increased GPU traffic.

I'm not sure I can do anything about this other than hope we blow through these old fragmented areas quickly so we can get to fresh blocks. Block fetching for the work generation is entirely handled by non-opensource upstream code so I'm kinda at mercy of that code. :(

That said, I'll try to look if there's more going on to affect our RACs as I find that concerning. Could be it's just an effect of not catching this problem sooner. It has been going on since at least from 10th of December. :(

-w

Since these WUs are taking 10%-15% longer to complete how about raising the credit 10%-15% to compensate then go back to old credit when they are finished (or not). I know some of the other WUs are mixed in but those would just be bonuses during this time. A simple short term fix I would think.

NeoMetal
9) Message boards : Number crunching : Faulty WUs (Message 1734)
Posted 14 Dec 2011 by NeoMetal*
Post:

EDIT:: (I was looking at my screen after closing a web session and caught a MOO work unit get a computation error.
"Output File dnetc_r72_1323592031_308_768_4_0 for task dnetc_r72_1323592031_308_768_4 Exceeds Size Limit
FileSize: 54240.000000 Bytes. Limit: 51200.000000 Bytes".
So this could be the problem, it shows as a Failed Upload and File Transfer Error when reported)

Conan.


Another thing is a packet limit error. Here's the end of one of my WU stderrs:


[Dec 14 08:03:11 UTC] RC5-72: 1 packet (2.00 stats units) remains in in.r72
Projected ideal time to completion: 0.00:00:04.00
[Dec 14 08:03:11 UTC] RC5-72: 387 packets (764.00 stats units) are in
out.r72
[Dec 14 08:03:20 UTC] RC5-72: Completed CA:5BBA5EB5:00000000 (3.00 stats units)
0.00:00:08.45 - [1,523,938,721 keys/s]
[Dec 14 08:03:20 UTC] RC5-72: Loaded CA:5BBA5ECC:00000000:2*2^32
[Dec 14 08:03:20 UTC] RC5-72: Summary: 388 packets (767.00 stats units)
0.00:35:59.60 - [1,525.39 Mkeys/s]
[Dec 14 08:03:20 UTC] RC5-72: 0 packets remain in in.r72
[Dec 14 08:03:20 UTC] RC5-72: 388 packets (767.00 stats units) are in
out.r72
[Dec 14 08:03:27 UTC] RC5-72: Completed CA:5BBA5ECC:00000000 (2.00 stats units)
0.00:00:05.66 - [1,516,852,305 keys/s]
[Dec 14 08:03:27 UTC] Shutdown - packet limit exceeded.
[Dec 14 08:03:27 UTC] RC5-72: Summary: 389 packets (769.00 stats units)
0.00:36:05.26 - [1,525.37 Mkeys/s]
[Dec 14 08:03:27 UTC] RC5-72: 0 packets remain in in.r72
[Dec 14 08:03:27 UTC] RC5-72: 389 packets (769.00 stats units) are in
out.r72
[Dec 14 08:03:27 UTC] *Break* Shutting down...
[Dec 14 08:03:27 UTC] Shutdown complete.
00:03:27 (1444): called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>dnetc_r72_1323738425_389_769_1_0</file_name>
<error_code>-131</error_code>
</file_xfer_error>


It happens at varying lengths so it must have something to do with the file size as well as it looks like its adding extra computation results to the out file from ghost packets.
I've been lucky so far. Only 3 bad WUs in last 2 days (all on 1 box). Another thing also is the in and out files in the slot folders are always garbled (or encrypted?) chars but the checkpoint file is that way only sometimes while clear text on others. This is on 3 different boxes, 1 with 4850 and 2 with 5830s. Any thoughts?

NeoMetal
10) Message boards : News : Read-only replica DB deployed (Message 1564)
Posted 1 Dec 2011 by NeoMetal*
Post:
Seems that Teemu forgot to reenable the cron task that sends the XML updates to the stat sites after he changed things.

Last updates at BoincStats:

Last update user XML 2011-11-30 11:30:07 GMT
Last update host XML 2011-11-30 11:30:23 GMT
Last update team XML 2011-11-30 11:30:23 GMT





 
Copyright © 2011-2019 Moo! Wrapper Project