Message boards :
Number crunching :
Running 14 Packets, Issued 12 packets ???
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 May 11 Posts: 52 Credit: 254,961,757 RAC: 7,378 |
Why is it that a work unit initialises with 12 Packets but nearly always processes 14 Packets and gets the error that it has exceeded Packet Limit? Could this be the reason that some of my work units often run for 2,500 to 3,100 seconds when they should be running 850 to 1,400 seconds ? (I have one computer that whose RAC is dropping very fast because of the random very long running work units). Conan |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
Why is it that a work unit initialises with 12 Packets but nearly always processes 14 Packets Couldnt find one - can you point to one that has done that? Could this be the reason that some of my work units often run for 2,500 to 3,100 seconds... Can you point at one, couldnt find one .... and gets the error that it has exceeded Packet Limit.... All WUs will have that"error" printed in the stderr, no matter the length. Two examples: Crunched 11 packets, starts the 12th packet - completes the 12th packet, next line "exceeded packet limit ...". It has, because it crunched 11, started and finished the 12th, and, when it tried to find a 13th (because it just works on a continuous cycle), it compared that quest for a 13th to the total its supposed to find at the top of the file (12), decided correctly the 13th iteration exceeds the task set, and (correctly) exits. Now work through that logic with 14 packets. File sets the total its crunching at the top (14). As it cycles through them one by one, it decrements the total it has to crunch by one. Gets to the 13th, finishes it ok, starts a 14th (which is fine as total to crunch is 14), finishes it, decrements by one so zero left to do, cycles to the start, checks the total to crunch (14) decides (correctly) the 15th iteration exceeds its task of 14, and exits. The same logic can therefore be used for any Packet sized WU, irrespective of size of data to be crunched, which is great. Technically it is an "error" because always on the last iteration its deciding that the last one it tried to start exceeds the task set (the 13th and the 15th in the examples), hence will always get the "error". Its a useful thing to have in there however, because if due to a genuine error it tries to keep cycling through more, which now dont exist, when it shouldnt, then we want to know, hence the blanket error message ... and we want it to exit in all cases else it will keep polling in a circle forever and never finish. There are significantly varying times for the same size WUs on both PCs however. Are you running CPU WUs at the same time that takes up all the theoretical CPU allocation, leaving minimal for the GPU app? That will cause - every time - wildly varying GPU WU times. The extent of the variance will depend on how much CPU time is taken by the CPU app, and therefore the time left to service GPU requests. You can easily check the latter by suspending all CPU Projects and note subsequent GPU WU times for consistency - but whilst doing that just use a mild overclock on the GPU, else the latter may introduce another factor that will complicate deductions. Regards Zy |
Send message Joined: 2 May 11 Posts: 52 Credit: 254,961,757 RAC: 7,378 |
It would appear that ALL my work units behave this way. Process and load packets into r72 and says that "10 (or 12) packets loaded into r72" Then continues onto 11 (or 13) and then 12 (or 14), saying that packet limit has been exceeded after the first extra packet has been processed but still does another one anyway. So appears to be normal behaviour but still seems odd when only a set number of packets have been loaded at the start of the WU but extra packets are processed. As they weren't originally loaded are the extra ones actually processed? Yes I run CPU projects on all cores. Both Windows computers are running the same mix of projects which very often changes. At the moment they are running Moo!Wrapper, Einstein, DistributedDataMining, Ralph (when available), WUProp and Surveill. The last two are Non CPU intensive. Running 32 bit and only 3 to 4 GB available memory I think I am pushing the limits of my computers as memory is often a problem. Thanks for the reply Zy. Conan. |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
We need to put to bed the sequence in the stderr, else it will overhang future thoughts. I think a mis-interpretation has occured at what is happening in the stderr. The same thing threw me the first time I crunched at DNETC. I'll cut & paste two blocks from a validated WU stderr output and step through the lines, one block from the start, and the final block. Hopefully it will make it clearer .... and anyone else reading this, check me out here, in case I've got it wrong. {All bolding and colours are mine for clarity, my comments in italics for clarity, it is a 12 packet WU, I will use the incorrect term "inbox" and "outbox" in the same way as a mailbox} [Jul 27 16:29:02 UTC] Automatic processor detection found 2 processors. Two cards found, and each card loaded with one packet coloured red and blue [Jul 27 16:29:02 UTC] RC5-72: 10 packets (640.00 stats units) remain in 10 packets remain in the inbox, one other is loaded as A200, and another is loaded as A240, total 12 packets [Jul 27 16:29:02 UTC] 2 crunchers ('a' and 'b') have been started. Packet A200 has now completed and in the outbox, nine packets remain in the inbox [Jul 27 16:32:11 UTC] RC5-72 #b: Completed CF:6259A240:00000000 (64.00 stats units) A240 is completed and in the outbox along with A200. Packet A2C0 is loaded, A280 still crunching, 8 Packets left in the inbox. Total 12 Packets Now cycles through to the end, last few actions are: [Jul 27 16:46:15 UTC] RC5-72: 0 packets remain in in.r72 No packets in the inbox, 10 packets completed and in the outbox, two still crunching on cards - total 12 packets [Jul 27 16:46:31 UTC] RC5-72 #a: Completed CF:6259A480:00000000 (64.00 stats units) 11th packet (A480) completes, nil packets in inbox, 11 packets in outbox, one still crunching [Jul 27 16:49:41 UTC] RC5-72 #b: Completed CF:6259A4C0:00000000 (64.00 stats units) Last packet (12th) completes. The programme cycles back to the top for a 13th loop, checks and finds WU only needs 12 loops, and .... [Jul 27 16:49:41 UTC] Shutdown - packet limit exceeded. ..... exits as all loops needed are complete, and rcords an "error" as it started to look for a 13th [Jul 27 16:49:41 UTC] RC5-72: 12 packets (768.00 stats units) are in out.r72 12 Packets now in the outbox, programme break and shutdown, calls boinc_finish and BOINC Client picks up the completed WU It cycles to try and start an extra (as designed) discovers it doesnt need it, calls it an error, and closes. WU completed. That finishing process will always happen at each WU - it will try for another loop and tests if thats needed, discovers its not and breaks off. No matter the size of WU, or any infinite loop, the programme stops and exits correctly. If we can establish that bit, we can then move onto why you are getting varied results and try to find solutions. Regards Zy |
Send message Joined: 2 May 11 Posts: 52 Credit: 254,961,757 RAC: 7,378 |
Thanks Zy, that makes sense. I will work on the slow processing times on one machine in particular. Doing a defrag now as the poor hard drive was very fragmented and I will defrag the other machine for good measure even though it does not need it. Conan |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
Okie Doke. For the future if the dosh is available, I very strongly recommend Diskeeper. It is very very good, and widely used on Company & Corporate Networks (always a good sign if those guys trust a utility like this). You literally do not notice it is there at all, and you will never - ever - have the slightest fragmentation ever again, its brilliant. It defrags in the background taking up spare cpu cycles, its extremely well behaved breaking off every time another programme needs CPU Service. So much so that it will pick up BOINC requests to the CPU and backoff, and remember BOINC itself is designed to backoff, however diskeeper keeps out of the way of even that logic. Very clever stuff. I recommend getting the Pro Premier Version, you can try it on 30 day trial for free. It will do other things as well, worth reading the blurb and documentation. BOINC is very heavy on fragmentation, its always "at it", and Diskeeper is a neat solution to that permanent BOINC issue (that will never go away, its the nature of the beast, BOINC will always be the cause of heavy fragmentation - unless you are using Diskeeper[!]). Diskeeper Home Page Regards Zy |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
Hi, Yep, Zydor explained it correctly. Thanks for that! :) The message "Shutdown - packet limit exceeded." is indeed normal since our wrapper process gives client process an input buffer and asks it to crunch until there are none left in which time it should shutdown itself. We do this by setting a "packet limit" setting. Other important part was the start message and that indeed indicates packets that remain in the input buffer after crunchers have loaded work from it. So this remaining number plus number of crunchers (usually one per processor or GPU in this case) should equal to the count of packets in a work unit (recorded as the third number from the end of task name). -w |