Running 14 Packets, Issued 12 packets ???

\n studio-striking\n

Message boards : Number crunching : Running 14 Packets, Issued 12 packets ???
Message board moderation

To post messages, you must log in.

AuthorMessage
Conan
Avatar

Send message
Joined: 2 May 11
Posts: 53
Credit: 255,675,997
RAC: 5,860
Message 838 - Posted: 25 Jul 2011, 22:22:05 UTC

Why is it that a work unit initialises with 12 Packets but nearly always processes 14 Packets and gets the error that it has exceeded Packet Limit?

Could this be the reason that some of my work units often run for 2,500 to 3,100 seconds when they should be running 850 to 1,400 seconds ?
(I have one computer that whose RAC is dropping very fast because of the random very long running work units).

Conan
ID: 838 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 841 - Posted: 26 Jul 2011, 16:35:48 UTC - in response to Message 838.  
Last modified: 26 Jul 2011, 16:40:09 UTC

Why is it that a work unit initialises with 12 Packets but nearly always processes 14 Packets

Couldnt find one - can you point to one that has done that?

Could this be the reason that some of my work units often run for 2,500 to 3,100 seconds...

Can you point at one, couldnt find one

.... and gets the error that it has exceeded Packet Limit....

All WUs will have that"error" printed in the stderr, no matter the length. Two examples:
Crunched 11 packets, starts the 12th packet - completes the 12th packet, next line "exceeded packet limit ...". It has, because it crunched 11, started and finished the 12th, and, when it tried to find a 13th (because it just works on a continuous cycle), it compared that quest for a 13th to the total its supposed to find at the top of the file (12), decided correctly the 13th iteration exceeds the task set, and (correctly) exits.

Now work through that logic with 14 packets. File sets the total its crunching at the top (14). As it cycles through them one by one, it decrements the total it has to crunch by one. Gets to the 13th, finishes it ok, starts a 14th (which is fine as total to crunch is 14), finishes it, decrements by one so zero left to do, cycles to the start, checks the total to crunch (14) decides (correctly) the 15th iteration exceeds its task of 14, and exits.

The same logic can therefore be used for any Packet sized WU, irrespective of size of data to be crunched, which is great. Technically it is an "error" because always on the last iteration its deciding that the last one it tried to start exceeds the task set (the 13th and the 15th in the examples), hence will always get the "error".

Its a useful thing to have in there however, because if due to a genuine error it tries to keep cycling through more, which now dont exist, when it shouldnt, then we want to know, hence the blanket error message ... and we want it to exit in all cases else it will keep polling in a circle forever and never finish.

There are significantly varying times for the same size WUs on both PCs however. Are you running CPU WUs at the same time that takes up all the theoretical CPU allocation, leaving minimal for the GPU app? That will cause - every time - wildly varying GPU WU times. The extent of the variance will depend on how much CPU time is taken by the CPU app, and therefore the time left to service GPU requests.

You can easily check the latter by suspending all CPU Projects and note subsequent GPU WU times for consistency - but whilst doing that just use a mild overclock on the GPU, else the latter may introduce another factor that will complicate deductions.

Regards
Zy
ID: 841 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Conan
Avatar

Send message
Joined: 2 May 11
Posts: 53
Credit: 255,675,997
RAC: 5,860
Message 844 - Posted: 27 Jul 2011, 11:38:20 UTC

It would appear that ALL my work units behave this way.

Process and load packets into r72 and says that "10 (or 12) packets loaded into r72"
Then continues onto 11 (or 13) and then 12 (or 14), saying that packet limit has been exceeded after the first extra packet has been processed but still does another one anyway.

So appears to be normal behaviour but still seems odd when only a set number of packets have been loaded at the start of the WU but extra packets are processed.

As they weren't originally loaded are the extra ones actually processed?

Yes I run CPU projects on all cores. Both Windows computers are running the same mix of projects which very often changes.
At the moment they are running Moo!Wrapper, Einstein, DistributedDataMining, Ralph (when available), WUProp and Surveill.
The last two are Non CPU intensive.
Running 32 bit and only 3 to 4 GB available memory I think I am pushing the limits of my computers as memory is often a problem.

Thanks for the reply Zy.

Conan.
ID: 844 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 845 - Posted: 27 Jul 2011, 19:21:03 UTC

We need to put to bed the sequence in the stderr, else it will overhang future thoughts. I think a mis-interpretation has occured at what is happening in the stderr. The same thing threw me the first time I crunched at DNETC. I'll cut & paste two blocks from a validated WU stderr output and step through the lines, one block from the start, and the final block. Hopefully it will make it clearer .... and anyone else reading this, check me out here, in case I've got it wrong. {All bolding and colours are mine for clarity, my comments in italics for clarity, it is a 12 packet WU, I will use the incorrect term "inbox" and "outbox" in the same way as a mailbox}

[Jul 27 16:29:02 UTC] Automatic processor detection found 2 processors.
[Jul 27 16:29:02 UTC] Loading crunchers with work...
[Jul 27 16:29:02 UTC] RC5-72: using core #0 (IL 4-pipe c).
[Jul 27 16:29:02 UTC] RC5-72 #a: Loaded CF:6259A200:00000000:64*2^32
[Jul 27 16:29:02 UTC] RC5-72 #b: Loaded CF:6259A240:00000000:64*2^32

Two cards found, and each card loaded with one packet coloured red and blue

[Jul 27 16:29:02 UTC] RC5-72: 10 packets (640.00 stats units) remain in
in.r72
[Jul 27 16:29:02 UTC] RC5-72: 0 packets are in out.r72

10 packets remain in the inbox, one other is loaded as A200, and another is loaded as A240, total 12 packets

[Jul 27 16:29:02 UTC] 2 crunchers ('a' and 'b') have been started.
[Jul 27 16:31:53 UTC] RC5-72 #a: Completed CF:6259A200:00000000 (64.00 stats units)
0.00:02:50.48 - [1,612,328,984 keys/s]
[Jul 27 16:31:53 UTC] RC5-72 #a: Loaded CF:6259A280:00000000:64*2^32
[Jul 27 16:31:53 UTC] RC5-72: Summary: 1 packet (64.00 stats units)
0.00:02:50.48 - [1,612.33 Mkeys/s]
[Jul 27 16:31:53 UTC] RC5-72: 9 packets (576.00 stats units) remain in
in.r72
Projected ideal time to completion: 0.00:09:36.00
[Jul 27 16:31:53 UTC] RC5-72: 1 packet (64.00 stats units) is in out.r72

Packet A200 has now completed and in the outbox, nine packets remain in the inbox

[Jul 27 16:32:11 UTC] RC5-72 #b: Completed CF:6259A240:00000000 (64.00 stats units)
0.00:03:08.57 - [1,457,635,073 keys/s]
[Jul 27 16:32:11 UTC] RC5-72 #b: Loaded CF:6259A2C0:00000000:64*2^32
[Jul 27 16:32:11 UTC] RC5-72: Summary: 2 packets (128.00 stats units)
0.00:03:10.25 - [2,889.65 Mkeys/s]
[Jul 27 16:32:11 UTC] RC5-72: 8 packets (512.00 stats units) remain in
in.r72
Projected ideal time to completion: 0.00:08:32.00
[Jul 27 16:32:11 UTC] RC5-72: 2 packets (128.00 stats units) are in out.r72

A240 is completed and in the outbox along with A200. Packet A2C0 is loaded, A280 still crunching, 8 Packets left in the inbox. Total 12 Packets

Now cycles through to the end, last few actions are:

[Jul 27 16:46:15 UTC] RC5-72: 0 packets remain in in.r72
[Jul 27 16:46:15 UTC] RC5-72: 10 packets (640.00 stats units) are in out.r72

No packets in the inbox, 10 packets completed and in the outbox, two still crunching on cards - total 12 packets

[Jul 27 16:46:31 UTC] RC5-72 #a: Completed CF:6259A480:00000000 (64.00 stats units)
0.00:03:35.78 - [1,273,874,469 keys/s]
[Jul 27 16:46:31 UTC] RC5-72: Summary: 11 packets (704.00 stats units)
0.00:17:30.06 - [2,879.50 Mkeys/s]
[Jul 27 16:46:31 UTC] RC5-72: 0 packets remain in in.r72
[Jul 27 16:46:31 UTC] RC5-72: 11 packets (704.00 stats units) are in out.r72

11th packet (A480) completes, nil packets in inbox, 11 packets in outbox, one still crunching

[Jul 27 16:49:41 UTC] RC5-72 #b: Completed CF:6259A4C0:00000000 (64.00 stats units)
0.00:03:25.79 - [1,335,674,995 keys/s]

Last packet (12th) completes. The programme cycles back to the top for a 13th loop, checks and finds WU only needs 12 loops, and ....
[Jul 27 16:49:41 UTC] Shutdown - packet limit exceeded.
[Jul 27 16:49:41 UTC] RC5-72: Summary: 12 packets (768.00 stats units)
0.00:20:40.65 - [2,658.70 Mkeys/s]
[Jul 27 16:49:41 UTC] RC5-72: 0 packets remain in in.r72

..... exits as all loops needed are complete, and rcords an "error" as it started to look for a 13th
[Jul 27 16:49:41 UTC] RC5-72: 12 packets (768.00 stats units) are in out.r72
[Jul 27 16:49:41 UTC] *Break* Shutting down...
[Jul 27 16:49:42 UTC] Shutdown complete.
02:49:43 (2544): called boinc_finish

12 Packets now in the outbox, programme break and shutdown, calls boinc_finish and BOINC Client picks up the completed WU

It cycles to try and start an extra (as designed) discovers it doesnt need it, calls it an error, and closes. WU completed. That finishing process will always happen at each WU - it will try for another loop and tests if thats needed, discovers its not and breaks off. No matter the size of WU, or any infinite loop, the programme stops and exits correctly.

If we can establish that bit, we can then move onto why you are getting varied results and try to find solutions.

Regards
Zy
ID: 845 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Conan
Avatar

Send message
Joined: 2 May 11
Posts: 53
Credit: 255,675,997
RAC: 5,860
Message 849 - Posted: 28 Jul 2011, 3:40:18 UTC

Thanks Zy, that makes sense.

I will work on the slow processing times on one machine in particular. Doing a defrag now as the poor hard drive was very fragmented and I will defrag the other machine for good measure even though it does not need it.

Conan
ID: 849 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 850 - Posted: 28 Jul 2011, 8:21:27 UTC
Last modified: 28 Jul 2011, 8:23:25 UTC

Okie Doke. For the future if the dosh is available, I very strongly recommend Diskeeper. It is very very good, and widely used on Company & Corporate Networks (always a good sign if those guys trust a utility like this).

You literally do not notice it is there at all, and you will never - ever - have the slightest fragmentation ever again, its brilliant.

It defrags in the background taking up spare cpu cycles, its extremely well behaved breaking off every time another programme needs CPU Service. So much so that it will pick up BOINC requests to the CPU and backoff, and remember BOINC itself is designed to backoff, however diskeeper keeps out of the way of even that logic. Very clever stuff. I recommend getting the Pro Premier Version, you can try it on 30 day trial for free.

It will do other things as well, worth reading the blurb and documentation. BOINC is very heavy on fragmentation, its always "at it", and Diskeeper is a neat solution to that permanent BOINC issue (that will never go away, its the nature of the beast, BOINC will always be the cause of heavy fragmentation - unless you are using Diskeeper[!]).

Diskeeper Home Page

Regards
Zy
ID: 850 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 859 - Posted: 31 Jul 2011, 17:20:58 UTC

Hi,

Yep, Zydor explained it correctly. Thanks for that! :)

The message "Shutdown - packet limit exceeded." is indeed normal since our wrapper process gives client process an input buffer and asks it to crunch until there are none left in which time it should shutdown itself. We do this by setting a "packet limit" setting.

Other important part was the start message and that indeed indicates packets that remain in the input buffer after crunchers have loaded work from it. So this remaining number plus number of crunchers (usually one per processor or GPU in this case) should equal to the count of packets in a work unit (recorded as the third number from the end of task name).

-w
ID: 859 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Message boards : Number crunching : Running 14 Packets, Issued 12 packets ???


 
Copyright © 2011-2024 Moo! Wrapper Project