Message boards :
Number crunching :
CPU app only - minor issue with checkpointing
Message board moderation
Author | Message |
---|---|
Send message Joined: 11 May 11 Posts: 26 Credit: 50,059,517 RAC: 0 |
I just noticed that every CPU-WU that is restarting from a checkpoint ends up with being marked as invalid - but granting some credits nonetheless. Example It looks like during the checkpoint 1 out of the 9 packets a CPU-WU consists of is lost. The WU is marked as invalid, but after been crunched by a wingman credit is granted for 8 packets. |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
One aspect of this ..... if you are going to close down BAM for any reason, suspend Moo first (make sure you have "suspended in memory" enabled), then shut down BAM. It will avoid a failed WU on restart of BAM. There are some issues around with checkpointing, Teemu is working on another Moo client version, but meanwhile if you suspend to memory before closing BAM it will avoid a big part of the checkpoint failures. Regards Zy |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
Hi, Yup, this should be fixed in the new application v1.3, which I'm going to deploy today or tomorrow depending on how coding of one last change is going. :) This was due to an upstream bug we inherited in the wrapper code (and I didn't see it either). Basically, wrapper deletes the checkpoint file everytime it starts the D.Net Client. That's obviously not a good idea if last shutdown was abnormal and some of the packets are in the checkpoint file, waiting to be rescued. :) -w |