CPU app only - minor issue with checkpointing

\n studio-striking\n

Message boards : Number crunching : CPU app only - minor issue with checkpointing
Message board moderation

To post messages, you must log in.

AuthorMessage
Senilix

Send message
Joined: 11 May 11
Posts: 26
Credit: 50,059,517
RAC: 0
Message 2180 - Posted: 10 Jan 2012, 16:29:11 UTC

I just noticed that every CPU-WU that is restarting from a checkpoint ends up with being marked as invalid - but granting some credits nonetheless.

Example

It looks like during the checkpoint 1 out of the 9 packets a CPU-WU consists of is lost. The WU is marked as invalid, but after been crunched by a wingman credit is granted for 8 packets.
ID: 2180 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 2181 - Posted: 10 Jan 2012, 18:52:11 UTC - in response to Message 2180.  
Last modified: 10 Jan 2012, 18:52:52 UTC

One aspect of this ..... if you are going to close down BAM for any reason, suspend Moo first (make sure you have "suspended in memory" enabled), then shut down BAM.

It will avoid a failed WU on restart of BAM.

There are some issues around with checkpointing, Teemu is working on another Moo client version, but meanwhile if you suspend to memory before closing BAM it will avoid a big part of the checkpoint failures.

Regards
Zy
ID: 2181 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 2224 - Posted: 14 Jan 2012, 9:14:12 UTC

Hi,

Yup, this should be fixed in the new application v1.3, which I'm going to deploy today or tomorrow depending on how coding of one last change is going. :)

This was due to an upstream bug we inherited in the wrapper code (and I didn't see it either). Basically, wrapper deletes the checkpoint file everytime it starts the D.Net Client. That's obviously not a good idea if last shutdown was abnormal and some of the packets are in the checkpoint file, waiting to be rescued. :)

-w
ID: 2224 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Message boards : Number crunching : CPU app only - minor issue with checkpointing


 
Copyright © 2011-2024 Moo! Wrapper Project