Ticket #40 (closed defect: fixed)

Opened 12 years ago

Last modified 12 years ago

stopmodule doesn't work

Reported by: stefan Owned by: somebody
Priority: major Milestone: Windows
Component: startstop Version: 7.4
Keywords: Cc:


Using the new startstop on Windows, I used 'stopmodule'.

The module was killed correctly. The status was listed as "Stop" correctly. But then after about a minute or so, the module got started back up again.

This appears to be reproducible.

Change History

comment:1 Changed 12 years ago by stefan

Yeah Paul, it's pretty much exactly like you say, I'll add the following to stopmodule ovr:

sends a TYPE_STOP message to the first RING listed in startstop_*.d with
a message payload of the process id of the module to stop

sees TYPE_STOP and runs StopChild. This should kill the child
if it isn't already marked internally with  a "Stopped" status
If successful it sets the status for the module as "Stopped"

sees a TYPE_STOP messge and sets a module's restart status to STOPPED.
Until statmgr sees a TYPE_RESTART message for a stopped module, it
will not request a restart of the module.

So, I'm not sure why we're seeing the problem we're seeing, where a module gets restarted when it should be stopped. We might have a situation where startstop is waiting and waiting for a module to stop before it marks it's status as stopped, while statmgr gets another heartbeat from the module. But as far as I can tell getting a heartbeat from a module shouldn't erase the STOPPED status, and therefore statmgr shouldn't go and try to restart it anyway.

Paul Friberg wrote:
> Hmmm, scratch that. I am seeing a module I stopped get suddenly
> restarted on my test system at Caltech.
> Let me confer with Stefan on the whole stop process. I thought
> that a TYPE_STOP message was received by both statmgr and by
> startstop.
> Statmgr was supposed to see a TYPE_STOP message and stop monitoring
> the module till a new heartbeat started for it?
> and Startstop was supposed to see it and kill the module and flag
> it as Stop (not dead).
> Maybe I am confusing the process here. Stefan can clear it up.
> Paul

comment:2 Changed 12 years ago by paulf

So, it has to be that statmgr is not getting the TYPE_STOP message properly. Let me check that on the test system and report back. I'll look in the code and see if statmgr registers this as a log message.

comment:3 Changed 12 years ago by paulf

  • Status changed from new to closed
  • Resolution set to fixed

Okay, this is resolved. copystatus was never modified to transfer TYPE_STOP and TYPE_RESTART messages (now does and is checked into CVS).

Note: See TracTickets for help on using tickets.