Ticket #260 (closed defect: fixed)

Opened 8 years ago

Last modified 5 years ago

heli_ewII doesn't beat heart at startup (not restarted if failure on startup)

Reported by: paulf Owned by: et
Priority: major Milestone: All Platforms
Component: heli_ewII Version: 7.7
Keywords: Cc:

Description

Philip Crotwell and Branden Christensen have both seen instances at EW startup where the wave server (winston or wsv) is not available and heliew dies immediately (because it cannot find a WS)..This is probably a heart beat issue and should be fixed. Statmgr never restarts it because the code probably doesn't beat its heart before this error, and thus statmgr never knows it started or died.

Change History

comment:1 Changed 8 years ago by paulf

Okay, not quite sure what is going on here. The module's code looks like it is beating its heart at least once at startup (before the Wave Server check). This should assure that statmgr will see it and if configured properly, restart heli_ewII. Are you all sure that you have the statmgr configured to restart the heli_ewII module? Could the problem be that statmgr is starting too late and it misses this heartbeat? Just checking.

I guess another option is to try look for wave server menus more than just one time....and after N tries, then die....beating the heartbeat all the while....this should be an easy fix.

comment:2 Changed 6 years ago by et

  • Owner changed from somebody to et
  • Status changed from new to assigned

comment:3 Changed 6 years ago by et

Added retries (with heartbeats) to initial connection; issue should be fixed now.

heli_ewII.c: Modified to retry initial connection to server(s), up to 10 times; updated version to '1.0.5 2015-03-17'; changed tabs to spaces and fixed up indentation.

--ET

comment:4 Changed 5 years ago by et

  • Status changed from assigned to closed
  • Resolution set to fixed
Note: See TracTickets for help on using tickets.