Ticket #524 (closed defect: fixed)

Opened 6 years ago

Last modified 5 years ago

startstop occasionally hangs on Linux in semop() wait state in tport_getflag()

Reported by: paulf Owned by: scott
Priority: blocker Milestone: Linux
Component: startstop Version: 7.8
Keywords: Cc:

Description

Symptom: startstop becomes unresponsive to status calls (or any other messages from the command line).

Might be a race condition as per stack overflow:

http://stackoverflow.com/questions/9579158/semop-failed-with-errno-4-dose-semop-support-threads-race-inside-a-proces

Observed 2 times on a redhat Linux system. Below is a gdb connection to the hung but running startstop process using the attach to process ID trick (-p option).

{aqms@plume:params} gdb `which startstop` -p 19008

GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6)

Copyright (C) 2010 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.  Type "show copying"

and "show warranty" for details.

This GDB was configured as "x86_64-redhat-linux-gnu".

For bug reporting instructions, please see:

<http://www.gnu.org/software/gdb/bugs/>...

Reading symbols from /app/proj/earthworm/earthworm_svn/bin/startstop...done.

Attaching to program: /app/proj/earthworm/earthworm_svn/bin/startstop, process 19008

Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.

[Thread debugging using libthread_db enabled]

Loaded symbols for /lib/libpthread.so.0

Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.

Loaded symbols for /lib/libc.so.6

Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.

Loaded symbols for /lib/ld-linux.so.2

Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.

Loaded symbols for /lib/libnss_files.so.2

Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.

Loaded symbols for /lib/libgcc_s.so.1

0x00253430 in __kernel_vsyscall ()

Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.5.i686 libgcc-4.4.7-11.el6.i686

(gdb) where

#0  0x00253430 in __kernel_vsyscall ()

#1  0x00694bdb in semop () from /lib/libc.so.6

#2  0x08054cf3 in tport_doFlagOp (region=0x0, pid=19008, op=4) at transport.c:972

#3  0x0805519a in tport_getflag (region=0x0) at transport.c:1153

#4  0x0804a634 in RunEarthworm (argc=1, argv=0xff80c384) at startstop_unix_generic.c:366

#5  0x08049c8f in main ()

(gdb) quit

A debugging session is active.



Inferior 1 [process 19008] will be detached.



Quit anyway? (y or n) y

Detaching from program: /app/proj/earthworm/earthworm_svn/bin/startstop, process 19008

{aqms@plume:params} 

Note that after detaching from the process with gdb, the startstop in question exits and leaves all children lying around unattended to.

Change History

comment:1 Changed 5 years ago by paulf

  • Status changed from new to closed
  • Resolution set to fixed

Okay, closing this one as we believe we have this case licked with the 7.9a version of startstop for Unix systems.

Note: See TracTickets for help on using tickets.