Ticket #666 (new defect)

Opened 3 months ago

Last modified 3 months ago

The Earthworm threading API is invalid on 64-bit Linux and Mac OS X, and possibly other 64-bit systems

Reported by: baker Owned by: somebody
Priority: major Milestone: All Platforms
Component: ALL modules Version: 7.9
Keywords: libsrc util thread_ew threading Cc:

Description

Compiling src/libsrc/unix/threads_ew.c on 64-bit Mac OS X give the warning:

threads_ew.c:251:26: warning: cast to 'pthread_t' (aka 'struct _opaque_pthread_t *') from smaller integer type 'unsigned int' [-Wint-to-pointer-cast]
   return( pthread_kill( (pthread_t) tid, SIGUSR1 ) );
                         ^

This is because the thread ID variable, tid, is a 32-bit int, but pthread_t is a 64-bit pointer.

The Earthworm threading functions are declared in include/earthworm_complex_funcs.h. They all use unsigned (with or without int) for the thread ID data type:

int  WaitThread( unsigned * );              /* threads_ew.c system-dependent */
int  KillThread( unsigned int );            /* threads_ew.c system-dependent */
int  KillSelfThread( void );                /* threads_ew.c system-dependent */
int  StartThread( thr_ret fun(void *), unsigned stack_size, unsigned *thread_id );
int  StartThreadWithArg( thr_ret fun(void *), void *arg, unsigned stack_size, unsigned *thread_id );

On Linux and Mac OS X THIS IS WRONG! (I have no idea whether this is also an error on 64-bit Solaris or Windows systems.)

Linux and Mac OS X use POSIX Threads (pthreads). The POSIX thread ID is a pthread_t type. pthread_t is an opaque type. See the NOTES section on the Linux man page for pthread_self(), http://man7.org/linux/man-pages/man3/pthread_self.3.html:

NOTES

       POSIX.1 allows an implementation wide freedom in choosing the type
       used to represent a thread ID; for example, representation using
       either an arithmetic type or a structure is permitted.  Therefore,
       variables of type pthread_t can't portably be compared using the C
       equality operator (==); use pthread_equal(3) instead.

       Thread identifiers should be considered opaque: any attempt to use a
       thread ID other than in pthreads calls is nonportable and can lead to
       unspecified results.

       Thread IDs are guaranteed to be unique only within a process.  A
       thread ID may be reused after a terminated thread has been joined, or
       a detached thread has terminated.

       The thread ID returned by pthread_self() is not the same thing as the
       kernel thread ID returned by a call to gettid(2).

On Linux, pthread_t is typedef'd as a long int in /usr/include/bits/pthreadtypes.h:

typedef unsigned long int pthread_t;

On Mac OS X, pthread_t is typedef'd as a pointer to an opaque structure in /usr/include/sys/_pthread/_pthread_t.h:

typedef __darwin_pthread_t pthread_t;

and /usr/include/sys/_pthread/_pthread_types.h:

typedef struct _opaque_pthread_t *__darwin_pthread_t;

In neither case is an int (signed or unsigned) an appropriate variable to store the thread ID.

This is a doozy. An int cannot reliably store addresses on a 64-bit system. This should have been caught when Earthworm support for 64-bit systems was added. Actually, it should have been caught when POSIX Threads support was added. Until this is fixed, I recommend not using the 64-bit Earthworm on Linux or Mac OS X. On Solaris and Windows, use 64-bit Earthworm at your own risk.

Change History

comment:1 Changed 3 months ago by baker

The good news is that the only way on Linux and Mac OS X for an invalid thread ID to get used is in KillThread(), which I bet does not get called very often.

I plan to add an assert() in the two thread creation functions in src/libsrc/unix/threads_ew.c, StartThread() and StartThreadWithArg(), to compare the pthread_t value from the pthread_create() call with the unsigned int value returned to the caller, such as:

   *thread_id = (unsigned)tid;

   assert( *thread_id == tid );

   return 0;

My hope is this will catch a pointer value on a 64-bit system that cannot be represented in 32 bits. Whether this will be accepted by the compilers, I'll find out.

comment:2 Changed 3 months ago by baker

From what I see in the Windows version of threads_ew.c, it has the same problem. The thread ID variable returned by the Windows _beginthread() function is an uintptr_t type. That is an integer large enough to hold a pointer value—definitely not an unsigned int on a 64-bit system.

My proposed assert() causes more warnings on Mac OS X, but the generated code does the desired comparison. I'll have a go at adding the same assert() calls to all three variants of threads_ew.c.

After that, I'll think about a solution. Maybe a thread ID data type, ew_thread_t, typedef'd in include/earthworm_complex_funcs.h depending on the _SOLARIS, _LINUX, _MACOSX, and _WINNT compiler preprocessor definitions.

Anyone have a better idea?

comment:3 Changed 3 months ago by paulf

Hi Larry,

Thanks for the catch. I think this hasn't effected anyone because the KillThread?() is probably not used much if at all...I really only see this call being used extensively in the import_* and export_* calls and no-where else.

I think your solution about a ew_thread_t that is typedef'ed works fine for this issue.

Paul

comment:4 Changed 3 months ago by baker

Paul,

I have started on the fix in r7508, r7508, and r7509. I cleaned up the declarations and definitions of the Earthworm threading API. In preparation for the proper fix, I moved the API declarations from include/earthworm_complex_funcs.h to a new file, include/ew_threads.h. I edited all three thread_ew.c implementations to include an assertion that the value being returned will fit in an unsigned int:

   *thread_id = (unsigned) tid;

   assert( *thread_id == tid );		/* Critical on 64-bit systems */

   return 0;

These changes should not affect the behavior of calls to Earthworm threading API functions, except that if the assertion fails, the Earthworm module will fail.

I hope you can test this code to verify my claims. Even if you cannot, I will work on the solution next. There could be many changes in modules that call the Earthworm APIs, depending on what they do with the thread IDs. For example, I saw some code that assumes the thread ID value can be tested like an integer. That is wrong. The code is not even correct on 32-bit systems, because it assumes if the OS threading call fails, the returned thread ID will be zero. For POSIX Threads at least, if pthread_create() fails, the value of the returned thread ID is undefined. It is opaque anyway. Thus, the caller cannot assume anything about what its value might be.

Larry

comment:5 Changed 3 months ago by baker

The assert() definitely fails on 64-bit systems, which validates the bug. Unfortunately, this causes runs of 64-bit Earthworm to fail until this bug is completely fixed. Until then, r7538 changes the assert() to a logit() warning message.

I have fixed this bug in a working copy of the Earthworm SVN. About 100 files changed. Rather than apply all these changes at once, I propose a stepwise approach:

• Alter the Earthworm threading API to permit a NULL *thread_id argument to StartThread() or StartThreadWithArg(). That will eliminate the return of an invalid thread ID when the caller does not even use it. That covers most cases.
• Turn the *thread_id argument into a simple thread_id in WaitThread(), as is done in KillThread().
• Alter the Earthworm threading API to use ew_thread_t type everywhere a thread ID is declared (except inside the O/S-specific code, which uses the native threading API type).

comment:6 Changed 3 months ago by paulf

Thanks for the modification to turn this into a logit() call.

The new approach sounds good to me. Thanks for tackling this one.

Paul

Note: See TracTickets for help on using tickets.