Ticket #724 (new defect)

Opened 11 months ago

Last modified 11 months ago

export_generic crashes against Winston 1.3.13

Reported by: paulf Owned by: somebody
Priority: major Milestone:
Component: export_generic Version:
Keywords: Cc:


I still need to debug this, but captured a stack trace dump here:

(base) [aqms@silverdragon params]$ gdb `which export_generic`
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
Reading symbols from /home/aqms/build/earthworm_svn/bin/export_generic...done.
(gdb) run wws/export_gen_wws.d
Starting program: /home/aqms/build/earthworm_svn/bin/export_generic wws/export_gen_wws.d
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
20200330_UTC_18:21:17 /home/aqms/build/earthworm_svn/bin/export_generic(MOD_EXPORT_GENERIC): Read command file <wws/export_gen_wws.d>
 Version 2.0 2018.05.03
20200330_UTC_18:21:17 /home/aqms/build/earthworm_svn/bin/export_generic(MOD_EXPORT_GENERIC): Waiting for new connection.
20200330_UTC_18:21:17 /home/aqms/build/earthworm_svn/bin/export_generic(MOD_EXPORT_GENERIC): Connection accepted from IP address
[New Thread 0x7ffff7106700 (LWP 18374)]
[New Thread 0x7ffff6905700 (LWP 18375)]
[New Thread 0x7ffff6104700 (LWP 18376)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7106700 (LWP 18374)]
0x0000000000404f24 in enqueuering (q=0x649d20 <OutQueue>, x=0x654ff0 "", size=932, userLogo=..., ringKey=-1, seq=0 '\000')
    at mem_circ_queue.c:270
270	  memcpy( pLQE->d, x, size);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64
(gdb) where
#0  0x0000000000404f24 in enqueuering (q=0x649d20 <OutQueue>, x=0x654ff0 "", size=932, userLogo=..., ringKey=-1, seq=0 '\000')
    at mem_circ_queue.c:270
#1  0x0000000000404eb3 in enqueue (q=0x649d20 <OutQueue>, x=0x654ff0 "", size=932, userLogo=...) at mem_circ_queue.c:242
#2  0x000000000040394b in MessageStacker (dummy=0x0) at export.c:1079
#3  0x00007ffff78c4e65 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff75ed88d in clone () from /lib64/libc.so.6

Change History

comment:1 Changed 11 months ago by paulf

Okay, I discovered that I had export_generic RingSize? set to a very large value 300K, and that was what was causing this. The MessageSize? setting was 4096...so something in the mem_circ_queue.c code doesn't like large internal rings in the maths used to compute addressing!!!!

When I set the value to 100K, it worked fine.

We need to add a check to not allow this memory overflow that causes the segfault......

comment:2 Changed 11 months ago by quintiliani


if a memory allocation problem occurs, the function initqueue() returns a nonzero value.

(ref. http://earthworm.isti.com/trac/earthworm/browser/trunk/src/libsrc/util/mem_circ_queue.c#L44)

Looking at export.c line 436, (http://earthworm.isti.com/trac/earthworm/browser/trunk/src/data_exchange/export/export.c#L434), the function initqueue() is called without checking the return code.

Maybe nobody has noticed the problem until today because small values have always been used, and the necessary resources have always been sufficient.

While this may not be the cause of your problem, in general it would be best to add the function's return value check.

comment:3 Changed 11 months ago by paulf

Thanks Matteo.

So, I modified export.c to check the initqueue() return value in r8173, and tested just now, but the module did not stop on startup. I think there still needs to be better logic checks inside the circ queue codes itself to detect this one.

Note: See TracTickets for help on using tickets.