Ticket #724 (new defect)
export_generic crashes against Winston 1.3.13
Reported by: | paulf | Owned by: | somebody |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | export_generic | Version: | |
Keywords: | Cc: |
Description
I still need to debug this, but captured a stack trace dump here:
(base) [aqms@silverdragon params]$ gdb `which export_generic` GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /home/aqms/build/earthworm_svn/bin/export_generic...done. (gdb) run wws/export_gen_wws.d Starting program: /home/aqms/build/earthworm_svn/bin/export_generic wws/export_gen_wws.d [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". 20200330_UTC_18:21:17 /home/aqms/build/earthworm_svn/bin/export_generic(MOD_EXPORT_GENERIC): Read command file <wws/export_gen_wws.d> Version 2.0 2018.05.03 20200330_UTC_18:21:17 /home/aqms/build/earthworm_svn/bin/export_generic(MOD_EXPORT_GENERIC): Waiting for new connection. 20200330_UTC_18:21:17 /home/aqms/build/earthworm_svn/bin/export_generic(MOD_EXPORT_GENERIC): Connection accepted from IP address 127.0.0.1 [New Thread 0x7ffff7106700 (LWP 18374)] [New Thread 0x7ffff6905700 (LWP 18375)] [New Thread 0x7ffff6104700 (LWP 18376)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff7106700 (LWP 18374)] 0x0000000000404f24 in enqueuering (q=0x649d20 <OutQueue>, x=0x654ff0 "", size=932, userLogo=..., ringKey=-1, seq=0 '\000') at mem_circ_queue.c:270 270 memcpy( pLQE->d, x, size); Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 (gdb) where #0 0x0000000000404f24 in enqueuering (q=0x649d20 <OutQueue>, x=0x654ff0 "", size=932, userLogo=..., ringKey=-1, seq=0 '\000') at mem_circ_queue.c:270 #1 0x0000000000404eb3 in enqueue (q=0x649d20 <OutQueue>, x=0x654ff0 "", size=932, userLogo=...) at mem_circ_queue.c:242 #2 0x000000000040394b in MessageStacker (dummy=0x0) at export.c:1079 #3 0x00007ffff78c4e65 in start_thread () from /lib64/libpthread.so.0 #4 0x00007ffff75ed88d in clone () from /lib64/libc.so.6 (gdb)
Change History
comment:2 Changed 11 months ago by quintiliani
Paul,
if a memory allocation problem occurs, the function initqueue() returns a nonzero value.
(ref. http://earthworm.isti.com/trac/earthworm/browser/trunk/src/libsrc/util/mem_circ_queue.c#L44)
Looking at export.c line 436, (http://earthworm.isti.com/trac/earthworm/browser/trunk/src/data_exchange/export/export.c#L434), the function initqueue() is called without checking the return code.
Maybe nobody has noticed the problem until today because small values have always been used, and the necessary resources have always been sufficient.
While this may not be the cause of your problem, in general it would be best to add the function's return value check.
Okay, I discovered that I had export_generic RingSize? set to a very large value 300K, and that was what was causing this. The MessageSize? setting was 4096...so something in the mem_circ_queue.c code doesn't like large internal rings in the maths used to compute addressing!!!!
When I set the value to 100K, it worked fine.
We need to add a check to not allow this memory overflow that causes the segfault......