今天要來查一個 python 程式當掉,跑出 core dump 的問題...

雖然沒有查出真正的原因,但還是簡單記錄一下~

 

1. 使用 gdb 來 debug core dump

執行 gdb /usr/bin/python coredump 後,可以看到下面的輸出:

root@localhost /tmp/ccpp # gdb /usr/bin/python coredump 

Core was generated by `python -u -m /tmp/testd'.
Program terminated with signal 6, Aborted.
#0  0x00007f4ed2c195d7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install python-2.7.5-16.el7.x86_64

 

看起來是因為 signal 6,也就是 SIGABRT 而當掉,

gdb 建議我們安裝 python 的 debug symbol~

 

2. 安裝 Python 的 debug symbol

既然 gdb 說缺少 debug symbol,還給我們 debuginfo-install 的指令,就直接執行吧~

安裝中間如果出現問題,可以參考使用 yum 安裝 debug symbol 這篇的說明~

 

3. 再次使用 gdb 來 debug core dump

重新執行 gdb /usr/bin/python coredump,

奇怪的是 py-bt 指令在這個 dump 裡沒什麼作用,沒能秀出相關的 python 函式呼叫,

但至少 bt 指令產出的 call stack 有提供比較詳細的參數資訊了: 

Core was generated by `python -u -m /tmp/testd'.
Program terminated with signal 6, Aborted.
#0  0x00007f4ed2c195d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56    return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
Missing separate debuginfos, use: debuginfo-install postgresql93-libs-9.3.6-2PGDG.rhel7.x86_64 python-crypto-2.6.1-1.el7.x86_64

(gdb) bt
#0  0x00007f4ed2c195d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f4ed2c1acc8 in __GI_abort () at abort.c:90
#2  0x00007f4ed2c59e07 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f4ed2d628c8 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007f4ed2c5fc67 in malloc_printerr (action=<optimized out>, str=0x7f4ed2d5ffb7 "corrupted double-linked list", ptr=<optimized out>) at malloc.c:4972
#4  0x00007f4ed2c6330c in _int_malloc (av=av@entry=0x7f4ed2f9e760 <main_arena>, bytes=bytes@entry=529) at malloc.c:3667
#5  0x00007f4ed2c639bc in _int_realloc (av=av@entry=0x7f4ed2f9e760 <main_arena>, oldp=oldp@entry=0x25828f0, oldsize=oldsize@entry=464, nb=nb@entry=544)
    at malloc.c:4247
#6  0x00007f4ed2c64702 in __GI___libc_realloc (oldmem=0x2582900, bytes=536) at malloc.c:2998
#7  0x00007f4ed39da2e9 in _PyObject_GC_Resize (op=0xbf65, nitems=49041) at /usr/src/debug/Python-2.7.5/Modules/gcmodule.c:1614
#8  0x00007f4ed39389b5 in PyFrame_New (tstate=<optimized out>, code=0x7f4ed3ce8db0, 
    globals={'_copy_with_copy_method': <s not handle Suites at remote 0x7f4ed3d02cf8>, '_deepcopy_atomic': <s not handle Suites at remote 0x7f4ed3d02e60>, '_reconstruct': <s not handle Suites at remote 0x7f4ed3cfc230>, '_deepcopy_tuple': <s not handle Suites at remote 0x7f4ed3d02f50>, '_deepcopy_dict': <s not handle Suites at remote 0x7f4ed3cfc050>, 'deepcopy': <s not handle Suites at remote 0x7f4ed3d02de8>, 'dispatch_table': {<GeneratedProtocolMessageType(__metaclass__=<me type at remote 0x226a590>, MergeFromString=<s not handle Suites at remote 0x231daa0>, ByteSize=<s not handle Suites at remote 0x231d8c0>, __str__=<s not handle Suites at remote 0x231d758>, SerializeToString=<s not handle Suites at remote 0x231d938>, _SetListener=<s not handle Suites at remote 0x231d848>, SetInParent=<s not handle Suites at remote 0x231dcf8>, _cached_byte_size_dirty=<er_descriptor at remote 0x231b1b8>, TYPE_FIELD_NUMBER=1, HasField=<s not handle Suites at remote 0x231d578>, _Modified=<s not hand...(truncated), locals=0x0)
    at /usr/src/debug/Python-2.7.5/Objects/frameobject.c:728
#9  0x00007f4ed39aba16 in PyEval_EvalCodeEx (co=0xbf65, globals=<unknown at remote 0xbf91>, locals=<unknown at remote 0x6>, args=0xffffffffffffffff, 
    argcount=0, kws=0x27, kwcount=0, defs=0x7f4ed3d7af08, defcount=2, closure=0x0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3099
#10 0x00007f4ed39aa83f in fast_function (nk=0, na=2, n=2, pp_stack=0x7f4e93ffdef0, func=<s not handle Suites at remote 0x7f4ed3d02de8>)
    at /usr/src/debug/Python-2.7.5/Python/ceval.c:4194
#11 call_function (oparg=<optimized out>, pp_stack=0x7f4e93ffdef0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4119
#12 PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at /usr/src/debug/Python-2.7.5/Python/ceval.c:2740

 

對照 python 原始碼看的時候,注意到裡面有個 copy.deepcopy() 的指令,

正好可以對應到 frame #8 的地方,

這邊不曉得為什麼 PyFrame_New -> _PyObject_GC_Resize 一路串到 _int_malloc之後,

接著就看到 frame #3 的 malloc_printerr 想要印出 "corrupted double-linked list" 這個字串,

接著就呼叫 abort() 結束程式了~

 

猜測是程式之前做了什麼操作導致記憶體亂掉,這邊的 deepcopy 只是倒楣,

在產生新的物件時碰到了壞掉的記憶體的內容,所以導致 malloc() 偵測到錯誤...

不過要光從這個 core dump 找出來是什麼地方弄亂了似乎有點困難,

目前只能先觀察看看是不是會再發生...

 

參考資料:

stackoverflow: What does 'corrupted double-linked list' mean

 

 

文章標籤
創作者介紹

亂打一通的心情日記

ephrain 發表在 痞客邦 PIXNET 留言(0) 人氣()