Subscribe to XML Feed

27 Dec 2011

network hangup

前几天因工作需要,使用python的httplib库写了一个简单的测试服务脚本。但是出现个好玩的问题。
这里描述一下:

  1. client端 即运行脚本的服务器
  2. server端 即被脚本访问的服务器

如果server突然crash后, client端的脚本会出现一直hangup的情况(不是100%出现)

下面就是在现场找到的信息:

<span class="o">[</span>root@lvs1_s ~<span class="o">]</span><span class="c"># strace -p 18235 (脚本的PID号)</span>
Process 18235 attached - interrupt to quit
<span class="nb">read</span><span class="o">(</span>8, ^C &lt;unfinished ...&gt;
Process 18235 detached

一直hangup在read系统调用中, file descriptor 8是什么? 查看一下:

<span class="o">[</span>root@lvs1_s ~<span class="o">]</span><span class="c"># lsof -p 18235|grep 8u</span>
python  18235 root    8u  IPv4 23796661      0t0      TCP 58.xxx.xx.116:36217-&gt;58.xxx.xx.85:https <span class="o">(</span>ESTABLISHED<span class="o">)</span>
python  18235 root   88u  IPv4 23795354      0t0      TCP self:49292-&gt;10.176.22.42:https <span class="o">(</span>ESTABLISHED<span class="o">)</span>

可以看出, 是socket。抓包也不见有任何数据:

<span class="o">[</span>root@lvs1_s ~<span class="o">]</span><span class="c"># tcpdump -i eth1 -n &#39;ip host 58.xxx.xx.85 and port 443&#39;</span>
tcpdump: verbose output suppressed, use -v or -vv <span class="k">for </span>full protocol decode
listening on eth1, link-type EN10MB <span class="o">(</span>Ethernet<span class="o">)</span>, capture size 65535 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel

解决这个问题,可以有几种方法,比如:

  1. 使用thread
  2. 使用child process
  3. 不使用httplib, 直接使用socket,这样可以使用更多的IO特性

选1的话,把thread给hangup了也是个问题.
选2的话,看上去不够优雅
选3的话,又懒得去分析HTTP协议, 特别是https!.

经过测试,发现只有https才会有这种问题. 测试方法如下:

client端:

    import os
    import traceback
    import sys
    import threading
    import httplib
    import ssl

    connection = httplib.HTTPSConnection(host=&quot;10.241.31.44&quot;, port=8082, timeout=7)
    connection.request(&quot;GET&quot;, &quot;/&quot;, headers={})
    response = connection.getresponse()
    print response.status, response.read()

server端:

    import os
    import sys
    import traceback
    import socket
    import select
    import time


    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.bind((&#39;0.0.0.0&#39;, 8082))
    sock.listen(10)

    while True:
        try:
            s = sock.accept()[0]
            fd = s.fileno()
            fd_list = [fd]

            write_list = select.select([], fd_list, [])[1]
            d = 0
            for fd in write_list:
                time.sleep(50) # 为了方便重现问题这里sleep, 在这50秒里把网线拨了就出现hangup
                os.close(fd)
                fd_list.remove(fd)
        except:
            traceback.print_exc()
            print &#39;error!&#39;

        time.sleep(5)

为了找个这个问题的原因, 查看httplib库的HTTPSConnection类,发现它的作用和下面的代码一致(当然也引发了这个hangup的问题):

    import socket
    import ssl

    timeout = 10
    socket.setdefaulttimeout(timeout)

    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

    sock.connect((&quot;10.241.31.44&quot;, 8082))
    ssl_sock= ssl.wrap_socket(sock, None, None)

很显示,问题出在ssl库的上. 再跟进ssl库,可以看到如下的代码:

            # yes, create the SSL object
            self._sslobj = _ssl.sslwrap(self._sock, server_side,
                                        keyfile, certfile,
                                        cert_reqs, ssl_version, ca_certs)
            if do_handshake_on_connect:
                timeout = self.gettimeout()
                try:
                    self.settimeout(None)
                    self.do_handshake()
                finally:
                    self.settimeout(timeout)

我靠! 因为do_handshake_on_connect参数, 让HTTPSConnection类实例化时timeout参数在do_handshake()之前被清了. fuck~~~~~~

看到只能把HTTPSConnection类的connect方法override一下了.

HTTPSConnection原代码是这样的:

    class HTTPSConnection(HTTPConnection):
        &quot;This class allows communication via SSL.&quot;

        default_port = HTTPS_PORT

        def __init__(self, host, port=None, key_file=None, cert_file=None,
                     strict=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
            HTTPConnection.__init__(self, host, port, strict, timeout)
            self.key_file = key_file
            self.cert_file = cert_file

        def connect(self):
            &quot;Connect to a host on a given (SSL) port.&quot;

            sock = socket.create_connection((self.host, self.port), self.timeout)
            if self._tunnel_host:
                self.sock = sock
                self._tunnel()
            self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)

把connect方法override:

    import socket
    import os
    import traceback
    import sys
    import threading
    import httplib
    import ssl

    from _ssl import CERT_NONE, PROTOCOL_SSLv23

    class MyHTTPSConnection(httplib.HTTPSConnection):

        def connect(self):
            &quot;Connect to a host on a given (SSL) port.&quot;

            sock = socket.create_connection((self.host, self.port), self.timeout)
            if self._tunnel_host:
                self.sock = sock
                self._tunnel()
            #self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
            self.sock = ssl.wrap_socket(sock, keyfile=None, certfile=None,
                            server_side=False, cert_reqs=CERT_NONE,
                            ssl_version=PROTOCOL_SSLv23, ca_certs=None,
                            do_handshake_on_connect=False,
                            suppress_ragged_eofs=True)
            # 在允许超时的情况下调用do_handshake()
            self.sock.do_handshake()

    connection = MyHTTPSConnection(host=&quot;10.241.31.44&quot;, port=8082, timeout=7)
    connection.request(&quot;GET&quot;, &quot;/&quot;, headers={})
    response = connection.getresponse()
    print response.status, response.read()

哈, override后的MyHTTPSConnection一切正常


View Comments
15 Oct 2011

windows文本编辑器

为了使用公司的VPN和公司内部服务,只能使用windows。
有时需要在windows下编辑文件,还是需要熟悉一下此环境。
首先是回车问题:

最好是显示new line符:

还有万恶的BOM头!!

最讨厌的就是加入BOM了,常常会为这种屁字节调试N久。下面对比有BOM与没有BOM的文件:


View Comments
15 Oct 2011

arp annoucement

又忘记了arp announcement就是gratuitous ARP了(旧blog: http://jessinio.blogspot.com/2010/12/ethernet.html)

回顾一下:arp announcement的作用:

  • http://en.wikipedia.org/wiki/Address_Resolution_Protocol#ARP_announcements

linux kernel 2.6.x使用一个特殊的flag来处理这个问题:

  • http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.arp_problem.html#2_6_arp_announce

随便提一下, ipvs文档(http://www.linuxvirtualserver.org/docs/arp.html)里使用hidden这个flag,这在2.6.x kernel时代被去掉了。


View Comments
26 Sep 2011

static route rule

centOS

此脚本的内容:

FILES="/etc/sysconfig/network-scripts/route-$1"
if [ -n "$2" -a "$2" != "$1" ]; then
    FILES="$FILES /etc/sysconfig/network-scripts/route-$2"
fi
<span class="nv">FILES</span><span class="o">=</span><span class="s2">&quot;/etc/sysconfig/network-scripts/route-$1&quot;</span>
<span class="k">if</span> <span class="o">[</span> -n <span class="s2">&quot;$2&quot;</span> -a <span class="s2">&quot;$2&quot;</span> !<span class="o">=</span> <span class="s2">&quot;$1&quot;</span> <span class="o">]</span>; <span class="k">then</span>
<span class="k">    </span><span class="nv">FILES</span><span class="o">=</span><span class="s2">&quot;$FILES /etc/sysconfig/network-scripts/route-$2&quot;</span>
<span class="k">fi</span>

for file in $FILES; do
if [ -f "$file" ]; then
if egrep -q '^[[:space:]]ADDRESS[0-9]+=' $file ; then
# new format
handle_file $file ${1%:
}
else
# older format
{ cat "$file" ; echo ; } | while read line; do
if [[ ! "$line" =~ '^[[:space:]](\#.)?$' ]]; then
/sbin/ip route add $line
fi
done
fi
fi
done

ubuntu


View Comments