今天在Linux 2.6.11.10上安装Netfilter patch-o-matic中的connlimit,准备用来限制IP对TCP服务的连接数,安装一切正常,可在启动connlimit进行匹配时却造成Kernel pannic,tcp连接一超过connlimit-above的限制系统就崩溃了。看来应该是connlimit出了问题。

我使用的规则是:

iptables -A INPUT -p tcp --sync --dport 22 -m connlimit --connlimit-above 1 -j REJECT

规则是用来限制每个IP对22号端口只能有一个连接,超过了则拒绝。

这时,如果进行超过一个的连接测试,如下:

telnet localhost 22 如果再打开另一个终端,同样执行 telnet localhost 22,这时系统立即崩溃,控制台出现类似以下的信息:

.....
Code: 43 17 89 d9 8d 04 ........
 <0>Kernel panic - not syncing: Fatal exception in interrupt

在google上搜索了一下,找到了答案,如下:

The problem is in ipt_connlimit.c(line 67):

  found = ip_conntrack_find_get(&conn->tuple,ct);
  if (0 == memcmp(&conn->tuple,&tuple,sizeof(tuple)) &&
    found != NULL && (found_ct = tuplehash_to_ctrack(found)) != NULL &&
    found_ct->proto.tcp.state != TCP_CONNTRACK_TIME_WAIT) {
      /* Just to be sure we have it only once in the list.
         We should'nt see tuples twice unless someone hooks this
         into a table without "-p tcp --syn" */
     addit = 0;
  }

The problem is that it is the usual case that "found" will not equal NULL,
but the memcmp will also not equal 0. This makes it so
tuplehash_to_ctrack(found) is never run so "found_ct" is always NULL.
Later in the function "found_ct" is dereferenced when it is NULL, which
causes the kernel panic. These operations need to be reordered so it is
guarantee that if "found" != NULL then tuplehash_to_ctrack will always be
run.

Basically it needs to be changed to:

  if (found != NULL && (found_ct = tuplehash_to_ctrack(found)) != NULL &&
    0 == memcmp(&conn->tuple,&tuple,sizeof(tuple)) &&
    found_ct->proto.tcp.state != TCP_CONNTRACK_TIME_WAIT) {

看来是逻辑顺序有问题,更改 /usr/src/linux/net/ipv4/netfilter/ipt_connlimit.c 中的:

  if (0 == memcmp(&conn->tuple,&tuple,sizeof(tuple)) &&
    found != NULL && (found_ct = tuplehash_to_ctrack(found)) != NULL &&
    found_ct->proto.tcp.state != TCP_CONNTRACK_TIME_WAIT) {

为:

  if (found != NULL && (found_ct = tuplehash_to_ctrack(found)) != NULL &&
    0 == memcmp(&conn->tuple,&tuple,sizeof(tuple)) &&
    found_ct->proto.tcp.state != TCP_CONNTRACK_TIME_WAIT) {

重新编译内核后便再也没有 panic 的问题了。