OpenBSD 内核 PPPoE 补丁
前段时间用新装的OpenBSD(3.7 Stable)做家里的NAT,使用Kernel Mode 的 PPPoE连接到电信的ADSL,结果系统在大流量的时候内核会crash,经过几天的调试,终于找到了Bug 所在,现在问题已经解决,并且工作非常良好。
这是crash时的信息:
uvm_fault(0xd0291580, 0xdee80000, 0, 1) -> e
kernel: page fault trap, code = 0
stopped at bcopy+0x1a: repe movsl (%esi), %es:(%edi)
ddb>
经过长时间的调试找到问题是出在/sys/net/if_spppsubr.c中,相关的代码如下:
sppp_input(struct ifnet *ifp,struct mbuf *m) 中的
struct ppp_header *h, ht;
if (sp->pp_flags & PP_NOFRAMING) {
memcpy(&ht.protocol, mtod(m, void *), 2);
m_adj(m, 2);
ht.control = PPP_UI;
ht.address = PPP_ALLSTATIONS;
h = &ht;
} else {
/* Get PPP header. */
h = mtod (m, struct ppp_header*);
m_adj (m, PPP_HEADER_LEN);
}
.....
switch (ntohs (h->protocol)) {
default:
if (sp->state[IDX_LCP] == STATE_OPENED)
sppp_cp_send (sp, PPP_LCP, PROTO_REJ,
++sp->pp_seq, m->m_pkthdr.len + 2,
&h->protocol);
在 sppp_cp_send 函数中有一个 bcopy 调用:
if (len)
bcopy (data, lh+1, len);
当 (sp->pp_flags & PP_NOFRAMING) 为真时, *h 将指向 ht (即结构
ppp_header), 接下来当调用 sppp_cp_send(sp, PPP_LCP, PROTO_REJ,
++sp->pp_seq, m->m_pkthdr.len + 2, &h->protocol) 时会产生一个错误的bcopy调用,即:
bcopy(&h->protocol, lh+1, m->m_pkthdr.len + 2)
此时 &h->protocol 的有效内容只有 2 个字节(因为struct ppp_header只分配4个字节),而m->m_pkthdr.len + 2 将永远 >= 2,所以此时将会出现非法的内存访问,因而就可能造成kernel crash。
修复这个bug只要将m->m_pkthdr.len + 2,改为 2 便可,下面是这一简单的修复方法:
将:
if (sp->state[IDX_LCP] == STATE_OPENED)
sppp_cp_send (sp, PPP_LCP, PROTO_REJ,
++sp->pp_seq, m->m_pkthdr.len + 2,
&h->protocol);
改为:
if (sp->state[IDX_LCP] == STATE_OPENED)
{
if (sp->pp_flags & PP_NOFRAMING)
sppp_cp_send (sp, PPP_LCP, PROTO_REJ,
++sp->pp_seq, 2,
&h->protocol);
else
sppp_cp_send (sp, PPP_LCP, PROTO_REJ,
++sp->pp_seq, m->m_pkthdr.len + 2,
&h->protocol);
}
另外,我看了一下netbsd的内核级PPPoE实现,好像也存在同样的问题,代码同样在 /sys/net/if_spppsubr.c (Revision: 1.85):
u_int16_t prot = htons(protocol);
sppp_cp_send(sp, PPP_LCP, PROTO_REJ,
++sp->pp_seq[IDX_LCP], m->m_pkthdr.len + 2,
&prot);
不过由于我这边没有测试环境,所以无法证实这个问题,希望有测试环境的朋友帮忙测试一下。
我将这个问题提交给了openbsd的开发者,Can Erkin Acar 提供了一个src patch,我在机器上经过长时间的测试,在使用这个patch后没有再出现crash的情况,Can Erkin Acar已经将其提交到openbsd-current中(if_spppsubr.c revision 1.36) ^_^,后附Can Erkin Acar提供的Patch,希望对你有用:
Index: if_spppsubr.c
===================================================================
RCS file: /cvs/src/sys/net/if_spppsubr.c,v
retrieving revision 1.35
diff -u -p -u -p -r1.35 if_spppsubr.c
--- if_spppsubr.c 3 Aug 2005 21:50:21 -0000 1.35
+++ if_spppsubr.c 9 Aug 2005 06:09:20 -0000
@@ -458,6 +458,7 @@ sppp_input(struct ifnet *ifp, struct mbu
struct ifqueue *inq = 0;
struct sppp *sp = (struct sppp *)ifp;
struct timeval tv;
+ void *prej;
int debug = ifp->if_flags & IFF_DEBUG;
int s;
@@ -483,7 +484,8 @@ sppp_input(struct ifnet *ifp, struct mbu
}
if (sp->pp_flags & PP_NOFRAMING) {
- memcpy(&ht.protocol, mtod(m, void *), 2);
+ prej = mtod(m, void *);
+ memcpy(&ht.protocol, prej, sizeof(ht.protocol));
m_adj(m, 2);
ht.control = PPP_UI;
ht.address = PPP_ALLSTATIONS;
@@ -491,6 +493,7 @@ sppp_input(struct ifnet *ifp, struct mbu
} else {
/* Get PPP header. */
h = mtod (m, struct ppp_header*);
+ prej = &h->protocol;
m_adj (m, PPP_HEADER_LEN);
}
@@ -511,8 +514,7 @@ sppp_input(struct ifnet *ifp, struct mbu
default:
if (sp->state[IDX_LCP] == STATE_OPENED)
sppp_cp_send (sp, PPP_LCP, PROTO_REJ,
- ++sp->pp_seq, m->m_pkthdr.len + 2,
- &h->protocol);
+ ++sp->pp_seq, m->m_pkthdr.len + 2, prej);
相关链接:
if_spppsubr.c Revision 1.36
http://www.openbsd.org/cgi-bin/cvsweb/src/sys/net/if_spppsubr.c?rev=1.36&content-type=text/x-cvsweb-markup
Bug Report PR# 4305:
From mj2@openbsd.org Sun Jul 24 00:13:50 2005
Received: from shear.ucar.edu (shear.ucar.edu [192.43.244.163])
by cvs.openbsd.org (8.13.4/8.12.1) with ESMTP id j6O6Dn2u019400
(version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=FAIL)
for <gnats@cvs.openbsd.org>; Sun, 24 Jul 2005 00:13:49 -0600 (MDT)
Received: from shear.ucar.edu (localhost.ucar.edu [127.0.0.1])
by shear.ucar.edu (8.13.4/8.13.3) with ESMTP id j6O6FIup023908
(version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO)
for <gnats@cvs.openbsd.org>; Sun, 24 Jul 2005 00:15:18 -0600 (MDT)
Received: (from mj2@localhost)
by shear.ucar.edu (8.13.4/8.13.3/Submit) id j6O6FIR1008133
for gnats@cvs.openbsd.org; Sun, 24 Jul 2005 00:15:18 -0600 (MDT)
Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.199])
by shear.ucar.edu (8.13.4/8.13.3) with ESMTP id j6O6ErSu014880
for <gnats@openbsd.org>; Sun, 24 Jul 2005 00:14:53 -0600 (MDT)
Received: by wproxy.gmail.com with SMTP id i7so791830wra
for <gnats@openbsd.org>; Sat, 23 Jul 2005 23:14:52 -0700 (PDT)
Received: by 10.54.57.21 with SMTP id f21mr442402wra; Sat, 23 Jul 2005 23:14:52 -0700 (PDT)
Received: by 10.54.45.30 with HTTP; Sat, 23 Jul 2005 23:14:52 -0700 (PDT)
Message-Id: <35cc614e0507232314354703e5@mail.gmail.com>
Date: Sat, 23 Jul 2005 23:14:52 -0700
From: xiangbo <xiangbo3@gmail.com>
Reply-To: xiangbo <xiangbo3@gmail.com>
To: bugs@openbsd.org
Cc: gnats@openbsd.org
Subject: kernel get crash when using the kernel mode PPPoE
>Number: 4305
>Category: kernel
>Synopsis: kernel get crash when using the kernel mode PPPoE
>Confidential: yes
>Severity: serious
>Priority: medium
>Responsible: bugs
>State: closed
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Jul 24 06:20:02 GMT 2005
>Closed-Date: Fri Aug 12 15:41:18 MDT 2005
>Last-Modified: Fri Aug 12 15:41:18 MDT 2005
>Originator: matthew
>Release: openbsd-3.7-stable 20050717
>Organization:
CNFUG.org
net
>Environment:
System : OpenBSD 3.7
Architecture: OpenBSD.i386
Machine : i386
>Description:
Using the kernel mode PPPoE + PF provide NAT service for clients, kernel get
crashed within 20 minutes.
>How-To-Repeat:
Compile a kernel without INET6 support and start up a PPPoE connection and PF
NAT service for some clients, then running ftp downloads on these clients.
If the kernel doesn't get crash, try to run some tasks on the OpenBSD box (for
example: build a kernel or nmap some box).
>Fix:
crash info and trace:
=====================
uvm_fault(0xd0291580, 0xdee80000, 0, 1) -> e
kernel: page fault trap, code = 0
stopped at bcopy+0x1a: repe movsl (%esi), %es:(%edi)
ddb> trace
bcopy(d06bc00,c021,8,1d,5d6,dee89f42,d069d84c,d4a57800) at bcopy+0x1a
sppp_input(d06bc00,d4a57800,1,d069d800,dee89f98) at sppp_input+0x308
pppoeintr(d4a57800,d0681380,44,3c01840a) at pppoeintr+0x841
pppoeintr(27,27,27,27,3c01840a) at pppoeintr+0xd9
Bad frame pointer: 0xdee89fa0
ddb> print $esi
dee89ffe
ddb> print $edi
d4a57de6
ddb> print $ecx
6
ddb> x/ex $esp,20
0xdeeb6ecc: 2 d4920700 d018fd64 deeb6f42 d492072a
.........
ddb> x/i 0xd018fd64
sppp_cp_send + 0x264: addl $0x10,%esp
Finally I found the function that make crash is sppp_input(struct ifnet *ifp,
struct mbuf *m) in /sys/net/if_spppsubr.c
struct ppp_header *h, ht;
if (sp->pp_flags & PP_NOFRAMING) {
memcpy(&ht.protocol, mtod(m, void *), 2);
m_adj(m, 2);
ht.control = PPP_UI;
ht.address = PPP_ALLSTATIONS;
h = &ht;
} else {
/* Get PPP header. */
h = mtod (m, struct ppp_header*);
m_adj (m, PPP_HEADER_LEN);
}
.....
switch (ntohs (h->protocol)) {
default:
if (sp->state[IDX_LCP] == STATE_OPENED)
sppp_cp_send (sp, PPP_LCP, PROTO_REJ,
++sp->pp_seq, m->m_pkthdr.len + 2,
&h->protocol);
and the bcopy call in function sppp_cp_send(struct sppp *sp, u_short proto,
u_char type, u_char ident, u_short len, void *data):
if (len)
bcopy (data, lh+1, len);
when the (sp->pp_flags & PP_NOFRAMING) is true, *h will point to ht (struct
ppp_header), and following call sppp_cp_send(sp, PPP_LCP, PROTO_REJ,
++sp->pp_seq, m->m_pkthdr.len + 2, &h->protocol) will make a error bcopy call:
bcopy(&h->protocol, lh+1, m->m_pkthdr.len + 2) /* the contents of h->protocol
is 2 byte, (m->m_pkthdr.len + 2) will always >= 2, so there maybe have a
invalid memory access */
BTW: it seems like netbsd's /sys/net/if_spppsubr.c also have this problem,
netbsd's code is (Revision: 1.85):
u_int16_t prot = htons(protocol);
sppp_cp_send(sp, PPP_LCP, PROTO_REJ,
++sp->pp_seq[IDX_LCP], m->m_pkthdr.len + 2,
&prot);
but I don't have enviroment to proof that.
Follow is a simple diff for /sys/net/if_spppsubr.c
-------------------- CUT HERE ------------------------------------------
--- if_spppsubr.c Thu Jul 21 07:51:23 2005
+++ if_spppsubr.c.origin Thu Jul 21 07:49:43 2005
@@ -506,17 +506,10 @@
}
switch (ntohs (h->protocol)) {
default:
- if (sp->state[IDX_LCP] == STATE_OPENED) {
- if (sp->pp_flags & PP_NOFRAMING)
- sppp_cp_send (sp, PPP_LCP, PROTO_REJ,
- ++sp->pp_seq, 2,
- &h->protocol);
- else
- sppp_cp_send (sp, PPP_LCP, PROTO_REJ,
- ++sp->pp_seq, m->m_pkthdr.len +
- &h->protocol);
- }
-
+ if (sp->state[IDX_LCP] == STATE_OPENED)
+ sppp_cp_send (sp, PPP_LCP, PROTO_REJ,
+ ++sp->pp_seq, m->m_pkthdr.len + 2,
+ &h->protocol);
if (debug)
log(LOG_DEBUG,
SPP_FMT "invalid input protocol "
-------------------- CUT HERE ------------------------------------------
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->closed
State-Changed-By: canacar
State-Changed-When: Fri Aug 12 15:39:55 MDT 2005
State-Changed-Why:
Fixed in -current, fix tested by the submitter, thanks.
>Unformatted:
SENDBUG: -*- sendbug -*-
SENDBUG: Lines starting with `SENDBUG' will be removed automatically, as
SENDBUG: will all comments (text enclosed in `<' and `>').
SENDBUG:
SENDBUG: Choose from the following categories:
SENDBUG:
SENDBUG: system user library documentation ports kernel alpha amd64 arm i386
m68k m88k mips ppc sgi sparc sparc64 vax
SENDBUG:
SENDBUG:
To: gnats@openbsd.org
Subject: kernel get crash when using the kernel mode PPPoE
From: matthew@cnfug.org
Cc:
Reply-To: matthew@cnfug.org
X-sendbug-version: 3.97