sth about oomkiller

经常会遇到oomkiller,但是oomkiller内部的机制是什么样子的? oomkiller选择的时机是什么样子的? oomkiller选择的pid是有什么背景以及机制? 也想做一个仔细的探讨。如下是一个线上的oomkiller的内核日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

[Thu Dec 7 00:29:05 2023] dnf cpuset=kubeNodeAgent.service mems_allowed=0-1
[Thu Dec 7 00:29:05 2023] CPU: 32 PID: 95251 Comm: dnf Kdump: loaded Tainted: GF W OE K 4.19.91-xxx.xxxx.x86_64 #1
[Thu Dec 7 00:29:05 2023] Hardware name: xxxxxx ECS/xxxx BIOS 3.3.44 01/08/2021
[Thu Dec 7 00:29:05 2023] Call Trace:
[Thu Dec 7 00:29:05 2023] dump_stack+0x66/0x8b
[Thu Dec 7 00:29:05 2023] dump_memcg_header+0x12/0x40
[Thu Dec 7 00:29:05 2023] oom_kill_process+0x201/0x2f0
[Thu Dec 7 00:29:05 2023] out_of_memory+0x12f/0x510
[Thu Dec 7 00:29:05 2023] mem_cgroup_out_of_memory+0xdd/0x100
[Thu Dec 7 00:29:05 2023] try_charge+0x847/0x870
[Thu Dec 7 00:29:05 2023] ? __ext4_journal_get_write_access+0x36/0x70 [ext4]
[Thu Dec 7 00:29:05 2023] mem_cgroup_charge+0xe2/0x220
[Thu Dec 7 00:29:05 2023] ? ext4_mark_iloc_dirty+0x5e/0x80 [ext4]
[Thu Dec 7 00:29:05 2023] __add_to_page_cache_locked+0x5f/0x220
[Thu Dec 7 00:29:05 2023] add_to_page_cache_lru+0x4a/0xc0
[Thu Dec 7 00:29:05 2023] pagecache_get_page+0xfc/0x310
[Thu Dec 7 00:29:05 2023] grab_cache_page_write_begin+0x1f/0x40
[Thu Dec 7 00:29:05 2023] ext4_da_write_begin+0xdc/0x490 [ext4]
[Thu Dec 7 00:29:05 2023] generic_perform_write+0xba/0x1b0
[Thu Dec 7 00:29:05 2023] ext4_buffered_write_iter+0x94/0x120 [ext4]
[Thu Dec 7 00:29:06 2023] ext4_file_write_iter+0x6c/0x6d0 [ext4]
[Thu Dec 7 00:29:06 2023] ? try_to_release_page+0x60/0x60
[Thu Dec 7 00:29:06 2023] new_sync_write+0xeb/0x150
[Thu Dec 7 00:29:06 2023] vfs_write+0xb0/0x190
[Thu Dec 7 00:29:06 2023] ksys_write+0x5a/0xd0
[Thu Dec 7 00:29:06 2023] ? get_vtime_delta+0x13/0xb0
[Thu Dec 7 00:29:06 2023] do_syscall_64+0x7b/0x200
[Thu Dec 7 00:29:06 2023] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Thu Dec 7 00:29:06 2023] RIP: 0033:0x7f5f528376fd
[Thu Dec 7 00:29:06 2023] Code: cd 20 00 00 75 10 b8 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 4e fd ff ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 97 fd ff ff 48 89 d0 48 83 c4 08 48 3d 01
[Thu Dec 7 00:29:06 2023] RSP: 002b:00007ffdfef524b0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[Thu Dec 7 00:29:06 2023] RAX: ffffffffffffffda RBX: 0000000000008000 RCX: 00007f5f528376fd
[Thu Dec 7 00:29:06 2023] RDX: 0000000000008000 RSI: 00007ffdfef52550 RDI: 000000000000001c
[Thu Dec 7 00:29:06 2023] RBP: 0000000000fbc600 R08: 00000000968d04df R09: 000000006570a150
[Thu Dec 7 00:29:06 2023] R10: 00007ffdfef524a0 R11: 0000000000000293 R12: 0000000000923be0
[Thu Dec 7 00:29:06 2023] R13: 00007ffdfef52550 R14: 00007f5f417b53a0 R15: 0000000000008000
[Thu Dec 7 00:29:06 2023] Task in /infra.slice/kubeNodeAgent.service killed as a result of limit of /infra.slice/kubeNodeAgent.service
[Thu Dec 7 00:29:06 2023] memory: usage 1048576kB, limit 1048576kB, failcnt 77296
[Thu Dec 7 00:29:06 2023] memory+swap: usage 1048576kB, limit 9007199254740988kB, failcnt 0
[Thu Dec 7 00:29:06 2023] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[Thu Dec 7 00:29:06 2023] Memory cgroup stats for /infra.slice/kubeNodeAgent.service: cache:1188KB rss:1052304KB rss_huge:0KB shmem:0KB mapped_file:396KB dirty:1584KB writeback:0KB swap:0KB workingset_refault_anon:0KB workingset_refault_file:396528KB workingset_activate_anon:0KB workingset_activate_file:15840KB workingset_restore_anon:0KB workingset_restore_file:4092KB workingset_nodereclaim:0KB inactive_anon:1051908KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[Thu Dec 7 00:29:06 2023] Tasks state (memory values in pages):


什么时机会触发OOMKiller ?

oomkiller怎么选择杀那些进程?