自动worker

1.功能描述

在OpenNJet控制面通过动态加载模块so,能够通过配置指令,实现在cpu使用率触发配置阈值的情况下能够动态调整worker数量。NJet cpu使用率低的时候要降低worker数量,cpu使用率高的时候要增加worker数量

cpu使用率:OpenNJet所有worker进程的平均cpu使用率, 数据采集自/proc/{pid}/stat

系统cpu使用率:系统所有cpu核的综合使用率,数据采集自/proc/stat

2.依赖模块

3.指令说明

sysguard_cpu interval=1 low_threshold=11  high_threshold=20 worker_step=2 min_worker=3 max_worker=5  sys_high_threshold=120;
参数 类型 默认值 最小值 最大值 描述
interval int 1 (单位为min分钟) 1 - 检测时间间隔,每隔interval去检测一次
low_threshold int 10(表示10%) 10 - cpu使用率下限,低于该阈值时,自动减少worker_step个worker, 数量受min_worker限制
high_threshold int 70(表示10%) 10 - cpu使用率上限,高于该阈值时,自动增加worker_step个worker, 数量受max_worker限制
worker_step int 1 1 - 每次增加或者减少得worker数量,worker数量受min_worker和max_worker数量限制
min_worker int 1 1 - 最少worker数量限制
max_worker int ncpu,系统cpu核数 1 - 最多worker数量限制
sys_high_threshold int 80(表示80%) 10 - 系统整体cpu使用率,超过该值时,不增加worker数量
  • low_threshold <= high_threshold
  • min_worker <= max_worker
  • min_worker数量、max_worker数量与静态配置worker数量的关系?

​ 未触发负载调整worker的时候,实际运行worker数量为初始静态配置数量

  • 合理配置情况下,会符合正常逻辑,也就是cpu使用率高了,会增加worker数量,cpu使用率低了,会减少worker数量
  • 不合理配置的情况下(比如静态初始配置小于min_worker或者大于max_worker, 也可能通过api修改动态worker数),可能会出现不在min_worker\max_worker的范围内的情况,
    • ​ 比如当前worker数量已经小于min_worker的时候触发减少worker调整的情况,worker数保持不变;
    • ​ 比如当前worker数量已经大于max_worker的时候触发增加worker调整的情况,worker数也保持不变;

4.配置样例

njet_ctrl.conf:

load_module modules/njt_http_sendmsg_module.so;       #依赖该module
load_module modules/njt_ctrl_config_api_module.so;
load_module modules/njt_doc_module.so;
load_module modules/njt_sysguard_cpu_module.so;       #sysguard_cpu module


user nobody;
sysguard_cpu interval=1 low_threshold=11  high_threshold=20 worker_step=2 min_worker=3 max_worker=5  sys_high_threshold=120;

events {
    worker_connections  1024;
}
error_log         logs/error_ctrl.log info;

http {
    dyn_sendmsg_conf  conf/iot-ctrl.conf;
    access_log        logs/access_ctrl.log combined;

    include           mime.types;

    server {
        listen       8081;

        location /api {
             dyn_module_api;
        }

        location /doc {
            doc_api;
        }

        
  }

}


cluster_name helper;
node_name node1;

5.调用样例

Wrk 一直压测,增加cpu使用率

wrk -t 1 -c 100 -d 1000s -L http://192.168.40.90:8001/
# 此处日志是总cpu使用率100%
2023/09/14 16:45:21 [info] 14016#0:  total cpu usage:100  usr:18459855  nice:2938  sys:5048704 idle:416296648 work:23511497  prev_work:23510134 total:439808145  pre_total:439806782 work-:1363 total-:1363

#此处日志表示一共三个(14095 14096 14004)worker进程
2023/09/14 16:45:21 [info] 14016#0: get all pids:14095_14096_14004_
# 进程的cpu使用率
2023/09/14 16:45:21 [info] 14016#0:  get process:14095 cpu_usage:31 utime:364 stime:276 cutime:0 cstime:0 work:640 pre_work:211 diff_work:429 diff_total:1363
2023/09/14 16:45:21 [info] 14016#0:  get process:14096 cpu_usage:32 utime:379 stime:277 cutime:0 cstime:0 work:656 pre_work:218 diff_work:438 diff_total:1363
2023/09/14 16:45:21 [info] 14016#0:  get process:14004 cpu_usage:32 utime:1167 stime:753 cutime:0 cstime:0 work:1920 pre_work:1481 diff_work:439 diff_total:1363
2023/09/14 16:45:21 [info] 14016#0:  old pids:14095_14096_14004_ new pids:14095_14096_14004_

#平均使用率
2023/09/14 16:45:21 [info] 14016#0:  average cpu usage:31
#worker触发规则,调整worker数量, 从3个到5个
2023/09/14 16:45:21 [info] 14016#0:  adjust worker num from 3 to 5

#此处是变为5个worker后的日志
2023/09/14 16:45:36 [info] 14016#0:  total cpu usage:54  usr:18460323  nice:2938  sys:5049015 idle:416297294 work:23512276  prev_work:23511497 total:439809570  pre_total:439808145 work-:779 total-:1425
2023/09/14 16:45:36 [info] 14016#0: get all pids:14095_14096_14215_14004_14216_
2023/09/14 16:45:36 [info] 14016#0:  get process:14095 cpu_usage:11 utime:464 stime:343 cutime:0 cstime:0 work:807 pre_work:640 diff_work:167 diff_total:1425
2023/09/14 16:45:36 [info] 14016#0:  get process:14096 cpu_usage:12 utime:485 stime:344 cutime:0 cstime:0 work:829 pre_work:656 diff_work:173 diff_total:1425
2023/09/14 16:45:36 [info] 14016#0:  get process:14215 cpu_usage:7 utime:56 stime:55 cutime:0 cstime:0 work:111 pre_work:0 diff_work:111 diff_total:1425
2023/09/14 16:45:36 [info] 14016#0:  get process:14004 cpu_usage:12 utime:1269 stime:822 cutime:0 cstime:0 work:2091 pre_work:1920 diff_work:171 diff_total:1425
2023/09/14 16:45:36 [info] 14016#0:  get process:14216 cpu_usage:8 utime:60 stime:54 cutime:0 cstime:0 work:114 pre_work:0 diff_work:114 diff_total:1425
2023/09/14 16:45:36 [info] 14016#0:  old pids:14095_14096_14004_ new pids:14095_14096_14215_14004_14216_


#此处随着cpu使用率下降,又重新调整worker个数从5个到3个
2023/09/14 16:45:36 [info] 14016#0:  average cpu usage:10
2023/09/14 16:45:36 [info] 14016#0:  adjust worker num from 5 to 3
2023/09/14 16:45:51 [info] 14016#0:  total cpu usage:1  usr:18460332  nice:2938  sys:5049031 idle:416298764 work:23512301  prev_work:23512276 total:439811065  pre_total:439809570 work-:25 total-:1495
2023/09/14 16:45:51 [info] 14016#0: get all pids:14095_14096_14215_
2023/09/14 16:45:51 [info] 14016#0:  get process:14095 cpu_usage:0 utime:465 stime:344 cutime:0 cstime:0 work:809 pre_work:807 diff_work:2 diff_total:1495
2023/09/14 16:45:51 [info] 14016#0:  get process:14096 cpu_usage:0 utime:485 stime:345 cutime:0 cstime:0 work:830 pre_work:829 diff_work:1 diff_total:1495
2023/09/14 16:45:51 [info] 14016#0:  get process:14215 cpu_usage:0 utime:57 stime:56 cutime:0 cstime:0 work:113 pre_work:111 diff_work:2 diff_total:1495
2023/09/14 16:45:51 [info] 14016#0:  old pids:14095_14096_14215_14004_14216_ new pids:14095_14096_14215_
2023/09/14 16:45:51 [info] 14016#0:  old pid:14004 need remove
2023/09/14 16:45:51 [info] 14016#0:  old pid:14004 remove success
2023/09/14 16:45:51 [info] 14016#0:  old pid:14216 need remove
2023/09/14 16:45:51 [info] 14016#0:  old pid:14216 remove success
2023/09/14 16:45:51 [info] 14016#0:  average cpu usage:0