tasks: fix update failed notification
https://forum.cloudron.io/topic/13408/update-to-cloudron-8.3-error We get a Task xx crashed with code null in the notification. The crux of the issue is that we use KillMode=control-group. This ends up sending SIGTERM signal to box code and all the sudo in parallel. The box code then sees the sudo die and records the task as failed. To fix, we switch to KillMode=mixed. This gives box code a chance to handle SIGTERM first. It cleans out its task list and kills all the sudo.
This commit is contained in:
@@ -35,13 +35,16 @@ systemctl reset-failed "${service_name}" 2>/dev/null || true
|
||||
options="-p TimeoutStopSec=10s -p MemoryMax=${memory_limit_mb}M -p OOMScoreAdjust=${oom_score_adjust} --wait"
|
||||
|
||||
# systemd-run is used to create resource limited tasks. the tasks are in separate cgroup and won't get affected by box start/stop
|
||||
# 1. tasks should stop when box code is stopped. in this state, dashboard us unreachable and don't want things in background.
|
||||
# 2. if tasks continue running, box code needs some reconcilation code to track tasks. systemd has no mechanism to handle both stop and restart.
|
||||
# when using BindsTo=box.service , the tasks restart with systemctl restart. This defeats any point of tasks running in background if they start afresh.
|
||||
# this design is because tasks should not run when box code is stopped. if dashboard is down, nothing should run -in background.
|
||||
# besides, if tasks continue running, box code needs complex reconcilation code on start up.
|
||||
# To achieve above design, we could use BindsTo=box.service. While this stops all tasks on systemctl stop, it restarts tasks on systemctl restart.
|
||||
# Another approach was to use --scope but this is incompatible with --wait (and then we have to start polling status to get exit code)
|
||||
|
||||
# it seems systemd-run does not return the exit status of the process despite --wait
|
||||
if ! systemd-run --unit "${service_name}" --nice "${nice}" --uid=${id} --gid=${id} ${options} --setenv HOME=${HOME} --setenv USER=${SUDO_USER} \
|
||||
--setenv DEBUG=box:* --setenv BOX_ENV=${BOX_ENV} --setenv NODE_ENV=production "${task_worker}" "${task_id}" "${logfile}"; then
|
||||
# it seems systemd-run does not return the exit status of the process despite --wait but atleast it waits
|
||||
if ! systemd-run --unit "${service_name}" --wait --uid=${id} --gid=${id} \
|
||||
-p TimeoutStopSec=2s -p MemoryMax=${memory_limit_mb}M -p OOMScoreAdjust=${oom_score_adjust} --nice "${nice}" \
|
||||
--setenv HOME=${HOME} --setenv USER=${SUDO_USER} --setenv DEBUG=box:* --setenv BOX_ENV=${BOX_ENV} --setenv NODE_ENV=production \
|
||||
"${task_worker}" "${task_id}" "${logfile}"; then
|
||||
echo "Service ${service_name} failed to run" # this only happens if the path to task worker itself is wrong
|
||||
fi
|
||||
|
||||
|
||||
Reference in New Issue
Block a user