tasks: fix update failed notification

https://forum.cloudron.io/topic/13408/update-to-cloudron-8.3-error

We get a Task xx crashed with code null in the notification.

The crux of the issue is that we use KillMode=control-group. This ends
up sending SIGTERM signal to box code and all the sudo in parallel. The box
code then sees the sudo die and records the task as failed.

To fix, we switch to KillMode=mixed. This gives box code a chance to handle SIGTERM
first. It cleans out its task list and kills all the sudo.
This commit is contained in:
Girish Ramakrishnan
2025-06-17 22:30:34 +02:00
parent ca25c6075b
commit fb39aa32bb
7 changed files with 35 additions and 34 deletions
+9 -6
View File
@@ -35,13 +35,16 @@ systemctl reset-failed "${service_name}" 2>/dev/null || true
options="-p TimeoutStopSec=10s -p MemoryMax=${memory_limit_mb}M -p OOMScoreAdjust=${oom_score_adjust} --wait"
# systemd-run is used to create resource limited tasks. the tasks are in separate cgroup and won't get affected by box start/stop
# 1. tasks should stop when box code is stopped. in this state, dashboard us unreachable and don't want things in background.
# 2. if tasks continue running, box code needs some reconcilation code to track tasks. systemd has no mechanism to handle both stop and restart.
# when using BindsTo=box.service , the tasks restart with systemctl restart. This defeats any point of tasks running in background if they start afresh.
# this design is because tasks should not run when box code is stopped. if dashboard is down, nothing should run -in background.
# besides, if tasks continue running, box code needs complex reconcilation code on start up.
# To achieve above design, we could use BindsTo=box.service. While this stops all tasks on systemctl stop, it restarts tasks on systemctl restart.
# Another approach was to use --scope but this is incompatible with --wait (and then we have to start polling status to get exit code)
# it seems systemd-run does not return the exit status of the process despite --wait
if ! systemd-run --unit "${service_name}" --nice "${nice}" --uid=${id} --gid=${id} ${options} --setenv HOME=${HOME} --setenv USER=${SUDO_USER} \
--setenv DEBUG=box:* --setenv BOX_ENV=${BOX_ENV} --setenv NODE_ENV=production "${task_worker}" "${task_id}" "${logfile}"; then
# it seems systemd-run does not return the exit status of the process despite --wait but atleast it waits
if ! systemd-run --unit "${service_name}" --wait --uid=${id} --gid=${id} \
-p TimeoutStopSec=2s -p MemoryMax=${memory_limit_mb}M -p OOMScoreAdjust=${oom_score_adjust} --nice "${nice}" \
--setenv HOME=${HOME} --setenv USER=${SUDO_USER} --setenv DEBUG=box:* --setenv BOX_ENV=${BOX_ENV} --setenv NODE_ENV=production \
"${task_worker}" "${task_id}" "${logfile}"; then
echo "Service ${service_name} failed to run" # this only happens if the path to task worker itself is wrong
fi
+10 -15
View File
@@ -30,21 +30,16 @@ readonly installer_path="${source_dir}/scripts/installer.sh"
log "updating Cloudron with ${source_dir}"
# StandardError will follow StandardOutput in default inherit mode. https://www.freedesktop.org/software/systemd/man/systemd.exec.html
systemctl reset-failed "${updater_service}" 2>/dev/null || true
if ! systemd-run --property=OOMScoreAdjust=-1000 --unit "${updater_service}" -p StandardOutput=append:${log_file} ${installer_path}; then
log "Failed to install cloudron"
exit 1
# this script is invoked as a task. installer.sh will systemctl stop at some point and that will stop all the tasks
# installer.sh is thus run as a separate unit name so that it doesn't get killed during the stoptask.sh
# StandardError will follow StandardOutput in default inherit mode. https://www.freedesktop.org/software/systemd/man/systemd.exec.html
if ! systemd-run --property=OOMScoreAdjust=-1000 --unit "${updater_service}" --wait -p StandardOutput=append:${log_file} ${installer_path}; then
echo "${updater_service} failed to run" # this only happens if the path to installer is wrong
fi
while true; do
if systemctl is-failed "${updater_service}" >/dev/null 2>&1; then
log "${updater_service} failed"
exit 1
fi
log "${updater_service} is still active. will check in 5 seconds"
sleep 5
# this loop will stop once the update process stopped the box unit and thus terminating this child process
done
# if the install script succeeded, the following code is never run since this script will get killed
exit_code=$(systemctl show "${service_name}" -p ExecMainCode | sed 's/ExecMainCode=//g')
echo "${updater_service} finished with exit code ${exit_code}"
exit "${exit_code}"