shell: add timeout logic and rework error handling

what's important:

* if task code ran, it exits with 0. this code is regardless of (error, result)
  * when it exited cleanly, we will get the values from the database

* if task timed out, the box code kills it and it has a flag tracking timedOut. we can
  ignore exit code in this case.

* if task code was stopped, box code will send SIGTERM which ideally it will handle and end with 70.

* if task code crashed and it caught the exception, it will return 50

* if task code crashed and node nuked us, it will exit with 1

* if task code was killed with some unhandleabe signal, taskworker.sh will return the signal (9=SIGKILL)
This commit is contained in:
Girish Ramakrishnan
2025-07-17 09:53:29 +02:00
parent 5e1c32b606
commit 7047ee9391
7 changed files with 96 additions and 78 deletions

View File

@@ -50,7 +50,9 @@ if ! systemd-run --unit "${service_name}" --wait --uid=${id} --gid=${id} \
echo "Service ${service_name} failed to run" # this only happens if the path to task worker itself is wrong
fi
# ExecMainCode=0 means killed by signal in ExecMainStatus. ExecMainCode=1 means exited cleanly with code in ExecMainStatus
exit_code=$(systemctl show "${service_name}" -p ExecMainCode | sed 's/ExecMainCode=//g')
exit_status=$(systemctl show "${service_name}" -p ExecMainStatus | sed 's/ExecMainStatus=//g')
echo "Service ${service_name} finished with exit code ${exit_code}"
exit "${exit_code}"
echo "Service ${service_name} finished with exit code ${exit_code} and status ${exit_status}"
exit "${exit_status}"