what's important:
* if task code ran, it exits with 0. this code is regardless of (error, result)
* when it exited cleanly, we will get the values from the database
* if task timed out, the box code kills it and it has a flag tracking timedOut. we can
ignore exit code in this case.
* if task code was stopped, box code will send SIGTERM which ideally it will handle and end with 70.
* if task code crashed and it caught the exception, it will return 50
* if task code crashed and node nuked us, it will exit with 1
* if task code was killed with some unhandleabe signal, taskworker.sh will return the signal (9=SIGKILL)
https://forum.cloudron.io/topic/13408/update-to-cloudron-8.3-error
We get a Task xx crashed with code null in the notification.
The crux of the issue is that we use KillMode=control-group. This ends
up sending SIGTERM signal to box code and all the sudo in parallel. The box
code then sees the sudo die and records the task as failed.
To fix, we switch to KillMode=mixed. This gives box code a chance to handle SIGTERM
first. It cleans out its task list and kills all the sudo.
the exception is when sudo calls backupupload.js which already has timestamped
output because it uses node
an alternative idea is to maybe not use this flag at all and always parse the output.
this is a bit complicated since we have to look for a timestamp in a stream.
shell.sudo logs output to stdout/stderr intentionally. It is not meant
for scripts that generate much output (basically scripts/* files).
core of the issue is that none of the log commands require to use sudo.
they can just use normal tail. only app logs requires sudo because of the
logPaths directive in the manifest.
sudo forks and execs the program. sudo also hangs around as the parent of the program waiting on the program and also forwarding signals.
sudo does not forward signals when the originator comes from the same process group. recently, there has been a change where it will
forward signals as long as sudo or the command is not the group leader (https://www.sudo.ws/repos/sudo/rev/d1bf60eac57f)
for us, this means that calling kill from this node process doesn't work since it's in the same group (and ubuntu 22 doesn't have the above fix).
the workaround is to invoke a kill from a different process group and this is done by starting detached
another idea is: use "ps --pid cp.pid -o pid=" to get the pid of the command and then send it signal directly
see also: https://dxuuu.xyz/sudo.html