we export using the old addon containers, which has a bug that it crashes
when db is missing. so, we have to skip them already. the crash then causes
future exports to also fail because it is restarting.
this happens because we have some bug in sftp container causing uninstall(s) to
fail. the database of those apps are gone but the export logic then tries to export
them and it all fails.
s3 api never return NotFound or ENOENT - https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html
Sadly, DO/OVH etc just return NotFound instead of NoSuchKey. And we cannot
distinguish easily if we are talking to some s3 server or some random server.
This is applicable for things like say minio where maybe there is something
apache now just giving out 404 / NotFound.
the start.sh script does a "systemctl restart systemd-resolved". this
ends up restarting the box code prematurely! and then later when mysql
restarts, the box code loses connection and bad things happen (tm)
especially during a platform update.
we don't log to journald anymore, so not sure if EPIPE is still an issue
the platform.start can take forever. this means that we start the
clock to include platform.start and this sends a lot of spurious
up/down notifications.
also, bump the down threshold to 20 mins.
not 100% sure because of missing log timestamps, but mysql restarts after the box
has started up. As seen from logs below, we try to mark the apps for restart on
platform update. But this failed because mysql was restarting at that time.
This ended up with e2e test failing.
box:apps restartAppsUsingAddons: marking nc4801.autoupdatetest.domain.io for restart
box:apps restartAppsUsingAddons: error marking nc4801.autoupdatetest.domain.io for restart: {"name":"BoxError","reason":"Database Error","details":{"fatal":true,"code":"PROTOCOL_CONNECTION_LOST"},"message":"Connection lost: The server closed the connection.","nestedError":{"fatal":true,"code":"PROTOCOL_CONNECTION_LOST"}}
box:apps restartAppsUsingAddons: marking wekan1398.autoupdatetest.domain.io for restart
box:database Connection 51 error: Connection lost: The server closed the connection. PROTOCOL_CONNECTION_LOST
box:database Connection 52 error: Connection lost: The server closed the connection. PROTOCOL_CONNECTION_LOST
Box GET /api/v1/cloudron/status 500 Internal Server Error connect ECONNREFUSED 127.0.0.1:3306 41.251 ms - 217
during update, we stop the box code which ends up trying to stop all tasks.
this gives warning like below:
box:shell stopTask (stdout): shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
box:shell stopTask (stdout): job-working-directory: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
box:shell stopTask (stdout): box-task-8.service loaded active running /home/yellowtent/box/src/scripts/../taskworker.js 8 /home/yellowtent/platformdata/logs/tasks/8.log
box:shell stopTask (stdout): job-working-directory: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
box:shell stopTask (stdout): job-working-directory: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
box:shell stopTask (stdout): job-working-directory: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory