Commit Graph

280 Commits

Author SHA1 Message Date
Girish Ramakrishnan
7047ee9391 shell: add timeout logic and rework error handling
what's important:

* if task code ran, it exits with 0. this code is regardless of (error, result)
  * when it exited cleanly, we will get the values from the database

* if task timed out, the box code kills it and it has a flag tracking timedOut. we can
  ignore exit code in this case.

* if task code was stopped, box code will send SIGTERM which ideally it will handle and end with 70.

* if task code crashed and it caught the exception, it will return 50

* if task code crashed and node nuked us, it will exit with 1

* if task code was killed with some unhandleabe signal, taskworker.sh will return the signal (9=SIGKILL)
2025-07-17 12:44:24 +02:00
Girish Ramakrishnan
6364ea8e57 grammar 2025-07-17 08:30:51 +02:00
Girish Ramakrishnan
e03beba9bc sudo: add kill-child.sh
ultimately, a non-previlieged child cannot kill previlieged parent.
all the notes and research in shell.js are not useful.
2025-07-16 21:26:57 +02:00
Girish Ramakrishnan
7214ce2ede support: remove ssh manipulation routes
this is now moved entirely to cloudron-support --enable-remote-access.

this emphasizes more that users have to get ssh access to the server before
we can do anything about it. it's far too simple for people to click this
button.

we have now also added clear terms to understand what remote access entails.
(what happens if support personnel makes a mistake. who is liable? etc)
2025-07-16 17:53:19 +02:00
Girish Ramakrishnan
fb39aa32bb tasks: fix update failed notification
https://forum.cloudron.io/topic/13408/update-to-cloudron-8.3-error

We get a Task xx crashed with code null in the notification.

The crux of the issue is that we use KillMode=control-group. This ends
up sending SIGTERM signal to box code and all the sudo in parallel. The box
code then sees the sudo die and records the task as failed.

To fix, we switch to KillMode=mixed. This gives box code a chance to handle SIGTERM
first. It cleans out its task list and kills all the sudo.
2025-06-17 23:47:04 +02:00
Girish Ramakrishnan
72d29030d9 tasks: remove redundant BindsTo= 2025-06-16 18:19:30 +02:00
Girish Ramakrishnan
5cf98922fb tasks: add note on why we use systemd-run 2025-06-16 18:14:21 +02:00
Girish Ramakrishnan
6a2f2b4efe task: do not use --pipe
--pipe make the spawned task inherit the systemd-run's stdio. if/when
system-run is killed, it might take out the spawned task as well with EPIPE.
2025-06-16 14:20:07 +02:00
Girish Ramakrishnan
69cb8c5a0a task: can use env -S in ubuntu 20 and above 2025-06-16 14:18:49 +02:00
Girish Ramakrishnan
0ebda97b03 remove various 18.04 specific code 2025-06-15 13:58:28 +02:00
Girish Ramakrishnan
39cbfb84ae refactor: move moveDataDir into services 2025-06-14 21:18:56 +02:00
Girish Ramakrishnan
e7b11a7ceb typo 2025-02-13 10:31:20 +01:00
Girish Ramakrishnan
a138425298 storage: start migration of s3 api 2025-02-12 23:04:37 +01:00
Girish Ramakrishnan
d0fbf68d90 typo 2025-01-20 13:56:26 +01:00
Girish Ramakrishnan
e34e479c33 services: separate volume clear and rm 2025-01-12 18:08:53 +01:00
Girish Ramakrishnan
19c744b17d unbound-anchor is now part of ExecStartPre
it seems unbound-anchor is not a dep of unbound in ubuntu 24. some
installations are thus missing this package.

in any case, ignore unbound-anchor exit status
2024-09-20 10:00:01 +02:00
Girish Ramakrishnan
ea72cef7f9 storage: remove getProviderStatus 2024-09-09 17:36:51 +02:00
Girish Ramakrishnan
7391af6f08 tail does not support doubledash it seems 2024-08-10 11:13:07 +02:00
Girish Ramakrishnan
d5ea99603f backups: give is a low oomScoreAdjust to not get killed 2024-07-19 13:05:09 +02:00
Girish Ramakrishnan
d0dc104ede logs: make logPaths work
we have to tail via sudo script

Fixes #811
2024-02-23 17:46:22 +01:00
Girish Ramakrishnan
450dd70ea2 backups: up min memory limit to 1GB 2024-02-19 17:02:14 +01:00
Girish Ramakrishnan
f7a53e1b15 also flush the ipv6 blocklist 2023-12-06 22:20:25 +01:00
Girish Ramakrishnan
943325baa3 better sudoers configuration check 2023-12-03 17:50:50 +01:00
Girish Ramakrishnan
64381e2a04 backups: remove validation mount point after testing it
this also moves out the attempt validation logic from mounts code
into volumes. mounts.tryAddMount is also used in backup code
2023-09-29 08:01:58 +05:30
Girish Ramakrishnan
eee49a8291 move dashboard setting into dashboard.js 2023-08-11 21:04:10 +05:30
Girish Ramakrishnan
ec23c7d2b8 Suppress aws sdk warning
https://github.com/aws/aws-sdk-js/issues/4354#issuecomment-1664694545
2023-08-04 09:21:48 +05:30
Girish Ramakrishnan
f83295372b updater: combine installer logs into the task file 2023-05-15 19:09:40 +02:00
Girish Ramakrishnan
e6506d9458 updater: use log 2023-05-15 19:05:39 +02:00
Girish Ramakrishnan
ff539e2669 remove crashnotifier
it's not really used
2023-05-15 11:08:00 +02:00
Girish Ramakrishnan
c4f4f3e914 logs: use %o to format error
otherwise, they are printed as multi-line and this messes up tail+date formatting
2023-04-16 10:49:59 +02:00
Girish Ramakrishnan
91a1cbac3e logs: files can be missing 2023-03-27 18:53:47 +02:00
Girish Ramakrishnan
b63d6c87ce logs: order existing logs by date 2023-03-27 11:56:51 +02:00
Girish Ramakrishnan
603f92251e refactor tail invokation into logtail.sh 2023-03-27 11:39:34 +02:00
Girish Ramakrishnan
e856681b3a typo 2023-02-01 21:52:15 +01:00
Girish Ramakrishnan
c07c8b5bb8 ubuntu 18: systemd kill ends up killing the script itself
This is because KillMode=control-group by default
2023-02-01 18:50:45 +01:00
Girish Ramakrishnan
7bbc7c2306 ubuntu 18: ExecReload does not work 2023-02-01 17:28:05 +01:00
Johannes Zellner
10e07fa300 Add disk speeds to disk usage data 2023-01-27 21:05:25 +01:00
Girish Ramakrishnan
817e950d47 Fix upstreamUri verification 2022-11-23 12:58:17 +01:00
Girish Ramakrishnan
e3642f4278 reverse proxy: rebuild configs on provider change 2022-11-16 12:42:06 +01:00
Girish Ramakrishnan
9c8f78a059 reverseproxy: simplify certificate renewal
An issue was that mail container was not getting refreshed with the up to
date certs. The root cause is that it is refreshed only in the renewCerts()
cron job. If cert renewal was caused by an app task, then the cron job will
skip the restart (since cert is fresh).

The other issue is that we keep hitting 0 length certs when we run out of disk
space. The root cause is that when out of disk space, a cert renewal will
cause cert to be written but since it has no space it is 0 length. Then, when
the user tries to restart the server, the box code does not write the cert again.

This change fixes the above two including:
* To simplify, we use the fallback cert only if we failed to get a LE cert. Expired LE certs
  will continue to be used. nginx is fine with this.

* restart directory as well on renewal
2022-11-13 11:55:12 +01:00
Girish Ramakrishnan
aae52ec795 backups: remove periodic dumping of heap info
this has not been as useful as I expected
2022-11-05 08:32:38 +01:00
Girish Ramakrishnan
8bc3b832e7 detect oom in tasks correctly 2022-11-02 22:39:25 +01:00
Girish Ramakrishnan
80a3ca0f46 remove 16.04 related task logic 2022-11-02 21:22:42 +01:00
Girish Ramakrishnan
0f0a98f7ac Add TimeoutStopSec=10s for systemctl kill to work faster 2022-11-02 18:46:20 +01:00
Girish Ramakrishnan
4fe0402735 box data is separate from mail data already 2022-10-12 11:59:28 +02:00
Girish Ramakrishnan
656f3fcc13 add system.du 2022-10-11 23:06:54 +02:00
Girish Ramakrishnan
3caffdb4e1 Rework app stats
Previously, the du plugin was collecting data every 20 seconds but
carbon was configured to only keep data every 12 hours causing much
confusion.

In the process of reworking this, it was determined:

* No need to collect disk usage info over time. Not sure how that is useful
* Instead, collect CPU/Network/Block info over time. We get this now from docker stats
* We also collect info about the services (addon containers)
* No need to reconfigure collectd for each app change anymore since there is no per
app collectd configuration anymore.
2022-10-10 21:13:26 +02:00
Girish Ramakrishnan
9f788c2c57 backup: reduce memory logs 2022-10-01 20:16:08 +02:00
Girish Ramakrishnan
a32166bc9d data dir: allow sameness of old and new dir
this makes it easy to migrate to a new volume setup
2022-06-09 17:49:33 -07:00
Girish Ramakrishnan
037f4195da guard against two level subdir moves
this has never worked since the -wholename check only works for
one level deep
2022-06-08 12:24:11 -07:00