Commit Graph

103 Commits

Author SHA1 Message Date
Girish Ramakrishnan
d2c702f890 eventlog: always use AuditSource objects as source field 2023-08-28 08:13:56 +05:30
Girish Ramakrishnan
6259849958 apphealth: timeout is already in msecs 2023-06-22 18:24:59 +05:30
Girish Ramakrishnan
c4f4f3e914 logs: use %o to format error
otherwise, they are printed as multi-line and this messes up tail+date formatting
2023-04-16 10:49:59 +02:00
Girish Ramakrishnan
2a660fa59d change terminology to running and unresponsive 2022-11-30 14:41:48 +01:00
Girish Ramakrishnan
e942b8fe7e better debugs 2022-11-30 13:08:05 +01:00
Girish Ramakrishnan
84ba333aa1 app proxy: disable TLS check in app health monitor 2022-10-01 11:47:52 +02:00
Girish Ramakrishnan
9bd9b72e5d apphealthmonitor: Fix crash 2022-06-10 11:09:41 -07:00
Johannes Zellner
f382b8f1f5 Set real upstreamUri for healthcheck 2022-06-09 15:04:09 +02:00
Johannes Zellner
fbc7fcf04b Put healthcheck errors in app logs 2022-06-09 14:56:40 +02:00
Johannes Zellner
923a9f6560 Rename RELAY_APPSTORE_ID to PROXY_APP_APPSTORE_ID 2022-06-09 13:57:57 +02:00
Johannes Zellner
a955457ee7 Support proxy app 2022-06-09 10:48:54 +02:00
Girish Ramakrishnan
a3e097d541 add missing awaits for eventlog.add 2022-02-24 20:04:46 -08:00
Girish Ramakrishnan
8cda287838 fix crash when there are multiple quick oom events 2021-10-19 12:25:25 -07:00
Girish Ramakrishnan
445c83c8b9 make auditsource a class
this allows us to use AuditSource for the class and auditSource for
the instances!
2021-09-30 10:13:36 -07:00
Girish Ramakrishnan
50c68cd499 notifications: better oom message for redis
fixes #795
2021-09-19 17:34:41 -07:00
Girish Ramakrishnan
42774eac8c docker.js and services.js: async'ify 2021-08-26 18:23:31 -07:00
Girish Ramakrishnan
77f5cb183b merge appdb.js into apps.js 2021-08-23 15:35:38 -07:00
Girish Ramakrishnan
ebab671f68 remove slash from container name 2021-06-23 17:20:11 -07:00
Girish Ramakrishnan
8da4eaf4a3 fix tests 2021-06-03 16:08:39 -07:00
Girish Ramakrishnan
73917e95c9 rework notifications
notifications are now system level instead of user level.

To clarify the use events/notifications/email:
* eventlog - everything that is happenning on server
* notifications - specific important events (alerts)
* email - these are really urgent things that require immediate attention. this is for
  the case where an admin does not visit the dashboard often. can also be alerts like
  bad backup config or reboot required which are not events per-se.

Notes on notifications
* oom - notification only
* appUpdated - notification only
* cert renewal failure - only raise when < 10 days to go. also send email thereafter (todo).
* Backup failure - only if last 5 backups failed (todo).
* Box update - notification only. we anyway send newsletter.
* box update available - we raise a notification. no email.
* app update available - we already have update indicator on dashboard. so, no notification or email.

Alerts:
* backup config
* disk space
* mail status
* reboot
* box updated
* ubuntu update required
2021-05-28 15:29:53 -07:00
Girish Ramakrishnan
f211de1ff4 apphealthmonitor: 403 is ok 2021-03-30 11:57:30 -07:00
Girish Ramakrishnan
1724607433 apphealth: clamp health time to first run
the platform.start can take forever. this means that we start the
clock to include platform.start and this sends a lot of spurious
up/down notifications.

also, bump the down threshold to 20 mins.
2021-03-04 15:03:08 -08:00
Girish Ramakrishnan
10ca889de0 apphealthmonitor: better debugs 2021-03-04 11:42:43 -08:00
Girish Ramakrishnan
aedc8e8087 do not send flurry of down notification on box restart 2021-01-16 11:27:19 -08:00
Girish Ramakrishnan
294413b798 Fix comment 2021-01-02 12:12:08 -08:00
Girish Ramakrishnan
c0b0029935 statically allocate app container IPs
We removed httpPort with the assumption that docker allocated IPs
and kept them as long as the container is around. This turned out
to be not true because the IP changes on even container restart.

So we now allocate IPs statically. The iprange makes sure we don't
overlap with addons and other CI app or JupyterHub apps.

https://github.com/moby/moby/issues/6743
https://github.com/moby/moby/pull/19001
2020-11-20 16:19:59 -08:00
Girish Ramakrishnan
d703d1cd13 remove httpPort
we can just use container IP instead of all this httpPort exporting magic.
this is also required for exposing httpPaths feature (we have to otherwise
have multiple httpPorts).
2020-11-19 00:38:52 -08:00
Girish Ramakrishnan
86916a94de allow 401 and 403 errors to pass health check
way too many WP sites use some plugin to block health check routes.
maybe some day we will have dynamic health check route settable by user.
2020-11-10 16:50:36 -08:00
Girish Ramakrishnan
f2489c0845 some logs for tracking the cron issue 2020-10-07 14:47:51 -07:00
Girish Ramakrishnan
252aedda25 remove verbose logs 2020-08-18 12:46:55 -07:00
Girish Ramakrishnan
50dcf827a5 remove console.error use in many places
the backtraces just flood the logs

apphealthtask: remove console.error
remove spurious console.dir
cleanup scheduler error logging
2020-06-04 11:21:56 -07:00
Girish Ramakrishnan
d2cd78c5cb more debug() removal 2020-05-24 12:30:48 -07:00
Girish Ramakrishnan
d000719fa2 app health monitor is too verbose 2020-05-24 11:43:17 -07:00
Girish Ramakrishnan
1ad0cff28e Use app.fqdn in output 2019-12-24 11:07:53 -08:00
Girish Ramakrishnan
d255466417 manifest.id is optional for custom apps 2019-11-15 17:28:54 -08:00
Girish Ramakrishnan
a017af41c5 Start moving db code to use BoxError as well 2019-10-24 14:09:53 -07:00
Girish Ramakrishnan
dd0fb8292c Move state enums to the model code 2019-08-30 13:21:51 -07:00
Girish Ramakrishnan
e29d224a92 Be a bit more specific 2019-07-31 15:45:25 -07:00
Girish Ramakrishnan
bb48ffb01f Fixup UA for easier detection (other than IP) 2019-07-31 15:43:15 -07:00
Girish Ramakrishnan
d752c68790 re-factor all the audit source objects 2019-03-25 15:15:39 -07:00
Girish Ramakrishnan
8d7f7cb438 rename the constant 2019-03-06 15:55:07 -08:00
Girish Ramakrishnan
b5a4121574 Better OOM notification messages 2019-03-06 14:47:24 -08:00
Girish Ramakrishnan
59ff3998bc do not send up mails immediately on installation 2019-02-13 14:44:02 -08:00
Girish Ramakrishnan
9471dc27e0 App can also be dead/error 2019-02-12 17:01:45 -08:00
Girish Ramakrishnan
5980ab9b69 Add healthTime in the database
this is currently an internal field and not returned in API
2019-02-12 16:33:28 -08:00
Girish Ramakrishnan
70e5daf8c6 Fix usage of audit source 2019-02-11 14:41:12 -08:00
Girish Ramakrishnan
2236e07722 Send app up notification
Fixes #438
2019-02-11 12:58:33 -08:00
Girish Ramakrishnan
c0b929035f lint 2019-01-23 21:00:26 -08:00
Johannes Zellner
701024cf80 Send app down notification through eventlog 2019-01-17 17:26:58 +01:00
Johannes Zellner
4ecb0d82e7 Handle oom notification through eventlog 2019-01-17 15:31:34 +01:00