Commit Graph

101 Commits

Author SHA1 Message Date
Girish Ramakrishnan c4f4f3e914 logs: use %o to format error
otherwise, they are printed as multi-line and this messes up tail+date formatting
2023-04-16 10:49:59 +02:00
Girish Ramakrishnan 2a660fa59d change terminology to running and unresponsive 2022-11-30 14:41:48 +01:00
Girish Ramakrishnan e942b8fe7e better debugs 2022-11-30 13:08:05 +01:00
Girish Ramakrishnan 84ba333aa1 app proxy: disable TLS check in app health monitor 2022-10-01 11:47:52 +02:00
Girish Ramakrishnan 9bd9b72e5d apphealthmonitor: Fix crash 2022-06-10 11:09:41 -07:00
Johannes Zellner f382b8f1f5 Set real upstreamUri for healthcheck 2022-06-09 15:04:09 +02:00
Johannes Zellner fbc7fcf04b Put healthcheck errors in app logs 2022-06-09 14:56:40 +02:00
Johannes Zellner 923a9f6560 Rename RELAY_APPSTORE_ID to PROXY_APP_APPSTORE_ID 2022-06-09 13:57:57 +02:00
Johannes Zellner a955457ee7 Support proxy app 2022-06-09 10:48:54 +02:00
Girish Ramakrishnan a3e097d541 add missing awaits for eventlog.add 2022-02-24 20:04:46 -08:00
Girish Ramakrishnan 8cda287838 fix crash when there are multiple quick oom events 2021-10-19 12:25:25 -07:00
Girish Ramakrishnan 445c83c8b9 make auditsource a class
this allows us to use AuditSource for the class and auditSource for
the instances!
2021-09-30 10:13:36 -07:00
Girish Ramakrishnan 50c68cd499 notifications: better oom message for redis
fixes #795
2021-09-19 17:34:41 -07:00
Girish Ramakrishnan 42774eac8c docker.js and services.js: async'ify 2021-08-26 18:23:31 -07:00
Girish Ramakrishnan 77f5cb183b merge appdb.js into apps.js 2021-08-23 15:35:38 -07:00
Girish Ramakrishnan ebab671f68 remove slash from container name 2021-06-23 17:20:11 -07:00
Girish Ramakrishnan 8da4eaf4a3 fix tests 2021-06-03 16:08:39 -07:00
Girish Ramakrishnan 73917e95c9 rework notifications
notifications are now system level instead of user level.

To clarify the use events/notifications/email:
* eventlog - everything that is happenning on server
* notifications - specific important events (alerts)
* email - these are really urgent things that require immediate attention. this is for
  the case where an admin does not visit the dashboard often. can also be alerts like
  bad backup config or reboot required which are not events per-se.

Notes on notifications
* oom - notification only
* appUpdated - notification only
* cert renewal failure - only raise when < 10 days to go. also send email thereafter (todo).
* Backup failure - only if last 5 backups failed (todo).
* Box update - notification only. we anyway send newsletter.
* box update available - we raise a notification. no email.
* app update available - we already have update indicator on dashboard. so, no notification or email.

Alerts:
* backup config
* disk space
* mail status
* reboot
* box updated
* ubuntu update required
2021-05-28 15:29:53 -07:00
Girish Ramakrishnan f211de1ff4 apphealthmonitor: 403 is ok 2021-03-30 11:57:30 -07:00
Girish Ramakrishnan 1724607433 apphealth: clamp health time to first run
the platform.start can take forever. this means that we start the
clock to include platform.start and this sends a lot of spurious
up/down notifications.

also, bump the down threshold to 20 mins.
2021-03-04 15:03:08 -08:00
Girish Ramakrishnan 10ca889de0 apphealthmonitor: better debugs 2021-03-04 11:42:43 -08:00
Girish Ramakrishnan aedc8e8087 do not send flurry of down notification on box restart 2021-01-16 11:27:19 -08:00
Girish Ramakrishnan 294413b798 Fix comment 2021-01-02 12:12:08 -08:00
Girish Ramakrishnan c0b0029935 statically allocate app container IPs
We removed httpPort with the assumption that docker allocated IPs
and kept them as long as the container is around. This turned out
to be not true because the IP changes on even container restart.

So we now allocate IPs statically. The iprange makes sure we don't
overlap with addons and other CI app or JupyterHub apps.

https://github.com/moby/moby/issues/6743
https://github.com/moby/moby/pull/19001
2020-11-20 16:19:59 -08:00
Girish Ramakrishnan d703d1cd13 remove httpPort
we can just use container IP instead of all this httpPort exporting magic.
this is also required for exposing httpPaths feature (we have to otherwise
have multiple httpPorts).
2020-11-19 00:38:52 -08:00
Girish Ramakrishnan 86916a94de allow 401 and 403 errors to pass health check
way too many WP sites use some plugin to block health check routes.
maybe some day we will have dynamic health check route settable by user.
2020-11-10 16:50:36 -08:00
Girish Ramakrishnan f2489c0845 some logs for tracking the cron issue 2020-10-07 14:47:51 -07:00
Girish Ramakrishnan 252aedda25 remove verbose logs 2020-08-18 12:46:55 -07:00
Girish Ramakrishnan 50dcf827a5 remove console.error use in many places
the backtraces just flood the logs

apphealthtask: remove console.error
remove spurious console.dir
cleanup scheduler error logging
2020-06-04 11:21:56 -07:00
Girish Ramakrishnan d2cd78c5cb more debug() removal 2020-05-24 12:30:48 -07:00
Girish Ramakrishnan d000719fa2 app health monitor is too verbose 2020-05-24 11:43:17 -07:00
Girish Ramakrishnan 1ad0cff28e Use app.fqdn in output 2019-12-24 11:07:53 -08:00
Girish Ramakrishnan d255466417 manifest.id is optional for custom apps 2019-11-15 17:28:54 -08:00
Girish Ramakrishnan a017af41c5 Start moving db code to use BoxError as well 2019-10-24 14:09:53 -07:00
Girish Ramakrishnan dd0fb8292c Move state enums to the model code 2019-08-30 13:21:51 -07:00
Girish Ramakrishnan e29d224a92 Be a bit more specific 2019-07-31 15:45:25 -07:00
Girish Ramakrishnan bb48ffb01f Fixup UA for easier detection (other than IP) 2019-07-31 15:43:15 -07:00
Girish Ramakrishnan d752c68790 re-factor all the audit source objects 2019-03-25 15:15:39 -07:00
Girish Ramakrishnan 8d7f7cb438 rename the constant 2019-03-06 15:55:07 -08:00
Girish Ramakrishnan b5a4121574 Better OOM notification messages 2019-03-06 14:47:24 -08:00
Girish Ramakrishnan 59ff3998bc do not send up mails immediately on installation 2019-02-13 14:44:02 -08:00
Girish Ramakrishnan 9471dc27e0 App can also be dead/error 2019-02-12 17:01:45 -08:00
Girish Ramakrishnan 5980ab9b69 Add healthTime in the database
this is currently an internal field and not returned in API
2019-02-12 16:33:28 -08:00
Girish Ramakrishnan 70e5daf8c6 Fix usage of audit source 2019-02-11 14:41:12 -08:00
Girish Ramakrishnan 2236e07722 Send app up notification
Fixes #438
2019-02-11 12:58:33 -08:00
Girish Ramakrishnan c0b929035f lint 2019-01-23 21:00:26 -08:00
Johannes Zellner 701024cf80 Send app down notification through eventlog 2019-01-17 17:26:58 +01:00
Johannes Zellner 4ecb0d82e7 Handle oom notification through eventlog 2019-01-17 15:31:34 +01:00
Johannes Zellner 85ea9b3255 Rework the oom notification 2019-01-08 14:37:58 +01:00
Johannes Zellner 5f71f6987c Create notifications for app down event 2019-01-07 13:01:27 +01:00