Commit Graph

85 Commits

Author SHA1 Message Date
Girish Ramakrishnan
8da4eaf4a3 fix tests 2021-06-03 16:08:39 -07:00
Girish Ramakrishnan
73917e95c9 rework notifications
notifications are now system level instead of user level.

To clarify the use events/notifications/email:
* eventlog - everything that is happenning on server
* notifications - specific important events (alerts)
* email - these are really urgent things that require immediate attention. this is for
  the case where an admin does not visit the dashboard often. can also be alerts like
  bad backup config or reboot required which are not events per-se.

Notes on notifications
* oom - notification only
* appUpdated - notification only
* cert renewal failure - only raise when < 10 days to go. also send email thereafter (todo).
* Backup failure - only if last 5 backups failed (todo).
* Box update - notification only. we anyway send newsletter.
* box update available - we raise a notification. no email.
* app update available - we already have update indicator on dashboard. so, no notification or email.

Alerts:
* backup config
* disk space
* mail status
* reboot
* box updated
* ubuntu update required
2021-05-28 15:29:53 -07:00
Girish Ramakrishnan
f211de1ff4 apphealthmonitor: 403 is ok 2021-03-30 11:57:30 -07:00
Girish Ramakrishnan
1724607433 apphealth: clamp health time to first run
the platform.start can take forever. this means that we start the
clock to include platform.start and this sends a lot of spurious
up/down notifications.

also, bump the down threshold to 20 mins.
2021-03-04 15:03:08 -08:00
Girish Ramakrishnan
10ca889de0 apphealthmonitor: better debugs 2021-03-04 11:42:43 -08:00
Girish Ramakrishnan
aedc8e8087 do not send flurry of down notification on box restart 2021-01-16 11:27:19 -08:00
Girish Ramakrishnan
294413b798 Fix comment 2021-01-02 12:12:08 -08:00
Girish Ramakrishnan
c0b0029935 statically allocate app container IPs
We removed httpPort with the assumption that docker allocated IPs
and kept them as long as the container is around. This turned out
to be not true because the IP changes on even container restart.

So we now allocate IPs statically. The iprange makes sure we don't
overlap with addons and other CI app or JupyterHub apps.

https://github.com/moby/moby/issues/6743
https://github.com/moby/moby/pull/19001
2020-11-20 16:19:59 -08:00
Girish Ramakrishnan
d703d1cd13 remove httpPort
we can just use container IP instead of all this httpPort exporting magic.
this is also required for exposing httpPaths feature (we have to otherwise
have multiple httpPorts).
2020-11-19 00:38:52 -08:00
Girish Ramakrishnan
86916a94de allow 401 and 403 errors to pass health check
way too many WP sites use some plugin to block health check routes.
maybe some day we will have dynamic health check route settable by user.
2020-11-10 16:50:36 -08:00
Girish Ramakrishnan
f2489c0845 some logs for tracking the cron issue 2020-10-07 14:47:51 -07:00
Girish Ramakrishnan
252aedda25 remove verbose logs 2020-08-18 12:46:55 -07:00
Girish Ramakrishnan
50dcf827a5 remove console.error use in many places
the backtraces just flood the logs

apphealthtask: remove console.error
remove spurious console.dir
cleanup scheduler error logging
2020-06-04 11:21:56 -07:00
Girish Ramakrishnan
d2cd78c5cb more debug() removal 2020-05-24 12:30:48 -07:00
Girish Ramakrishnan
d000719fa2 app health monitor is too verbose 2020-05-24 11:43:17 -07:00
Girish Ramakrishnan
1ad0cff28e Use app.fqdn in output 2019-12-24 11:07:53 -08:00
Girish Ramakrishnan
d255466417 manifest.id is optional for custom apps 2019-11-15 17:28:54 -08:00
Girish Ramakrishnan
a017af41c5 Start moving db code to use BoxError as well 2019-10-24 14:09:53 -07:00
Girish Ramakrishnan
dd0fb8292c Move state enums to the model code 2019-08-30 13:21:51 -07:00
Girish Ramakrishnan
e29d224a92 Be a bit more specific 2019-07-31 15:45:25 -07:00
Girish Ramakrishnan
bb48ffb01f Fixup UA for easier detection (other than IP) 2019-07-31 15:43:15 -07:00
Girish Ramakrishnan
d752c68790 re-factor all the audit source objects 2019-03-25 15:15:39 -07:00
Girish Ramakrishnan
8d7f7cb438 rename the constant 2019-03-06 15:55:07 -08:00
Girish Ramakrishnan
b5a4121574 Better OOM notification messages 2019-03-06 14:47:24 -08:00
Girish Ramakrishnan
59ff3998bc do not send up mails immediately on installation 2019-02-13 14:44:02 -08:00
Girish Ramakrishnan
9471dc27e0 App can also be dead/error 2019-02-12 17:01:45 -08:00
Girish Ramakrishnan
5980ab9b69 Add healthTime in the database
this is currently an internal field and not returned in API
2019-02-12 16:33:28 -08:00
Girish Ramakrishnan
70e5daf8c6 Fix usage of audit source 2019-02-11 14:41:12 -08:00
Girish Ramakrishnan
2236e07722 Send app up notification
Fixes #438
2019-02-11 12:58:33 -08:00
Girish Ramakrishnan
c0b929035f lint 2019-01-23 21:00:26 -08:00
Johannes Zellner
701024cf80 Send app down notification through eventlog 2019-01-17 17:26:58 +01:00
Johannes Zellner
4ecb0d82e7 Handle oom notification through eventlog 2019-01-17 15:31:34 +01:00
Johannes Zellner
85ea9b3255 Rework the oom notification 2019-01-08 14:37:58 +01:00
Johannes Zellner
5f71f6987c Create notifications for app down event 2019-01-07 13:01:27 +01:00
Johannes Zellner
86dbb1bdcf Create notification for oom events 2019-01-07 12:57:57 +01:00
Girish Ramakrishnan
a536e9fc4b track last oom time using a global variable
because it was a local variable, we were just sending out oom mails
like crazy

also, fixes an issue that if docker.getEvents gets stuck because
docker does not respond then we do not do any health monitoring.
i guess this can happen if the docker API gets stuck.
2018-12-16 20:52:42 -08:00
Girish Ramakrishnan
a49969f2be Move apphealthmonitor into a cron job
This makes sure that it only runs post activation

See also a9c1af50f7
2018-10-22 20:08:49 -07:00
Girish Ramakrishnan
630fbb373c healthCheckPath is optional for non-appstore apps 2018-10-11 13:20:31 -07:00
Johannes Zellner
b6384d5025 Remove intrinsicFqdn 2018-02-08 15:07:49 +01:00
Johannes Zellner
8f74cacfd0 Remove unused require 2018-02-05 20:45:53 +01:00
Girish Ramakrishnan
efc0a3b68d Remove usage of config.appFqdn() 2018-01-10 13:58:05 -08:00
Johannes Zellner
e43e904622 Refactor all app.location usages to config.appFqdn(app) 2017-11-20 20:01:50 +01:00
Girish Ramakrishnan
afed3f3725 Remove duplicate debug 2017-10-04 15:08:26 -07:00
Dennis Schwerdel
e3f3241966 Added user agent to health checks 2017-10-04 13:05:00 +02:00
Girish Ramakrishnan
5afef14760 Actually send emails for responsive apps 2017-03-14 13:42:28 -07:00
Girish Ramakrishnan
9b6c6dc709 doc: base image 0.10.0 2017-02-16 09:20:27 -08:00
Girish Ramakrishnan
e35dbd522f More debugMode fixes 2017-01-20 09:56:44 -08:00
Girish Ramakrishnan
a71323f8b3 Add developmentMode flag to appdb
Part of #171
2017-01-19 15:57:24 -08:00
Johannes Zellner
d392293b50 Remove unused require 2017-01-17 16:32:22 +01:00
Johannes Zellner
16371d4528 Use the apps.js layer instead of the raw appdb in apphealthmonitor.js 2017-01-17 16:32:12 +01:00