Commit Graph

23 Commits

Author SHA1 Message Date
Girish Ramakrishnan
1872cea763 graphs: do not average cpu use
Show like htop/top: cpu core count * 100
2022-10-13 22:36:20 +02:00
Johannes Zellner
cbaf86b8c7 Use counter values for docker stats in collectd and grafana queries 2022-10-11 19:06:40 +02:00
Johannes Zellner
ad29f51833 Fixup typo guage -> gauge in docker-stats.py 2022-10-11 10:54:53 +02:00
Girish Ramakrishnan
3caffdb4e1 Rework app stats
Previously, the du plugin was collecting data every 20 seconds but
carbon was configured to only keep data every 12 hours causing much
confusion.

In the process of reworking this, it was determined:

* No need to collect disk usage info over time. Not sure how that is useful
* Instead, collect CPU/Network/Block info over time. We get this now from docker stats
* We also collect info about the services (addon containers)
* No need to reconfigure collectd for each app change anymore since there is no per
app collectd configuration anymore.
2022-10-10 21:13:26 +02:00
Girish Ramakrishnan
534c8f9c3f collectd: on one system, localhost was missing in /etc/hosts 2022-05-27 16:10:38 -07:00
Girish Ramakrishnan
5ee9feb0d2 If disk name has '.', replace with '_'
graphite uses . as the separator between different metric parts

see #348
2022-05-27 16:00:08 -07:00
Girish Ramakrishnan
3adf8b5176 collectd: FQDNLookup causes collectd install to fail
this is on ubuntu 20

https://forum.cloudron.io/topic/7091/aws-ubuntu-20-04-installation-issue
2022-05-25 15:10:55 -07:00
Girish Ramakrishnan
c1ee3dcbd4 collectd: cache du values and send it every Interval (20)
collectd plugin ordering matters. the write_graphite plugin establishes
a TCP connection but there is a race between that and the df/du values that
get reported. du is especially problematic since we report this only every 12 hours.

so, instead we cache the values and report it every 20 seconds. on the carbon side,
it will just retain every 12 hours (since that is the whisper retention period).

there is also FlushInterval which I am not 100% sure has any effect. by default, the
write_graphite plugin waits for 1428 bytes to be accumulated. (https://manpages.debian.org/unstable/collectd-core/collectd.conf.5.en.html)

https://github.com/collectd/collectd/issues/2672
https://github.com/collectd/collectd/pull/1044

I found this syntax hidden deep inside https://www.cisco.com/c/en/us/td/docs/net_mgmt/virtual_topology_system/2_6_3/user_guide/Cisco_VTS_2_6_3_User_Guide/Cisco_VTS_2_6_1_User_Guide_chapter_01111.pdf
2021-03-26 00:21:38 -07:00
Girish Ramakrishnan
c1b61bc56b add note 2021-03-24 20:30:02 -07:00
Girish Ramakrishnan
6810d823f5 collectd(df): convert byte string to string
this makes the graphs work
2020-12-04 12:10:59 -08:00
Girish Ramakrishnan
eb47476c83 collectd: remove nginx status collection
we don't use this at all
2020-09-23 16:09:46 -07:00
Girish Ramakrishnan
d113cfc0ba add comment on how often du value is stored 2020-05-22 20:06:45 -07:00
Girish Ramakrishnan
38d4f2c27b Add note on what df output is 2020-04-01 15:59:48 -07:00
Girish Ramakrishnan
552e2a036c Use block size instead of apparent size in du
https://stackoverflow.com/questions/5694741/why-is-the-output-of-du-often-so-different-from-du-b

df uses superblock info to get consumed blocks/disk size. du with -b
prints actual file size instead of the disk space used by the files.
2020-04-01 15:24:53 -07:00
Girish Ramakrishnan
037440034b Move collectd logs to platformdata and rotate it 2020-02-18 20:36:50 -08:00
Girish Ramakrishnan
1aa7eb4478 Collect and aggregate du information twice a day 2019-08-21 13:45:52 -07:00
Girish Ramakrishnan
fd6dd1ea18 Add timestamp to the logs 2019-08-21 10:16:57 -07:00
Girish Ramakrishnan
9d3b4ba816 store docker df output as well 2019-08-19 16:15:31 -07:00
Girish Ramakrishnan
2b484c0382 collect maildata size separately 2019-08-19 13:23:31 -07:00
Girish Ramakrishnan
0d7a3f43c4 Collect du information 2019-08-18 21:52:41 -07:00
Girish Ramakrishnan
9e558924bb df plugin replaces with _ and not -
Part of #348
2017-08-15 09:32:42 -07:00
Girish Ramakrishnan
57891c64b5 use check_output instead
Aug 14 19:10:46 collectd[12651]: close failed in file object destructor:
Aug 14 19:10:46 collectd[12651]: IOError: [Errno 10] No child processes
2017-08-14 12:31:58 -07:00
Girish Ramakrishnan
5fe73c5a46 Replace df plugin with custom df plugin
The built-in df plugin cannot do the following:
    * if we choose by type ext4, we want to skip devicemapper (on scaleway)
    * the MountPoint of the appsdata directory is not possible to know at install time

Fixes #398
2017-08-11 01:39:51 -07:00