with our new retagging approach, the Digest ID remains <null> because
this is only set by docker if truly fetched from the registry.
this means that redis container always gets removed...
Currently, we allocate 50% as RAM and 50% as swap. The manifest is
usually quite conservative on memory values. This means that we set
up a system where the app is applying memory pressure almost immediately.
This then swaps things randomly and increases cpu usage (kswapd shows
up in the profile).
To rethink the whole situation: we should not cap apps with a swap limit at all.
The memory hard limit is what is important. By redefining memoryLimit , we are
doubling every container's memory and it's good that we over allocate this.
docker is using a extra udp port for every container. when there is
a lot of containers, a lot of random udp ports get used up. this causes
problems when installing apps that require contiguous port ranges
this tries to solve two issues:
* the current approach mounts the data directories of apps/volumes individually.
this causes a problem with volume mounts that mount after the container is started i.e not
network time/delay but systemd ordering. With CIFS, the mount is a hostname. This requires
unbound to be running but unbound can only start after docker because it wants to bind to
the docker network. one way to fix is to not start sftp automatically and only start sftp
container in the box code. This results in the sftp container attaching itself of the
directory before mounting and it appears empty. (on the host, the directory will appear
to have mount data!)
* every time apptask runs we keep rebuilding this sftp container. this results in much race.
the fix is: mount the parent directory of apps and volumes. in addition, then any specialized appdata
paths and volume paths are mounted individually. this greatly minimized rebuilding and also since we don't rely
on binding to the mount point itself. the child directories can mount in leisure. this limits the race
issue to only no-op volume mounts.
part of #789
when multiple apptasks are scheduled, we end up with a sequence like this:
- task1 finishes
- task2 (uninstall) removes appdata directory
- sftp rebuild (from task1 finish)
- task2 fails because sftp rebuild created empty appdata directory
a fix is to delay the sftp rebuild until all tasks are done. of course,
the same race is still there, if a user initiated another task immediately
but this seems unlikely. if that happens often, we can further add a sftpRebuildInProgress
flag inside apptaskmanager.
restart service does not rebuild automatically, we should add a route
for that. we need to figure where to scale services etc if we randomly
create containers like that.
bd9c664b1a tried to remove it and use
the system resolver. However, we found that debian has a quirk that it adds
it adds the fqdn as 127.0.1.1. This means that the docker containers
resolve the my.example.com domain to that and can't connect.
This affects any apps doing a turn test (CLOUDRON_TURN/STUN_SERVER)
and also apps like SOGo which use the mail server hostname directly (since
they require proper certs).
https://www.debian.org/doc/manuals/debian-reference/ch05.en.html#_the_hostname_resolution
So, the solution is to go back to unbound, now that port 53 binding is specially
handled anyway in docker.js
It's all very complicated.
Approach 1: Simple move unbound to not listen on 0.0.0.0 and only the internal
ones. However, docker has no way to bind only to the "public" interface.
Approach 2: Move the internal unbound to some other port. This required a PR
for haraka - https://github.com/haraka/Haraka/pull/2863 . This works and we use
systemd-resolved by default. However, it turns out systemd-resolved with hog the
lo and thus docker cannot bind again to port 53.
Approach 3: Get rid of systemd-resolved and try to put the dns server list in
/etc/resolv.conf. This is surprisingly hard because the DNS listing can come from
DHCP or netplan or wherever. We can hardcode some public DNS servers but this seems
not a good idea for privacy.
Approach 4: So maybe we don't move the unbound away to different port after all.
However, all the work for approach 2 is done and it's quite nice that the default
resolver is used with the default dns server of the network (probably a caching
server + also maybe has some home network firewalled dns).
So, the final solution is to bind to the make docker bind to the IP explicity.
It's unclear what will happen if the IP changes, maybe it needs a restart.