Currently, the update/apptask/fullbackup/platformstart take a
global lock and cannot run in parallel. This causes situations
where when a user tries to trigger an apptask, it says "waiting for
backup to finish..." etc
The solution is to let them run in parallel. We need a lock at the
app level as app operations running in parallel would be bad (tm).
In addition, the update task needs a lock just for the update part.
We also need multi-process locks. Running tasks as processes is core
to our "kill" strategy.
Various inter process locks were explored:
* node's IPC mechanism with process.send(). But this only works for direct node.js
children. taskworker is run via sudo and the IPC does not work.
* File lock using O_EXCL. Basic ideas to create lock files. While file creation
can be done atomically, it becomes complicated to clean up lock files when
the tasks crash. We need a way to know what locks were held by the crashing task.
flock and friends are not built-into node.js
* sqlite/redis were options but introduce additional deps
* Settled on MySQL based locking. Initial plan was to have row locks
or table locks. Each row is a kind of lock. While implementing, it was found that
we need many types of locks (and not just update lock and app locks). For example,
we need locks for each task type, so that only one task type is active at a time.
* Instead of rows, we can just lock table and have a json blob in it. This hit a road
block that LOCK TABLE is per session and our db layer cannot handle this easily! i.e
when issing two db.query() it might use two different connections from the pool. We have to
expose the connection, release connection etc.
* Next idea was atomic blob update of the blob checking if old blob was same. This approach,
was finally refined into a version field.
Phew!
this tries to solve two issues:
* the current approach mounts the data directories of apps/volumes individually.
this causes a problem with volume mounts that mount after the container is started i.e not
network time/delay but systemd ordering. With CIFS, the mount is a hostname. This requires
unbound to be running but unbound can only start after docker because it wants to bind to
the docker network. one way to fix is to not start sftp automatically and only start sftp
container in the box code. This results in the sftp container attaching itself of the
directory before mounting and it appears empty. (on the host, the directory will appear
to have mount data!)
* every time apptask runs we keep rebuilding this sftp container. this results in much race.
the fix is: mount the parent directory of apps and volumes. in addition, then any specialized appdata
paths and volume paths are mounted individually. this greatly minimized rebuilding and also since we don't rely
on binding to the mount point itself. the child directories can mount in leisure. this limits the race
issue to only no-op volume mounts.
part of #789
when multiple apptasks are scheduled, we end up with a sequence like this:
- task1 finishes
- task2 (uninstall) removes appdata directory
- sftp rebuild (from task1 finish)
- task2 fails because sftp rebuild created empty appdata directory
a fix is to delay the sftp rebuild until all tasks are done. of course,
the same race is still there, if a user initiated another task immediately
but this seems unlikely. if that happens often, we can further add a sftpRebuildInProgress
flag inside apptaskmanager.
restart service does not rebuild automatically, we should add a route
for that. we need to figure where to scale services etc if we randomly
create containers like that.