⏩
SSH Fan Out
Coming Soon
This feature will be added for Jet's Tech Preview 2 release. The notes here are incomplete and are here to give readers a view into upcoming features. Everything is subject to change.
SSH Fan Out extends jet beyond 1:N SSH by having jetp reach out to any number of designated remote worker machines, and having those machines in turn invoke jetp asynchronously to each configure a subset of a playbook's designated inventory targets.
In short, this is a multi-tier distributed push system designed for multi-region, multi-AZ, and even multi-cloud topologies. By adding more worker nodes, there is effectively no limit to the number of total open connections or threads used by the distributed system.
SSH Fanout Architecture
Most users will not need this feature because jetp features very efficient threading and performance anyway. Further, rapid configuration of more than several thousand simultaneous hosts is often not needed to be done by CI/CD systems as tasks are usually deploying single services at a time.
Compared to Planetary Scale capabilities, SSH Fan Out in Jet rapidly expands the parallelism and reach of jet without having to maintain or supervise any extra infrastructure components. To further clarify, while Planetary Scale is essentially a push/pull or pure pull configuration, SSH Fan Out remains a pure 'push' method.
SSH Fan Out mode may be ideal for use with cloud spot instances for ephemeral workers as the feature requires no permanent infrastructure to enable remote configuration.
Usage to invoke jetp against many worker machines may look like this:
$ jetp --playbook pb1.yml --inventory ~/private_inventory \
--roles ~/roles --threads 50 \
--worker-config workers.yml
workers.yml on the workers machine would look something like this:
workers:
map:
- { from: us_east_1, to: us_east_1_workers }
- { from: us_west_1, to: us_west_1_workers }
- { from: stage, to: stage_workers }
threads: 200
jetp: /usr/bin/jetp
batch_size: 999999
sudo_user: opsteam
Rather than explaining each parameter in reference terms, it is probably easiest to walk through an example.
In the above configuration, jet uses the worker map is used to find the first available match for every host that is in the "from" group and then finds the mapped worker group that contains one or more machines to actually configure that host.
Let us say a host xyz.example.com was a member of the group "us_west_1".
If the group us_west_1 contained 2000 machines and the group us_west_1_workers contained 5 machines, 2000/5=400 hosts would be given to each worker to configure. This first execution of jetp from the control machine is done by jetp automatically constructing an in-memory playbook and using it's own internal code to invoke jetp again on the workers. The workers will then deploy the final automation configuration against the managed machines in inventory.
To use this feature, jetp must be preinstalled and found in the path of the remote hosts themselves, as jetp does not transfer itself to worker hosts. The path of jetp can also be modified and prefixed, for instance to run jetp using environment based secrets tools.
Running the first jetp command will print a GUID that can be used with jetp and remote logs to check remote status on all of the workers, or associate distributed logs in logfile aggregration systems.
This GUID can also be found in the logs of the machine that kicked off the jetp delegation command.
Delegation does not simply work using jump hosts. Jump hosts would not provide much performance improvements.
Access to remote nodes via fan-out (when using SSH delegation features only) uses SSH agent forwarding and then jetp runs again on the worker machines. This provides exponentially increased scaling capabilities.
The general concern with SSH agent forwarding is that root access on a worker allows someone on that system to use keys flowing through that system for other nefarious purposes. Basically we want to disallow root access on those machines, which is pretty simple, but we really should limit access to workers further. This is the one rule:
Only the security team should have root access on jetp worker nodes.
To encourage good practices, Jetp is designed to refuse to login to worker nodes as root when using fan out features.
A few other practices are suggested:
- Have the CI/CD system be the gateway for jobs to use worker nodes, do not give access to individual developers or service teams to access worker nodes directly.
- Use different SSH keys to access/use the workers than teams may use for production login access of any kind, so giving users access to one system does not give them access to login to worker nodes
- Treat worker nodes as single purpose machines that are not used for any other task
Multiple executions of jetp fan-out commands will not wait for the previous commands to finish and will run in parallel. This is considered a feature *mostly*. If you have lots of jobs all trying to use the same workers and they are overtaxed, consider adding more workers. Monitoring workers for resource consumption would be appropriate.
To decide how many worker nodes to allocate to jetp, first decide how long configuration of a single machine will take.
If this is, for example, 5 minutes and we want all jobs to complete in 5 minutes, we need the total number of threads on all workers to equal the total number of hosts to configure for any particular worker group.
If we ran 150 threads per worker and had 900 hosts, we might want 6 workers for that group. If we ran more threads or were willing to wait longer for configurations, we would need less.
If we are ok with all jobs completing in 20 minutes when a stand alone configuration would normally take 5 minutes, we only need one total thread for every 4 machines in inventory. With 900 hosts, this would move from 6 workers down to 1.5 (so, rounding up, we'd need 2).
Using more workers may reduce load on worker machines and care should be taken to not flood the network or effectively perform a distributed DDOS of a package mirror
batch_size can also be used on each worker to limit the number of hosts to configure at one time, just as with normal jetp - to enable rolling update deployments and prevent the entire system from being updated at once. When using smaller batch_sizes, less workers are also needed.
With this in mind, you can weigh computing costs versus efficiency and make more informed choices about how many workers you may want to have. Fewer workers are always an option even for very large scale configurations, it just means waiting longer for everything to complete.
Last modified 4d ago