`ssh-keyscan` Is the Fast Way to Build `known_hosts` Files but the Wrong Way to Skip Verification Thinking
A practical guide to `ssh-keyscan` for developers who need to collect SSH host keys for automation, CI, or deployment scripts without pretending raw collection is the same thing as trust.
Why this command matters: SSH automation breaks in two opposite ways. Some teams do everything manually forever. Others automate host trust so casually that they stop thinking about verification at all.
ssh-keyscansits right in the middle of that tension.
What ssh-keyscan does
The OpenSSH manual describes ssh-keyscan as a utility for gathering the public SSH host keys of a number of hosts. It was designed to help build and verify ssh_known_hosts files and to fit shell and Perl scripts with a minimal interface.
That means the command is not primarily about logging in. It is about collecting host key material efficiently.
A basic example:
ssh-keyscan github.comThat returns host key lines you can place into a known_hosts file.
Why this command is so useful in automation
Modern deployment flows often need non-interactive SSH:
- CI pulling private dependencies
- deploy scripts connecting to servers
- build machines cloning private repos
- orchestration jobs touching many hosts
In those environments, the classic interactive SSH trust prompt is not a usable workflow. ssh-keyscan helps you prebuild the trust file so automation does not stall waiting for a human.
The part people misuse
The same man page also contains the warning many teams mentally skip: if an ssh_known_hosts file is constructed using ssh-keyscan without verifying the keys, users become vulnerable to man-in-the-middle attacks.
That sentence matters more than most blog posts admit.
ssh-keyscan gathers keys. It does not prove the network path is honest. It does not prove you contacted the right machine. It does not replace out-of-band trust.
That is the core mental model:
- collection is not verification
- automation is not trust
A practical CI example
Many teams do something like this:
mkdir -p ~/.ssh
ssh-keyscan github.com >> ~/.ssh/known_hostsThat can be acceptable when the trust assumptions are already understood and documented. It becomes sloppy when people paste it into scripts as a magic incantation without knowing what risk they are accepting.
The right question is not “does the command work?” The right question is “why am I comfortable trusting this key source in this environment?”
Useful flags worth knowing
The manual highlights a few options that matter in real workflows:
-tto select key types such asrsa,ecdsa, ored25519-pto connect to a non-default port-Hto hash hostnames and addresses in output-Tto set a connection timeout-fto read many hosts from a file
These are not edge-case flags. They are what make the tool practical in real deployment environments.
For example:
ssh-keyscan -T 10 -t ed25519 my-server.example.comThat is clearer and tighter than fetching every possible thing with default assumptions.
Why the scaling story matters
The manual also notes that ssh-keyscan uses non-blocking socket I/O and can contact many hosts in parallel efficiently. That is exactly why the command remains valuable. If you need keys from dozens or hundreds of hosts, interactive SSH acceptance is not a strategy.
Speed is the point.
But speed without verification discipline is where teams create quiet risk.
The right way to think about it
Use ssh-keyscan as a collection tool inside a larger trust process. Maybe that process is:
- compare keys against provider documentation
- validate fingerprints out of band
- store trusted keys in versioned infrastructure config
- alert when host keys change unexpectedly
That is real operational maturity. Blindly appending keys in every script is not.
Final recommendation
Use ssh-keyscan when you need fast, scriptable host key collection. Just keep the important boundary in your head: it is a data-gathering tool, not a magical trust oracle. Fast automation is useful. Verified automation is better.