SSH is the de facto standard for remote access to your GNU/Linux operating system (and for all modern Unix systems). When you are using SSH in a corporate environment, in order to comply with security standards like ISO 27001, PCI DSS, or SOC2, you need an SSH Bastion to manage remote access to your infrastructure.
I’m sharing with you what I expect from this type of service and what features I want to have. This list comes from many years of managing SSH Bastions and also a lot of security audits from third parties.
Fined-grained access control is the primary feature I’m looking for when I evaluate an SSH Bastion. I classified my needs with these three categories:
The system should be connected to my organization SSO (with OIDC or SAML):
ℹ️ Note: this integration could lead to external dependency and could have an impact in case of an incident (one on the SSO can’t prevent you from connecting to your infrastructure). You need to take this into account.
Ability to quickly revoke access (usually provided by the SSO integration, but some systems aren’t synchronized, so your Bastion has to do the job instead). Quickly removing access is a key feature for a lot of use cases:
Logs are the second key feature for an SSH Bastion. There are two parts in a logging capability.
The first one is about connection logs. I classified my needs into four categories:
The second one is about log storage. Logs have to be securely stored:
Bonus: I like when I can replay the logs (using script command for example)
Because my bastion is a critical resource in my infrastructure, I want a state-of-the-art OS hardening on my bastion. I also want to be up-to-date with the most recent innovations and best practices on this theme.
Next, I want my OS to be updated quickly after a patch is released. Furthermore, I want to avoid breaking the service and minimize the outage each time an update is deployed.
Patch Management isn’t the only issue to tackle. I need to be able to identify the latest Security Vulnerability (like zero days) applicable to my system and deploy, if available, the workaround.
Most of the time, a bastion needs to be accessed from anywhere, anytime. A best practice is to put the bastion behind a VPN or something limiting access.
While it’s an excellent practice (reducing the exposure of your service reduces the surface of potential attacks), it isn’t always possible for a lot of reasons like the cost of managing a VPN endpoint, the additional time needed to maintain all the solutions, etc. (a meshed VPN network with a service like Tailscale could be an affordable and elegant solution in that case, maybe the content for another article!).
So, secure access to your bastion is hard-linked with the previous security thematic (OS, Patch, and Vulnerability).
Because it’s a critical component, I need a high level of service and I can’t rely on a single instance for the administration entry point of my infrastructure. It will be unacceptable to not have the possibility to access my systems during an incident because my SSH Bastions are unavailable.
I want to be able to easily monitor my bastion (with an API for example) and I want to detect if there is an issue with the service provided (remote connection).
Even if it’s my personal expectations for an SSH Bastion, I think it’s a pretty common need, and you could use this list as a template for your own checklist while reviewing software or service capabilities. Don’t forget to look at the security standard you try to tackle, you may find specific requirements to add.
Let’s have a look at how AWS Systems Manager matches this list.
AWS Systems Manager is a service that allows you to manage your compute resources like Amazon EC2 instance or EKS node without the need for an SSH connection to your target host. You can use the AWS CLI or the console to connect to your host.
We talked about SSH Bastion, so generally, you have to deal with SSH Keys or SSH Certificates, expose your SSH server and deal with firewall rules to filter access.
With AWS Systems Manager, you don’t have to deal with all this stuff! No SSH keys are required, and you don’t have to expose your SSH server on the targeted host (therefore, SSH needs to run, no more magic). With AWS Systems Manager, no more rules on TCP 22 in your Security Group. Your auditor won’t believe you.
Moreover, Systems Manager will securely log everything to S3, from connection logs to session logs.
Let's see in detail how it fits our needs:
Authentication, user management, and user permissions |
|
Audit logs |
|
OS hardening and Patch Management |
|
Secure access to the bastion |
|
High Availability of the bastion |
|
Monitoring |
|
Now we know that AWS Systems Manager fits all my needs, let's look at how to set up and use it.
You don’t need a lot of things to start using Systems Manager:
Therefore, I strongly recommend having at least a dedicated AWS account for the logs. It’s a common best practice and your security auditor or your CISO will be happy (we all want those guys to be happy, am I right?). By having this dedicated account for logs, you will enforce the security of the logs since they are highly sensitive: activity of your people, the content of the session, and in some cases secrets information, needs to be stored with care.
You certainly already have an AWS Organization with a few accounts, maybe managed by AWS Control Tower. In that case, creating a new account won’t be an issue, and in some cases, it already exists (by default it’s named Log Archive). Use it!
You will send logs to this account from another account, so you have a little setup to achieve this on the bucket(s). Here is a template to apply as a Bucket Policy on all the buckets you want to use to store your logs (because maybe you want to have a bucket for each account, each env, etc).
To edit the Bucket Policy of your bucket, go to the AWS Console, choose S3, click on your bucket and go to the Permissions tab.
Customize the template below:
{
"Version": "2012-10-17",
"Id": "Policy1625564666394",
"Statement": [
{
"Sid": "SSMPutLogs",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{{ SRC_ACCOUNT_ID }}:root"
},
"Action": [
"s3:PutObject",
"s3:PutObjectAcl"
],
"Resource": "arn:aws:s3:::{{ MY BUCKET }}/*"
},
{
"Sid": "2",
"Effect": "SSMCheckBucket",
"Principal": {
"AWS": "arn:aws:iam::{{ SRC_ACCOUNT_ID }}:root"
},
"Action": [
"s3:GetBucketAcl",
"s3:GetEncryptionConfiguration"
],
"Resource": "arn:aws:s3:::{{ MY BUCKET }}"
}
]
}
Replace {{ SRC_ACCOUNT_ID }}
with the Account ID where the logs come from.
Replace {{ MY BUCKET }}
with the name of your bucket.
Once you have created your(s) bucket(s) and the correct setting, enable AWS Systems Manager. Do this task on each account where you want your SSM to connect to your resources.
On the console, select the service Systems Manager and select Session Manager in the left menu:
Next, click on Preferences tab to begin the settings.
Now, click on edit, and complete the S3 configuration:
Since your bucket is on another account, choose Enter a bucket name in the text box and put the name of your bucket, not the ARN.
Save your changes.
The last step is about creating a role (or updating an existing role) to allow AWS Systems Manager Agent to work. There are three things to do:
1) Allow the agent to check if the bucket is encrypted
2) Allow the agent to write logs on the bucket
Here is the IAM policy you need to add to your role (for example as an inline policy) for the first two steps:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "SSMLoggingEnc",
"Effect": "Allow",
"Action": "s3:GetEncryptionConfiguration",
"Resource": "arn:aws:s3:::{{ YOUR_BUCKET }}"
},
{
"Sid": "SSMLogging",
"Effect": "Allow",
"Action": [
"s3:PutObjectAcl",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::{{ YOUR_BUCKET }}/*"
}
]
}
Replace {{ YOUR_BUCKET }}
with the name of your bucket.
3) Allow the agent to interact with the Systems Manager service
This part is easy, simply attach through the console the existing policy AmazonSSMManagedInstanceCore to your role.
If you are using Amazon Linux 2 AMI, the agent is already installed. With the previous steps, you have finalized the setup, you can jump to the next section.
If you don’t already have Systems Manager Agent installed, you can follow the official documentation to manually install the agent.
Now you can connect to any host that has an SSM Agent installed and the role.
You could use both the CLI or the console. I will talk only about the CLI, my favorite method. Of course, replace INSTANCE_ID
with the ID of your target.
$ aws ssm start-session --target INSTANCE_ID
Starting session with SessionId: xxxxxxxx
sh-4.2$ whoami
ssm-user
As you can see, we are connected as ssm-user
. Now you can sudo
to get root access and do more privileged work.
Once disconnected (there is a default timeout at 20 minutes, great feature too), if you look inside the bucket set for the logs, you will found a file xxxxxxxx.log
with the content of your session:
Script started on 2021-08-07 20:09:07+0000
[?1034hsh-4.2$
[Ksh-4.2$ whoami
ssm-user
sh-4.2$ sudo) [K[K -i
]0;root@ip-10-250-0-13:~[?1034h[root@ip-10-250-0-13 ~]# logout
sh-4.2$ exit
Script done on 2021-08-07 20:09:07+0000
AWS uses something like the command script
to record the session. You can replay the session easily. It’s very cool that AWS keeps this feature. script
is an old tool, unknown from most people, but very useful for auditing a system, for example.
I imagine you are using Terraform to manage all your resources on AWS. Congratulations, you are doing it well.
You could easily set up everything we talked about with Terraform. Currently, there is an issue if you try to set up a logging configuration with Terraform on SSM after you have visited the page on the console. You will encounter an issue because the resource already exists:
Error: Error creating SSM document: DocumentAlreadyExists: Document with same name SSM-SessionManagerRunShell already exists
Delete it with the CLI before applying your setting with Terraform:
$ aws ssm delete-document --name SSM-SessionManagerRunShell
Bonus, because it’s not easy to quickly find how to do it with TF, here is the resource to use to set up the logging configuration for SSM:
resource "aws_ssm_document" "ssm_logging" {
name = "SSM-SessionManagerRunShell"
document_type = "Session"
document_format = "JSON"
content = <<EOF
{
"schemaVersion": "1.0",
"description": "Document to hold regional settings for Systems Manager",
"sessionType": "Standard_Stream",
"inputs": {
"s3BucketName": "",
"s3KeyPrefix": "",
"s3EncryptionEnabled": true
}
}
EOF
}
Replace {{ YOUR_BUCKET}} with the name of your bucket.
If you are using managed Kubernetes service by AWS, EKS, you could even use Systems Manager for your EKS Nodes. It works for nodes groups (not Fargate).
Use the same configuration workflow as for an EC2 instance:
For the AMI, I recommend using Amazon EKS optimized Amazon Linux AMIs where SSM Agent is installed by default since AMI release v20210621.
$ aws ssm start-session --target NODE_INSTANCE_ID
Starting session with SessionId: xxxxxxxxx
sh-4.2$ kubelet --version
Kubernetes v1.20.4-eks-6b7464
AWS Systems Manager is almost the perfect solution to replace your old EC2 SSH Bastion. It brings you everything you could expect from a “state of the art” SSH Bastion, easily … and for free (except the cost of S3 for logging)!
Of course, it only fits if you are using AWS services. For other Cloud Providers and for On-Premise architecture, you have to consider other solutions. The old SSH Bastion on a GNU/Linux system isn’t dead. You could also consider specific solutions like Wallix, Teleport, or Boundary.
AWS Systems Manager also pushes you to consider your instance like cattle instead of pets, but this is a common case for modern architecture.
I hope this article convinces you that you don’t have to deal with complex systems to be able to manage your instance and EKS nodes. AWS Systems Manager has more to offer, feel free to discover all the capabilities by reading the documentation.
If you have any questions, or if you think I missed something with my SSH Bastion requirements, feel free to contact me, I will be happy to continue the conversation.