DevOps Blog

AWS Systems Manager to replace your SSH Bastion | Padok

Written by Guillaume Leccese | 26-Aug-2021 12:31:27

What do I expect from an SSH Bastion?

SSH is the de facto standard for remote access to your GNU/Linux operating system (and for all modern Unix systems). When you are using SSH in a corporate environment, in order to comply with security standards like ISO 27001, PCI DSS, or SOC2, you need an SSH Bastion to manage remote access to your infrastructure.

I’m sharing with you what I expect from this type of service and what features I want to have. This list comes from many years of managing SSH Bastions and also a lot of security audits from third parties.

Authentication, user management, and user permissions


Fined-grained access control is the primary feature I’m looking for when I evaluate an SSH Bastion. I classified my needs with these three categories:

  • who: control who can access the bastion
  • when: allows me to set a time range for user or group
  • where (from/to): I can control the source and the destination of the connection

The system should be connected to my organization SSO (with OIDC or SAML):

  • I can define most of my ACL in my SSO to centralize the information (shared use with other tools)

ℹ️ Note: this integration could lead to external dependency and could have an impact in case of an incident (one on the SSO can’t prevent you from connecting to your infrastructure). You need to take this into account.

Ability to quickly revoke access (usually provided by the SSO integration, but some systems aren’t synchronized, so your Bastion has to do the job instead). Quickly removing access is a key feature for a lot of use cases:

  • an employee leaving your organization
  • removing access to an employee because you have a social issue (there is a risk for your organization)
  • removing access of a compromised account
  • etc.

Audit logs


Logs are the second key feature for an SSH Bastion. There are two parts in a logging capability.

The first one is about connection logs. I classified my needs into four categories:

  • who: who is using the bastion
  • when: login and logout time
  • where: source of the connection, and the destination of the connection
  • what (session logs): both input and output (take attention to secret information inside the logs)

The second one is about log storage. Logs have to be securely stored:

  • access control to the log because they are confidential AND can contain a lot of secret information (input secret, secret displayed from configuration files, etc.)
  • Immutability of the logs: my logs can’t be overwritten and changed
  • Prefer remote storage for the logs because you could more easily control access and restrict the permissions. Local copies are the hardest to protect and therefore to trust.

Bonus: I like when I can replay the logs (using script command for example)

OS hardening and Patch Management


Because my bastion is a critical resource in my infrastructure, I want a state-of-the-art OS hardening on my bastion. I also want to be up-to-date with the most recent innovations and best practices on this theme.

Next, I want my OS to be updated quickly after a patch is released. Furthermore, I want to avoid breaking the service and minimize the outage each time an update is deployed.

Patch Management isn’t the only issue to tackle. I need to be able to identify the latest Security Vulnerability (like zero days) applicable to my system and deploy, if available, the workaround.

Secure access to the bastion


Most of the time, a bastion needs to be accessed from anywhere, anytime. A best practice is to put the bastion behind a VPN or something limiting access.

While it’s an excellent practice (reducing the exposure of your service reduces the surface of potential attacks), it isn’t always possible for a lot of reasons like the cost of managing a VPN endpoint, the additional time needed to maintain all the solutions, etc. (a meshed VPN network with a service like Tailscale could be an affordable and elegant solution in that case, maybe the content for another article!).

So, secure access to your bastion is hard-linked with the previous security thematic (OS, Patch, and Vulnerability).

High Availability of the bastion


Because it’s a critical component, I need a high level of service and I can’t rely on a single instance for the administration entry point of my infrastructure. It will be unacceptable to not have the possibility to access my systems during an incident because my SSH Bastions are unavailable.

Monitoring


I want to be able to easily monitor my bastion (with an API for example) and I want to detect if there is an issue with the service provided (remote connection).

Even if it’s my personal expectations for an SSH Bastion, I think it’s a pretty common need, and you could use this list as a template for your own checklist while reviewing software or service capabilities. Don’t forget to look at the security standard you try to tackle, you may find specific requirements to add.

Let’s have a look at how AWS Systems Manager matches this list.

What is AWS Systems Manager?

AWS Systems Manager is a service that allows you to manage your compute resources like Amazon EC2 instance or EKS node without the need for an SSH connection to your target host. You can use the AWS CLI or the console to connect to your host.

We talked about SSH Bastion, so generally, you have to deal with SSH Keys or SSH Certificates, expose your SSH server and deal with firewall rules to filter access.

With AWS Systems Manager, you don’t have to deal with all this stuff! No SSH keys are required, and you don’t have to expose your SSH server on the targeted host (therefore, SSH needs to run, no more magic). With AWS Systems Manager, no more rules on TCP 22 in your Security Group. Your auditor won’t believe you.

Moreover, Systems Manager will securely log everything to S3, from connection logs to session logs.

Let's see in detail how it fits our needs:

Authentication, user management, and user permissions

  • AWS Systems Manager relies on IAM, so you have all the power of IAM policies to give permissions access to IAM Users and Roles
  • You could use AWS SSO with your IDP
  • With IAM policies, you can set a time range, which could be very useful for outsourcing on-call and restricting access to people only for a short period of time (debug session, audit, etc)
  • Cloudtrail logs benefits when logging and using the API

Audit logs

  • AWS Session Manager and Cloudtrail generate logs for each connection: usage and content of the session. So you have a complete trace of who, when, where, and what.
  • Logging relies on S3, so you have all the power of S3: availability, storage redundancy, replication, encryption, versioning, lifecycle policies, security policies, etc. Also, S3 and IAM will provide fine-grained access to the bucket and its content. You will easily achieve the goal of the immutability of the logs by using the right policy.
  • There isn’t a local vs a remote copy. Only one copy exists on S3. So you have only one source of truth (and you can secure it easily).
  • Cloudtrail benefits when logging in and using the API

OS hardening and Patch Management

  • Nothing to do here! Managed Services means no work on your side, no patching, and no hardening. From an Ops view, it’s more time to do valuable stuff other than scheduling patch sessions and maintenance windows.

Secure access to the bastion

  • No system means nothing to secure. Access policies are managed through AWS IAM and access is made through AWS API. You only have to deal with the configuration of what can do each user of your system

High Availability of the bastion

  • By using AWS Systems Manager you have a de facto high availability system and a very resilient service with components like AWS API, IAM, and S3. They are certainly one of the most robust services available in AWS and even on the net.

Monitoring

  • This topic is difficult to tackle because you can’t easily monitor AWS Systems Manager. But in fact, you don’t have to monitor it, or, precisely, we assume that because we rely on strategic components on AWS (AWS API, S3, IAM), chances are very, very low, to have an incident that impacts you when you need to use AWS Systems Manager. The availability rate will be much higher than if you have to deploy and manage your own system.

Now we know that AWS Systems Manager fits all my needs, let's look at how to set up and use it.

How to use AWS Systems Manager

You don’t need a lot of things to start using Systems Manager:

  • A bucket where to store the logs
  • Enable AWS Systems Manager logging
  • An IAM role that allows SSM on your resources and sends the logs to your bucket
  • An EC2 instance with SSM Agent

The logs


Therefore, I strongly recommend having at least a dedicated AWS account for the logs. It’s a common best practice and your security auditor or your CISO will be happy (we all want those guys to be happy, am I right?). By having this dedicated account for logs, you will enforce the security of the logs since they are highly sensitive: activity of your people, the content of the session, and in some cases secrets information, needs to be stored with care.

You certainly already have an AWS Organization with a few accounts, maybe managed by AWS Control Tower. In that case, creating a new account won’t be an issue, and in some cases, it already exists (by default it’s named Log Archive). Use it!

You will send logs to this account from another account, so you have a little setup to achieve this on the bucket(s). Here is a template to apply as a Bucket Policy on all the buckets you want to use to store your logs (because maybe you want to have a bucket for each account, each env, etc).

To edit the Bucket Policy of your bucket, go to the AWS Console, choose S3, click on your bucket and go to the Permissions tab.

Customize the template below:

{
    "Version": "2012-10-17",
    "Id": "Policy1625564666394",
    "Statement": [
        {
            "Sid": "SSMPutLogs",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::{{ SRC_ACCOUNT_ID }}:root"
            },
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": "arn:aws:s3:::{{ MY BUCKET }}/*"
        },
        {
            "Sid": "2",
            "Effect": "SSMCheckBucket",
            "Principal": {
                "AWS": "arn:aws:iam::{{ SRC_ACCOUNT_ID }}:root"
            },
            "Action": [
                "s3:GetBucketAcl",
                "s3:GetEncryptionConfiguration"
            ],
            "Resource": "arn:aws:s3:::{{ MY BUCKET }}"
        }
    ]
}

Replace {{ SRC_ACCOUNT_ID }} with the Account ID where the logs come from.

Replace {{ MY BUCKET }} with the name of your bucket.

AWS Systems Manager logging


Once you have created your(s) bucket(s) and the correct setting, enable AWS Systems Manager. Do this task on each account where you want your SSM to connect to your resources.

On the console, select the service Systems Manager and select Session Manager in the left menu:

Next, click on Preferences tab to begin the settings.

Now, click on edit, and complete the S3 configuration:

Since your bucket is on another account, choose Enter a bucket name in the text box and put the name of your bucket, not the ARN.

Save your changes.

IAM role

The last step is about creating a role (or updating an existing role) to allow AWS Systems Manager Agent to work. There are three things to do:

1) Allow the agent to check if the bucket is encrypted

2) Allow the agent to write logs on the bucket

Here is the IAM policy you need to add to your role (for example as an inline policy) for the first two steps:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "SSMLoggingEnc",
            "Effect": "Allow",
            "Action": "s3:GetEncryptionConfiguration",
            "Resource": "arn:aws:s3:::{{ YOUR_BUCKET }}"
        },
        {
            "Sid": "SSMLogging",
            "Effect": "Allow",
            "Action": [
                "s3:PutObjectAcl",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::{{ YOUR_BUCKET }}/*"
        }
    ]
}

Replace {{ YOUR_BUCKET }} with the name of your bucket.

3) Allow the agent to interact with the Systems Manager service


This part is easy, simply attach through the console the existing policy AmazonSSMManagedInstanceCore to your role.

AWS SSM Agent


If you are using Amazon Linux 2 AMI, the agent is already installed. With the previous steps, you have finalized the setup, you can jump to the next section.

If you don’t already have Systems Manager Agent installed, you can follow the official documentation to manually install the agent.

Connect to your host


Now you can connect to any host that has an SSM Agent installed and the role.

You could use both the CLI or the console. I will talk only about the CLI, my favorite method. Of course, replace INSTANCE_ID with the ID of your target.

$ aws ssm start-session --target INSTANCE_ID 

Starting session with SessionId: xxxxxxxx
sh-4.2$ whoami
ssm-user

As you can see, we are connected as ssm-user. Now you can sudo to get root access and do more privileged work.

Once disconnected (there is a default timeout at 20 minutes, great feature too), if you look inside the bucket set for the logs, you will found a file xxxxxxxx.log with the content of your session:

Script started on 2021-08-07 20:09:07+0000
[?1034hsh-4.2$ 
[Ksh-4.2$ whoami 

ssm-user

sh-4.2$ sudo) [K[K -i

]0;root@ip-10-250-0-13:~[?1034h[root@ip-10-250-0-13 ~]# logout

sh-4.2$ exit

Script done on 2021-08-07 20:09:07+0000

AWS uses something like the command script to record the session. You can replay the session easily. It’s very cool that AWS keeps this feature. script is an old tool, unknown from most people, but very useful for auditing a system, for example.

Terraform everything


I imagine you are using Terraform to manage all your resources on AWS. Congratulations, you are doing it well.

You could easily set up everything we talked about with Terraform. Currently, there is an issue if you try to set up a logging configuration with Terraform on SSM after you have visited the page on the console. You will encounter an issue because the resource already exists:

Error: Error creating SSM document: DocumentAlreadyExists: Document with same name SSM-SessionManagerRunShell already exists

Delete it with the CLI before applying your setting with Terraform:

$ aws ssm delete-document --name  SSM-SessionManagerRunShell

Bonus, because it’s not easy to quickly find how to do it with TF, here is the resource to use to set up the logging configuration for SSM:

resource "aws_ssm_document" "ssm_logging" {
  name            = "SSM-SessionManagerRunShell"
  document_type   = "Session"
  document_format = "JSON"

  content = <<EOF
{
    "schemaVersion": "1.0",
    "description": "Document to hold regional settings for Systems Manager",
    "sessionType": "Standard_Stream",
    "inputs": {
        "s3BucketName": "",
        "s3KeyPrefix": "",
        "s3EncryptionEnabled": true
    }
}
EOF
}

Replace {{ YOUR_BUCKET}} with the name of your bucket.

Even for Kubernetes nodes!


If you are using managed Kubernetes service by AWS, EKS, you could even use Systems Manager for your EKS Nodes. It works for nodes groups (not Fargate).

Use the same configuration workflow as for an EC2 instance:

  • A bucket where to store the logs
  • Enable AWS Systems Manager logging
  • An IAM role to allow SSM on your EKS nodes and to send the logs to your bucket
  • An AMI with SSM Agent installed

For the AMI, I recommend using Amazon EKS optimized Amazon Linux AMIs where SSM Agent is installed by default since AMI release v20210621.

$ aws ssm start-session --target NODE_INSTANCE_ID

Starting session with SessionId: xxxxxxxxx
sh-4.2$ kubelet --version
Kubernetes v1.20.4-eks-6b7464

AWS Systems Manager is almost the perfect solution to replace your old EC2 SSH Bastion. It brings you everything you could expect from a “state of the art” SSH Bastion, easily … and for free (except the cost of S3 for logging)!

Of course, it only fits if you are using AWS services. For other Cloud Providers and for On-Premise architecture, you have to consider other solutions. The old SSH Bastion on a GNU/Linux system isn’t dead. You could also consider specific solutions like Wallix, Teleport, or Boundary.

AWS Systems Manager also pushes you to consider your instance like cattle instead of pets, but this is a common case for modern architecture.

I hope this article convinces you that you don’t have to deal with complex systems to be able to manage your instance and EKS nodes. AWS Systems Manager has more to offer, feel free to discover all the capabilities by reading the documentation.

If you have any questions, or if you think I missed something with my SSH Bastion requirements, feel free to contact me, I will be happy to continue the conversation.