Ramkumar Krishnan: September 2021

In our container environment, the moment we started thinking about protecting containers with the right security practices then the first buzz word would come in all our minds is "isolation".

You are right !, In container security, the real buzzword is "isolation". The more you isolate container runtime from a container Host, the more you isolate one container from another container then the security is almost there. To bring these "isolation", the docker as a framework by default supports some of the isolation practices such as

Docker Namespace
Cgroups
Kernel capabilities.

Docker namespace brings much isolation by providing namespace separation for "process"," mount", "network stack", etc., etc. For example with the docker process namespace, the isolation is provided between the process in the container and the process in the host. The process in the host will have a different process ID, and the same process inside the container will have a different process ID. The processes in running in a host cannot be accessed inside the container and vice versa. This way docker provides isolation of one container is not disturbing other container and also not disturbing the host.

CGroups is another key component that supports isolation in docker. They implement resource accounting and limiting. They provide many useful metrics, but they also help ensure that each container gets its fair share of memory, CPU, disk I/O; and, more importantly, that a single container cannot bring the system down by exhausting one of those resources.

If we take Kernel capabilities, the docker by default restricts the set of kernel capabilities within the container. For example, the root user in the docker Host will NOT have all the capabilities inside the docker container.

Along with the aforementioned isolation practices, we will look at some of the docker secure practices which docker and Linux Kernel support.

Below is the list of container secure practices, we will discuss in this article. Also, according to me if we want to learn how to protect something we should first be knowing how to break it too. Let's do our learning with easy exploitation practice on some of the container security weaknesses in the docker environment.

Let's start with docker architecture to understand why we do we say "isolation" is important in docker security. If we look at the below diagram, you could imagine how the kernel is positioned in docker architecture while comparing traditional VM architecture. In VM architecture the individual VM process will have it is own dedicated Kernel, but when it comes to docker architecture this is not the case. Each container's process will share the same host Kernel across the cluster.

This is one of the reasons why "isolation" is important in docker security terms. Let's take an example if one container is damaged with attacker arbitrary code then eventually there is a possibility

of the vulnerability breakout from the container to the host kernel. As the kernel is shared across the container and the docker engine is positioned above the host kernel the attack surface will be extended to break out to the other containers in the cluster also. This is the risk the docker architecture poses in terms of sharing host kernel across container processes.

Rootless containers

Running your containers as a "Rootless container". It means running the entire container runtime as well as the containers without the root privileges.

In a normal scenario when a docker engine spins a new container process the default privilege

the container that will be running is "root" privilege, though the default docker isolation

practices limit the root user capabilities within the container but still the container

will be running in an as the root user. In any case, if the container runtime is processed it could maximum

impact to the container and also if the vulnerability breakout the vulnerability will have access to docker engine and host machine kernels.

Also, if we really look into the need for a running container in ROOT mode. Absolutely 90 % there is NO need to run the container in root mode.

Below are the potential threats of running container in ROOT mode

Within the container

A compromised container runtime: With root, context can perform any action inside the container including installing new software editing files, mount file system, modify permission, etc.,

Outside the container

In a compromised container, the vulnerability could:

Breakout the container and escalate permission to Host.
Breakout the container to damage another container
Breakout to docker engine and can make requests to the Docker API server.

How to exploit the root containers

Here I will show you how the container running with root mode can be exploited in simple ways.

I've used Katacoda as a testing environment.

As a first step to exploit, you can verify the container running mode as shown below.

In the below container I verified that container is running mode by running the "whoami" command

inside the container.

Privilege escalation to host machine

In the below steps, I've shown how privilege escalation happens from the docker container to the Docker host.

To simulate I've mounted the host machine filesystem as a volume into the container, then I run

the command "cat /host/etc/shadow". The output is listing the user's details of the host machine.

Small DoS attack within the container

In the below step, I'll show a simple DoS attack exploitation within the docker container.

Here the container is running in a root user mode, hence it has the privilege to install any software's within the container. Taking advantage of that, I install the Debian package called "Stress", then using the "stress" package I make heavy load to container memory thereby bring down the container to "OOMKilled" mode. Successfully made the DoS exploit.

How to run as a "Rootless container"

Here the some of the basic steps to consider running your container as a "rootless container"

1. Update your YAML file (if using K8s) and the security context section to

"runAsNonRoot" : true

"runAsUser" : 1000

2. Add a new non-root user in your docker file

RUN groupadd --gid 1000 NONROOTUser && useradd --uid 1000 --gid 1000 --home-dir /usr/share/NONROOTUser --no-create-home NONROOTUser

USER NONROOTUser

3. In case your container port is running in privileged port anything below 1024 for example port 80, please modify

to run in an unprivileged port (anything above 1024), for example, port 5000.

Rootless Docker Engine

Running docker-engine or daemon in a NON-ROOT user context.

In the above section, we saw "rootless container", here the other secure practice is to run your docker engine /host itself in a rootless mode.

Docker recently introduced a "rootless docker-engine" as part of Docker version 19.03. Docker recommends

to run your container as rootless mode, however, this feature is still previewed mode and yet to

be used by many peoples.

With the below command, you can check your docker engine is running in root mode or rootless mode.

Docker Seccomp Profile

Secure computing mode (second) is a Linux kernel feature.

Seccomp acts like a firewall for systems (syscalls) from container to host kernel.
Sample list well known syscalls: MKDIR <> , REBOOT <>, MOUNT <>,KILL <>, WRITE <>.
Docker default Seccomp profile disables 44 dangerous system calls, out of 313 available in 64-bit Linux systems
As per Docker incident CVE’s list, most docker incidents are due to privileged Syscalls.
Docker default Seccomp profile provided whitelisted Syscalls most of the time NOT necessary for our product needs. It is recommended to have a product-specific custom seccomp profile by whitelisting only Syscalls used by our container.

How to check Container Seccomp Profile

We can verify your container runtime is enabled with default seccomp profile protection or not. Just go inside your container terminal mode and run the below command grep Seccomp /proc/$$/status ( as shown below)

Seccomp value 2 means it is ENABLED

Seccomp value 0 means it is NOT enabled

Docker Limited Kernel capabilities

By default, Docker starts containers with a restricted set of capabilities. This provides

greater security within the container environment.

It means though your container's process is running with a root mode, the Kernel capabilities

within the container are limited. Docker will allow only limited capabilities within the

container which the user process can execute. However, this default protection from docker

can be overridden if you run your container in a "privileged" mode.

To understand better. If you log into your Linux host machine as a Root user then you will

have the below Linux kernel capabilities will be allowed.

CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH, CAP_FOWNER, CAP_FSETID, CAP_KILL, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_LINUX_IMMUTABLE, CAP_NET_BIND_SERVICE, CAP_NET_BROADCAST,

CAP_NET_ADMIN, CAP_NET_RAW, CAP_IPC_LOCK, CAP_IPC_OWNER, CAP_SYS_MODULE, CAP_SYS_RAWIO, CAP_SYS_CHROOT, CAP_SYS_PTRACE, CAP_SYS_PACCT, CAP_SYS_ADMIN, CAP_SYS_BOOT, CAP_SYS_NICE, CAP_SYS_RESOURCE, CAP_SYS_TIME, CAP_SYS_TTY_CONFIG, CAP_MKNOD,

CAP_LEASE, CAP_AUDIT_WRITE, CAP_AUDIT_CONTROL, CAP_SETFCAP, CAP_MAC_OVERRIDE, CAP_MAC_ADMIN, CAP_SYSLOG

But the same root user enters into the docker container the most above kernel capabilities will

be dropped and only below restricted list of capabilities will be allowed.

CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID, CAP_KILL, CAP_SETGID,

CAP_SETUID, CAP_SETPCAP,CAP_NET_BIND_SERVICE, CAP_NET_RAW,CAP_SYS_CHROOT,CAP_MKNOD, CAP_AUDIT_WRITE

DO NOT RUN CONTAINER IN – –PRIVILEGED MODE !!

The privileged container can do almost everything that the host can do.

The --privileged flag gives all capabilities to the container, and it also lifts all the limitations enforced by the device cgroup controller.

Using the below command you can verify whether your command is running in PRIVILEGED Mode or normal mode.

If the command returns TRUE, it means the container is running in a PRIVILEGED mode.

Run container with limited or NO Kernel capabilities

Absolutely, in normal scenarios, most of the Microservices running in a container does NOT need

all Kernel capabilities provided by Docker.

Hence, the best practice is to DROP all capabilities and add only the required capabilities.

This can be done from Kubernetes docker Yaml file security context configuration. In your security

context either DROP all capabilities. Example

SecurityContext => Capabilities => drop : ALL

Or add only the required capabilities. Example

SecurityContext => Capabilities => add : ["NET_ADMIN", "SYS_TIME"]

Docker SE Linux Protection

Docker SELinux controls access to processes by Type and Level to the containers. Docker offers two forms of SELinux protection: type enforcement and multi-category security (MCS) separation.

SELinux is a LABELING system
Every process has a LABEL. Every File, Directory, and System object has a LABEL
SE Linux Policy rules control access between labeled processes and labeled objects.

!! To enable SE Linux in a container, your Linux host machine must have SE Linux enabled and running !!

Docker UNIX Socket (/var/run/docker. Sock) usage

There are approaches followed by developers to achieve container management related functionalities

they will mount the docker UNIT socket inside the container and using the docker socket they

will do achieve the container management functionalities implementations such as for collecting logs from all containers, creating a container, stop container...etc

BE CAUTIOUS WHEN YOU MOUNT THE DOCKER UNIX SOCKET INSIDE YOUR CONTAINER!

It is a more dangerous combination of the Root context, container privileged mode, and UNIX socket mounted.

Below is a sample scenario that mounts the docker UNIX socket inside the container for log management of all the containers running by the docker engine.

Docker Network security

Be cautious on how you expose the services inside the container to outside the cluster.

Do NOT expose the container with External IP ( if there is NO explicit need to run in external IP)
When there is a need to expose with External IP ensure that the inbound connection is encrypted and listening in 443 port.
Always try to expose your services only with Cluster IP mode.
If there is a need to expose with Node Port, ensure that the inbound connection is encrypted and listening in 443 port

Ingress and Egress rules:

Control traffic to your services with Ingress and Egress network policies.

With strict ingress rules supported by Kubernetes you can restrict the inbound connections to your containers.
With strict egress supported by Kubernetes you can restrict the outbound connections from your connection to another network.

Other Docker Security Practices

Volume mount – as read-only
Ensure SSHD does not run within the containers
Ensure Linux host network interface is not shared with containers.
Having no limit on container memory usage can lead to issues where one container can easily make the whole system unstable in case a DoS attack happened
Don't mount system-relevant volumes (e.g. /etc, /dev, ...) of the underlying host into the container instance to prevent an attacker can compromising the entire system and not just the container instance.
Incase Docker daemon is available remotely over a TCP port. Ensure TLS communication.
Consider read-only filesystem for the containers.
Leverage secrets store/wallets instead of environment variables for sensitive data storage inside a docker container.

Ramkumar Krishnan

Medium Feed

Sunday, September 26, 2021

Container Security - Learn with exploiting the weakness