Yes, In docker security the real buzz word is "isolation". The more you isolate docker container runtime from a docker Host, the more you isolate one docker container from another container then the security is almost there. To bring these "isolation", the docker as a framework by default supports some of isolation practices such as
- Docker Namespace
- Cgroups
- Kernel capabilities.
CGroups are another key component supports isolation in docker. They implement resource accounting and limiting. They provide many useful metrics, but they also help ensure that each container gets its fair share of memory, CPU, disk I/O; and, more importantly, that a single container cannot bring the system down by exhausting one of those resources.
If we take Kernel capabilities, the docker by default restricts the set the kernel capabilities within the container. For example, the root user in the docker Host will NOT have all the capabilities inside the docker container.
Along with
aforementioned isolation practices, we will look at some of the docker secure
practices which docker and Linux Kernel supports.
Below are the list
of docker secure practices, we will discuss in this article. Also, according to
me if we want to protect something we also should be knowing how to break it
too. Let's learning with exploiting each secure practices weakness in the docker
environment.
Let's start with
docker architecture to understand why we do we say "isolation" is
important in docker security. If we look
at the below diagram , you could imagine that how the kernel is positioned in docker architecture while comparing
traditional VM architecture . In VM architecture the individual VM process will
have it is own dedicated Kernel , but when it comes to docker architecture
which is not the case. Each containers process will share the same host Kernel
across the cluster.
This is one of the
reason why the "isolation" is important in docker security
terms. Let's take an example if one
containers is damaged with attacker arbitrary code then eventually there is a
possibility
of the vulnerability
breakout from the container to host kernel. As kernel is sharing across the
container and docker engine is positioned above the host kernel the attack
surface will be extended to breaking out to the other containers in the cluster
also. This is the risk the docker architecture poses in terms of sharing host
kernel across container processes.
Rootless containers
Running your containers as a "Rootless container". It means running the entire container runtime as well as the containers without the root privileges.
In a normal scenario when a docker engine spins a new container process the default privilege
the container will be running is "root" privilege, though the default docker isolation
practices limits the root user capabilities within the container but the still the container
will be running in a as root user. In any case, if the container runtime is processed it could maximum
impact to the container and also if the vulnerability breakout the vulnerability will have access to docker engine and host machine kernels.
Also, if we really look into the need of running container in ROOT mode. Absolutely 90 % there is NO need to run container in root mode.
Below are the potential threats of running container in ROOT mode
Within the container
A compromised container runtime: With root context can perform any action inside the container including installing new software editing files, mount file system, modify permission etc.,
Outside the container
In a compromised container, the vulnerability could:
- Breakout the container and escalate permission to Host.
- Breakout the container to damage other container
- Breakout to docker engine and can make requests to docker API server.
How to exploit the root containers
Here I will show you how the container running with root mode can be exploited in simple ways.
I've used Katacoda as a testing environment.
As a first step to exploit, you can verify the container running mode as shown below.
In the below container I verified that container is running mode by running "whoami" command
inside the container.
Privilege escalation to host machine
In below steps, I've shown how privilege escalation happens from the docker container to docker host.
To simualte I've mounted the host machine filesystem as a volume into the container, then I run
the command "cat /host/etc/shadow" . The output is listing the users details of host machine.
Small DoS attach within the container
In below step, I'll show a simple DoS attack exploitation within the docker container.
Here the container is running in a root user mode, hence it has privilege to install any software's within the container. Taking advantage of that, I install the debian package called "Stress", then using "stress" package I make heavy load to container memory and thereby bringdown the container to "OOMKilled" mode. Successfully made the DoS exploit.
How to run as "Rootless container"
Here the some of the basic steps to consider running your container as "rootless container"
1. Update your YAML file (if using K8s) and the securit context section to
"runAsNonRoot" : true
"runAsUser" : 1000
2. Add a new non-root user in your docker file
RUN groupadd --gid 1000 NONROOTUser && useradd --uid 1000 --gid 1000 --home-dir /usr/share/NONROOTUser --no-create-home NONROOTUser
USER NONROOTUser
3. Incase your container port is running in priviliged port any thing below 1024 for example: port 80, please modify
to run in unpriviliged port (anything above 1024), for example : port 5000.
Rootless Docker Engine
Running docker engine or daemon in a NON-ROOT user context.
In the above section we saw "rootless container", here the other secure practice to run your docker engine /host itself in a rootless mode.
Docker recently introduced "rootless docker engine" as part Docker version 19.03. Docker recommends
to run your container as rootless mode, however the this feature is still preview mode and yet to
be used by many peoples.
With below command, you can check your docker engine is running in root mode or rootless mode.
Docker Seccomp Profile
Secure computing mode (seccomp) is a Linux kernel feature.
- Seccomp acts like a firewall for systems (syscalls) from container to host kernel.
- Sample list well known syscalls: MKDIR <> , REBOOT <>, MOUNT <>,KILL <>, WRITE <>.
- Docker default Seccomp profile disables 44 dangerous system calls, out of 313 available in 64-bit Linux systems
- As per Docker incident CVE’s list, most of docker incidents are due to privileged Syscalls.
- Docker default Seccomp profile provided whitelisted Syscalls most of time NOT necessary for our product needs.It is recommended to have product specific custom seccomp profile by whitelisting only Syscalls used by our container.
How to check Container Seccomp Profile
We can verify your container runtime is enabled with default seccomp profile protection or not. Just go inside your container terminal mode and run the below command grep Seccomp /proc/$$/status ( as shown below)
Seccomp value 2 means it is ENABLED
Seccomp value 0 meants it is NOT enabled
Docker Limited Kernel capabilities
By default, Docker starts containers with a restricted set of capabilities. This provides
a greater security within the container environment.
It means though your containers process is running with a root mode, the Kernel capabilities
within the container are limited. Docker will allow only a limited capabilities within the
container which user process can execute. However, this default protection from docker
can be overridden if you run your container in a "privileged" mode.
To understand better. If you log into your Linux host machine as a Root user then you will
have the below Linux kernel capabilities will be allowed.
CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH, CAP_FOWNER, CAP_FSETID, CAP_KILL, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_LINUX_IMMUTABLE, CAP_NET_BIND_SERVICE, CAP_NET_BROADCAST,
CAP_NET_ADMIN, CAP_NET_RAW, CAP_IPC_LOCK, CAP_IPC_OWNER, CAP_SYS_MODULE, CAP_SYS_RAWIO, CAP_SYS_CHROOT, CAP_SYS_PTRACE, CAP_SYS_PACCT, CAP_SYS_ADMIN, CAP_SYS_BOOT, CAP_SYS_NICE, CAP_SYS_RESOURCE, CAP_SYS_TIME, CAP_SYS_TTY_CONFIG, CAP_MKNOD,
CAP_LEASE, CAP_AUDIT_WRITE, CAP_AUDIT_CONTROL, CAP_SETFCAP, CAP_MAC_OVERRIDE, CAP_MAC_ADMIN, CAP_SYSLOG
But the same root user enters into docker container the most above kernel capabilities will
be dropped and only below restricted list of capabilities will be allowed.
CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID, CAP_KILL, CAP_SETGID,
CAP_SETUID, CAP_SETPCAP,CAP_NET_BIND_SERVICE, CAP_NET_RAW,CAP_SYS_CHROOT,CAP_MKNOD, CAP_AUDIT_WRITE
DO NOT RUN CONTANIER IN – –PRIVILEGED MODE !!
Privileged container can do almost everything that the host can do.
The --privileged flag gives all capabilities to the container, and it also lifts all the limitations enforced by the device cgroup controller.
Using below command you can verify whether your command is running in PRIVILEGED Mode or normal mode.
If the command returns TRUE, it means container is running in a PRIVILEGED mode.
Run container with limited or NO Kernel capabilities
Absolutely, in normal scenarios most of the Microservices running in a container does NOT need
all Kernel capabilities provided by Docker.
Hence, the best practice is DROP all capabilities and add only required capabilities.
This can be done from Kubernetes docker Yaml file security context configuration. In your security
context either DROP all capabilities. Example
SecurityContext => Capabilities => drop : ALL
Or add only the require capabilities. Example
SecurityContext => Capabilities => add : ["NET_ADMIN", "SYS_TIME"]
Docker SE Linux Protection
Docker SELinux controls access to processes by Type and Level to the containers. Docker offers two forms of SELinux protection: type enforcement and multi-category security (MCS) separation.
Docker UNIX Socket (/var/run/docker. Sock) usage
There are approaches followed by developer to achieve container management related functionalities
they will mount the docker UNIT socket inside the container and using the docker socket they
will do achieve the container management functionalities implementations such as for collect logs from all containers, create a container, stop container...etc
BE CAUTIOUS WHEN YOU MOUNT DOCKER UNIX SOCKET INSIDE YOUR CONTAINER !
It is more dangerous combination of Root context, container privileged mode and UNIX socket mounted.
Below is sample scenario which mounts docker UNIX socket inside container for log management of all the containers running by the docker engine.
Docker Network security
Be cautious on how you expose the services insider the container to outside the cluster.
- Do NOT expose the container with External IP ( if there is NO explicit need to run in external IP)
- When there is a need to expose with External IP ensure that the inbound connection are encrypted and listening in 443 port.
- Always try to expose your services only with Cluster IP mode.
- If there is a need to expose with Node Port, ensure that the inbound connection are encrypted and listening in 443 port
Ingress and Egress rules:
Control traffic to your services with Ingress and Egress network policies.
- With strict ingress rules supported by Kubernetes you can restrict the inbound connections to your containers.
- With strict egress supported by Kubernetes you can restrict the outbound connections from your connection to other network.
Other Docker Security Practices
- Volume mount – as read only
- Ensure SSHD does not run within the containers
- Ensure Linux host network interface not shared with containers.
- Having no limit on container memory usage can lead to issues where one container can easily make the whole system unstable incase DoS attack happened
- Don't mount system relevant volumes (e.g. /etc, /dev, ...) of the underlying host into the container instance to prevent that an attacker can compromise the entire system and not just the container instance.
- Ensure Docker daemon available remotely over a TCP port. Ensure TLS authentication.
- Consider read-only filesystem for the containers.
- Leverage secrets store/wallets instead of environment variables for sensitive data storage inside docker container.
No comments:
Post a Comment