Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications. It has recently seen rapid adoption across enterprise environments.
Many environments rely on managed Kubernetes services such as Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS) to take advantage of the benefits of both containerization and the cloud. Google Cloud Platform, for example, offers advanced cluster management features such as automatic upgrades, load balancing, and Stackdriver logging. Amazon Web Services (AWS) provides Kubernetes(R) Role based access control (RBAC) integration with AWS IAM Authenticator and logging via Cloudtrail and Cloudwatch.
However, adoption of these cloud-managed services can introduce new challenges to your monitoring and detection capabilities, such as:
- understanding the shared responsibility model
- and lack of normalization in logging & monitoring.
The purpose of this post and associated release is to help fill in the detection gaps between Kubernetes, EKS, and GKE, and provide normalized detection strategies that include multi-cloud or hybrid cloud environments.
With their advanced features, cloud-managed services add potential and new attack vectors as well as room for error or confusion.
Kubernetes itself is highly complex; Trail of Bits recently published a 241 page security assessment of the Kubernetes platform. The intent of the audit was to identify and document existing risks and vulnerabilities of the platform, ultimately providing recommendations to improve its security posture.
The assessment team found configuration and deployment of Kubernetes to be non-trivial, with certain components having “confusing default settings, missing operational controls, and implicitly defined security controls”. This presents a need to understand the vulnerabilities and limitations of the platform itself, in addition to passing a high technical barrier of entry.
Each managed cloud service may remediate these issues differently (or not at all). According to Google, “The report also calls out Kubernetes’ default settings. In GKE, we’ve been actively changing these over time, including turning off ABAC and basic authentication by default, to make sure new clusters you create are more secure.” ABAC is attribute-based access control.
Insecure defaults may have inconsistent fixes between managed platforms, and new protections may not apply to earlier versions. A well-intentioned administrator may also re-enable insecure settings to restore previous functionality or compatibility.
Because of this, our new detections are focused not only on potential attacks but on environmental awareness with an emphasis on multi-cloud support.
Understanding the shared responsibility model
Managed services such as Amazon Elastic Kubernetes Service (EKS) and Google Kubernetes Engine (GKE) use a shared responsibility model which describe the role of security of the cloud and security in the cloud (see: the GKE shared responsibility model versus EKS).
The good news is that this model clearly defines things that aren’t your responsibility - like the underlying infrastructure, the etcd database, and control plane nodes. The bad news is that you also need to understand what you are responsible for in order to implement the proper security monitoring and controls, and this includes platform-specific implementations such as defining and protecting your security groups or IAM roles.
API server flags, security control defaults, and the Kubernetes patch version may also be considered in their purview for new systems, while any modifications to that or continued use of legacy systems become yours. Through correlation rules you are able to better track these security regressions.
Logging, monitoring, and normalization
Logs are the foundation of security monitoring and the key to understanding who is doing what on your systems. USM Anywhere supports log collection for AWS, GCP, and Azure, which significantly expedites the collection and normalization of log data.
In addition to the Kubernetes control plane logs, you should also remember to monitor the associated API logs for the cloud services such as API calls to EKS or GKE.
For cloud-managed Kubernetes logging you’ll need to understand which logs you need, where to send them, and any limitations that may influence your decisions. For example, GKE container logs are removed when their host Pod is removed, if the log storage disk runs out of space, or when logs are periodically rotated. Once exported to an external source such as Stackdriver though, retention is only affected by the policy (if any) set on that platform and the events can be easily ingested into the SIEM.
See the following tips to maximize your log coverage:
Kubernetes Audit Logging is integrated with Cloud Audit Logs and Stackdriver Logging.
- Use Kubernetes Audit Policy to define which log entries are exported by the Kubernetes API server, whether they are sent to the Admin Activity log or Data Access log, and what data they should contain. Each GCP project potentially contains the following logs:
- Admin Activity log - Contains API calls for administrative, state-changing actions on resources; enabled by default, no cost
- Data Access log - Contains API calls for read actions on resources and user-driven API calls on user-provided resource data; disabled by default, can be expensive
- Keep data retention requirements in mind when writing your audit policy.
- Read more about GKE audit logging here.
Amazon EKS integrates with Cloudtrail to log EKS API by calls by users, roles, or services as events. Amazon EKS control plane logs are delivered to Cloudwatch. The control plane log streams include Kubernetes API server component logs (api), Audit (audit), Authenticator (authenticator), Controller manager (controllerManager), and Scheduler (scheduler).
- Provide that a Cloudtrail trail is set up to ingest EKS API logs.
- Enable control plane logging per cluster.
- Control plane logs are disabled by default.
- Log verbosity is set to a default of 2, but may be changed depending on your requirements.
Once these logs are in place, they are ready to be ingested by your security analytics or log management platform.
From here, you’d typically have to ask yourself how to monitor these logs. How does the log syntax for unauthenticated requests differ between Cloudwatch and Stackdriver? What interesting platform-specific API events might exist, such as association of AWS IAM with a Kubernetes service account? Fortunately, this part is already done for you in USM Anywhere.
We have released the following Cloud Kubernetes Correlation Rules that support both EKS and GKE, including:
- Known crypto mining image
- Potentially dangerous container command
- Known malicious Kubernetes pod
- New cluster role with exec permissions
- Kubernetes unauthenticated request allowed
- New privileged cluster role
- A user attached to a Pod
- New host network Pod
- Kubernetes service exposing resources
- New privileged container in a Pod
- New Pod using a sensitive volume
- Unauthenticated command execution in a Pod