Observability and Visibility in DevSecOps

October 1, 2018  |  Noah Beddome

To celebrate AllDayDevOps coming up October 17 - here's an on-topic blog.

AllDayDevOps event coming soon

Automation is Your Friend DevSecOps

Companies often turn to software as a solution when they need to solve a problem.  Whether it’s to automate or enhance a task, or gain valuable information in an easily consumable fashion. The same is true for security teams on both sides of the red and blue line. Security professionals build tools to automate exploitation, detect attacks, or process large amounts of data into a usable form. By allowing staff members to understand how these software solutions behave in live environments, security teams can avoid common pitfalls. They can also increase the value that they receive from these tools overall.

When discussing software design, the word “visibility” gets tossed around a lot. People may use the word to describe the benefits provided by the software. They may use it to describe a quality of the software’s operation. They may even use it to describe how easy it is to gain an understanding of how the software was designed (i.e. open source). This has led me to believe that when we are talking about visibility, we are really talking about three specific concepts that form this bigger idea:

  • Insight - the valuable data received due to the software’s function
  • Transparency - being able to see how software is designed to function
  • Observability - the ability to view the actual actions software takes and its performance while taking those actions

For consumers of software, insight is the big focus, mostly because it is perceived as relating directly to value. As the role of security teams evolve, both offensive and defensive, these teams have realized that they can't just be consumers. Security teams need to be builders, maintainers, and providers. Security processes, procedures, and software need to be consumable by the greater organization. While good insight and consumable data are a requisite for quality software; what increases buy-in, improves impact in the org, and ultimately makes security software successful are the observability and transparency aspects.

Transparency in Security

In modern agile and DevOps style software development organizations, everything is in source (other than secrets), and every service has mandatory levels of documentation. Engineering teams operate this way in order to foster inter- and intra-team operability of services, to streamline troubleshooting in the event of an outage, and to increase the understanding of how individual services interact with other environment or application components.

Breaking Down Barriers to Collaboration in DevSecOps

For security teams that solve problems by writing code, or who actively work with code written by other teams, conforming to this pattern goes a long way. The similarity in process helps break down barriers to collaboration. Removing any disparity in quality between the systems being secured and the systems doing the securing helps normalize the idea that security is just one quality of the system. Leveraging a transparent approach fosters a greater degree of understanding between the security organization and the rest of the enterprise.

This idea of transparency might cause some shudders on the red side of security: historically, notions of operational security and stealth have permeated red practitiones' methods. These notions are indeed good things when conducting adversarial simulation or incident response, but there is no reason to conceal the function or performance of security software from the teams that have to interact with it outside of these specific scenarios. It is almost a cliché now that security teams are secretive, insular, and operate as a “black box.” Transparency allows security teams to educate and build engagement with teams while reducing operational issues.

Observability in DevSecOps

Observability is, in my opinion, the most important quality for security teams to adopt as part of their development and general operations. This is best demonstrated in a scenario.

Imagine a service on a server. Its only job is to ship and rotate log files so that they can be stored off-server in a central location for compliance. The service is configured to run and be left unattended. Six months later, there is a security event, and only the first two months of logs are present. It turns out that the service failed two months in and was not restarted.

If we were monitoring the used space in the central storage volume and logging a metric off-server for each invocation of the service, we would have known when the service failed to write and corrected the issue before it caused real harm.

This critical failure type scenario is why performance monitoring is a prerequisite in production software. If that failure were a production service that customers were paying for, this event would have been catastrophic to an early stage organization.

Now, let's take another scenario:

Imagine a piece of software that watches for openings of firewall and security group rules. When it sees a rule that opens ports and services to the internet that didn't go through the change management process, it removes the rule.


Monitoring Processes that are Preventing Dangerous Behaviors in DevSecOps

From a security standpoint, this practice is really valuable. Dangerous behaviors happen, and having an automated process secure it, are extremely valuable. But we are missing some pieces: in this scenario, when a developer performs this behavior, there is no indication why the rule was removed or what happened. Also, there isn't a visible way for security teams to monitor what the function is doing. We have created a black box. This will inevitably lead to wasted hours troubleshooting and frustrated teams. If this function fails, we have no real way to know.

So, let's revise the scenario. We will use the same software—but any time it removes a rule, it messages the user who made the rule and logs this output to a Slack channel. There is also a web service with a dashboard that shows invocations and removed rules. Additionally, the solution has a heartbeat that logs a metric to the dashboard and triggers a page or other message when the heartbeat or health checks fail.

Observable Enforcement Systems in DevSecOps

What we have now is essentially an enforcement system that is observable. But what about the transparency? We get that easily by managing the code in source control and allowing users to make pull requests to a repository with a controlled branch in order to propose changes to the whitelist for the enforcement. The addition of observability and transparency to the implementation makes the function more consumable by our consumers (the rest of the organization).

These same principles can be applied to offensive tooling. Imagine a team of offensive security professionals is running a suite of tests, gathering data, standing up servers, and otherwise conducting normal Red Team operations. The team stands up a web server hosting their phishing site, and they stand up their mail mechanism and other components. During a test, the web server experiences a failure, and all of their targets navigate over a course of hours to a broken URL. This failure is pretty devastating. The Red Team now has wasted a lot of effort and opportunity.

Trust but Verify in DevSecOps

We could avoid all this disaster and lost opportunity by working like an engineering team and listening to the sage security advice we all like to toss around when we are being paranoid: “trust but verify”. We can use common performance monitoring technology to observe the health and operation of our campaign components. We can even get fancy and have a set of functions that monitors the health of the web server, writes a metric, and triggers redirects or pauses mail delivery for the phishing exercise when a process fails a health check.

We can take the Red Team use case another step. We can set up a trigger: whenever the team gets a shell or a slave joins their botnet collective, a metric is sent to a tool like Datadog, and a notification is sent by email or other means to the team. This kind of observability during an engagement allows for much more timely activity and enhances collaboration within the team. The function can even publish the gathered metrics to the client or to the rest of the organization as insights.

At Datadog, we use our own product, Datadog, for all the observability pieces. It lets us easily publish metrics and is the standard within our org, so it’s more consumable for other teams. Ultimately, you should evaluate what the right solution is given your use case: use what is most effective for you and consumable by your organization or team.

Conclusion

All security software is production software, whether we believe it or not. Offensive teams use their software to get to production environments, and defensive teams use their software to protect critical resources. Given this knowledge, we should strive to develop software to the same standards of quality and visibility as product teams. It inevitably makes us better operators and improves the quality of our product and the relationship we have with our consumer, whether that consumer is a customer, another internal team, or a fellow Red Teamer taking over to mind the shells at 3 am.

Share this with others

Get price Free trial