DevOps Security at Scale

This is the fourth blog post in a series discussing how high-performing DevOps teams build secure systems at scale.

Software teams are usually responsive to serious bugs, bad designs, or terrible performance of their systems. That’s because those are acute, obvious, in-your-face-bad kinds of issues that demand immediate attention.

On the opposite side of the coin are latent issues that, left untouched, will become bigger problems in the future. These are items that get addressed through preventative work before they become major visible problems. For example, technical debt is well understood by most software teams, and it usually gets periodic investment to prevent it spiraling out of control.

A sophisticated DevOps team will regard security issues as acute, immediate problems and not file them away in a growing backlog of To-Dos, never to be addressed. They elevate security from being a secondary concern to being a first-class citizen.

Less sophisticated teams think of vulnerabilities and security flaws the same way as technical debt, and try to chip away at them to prevent them from becoming a problem in the future. Unlike technical debt, vulnerabilities are problems right now. They haven’t yet been exploited. And once they’re exploited, it’s too late.

When a team puts security as a first-class citizen, they naturally adopt many good security practices as part of their development and release processes. Let’s examine some of those practices in detail.

Secure management of secrets

All secrets for productions systems (passwords, private keys, or any sensitive information that an attacker could use) are stored in a secure, highly-available vault and accessed at runtime only by the systems that have been granted access to them. The secrets are automatically changed on a frequent schedule, thereby limiting the amount of time they are valid even if they are breached. Human operators of the vault are not permitted access to the secret values held within.

Good Security Hygiene

There are a variety of “good hygiene” practices which security-focused teams will always take care to perform:

The principle of least privilege is applied everywhere possible: machines and humans only get access to the resources they absolutely deserve, and their permissions are promptly revoked once that privilege should be removed.
Vulnerability scans are automatically run on all dependent 3rd-party libraries. Vulnerable libraries are upgraded immediately if fixes are available. Upgrading 3rd-party libraries to newer compatible versions is not a complex process.
3rd-party libraries are kept up to date, even if vulnerabilities aren’t yet detected in them. This helps avoid major compatibility issues if a dependent library needs to be immediately updated to address a vulnerability, but the patch is only available for the latest version of that library.
Penetration tests are conducted very regularly to scan the application (and its CI/CD environment) for potential attack vectors. These tests are sometimes even done as part of a CD pipeline using automated tools, and augmented with regular human-driven, white-hat ethical hacks.
All developers are trained in application and information security as part of their new hire programs, and in their ongoing education.
The development workflow for product code changes includes explicit checks for information exposure, attack vectors, and security policy violations. For example, if a team uses GitHub PRs, a PR template would be used that codified these expectations for the submitter and reviewer(s).

All team members work to enable secure velocity

Security teams in some companies have earned a negative reputation as “the people in the way of progress”. Typically this arises due to the security team hearing about a new system or initiative too late in its development cycle. Regardless of how free-flowing and speedy the development of that component was, the security team still have the job of ensuring that their business isn’t put at risk with the deployment of that new technology.

Security approvals can take time (especially with lots of concurrent projects going on), and can lead to the security team getting a reputation of being a blocker. Discontent mounts, emotions rise, and efforts are taken to work around their governance with “waivers” and “escalations”. In dysfunctional organizations, developers might even work to subvert the security team’s authority by covertly deploying applications without their approval, making matters even worse.

A high-performing organization will recognize that the root cause of the above problems come from the security team being in the wrong position of the development workflow.

Shifting Security Left

Sophisticated teams will directly involve security at its early stages of development of new applications, get architectures and technology choices reviewed by them as early as possible, and make an effort to “move security left” in the workflow. Then if the security team finds serious issues, they can be addressed much earlier, perhaps even before much code has been written. This leads to a much less contentious discussion between developers and security, and a far healthier relationship between those groups.

For teams that haven’t yet pushed security to the left, it’s probably natural for them to think that such a move would slow down their development process. If viewed within a short enough horizon, doing security reviews of early versions of new applications would seem like a costly and burdensome choice.

But if the entire development lifecycle is considered, those early security reviews prevent more costly rework later on when major architectural choices are proven to have vulnerabilities that have to be reworked. Early security reviews actually de-risk development projects and make much more economic sense than doing them as a later step in the release lifecycle.

Summary

Teams that work to put security concerns high on their list of priorities quickly learn to optimize around them. Instead of applying security fixes as band-aids when vulnerabilities are discovered, they modify their development flow to incorporate process and tools that help them avoid security gaps in the first place. This mindset leads these teams to be proactive in dealing with security issues, not reactive – and that is what it means to make security a first-class citizen in a DevOps organization.

DevOps Security at Scale series

Brian Kelly

Brian Kelly is Head of Conjur Engineering at CyberArk, where he focuses on creating products that add much-needed security and access management to the landscape of DevOps tools and cloud systems. Brian is passionate about building teams, cybersecurity, and DevOps. Find him on Twitter at @brikelly.