Every API, service, dependency & sensitive data to map the application attack surface (SBOM)
June 9 2021 | 7 min read
Technical | June 9 2021 | 7 min read
Finding exposed secrets in code sounds simple, doesn’t it? Just look for field names like “password”, “token”, or “API_Key”. Maybe dig a little deeper to search for commonly-used passwords or look for randomly-generated strings of specific lengths.
Unfortunately, there’s a lot more to it than that. There is a lot of nuance and complexity to both understanding the impact of exposed secrets in code, detecting them in the first place, and not being overwhelmed with endless false-positives. Here are the most common questions that we get and we hope that they will help you better understand why secrets in code is such an important issue and what you can do about it.
Software development has changed! Engineers no longer write code in isolation on desktops or laptops, where an attacker compromising a device could only access locally-stored files. Cloud-based development has changed the security model so developers often have expanded access to the entire application. With the rise of DevOps, the same developers (and developer identities) have the ability to make changes to production environments. A single compromised identity can now have a catastrophic impact on the security of the entire application and infrastructure.
It’s easy to say that developers should be more careful and better follow best practices but the truth is that developers are under increasing pressure to deliver. Hard-coding a token or password may be a temporary hack before implementing a better solution later on … that conveniently gets forgotten about as the next priority comes along.
In addition, developers don’t always have visibility into where their code is deployed, so they don’t have an end-to-end view of the risk. Or old code can be deployed in new ways that were never anticipated by the original developer. It is also common to see stored secrets that were intended to never leave the development environment make their way into production.
You can find secrets in many places, including:
Secrets of each type can be in multiple environments, from staging to production. The challenge is to identify these places automatically and quantify risk for secrets in production source code vs. secrets in test code in staging and other environments.
There are many types of “secrets” that your developers can put into your code. These can be:
Don’t assume that an attacker has to have access to the source code in order to access the secret. A skilled attacker may compromise a server hosting your source code (yes, even in the cloud), but an even more skilled attacker will reverse engineer a facsimile of the source code from the binary. This can be done when your B2B software is deployed on-premises at a customer or when it’s consumer-facing, such as a Windows .exe or even an iOS or Android app.
The most sophisticated attacks are multi-step and a single secret can be a launch point to further command and control. If an attacker is able to find a hard-coded token, they can use it to gain whatever access that token grants. Using the right token, an attacker can impersonate a valid user or service and then use other means to escalate privileges or “jump” horizontally to other systems that use that token.
The main issue with having secrets in your code is that it short-circuits many of your defenses. For example, even if you have a SaaS product and your cloud infrastructure is secure, an attacker could use social engineering or other methods to gain access to a developer account and access the code to find and exploit the secret. And if you have 1,000 developers, all they need is access to one account!
The main reason is that there are many types of secrets. Some data is “structured” (not in the database sense). Certificates and access tokens are generally formatted in standard ways and are easy to find, with few false positives or false negatives.
Other types of data are less structured and consist of long strings of random characters and may be encrypted or hashed. The problem is that you can’t tell what you’re looking at if you don’t understand the code! If you find 100 seemingly “random” strings, some will be test files. Some will be binary files. Some will be encrypted or hashed, which are ok to have in code and a tool should be able to distinguish them.
Cryptographic keys are usually long enough that you can use statistics to determine if the string is sufficiently random enough to be a key, but for many types of files, it isn’t clear-cut and the problem becomes: how many false positives can you accept? GitHub recently changed their entire API authentication tokens so they are easily detected by scanners.
The other challenge has become the speed of development. Any organization can perform an ad hoc code review and identify a good number of secrets from a detailed manual search, but this isn’t a scalable solution.
There are multiple things you can do to identify secrets, remediate issues, and reduce your risk:
Understanding and remediating the risk of secrets-in-code cannot be done in isolation! There is a significant difference in risk between finding a secret in an application with low business impact that is deployed on-premises compared to finding a similar secret in a high business impact application that stores PII! Risk is multidimensional and secrets-in-code is only one part of the larger picture surrounding multidimensional application risk.
There is an entire industry around detecting exposed secrets in code but there are a few ways that many existing solutions fall short:
Apiiro uses a variety of techniques to identify exposed secrets in code. We use the latest algorithms for entropy detection of crypto keys and leverage our deep understanding of the code to look at the context. We also do this over the entire history of your code. In addition, Apiiro provides continuous detection of secrets, with automated workflows so you can manage your code and your risks as new secrets are introduced. Apiiro also understands which key management systems are already in place and can instruct the developers on how to remediate instead of only showing alerts.