Go back

Part 1: What we learned about AppSec programs from the Twitch code leak

Technical
|
October 7 2021
|
3 min read

On Wednesday, Oct. 7 2021, an anonymous 4chan user claimed to have posted 125 GB of data from 6,000 internal Git repositories. Twitch confirmed the massive data leak, including source code and creator earnings, and stated that the breach was due to a “server configuration change”.

While there will be many negative repercussions of this breach, it does provide us with a trove of raw data that we can use to better understand the SSDLC of a typical, security-aware company and see if other organizations can learn from this failure.

Specifically, by using our Code Risk Platform, we have analyzed the source code and other data that is now freely-accessible from the Internet, and we were able to gain critical insights into the Twitch application security program. In fact, our findings prove that Application Security is hard and code risk is multidimensional.

Key examples

AWS secrets in code

Twitch Entitlement Service code is written in Go, which is a language of choice by many of the packages involved in the leak, the package is rather extensive and seems to be part of crucial operations – it is not bare of mishaps of hardcoded AWS secrets, we’ve censored the actual keys as the leak is still fresh.

Twitch secrets in a dedicated secrets file

Test files can be overlooked many times as they are handled with less care and security practices rigor. Also, threat intelligence indicated that these are being investigated by different actors.

Private Keys

A package that dates back to Justin.tv, the predecessor brand for Twitch from before 2014, contains a private key. We cannot assume it hasn’t been revoked since, but in any case – the practice is to not include private keys in any form in code repositories.

Google API hardcoded key

In, no doubt, one of the most important packages regarding video content and the largest of them all in size – does involve many contributors to the code. This package included many alleged secrets in code, which every code scanner will bring up to surface but without context it can easily turn into an Alert Fatigue situation where you cannot prioritize those any more and slips like this is happening.

Facebook keys, database passwords on django/python

Developer awareness is the back-bone of any application security program, we’ve seen some cases which this awareness got consciously evaded as this one:

Terraform includes DB password

Terraform .tf files, by definition, contain a declarative representation of infrastructure to be raised to better cloud operations, in several cases we saw an abuse of those mechanisms to institute hard-coded violations.

Network Device BGP block Password

Network device configuration repositories are also part of the trove leaked, as they are pretty much standard for the last 20+ years, some of their faults are there for legacy reasons. In this case a ‘password 7’ declares the type of MD5 hash in the Arista platform which the file intended to be used against.

The block is setting a BGP neighborhood, which many files containing the same malpractice.

 

Closing Thoughts

Our Code Risk Platform analysis automatically identified multiple key insights:

  • Unnecessary False Positives. Many false positives from vulnerability scanning tools like SAST and SCA can be reduced in risk with context (e.g., tests, very old code, example snippets)
  • # of Incidents Rises with Activity. Once a repository becomes very active, the number of incidents per archive becomes more and more inevitable. Organizations need to harness context to it in order to gain priority and proper visibility into risk (either lower or higher).
  • Risk is Often Underestimated. Many true positives can be receiving a higher risk when considered with context (relational connection to an core infrastracture platform, or when code is related to authentication and authorization mechanisms)
  • Code drift and legacy code is often a problem. There is significant usage of code that was written a long time ago (some indications point to 2014 and even suggest code from 2011) that is still being used in crucial parts of the software (video, stream content management). No one bothers to look at it and prioritize existing security issues.
  • Everything should be treated as code. Organizations need to evaluate configurations for BGPs for network devices, etc. as code. This enables risk detection before those settings are put into production.
  • Secrets in Code Detection is missing context. Secret exposure is hard to tackle without the proper visibility into the code and its history. To find so many secrets in the Twitch code that would be detected by a simple secrets scanning tool indicates that the AppSec team was unable to see and properly prioritize these secrets.

Twitch is a large, security-aware organization. Finding secrets in code and other security risks is still not in the least bit surprising. Application Security is HARD. It involved complex processes, tools, and highly-technical skill sets. What we can learn is that organizations need to build an Application Security program. Identifying vulnerabilities and looking for security issues in silos isn’t enough. It’s essential to understand the history of your code bases and identify risks in the context of the entire application and its infrastructure.