Cookies Notice

This site uses cookies to deliver services and to analyze traffic.

Ok, Got it

Go back

October 7 2021 | 7 min read

Part 1: What We Learned from the Twitch Code Leak about Application Security Programs

Technical | October 7 2021 | 7 min read

On Wednesday, Oct. 7 2021, an anonymous 4chan user claimed to have posted 125 GB of data from 6,000 internal Git repositories. Twitch confirmed the massive data leak, including source code and creator earnings, and stated that the breach was due to a “server configuration change”.

While there will be many negative repercussions of this breach, it does provide us with a trove of raw data that we can use to better understand the SSDLC of a typical, security-aware company and see if other organizations can learn from this failure.

Specifically, by using our Code Risk Platform, we have analyzed the source code and other data that is now freely-accessible from the Internet, and we were able to gain critical insights into the Twitch application security program. In fact, our findings prove that Application Security is hard and risk is multidimensional.

Key examples

AWS secrets in code

Twitch Entitlement Service code is written in Go, which is a language of choice by many of the packages involved in the leak, the package is rather extensive and seems to be part of crucial operations – it is not bare of mishaps of hardcoded AWS secrets, we’ve censored the actual keys as the leak is still fresh.

import (

const (
        awsID     = "AKIAJQT6A3xxxxxxxx"
        awsSecret = "HwlMcT0u/s8GWBRA4J95WgP3xxxxxxxxxxxxxxxx"
        awsToken  = ""

        hostSpecific = false
        marketplace  = "us-west-2"
        serviceName  = "TwitchEntitlementService"
        client       = ""
        environment  = "TwitchEntitlementService/NA"
        hostname     = "Visage"
        partitionID  = ""
        customerID   = ""

Twitch secrets in a dedicated secrets file

Test files can be overlooked many times as they are handled with less care and security practices rigor. Also, threat intelligence indicated that these are being investigated by different actors.

//  TWTestTwitchKitSecrets.m
//  Pods
//  Created by Borders, Heath on 10/3/16.

#import "TWTestTwitchKitSecrets.h"

NSString * _Nonnull const TWTestTwitchKitClientID = @"85lcqzxpb9bqu9z6ga1ol55du";
NSString * _Nonnull const TWTestTwitchKitClientID2 = @"p9lhq6azjkdl72hs5xnt3amqu7vv8k2";
NSString * _Nonnull const TWTestTwitchKitClientSecret = @"sc0qn64ihq2m3f47zl2x626jdd1sm6x";
NSString * _Nonnull const TWTestTwitchKitClientSecret2 = @"jwdzcotxzyftca5w8m1w9ib8jpo6lto";
NSString * _Nonnull const TWTestTwitchKitClientIdentifer = @"126250792";

// iostest:test$1234
NSString * _Nonnull const TWTestTwitchKitRefreshToken = @"zjjifl3utnrdl9nty2rl1u1r27f7cx4qahtbxnduls53crma78";
// ios_1088:test$1234
NSString * _Nonnull const TWTestTwitchKitBitsUserRefreshToken = @"eyJfaWQiOiIiLCJfdXVpZCI6ImM5NmYzYzBiLTQxOWItNDI5NS1hNzg5LTJkOTYzODQyYjVmYyJ9%SNk/aF6wrjCKLfkAubPwAM3yMsTX/Nl2Hi61sXYFNjQ=";

Private Keys

A package that dates back to, the predecessor brand for Twitch from before 2014, contains a private key. We cannot assume it hasn’t been revoked since, but in any case – the practice is to not include private keys in any form in code repositories.


Google API hardcoded key

In, no doubt, one of the most important packages regarding video content and the largest of them all in size – does involve many contributors to the code. This package included many alleged secrets in code, which every code scanner will bring up to surface but without context it can easily turn into an Alert Fatigue situation where you cannot prioritize those any more and slips like this is happening.

namespace {
const uint32_t kIdPrefixOffset = 3;
const std::string kURLPrefix = "";
const std::string kURLPagePrefix = "";
const std::string kParamId = "?id=";
const std::string kURLSuffix = "&part=snippet%2CcontentDetails%2Cstatistics"

It is concatenated as part of ConstructedURL function later in the same file:

std::string ConstructURL(const std::string& video_id) {
  std::string url = kURLPrefix + kParamId + video_id.substr(kIdPrefixOffset)
          + kURLSuffix;
  return url;

Which, in turn is called by a video-download function:

bool DownloaderYoutubeVideoInfo::YTVideoInfoImpl::Download(
                        const std::string& video_id) {
  std::string url = ConstructURL(video_id);

  if (!curl::HTTPDownloaderBasic::Download(url)) {
    TLOG(ERROR) << "Download failed for " << url;
    return false;


  if (!ParseInternal()) {
    TLOG(ERROR) << "Parsing failed for " << url;
    return false;

Facebook keys, database passwords on django/python

Developer awareness is the back-bone of any application security program, we’ve seen some cases which this awareness got consciously evaded as this one:

# Make this unique, and don't share it with anybody.
SECRET_KEY = 'r-+!3_et@czju98^v=hprrrzqzibo!4w4&dy9p^9d3li49t=$9'

Terraform includes DB password

Terraform .tf files, by definition, contain a declarative representation of infrastructure to be raised to better cloud operations, in several cases we saw an abuse of those mechanisms to institute hard-coded violations.

resource "aws_db_instance" "db" {
    identifier = "${var.project_name}-${var.environment}"
    auto_minor_version_upgrade = "false"
    engine = "postgres"
    multi_az = "true"
    instance_class = "db.t2.large"
    allocated_storage = 10
    apply_immediately = "false"
    backup_retention_period = 30
    name = "metabase"
    username = "metabase"
    password = "pBKr2pkTVva"
    publicly_accessible = "false"
    db_subnet_group_name = "${}"
    vpc_security_group_ids = ["${data.terraform_remote_state.remote_state.twitch_subnets_sg}"]

Network Device BGP block Password

Network device configuration repositories are also part of the trove leaked, as they are pretty much standard for the last 20+ years, some of their faults are there for legacy reasons. In this case a ‘password 7’ declares the type of MD5 hash in the Arista platform which the file intended to be used against.

The block is setting a BGP neighborhood, which many files containing the same malpractice.

router bgp 64516
   distance bgp 20 200 20
   graceful-restart stalepath-time 30
   maximum-paths 32 ecmp 32
   neighbor MA peer-group
   neighbor MA remote-as 64516
   neighbor MA fall-over bfd
   neighbor MA allowas-in 3
   neighbor MA password 7 fB4XWf3mJFQdgqwjstGhoQ==
   neighbor MA send-community
   neighbor MA maximum-routes 12000
   neighbor MCS peer-group
   neighbor MCS remote-as 64514
   neighbor MCS fall-over bfd
   neighbor MCS allowas-in 3
   neighbor MCS route-map mcs-out out
   neighbor MCS send-community
   neighbor MCS maximum-routes 12000
   neighbor MDS peer-group
   neighbor MDS remote-as 64515
   neighbor MDS fall-over bfd
   neighbor MDS allowas-in 3
   neighbor MDS route-map mds-out out
   neighbor MDS password 7 k2aRtzsXB1d9lmz/Tv5FyQ==
   neighbor MDS send-community
   neighbor MDS maximum-routes 12000
   neighbor peer-group MA
   neighbor description INTERNAL-MA:r708-ma01.pdx05:1:64516::
   neighbor peer-group MDS
   neighbor description INTERNAL-MDS:r700-mds02.pdx05:1:64515::
   redistribute connected route-map connected-to-bgp


Closing Thoughts

Our Code Risk Platform analysis automatically identified multiple key insights:

  • Unnecessary False Positives. Many false positives from vulnerability scanning tools like SAST and SCA can be reduced in risk with context (e.g., tests, very old code, example snippets)
  • # of Incidents Rises with Activity. Once a repository becomes very active, the number of incidents per archive becomes more and more inevitable. Organizations need to harness context to it in order to gain priority and proper visibility into risk (either lower or higher).
  • Risk is Often Underestimated. Many true positives can be receiving a higher risk when considered with context (relational connection to an core infrastracture platform, or when code is related to authentication and authorization mechanisms)
  • Code drift and legacy code is often a problem. There is significant usage of code that was written a long time ago (some indications point to 2014 and even suggest code from 2011) that is still being used in crucial parts of the software (video, stream content management). No one bothers to look at it and prioritize existing security issues.
  • Everything should be treated as code. Organizations need to evaluate configurations for BGPs for network devices, etc. as code. This enables risk detection before those settings are put into production.
  • Secrets in Code Detection is missing context. Secret exposure is hard to tackle without the proper visibility into the code and its history. To find so many secrets in the Twitch code that would be detected by a simple secrets scanning tool indicates that the AppSec team was unable to see and properly prioritize these secrets.

Twitch is a large, security-aware organization. Finding secrets in code and other security risks is still not in the least bit surprising. Application Security is HARD. It involved complex processes, tools, and highly-technical skill sets. What we can learn is that organizations need to build an Application Security program. Identifying vulnerabilities and looking for security issues in silos isn’t enough. It’s essential to understand the history of your code bases and identify risks in the context of the entire application and its infrastructure.

Moshe Zioni

VP of Security Research