Cookies Notice

This site uses cookies to deliver services and to analyze traffic.

Ok, Got it

Go back

October 7 2021 | 7 min read

Part 1: What We Learned from the Twitch Code Leak about Application Security Programs

Technical | October 7 2021 | 7 min read

On Wednesday, Oct. 7 2021, an anonymous 4chan user claimed to have posted 125 GB of data from 6,000 internal Git repositories. Twitch confirmed the massive data leak, including source code and creator earnings, and stated that the breach was due to a “server configuration change”.

While there will be many negative repercussions of this breach, it does provide us with a trove of raw data that we can use to better understand the SSDLC of a typical, security-aware company and see if other organizations can learn from this failure.

Specifically, by using our Code Risk Platform, we have analyzed the source code and other data that is now freely-accessible from the Internet, and we were able to gain critical insights into the Twitch application security program. In fact, our findings prove that Application Security is hard and risk is multidimensional.

Key examples

AWS secrets in code

Twitch Entitlement Service code is written in Go, which is a language of choice by many of the packages involved in the leak, the package is rather extensive and seems to be part of crucial operations – it is not bare of mishaps of hardcoded AWS secrets, we’ve censored the actual keys as the leak is still fresh.

import (
        "time"
        "github.com/aws/aws-sdk-go/aws/credentials"
        "code.justin.tv/commerce/AmazonMWSGoClient/mws"
)

const (
        awsID     = "AKIAJQT6A3xxxxxxxx"
        awsSecret = "HwlMcT0u/s8GWBRA4J95WgP3xxxxxxxxxxxxxxxx"
        awsToken  = ""

        hostSpecific = false
        marketplace  = "us-west-2"
        serviceName  = "TwitchEntitlementService"
        client       = ""
        environment  = "TwitchEntitlementService/NA"
        hostname     = "Visage"
        partitionID  = ""
        customerID   = ""

Twitch secrets in a dedicated secrets file

Test files can be overlooked many times as they are handled with less care and security practices rigor. Also, threat intelligence indicated that these are being investigated by different actors.

//
//  TWTestTwitchKitSecrets.m
//  Pods
//
//  Created by Borders, Heath on 10/3/16.
//
//

#import "TWTestTwitchKitSecrets.h"

NSString * _Nonnull const TWTestTwitchKitClientID = @"85lcqzxpb9bqu9z6ga1ol55du";
NSString * _Nonnull const TWTestTwitchKitClientID2 = @"p9lhq6azjkdl72hs5xnt3amqu7vv8k2";
NSString * _Nonnull const TWTestTwitchKitClientSecret = @"sc0qn64ihq2m3f47zl2x626jdd1sm6x";
NSString * _Nonnull const TWTestTwitchKitClientSecret2 = @"jwdzcotxzyftca5w8m1w9ib8jpo6lto";
NSString * _Nonnull const TWTestTwitchKitClientIdentifer = @"126250792";

// iostest:test$1234
NSString * _Nonnull const TWTestTwitchKitRefreshToken = @"zjjifl3utnrdl9nty2rl1u1r27f7cx4qahtbxnduls53crma78";
// ios_1088:test$1234
NSString * _Nonnull const TWTestTwitchKitBitsUserRefreshToken = @"eyJfaWQiOiIiLCJfdXVpZCI6ImM5NmYzYzBiLTQxOWItNDI5NS1hNzg5LTJkOTYzODQyYjVmYyJ9%SNk/aF6wrjCKLfkAubPwAM3yMsTX/Nl2Hi61sXYFNjQ=";

Private Keys

A package that dates back to Justin.tv, the predecessor brand for Twitch from before 2014, contains a private key. We cannot assume it hasn’t been revoked since, but in any case – the practice is to not include private keys in any form in code repositories.

-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA8emj4kzS0BP7U9ixEX8vxe9Df5QzXTIpc+289EH9VJbcR4QJ
q1FTlA7MLE3WKXSMCfn5wU3fMJeCz1S8u+qcOhbpohW1KxRD937C/YtZK7EUhzyF
1EGNDH0w+3tec5gU/wWGx565WlJvNwxeFkUYb6lGbXWVNRBWxdSRWVJP2aRHKT3N
C1gENCoYFoPN051moQviIsLlCCFR9SLJtFXc6NDBMbndzFjLMRRMfGca7bTuuNnl
rGdqb7TJp7ETgD2wiWSP62hsv83LgpIb23JiV24H9le/SIF75Y7T57A4NsdhMlL9
oMO2WwmYNvgePEqsyd/U7TPxkR6CvGMiXxrn2wIDAQABAoIBAHLjyI6Qd8qUwtcm
Yanyoqi5om/z3ZUUXrWNIiFLOdozr7hTUBhKDoyRnowoB182189hJimVJzu3qUt4
bg49NSctfJYbAyjLfiAL1uV9icMDXcGAj/qniyp0RpAZHll9z/LyF/m0O0lXPzSA
riqbdCiL10PjBRLniJ55/vHR8tRkkXok+hD2wt6H6XlJkPmSiro5tCbMkpHffWdt
vy6fAlrb2TI15XA/J9wg4IGhyb/L6GRg5baL4BX1tw08j4qSIzDXOWMWAnNh0v6i
OmpnApOaDC7l6NNgxyFLM8KaV00ej5HwXfMOEUz0mPmAva1KhUotNaklD4uW2CQp
uuD2w/ECgYEA+X97SbEFlwXjvYbwzTa4KEn45zXrC9jb9L0AEc0CWsc49UxyTE0l
FopWD/1wqJSkX6jmk5EPXt2+tjicOuUn2HSFB74myA4yNJLHpzHXbDYSUqWtFd/e
Zpj8RbHC7Cq6NUAaqKZaaq1KyVBNUbx9KfJ3NCc9eRUJJp5kVpLSHVMCgYEA+DeN
qH0dXnjklZbcdPHYod0DWYpPpjP7u4jtdQYgezLfLg8vRl6V2j65l52MdR1pjPqj
KxeruFobikzxbF1zWMCF/mpW6JQxeiEbAo2lN9NlwsebLTGmDFdzh97wh71XHbaG
NMppINGMlmlfFDQhz8FFeeQqudnRxRM2qMVJslkCgYEA7K//aZ1BzE+OCVJmRofO
lIn4Un9YB9kmcTqLQlfWEABHDI4FMFVPBd8eXfT0VzkL5qP4ea13g2uhbISv0T9r
WXDQctP1PnwZLL7CIN6rmsCBCV6aoNHLzlD7obJNVHYESFgT8kI+LE1RUUGY2B2U
L6MRaqx/KMrH75b7YRXPtnkCgYEAoq92jzoBp8vAtjK8p4FjpSNAcM1wStTDZzTl
vc+YNmcvU/br20lfGj4GUlMWniP67EXR8AqBqECW0FyB166gTUlSCWAVOjb2/r73
/wJriV1q0vEUydhCptAijqkWKUF1+amJ6MvJf5MYe/TwNkO87XgVW0CqqEkVbf+b
0Z4NIXECgYBfe99o+/TgLqcLXFfT/cL2coCmwhXmsoechDUJhhorJHjCnoUBH3Dr
4myZEkJNsiwgPJyhn2d9zwqlWyfpPXPPBVtOiPZPxeOmO9KzagLLAspvzjX9r/JB
NddFcdfSrPTLmFOyTm9WTKURP41H4DuCaoBFahNouKfiehoD36LTYg==
-----END RSA PRIVATE KEY-----

Google API hardcoded key

In, no doubt, one of the most important packages regarding video content and the largest of them all in size – does involve many contributors to the code. This package included many alleged secrets in code, which every code scanner will bring up to surface but without context it can easily turn into an Alert Fatigue situation where you cannot prioritize those any more and slips like this is happening.

namespace {
const uint32_t kIdPrefixOffset = 3;
const std::string kURLPrefix = "https://www.googleapis.com/youtube/v3/videos";
const std::string kURLPagePrefix = "www.youtube.com/watch?v=";
const std::string kParamId = "?id=";
const std::string kURLSuffix = "&part=snippet%2CcontentDetails%2Cstatistics"
                       "&key=AIzaSyCGyZFpJnjHl8Bj1fTgcyq5hBUp-0wASRo";

It is concatenated as part of ConstructedURL function later in the same file:

std::string ConstructURL(const std::string& video_id) {
  std::string url = kURLPrefix + kParamId + video_id.substr(kIdPrefixOffset)
          + kURLSuffix;
  return url;
}

Which, in turn is called by a video-download function:

bool DownloaderYoutubeVideoInfo::YTVideoInfoImpl::Download(
                        const std::string& video_id) {
  std::string url = ConstructURL(video_id);

  if (!curl::HTTPDownloaderBasic::Download(url)) {
    TLOG(ERROR) << "Download failed for " << url;
    return false;
  }

  info_creator_->set_video_id(video_id);

  if (!ParseInternal()) {
    TLOG(ERROR) << "Parsing failed for " << url;
    return false;
  }

Facebook keys, database passwords on django/python

Developer awareness is the back-bone of any application security program, we’ve seen some cases which this awareness got consciously evaded as this one:

# Make this unique, and don't share it with anybody.
SECRET_KEY = 'r-+!3_et@czju98^v=hprrrzqzibo!4w4&dy9p^9d3li49t=$9'

Terraform includes DB password

Terraform .tf files, by definition, contain a declarative representation of infrastructure to be raised to better cloud operations, in several cases we saw an abuse of those mechanisms to institute hard-coded violations.

resource "aws_db_instance" "db" {
    identifier = "${var.project_name}-${var.environment}"
    auto_minor_version_upgrade = "false"
    engine = "postgres"
    multi_az = "true"
    instance_class = "db.t2.large"
    allocated_storage = 10
    apply_immediately = "false"
    backup_retention_period = 30
    name = "metabase"
    username = "metabase"
    password = "pBKr2pkTVva"
    publicly_accessible = "false"
    db_subnet_group_name = "${aws_db_subnet_group.db_subnet_group.name}"
    vpc_security_group_ids = ["${data.terraform_remote_state.remote_state.twitch_subnets_sg}"]
}

Network Device BGP block Password

Network device configuration repositories are also part of the trove leaked, as they are pretty much standard for the last 20+ years, some of their faults are there for legacy reasons. In this case a ‘password 7’ declares the type of MD5 hash in the Arista platform which the file intended to be used against.

The block is setting a BGP neighborhood, which many files containing the same malpractice.

router bgp 64516
   distance bgp 20 200 20
   graceful-restart stalepath-time 30
   maximum-paths 32 ecmp 32
   neighbor MA peer-group
   neighbor MA remote-as 64516
   neighbor MA fall-over bfd
   neighbor MA allowas-in 3
   neighbor MA password 7 fB4XWf3mJFQdgqwjstGhoQ==
   neighbor MA send-community
   neighbor MA maximum-routes 12000
   neighbor MCS peer-group
   neighbor MCS remote-as 64514
   neighbor MCS fall-over bfd
   neighbor MCS allowas-in 3
   neighbor MCS route-map mcs-out out
   neighbor MCS send-community
   neighbor MCS maximum-routes 12000
   neighbor MDS peer-group
   neighbor MDS remote-as 64515
   neighbor MDS fall-over bfd
   neighbor MDS allowas-in 3
   neighbor MDS route-map mds-out out
   neighbor MDS password 7 k2aRtzsXB1d9lmz/Tv5FyQ==
   neighbor MDS send-community
   neighbor MDS maximum-routes 12000
   neighbor 10.36.98.238 peer-group MA
   neighbor 10.36.98.238 description INTERNAL-MA:r708-ma01.pdx05:1:64516::
   neighbor 10.36.113.33 peer-group MDS
   neighbor 10.36.113.33 description INTERNAL-MDS:r700-mds02.pdx05:1:64515::
   redistribute connected route-map connected-to-bgp

 

Closing Thoughts

Our Code Risk Platform analysis automatically identified multiple key insights:

  • Unnecessary False Positives. Many false positives from vulnerability scanning tools like SAST and SCA can be reduced in risk with context (e.g., tests, very old code, example snippets)
  • # of Incidents Rises with Activity. Once a repository becomes very active, the number of incidents per archive becomes more and more inevitable. Organizations need to harness context to it in order to gain priority and proper visibility into risk (either lower or higher).
  • Risk is Often Underestimated. Many true positives can be receiving a higher risk when considered with context (relational connection to an core infrastracture platform, or when code is related to authentication and authorization mechanisms)
  • Code drift and legacy code is often a problem. There is significant usage of code that was written a long time ago (some indications point to 2014 and even suggest code from 2011) that is still being used in crucial parts of the software (video, stream content management). No one bothers to look at it and prioritize existing security issues.
  • Everything should be treated as code. Organizations need to evaluate configurations for BGPs for network devices, etc. as code. This enables risk detection before those settings are put into production.
  • Secrets in Code Detection is missing context. Secret exposure is hard to tackle without the proper visibility into the code and its history. To find so many secrets in the Twitch code that would be detected by a simple secrets scanning tool indicates that the AppSec team was unable to see and properly prioritize these secrets.

Twitch is a large, security-aware organization. Finding secrets in code and other security risks is still not in the least bit surprising. Application Security is HARD. It involved complex processes, tools, and highly-technical skill sets. What we can learn is that organizations need to build an Application Security program. Identifying vulnerabilities and looking for security issues in silos isn’t enough. It’s essential to understand the history of your code bases and identify risks in the context of the entire application and its infrastructure.

Moshe Zioni

VP of Security Research