Code Provenance

What Is Code Provenance?

Code provenance is the verifiable record of where a piece of software came from, how it was built, and what transformations it underwent from source code to deployable artifact. It answers a fundamental supply chain question: can you prove that what you are running was built from the source you trust, by a system you control?

Software provenance has moved from an academic concept to an operational requirement. Attacks like SolarWinds demonstrated that compromising the build process can inject malicious code into artifacts that appear legitimate at every other layer of verification. Signing confirms an artifact was not tampered with after creation, but provenance confirms the integrity of the creation process itself.

Organizations embedding provenance into their software supply chain security programs gain the ability to trace any deployed artifact back to its exact source commit, build system, and pipeline configuration.

Why Code Provenance Matters for Supply Chain Security

Code provenance security addresses the trust gap between source code and running software. Multiple stages in the build and delivery pipeline can introduce unauthorized modifications: compromised CI/CD runners, injected build dependencies, modified build scripts, or tampered intermediate artifacts.

Without provenance, organizations rely on implicit trust in their build infrastructure. They assume that if the source code is reviewed and the artifact is signed, the output is trustworthy. That assumption fails when the build system itself is the attack vector.

Provenance tracking across the supply chain provides explicit, verifiable evidence at each stage. It records which source commit triggered the build, which builder produced the artifact, what dependencies were resolved, and which configuration was used. This evidence chain makes tampering detectable because any unauthorized modification creates a mismatch between the provenance record and the actual build inputs.

For compliance purposes, provenance data satisfies audit requirements by providing a machine-readable trail that connects deployed software to its trusted origin. CI/CD security controls that generate and verify provenance at the pipeline level turn this trail into an automated, continuous guarantee.

How Provenance Is Tracked Across the Build Pipeline

Provenance tracking instruments the build pipeline to capture metadata at each stage and package it into a signed attestation that travels with the artifact.

The core data captured in a provenance attestation typically includes:

Source identity: The repository URL, branch, and commit hash that triggered the build.
Builder identity: The build system or CI/CD platform that executed the build, including its version and configuration.
Build instructions: The entry point script, Dockerfile, or build manifest that defined how the artifact was produced.
Dependency inputs: The resolved dependency versions and their sources at build time.
Output identity: The digest or hash of the resulting artifact, binding the attestation to a specific output.

This attestation is signed (typically using keyless signing via Sigstore) and stored alongside the artifact in a registry or transparency log. At deployment time, verification tools confirm that the attestation is valid and that the artifact’s digest matches the recorded output.

The goal is hermetic provenance: a complete, verifiable record that leaves no gap between source and artifact.

Code Provenance Standards: SLSA, in-toto, and Sigstore

Three standards form the backbone of modern software build provenance practices:

SLSA (Supply-chain Levels for Software Artifacts): A graduated framework defining four levels of build integrity. SLSA Level 1 requires documentation of the build process. Level 2 requires signed provenance generated by a hosted build service. Level 3 requires a hardened build platform. Level 4 requires hermetic, reproducible builds. SLSA provenance attestations use the in-toto format.
in-toto: An open standard for supply chain integrity verification. It defines a metadata format for recording each step in the supply chain, including the actors, materials (inputs), and products (outputs) at each step. SLSA provenance attestations are expressed as in-toto statements.
Sigstore: Provides the signing and transparency infrastructure for provenance attestations. Cosign signs the attestations, Fulcio issues short-lived certificates, and Rekor maintains a public transparency log. Sigstore makes keyless provenance signing practical at scale.

Together, these standards enable organizations to generate, sign, and verify provenance without building custom infrastructure. An SBOM provides a component inventory; provenance proves how those components were assembled.

How to Implement Provenance Tracking in Modern CI/CD

Implementing code provenance tracking starts with instrumenting the build pipeline and enforcing verification at deployment.

First, enable provenance generation in the CI/CD system. GitHub Actions, Google Cloud Build, and GitLab CI all support SLSA provenance generation natively or through community actions. These generate signed attestations automatically as part of the build.

Second, store provenance alongside artifacts. Container registries that support OCI artifacts (like GitHub Container Registry or Google Artifact Registry) can store attestations attached to the image manifest. For non-container artifacts, store attestations in a dedicated attestation store or transparency log.

Third, enforce verification at deployment. Kubernetes admission controllers (like Sigstore’s policy-controller or Kyverno) can reject any workload that lacks a valid provenance attestation meeting a defined SLSA level. This turns provenance from a passive record into an active gate.

Fourth, integrate provenance data into your broader security program. Dependency management tools, SBOM generators, and ASPM platforms can consume provenance attestations to enrich their risk models with verified build-origin data.

Start at SLSA Level 2, which requires signed provenance from a hosted build service. This covers the majority of supply chain attack scenarios and can be implemented in most modern CI/CD systems within days.

FAQs

What is the difference between code provenance and an SBOM?

An SBOM lists the components in a software release. Code provenance verifies how those components were assembled, by which build system, from which source.

How does SLSA provenance differ from general build metadata?

SLSA provenance follows a standardized, signed format (in-toto) with defined completeness requirements. General build metadata is unstructured and unsigned, making it easy to forge.

Can provenance tracking detect a compromised build environment?

Yes. If a compromised builder modifies the artifact, the provenance attestation will not match the expected builder identity, or the artifact digest will differ from the attested output.

Which compliance frameworks require provenance tracking?

NIST SSDF, Executive Order 14028, SLSA, and FedRAMP increasingly require or recommend verifiable software build provenance as part of supply chain risk management.

How does AI-generated code complicate provenance tracking?

AI-generated code may originate from multiple training sources, making authorship attribution harder. Software provenance tracks the build and delivery chain but does not solve source-level attribution for AI-written code.

← Back to glossary

See Apiiro in action

Meet with our team of application security experts and learn how Apiiro is transforming the way modern applications and software supply chains are secured. Supporting the world’s brightest application security and development teams: