Code Property Graph

Glossary

## What Is a Code Property Graph? A code property graph (CPG) is a unified data structure that merges three fundamental code representations into a single, queryable graph: the abstract syntax tree (AST), the control flow graph (CFG), and the program dependence graph (PDG), which captures data flow relationships. By combining these representations, a CPG provides a complete structural and semantic model of source code that supports advanced security analysis, vulnerability discovery, and code understanding. The concept was introduced in academic research and popularized by tools like Joern and ShiftLeft. A code property graph enables analysts and automated tools to ask complex questions about code that no single representation can answer alone, such as "show me every path where user input reaches a database query without passing through a sanitizer." ## How Code Property Graphs Combine AST, CFG, and Data Flow Each component of a code property graph captures a different dimension of program behavior. These include: - Abstract syntax tree (AST): Represents the syntactic structure of the code: functions, statements, expressions, operators, and literals organized hierarchically. The AST captures what the code says. - Control flow graph (CFG): Represents the possible execution paths through the program. Nodes are basic blocks of code, and edges represent branches, loops, and conditional jumps. The CFG captures the order in which statements can execute. - Program dependence graph (PDG): Represents data flow and control dependencies between statements. It tracks how values propagate through assignments, function parameters, and return values. The PDG captures how data moves through the code. A CPG merges these three graphs by connecting their nodes through shared identity. A function node in the AST links to its entry point in the CFG and to the data flow edges in the PDG that trace values through its parameters. This unified code graph lets analysts traverse syntactic structure, execution paths, and data flow in a single query, without switching between separate tool outputs. ## Storing and Querying Code Property Graphs in Graph Databases Code property graphs are naturally suited to graph databases because they consist of nodes (representing code elements) and edges (representing relationships like "calls," "flows to," or "controls"). Graph databases like Neo4j, TinkerGraph, and OverflowDB store CPGs efficiently and support traversal queries that would be expensive or impractical in relational databases. Query languages like Gremlin, Cypher, or tool-specific DSLs (such as Joern's query language) allow analysts to express complex code patterns as graph traversals. For example, a query might find all functions that accept HTTP request parameters, trace data flow through intermediate variables, and check whether the data reaches a SQL execution sink without passing through a parameterized query builder. This query-driven approach is what makes CPG code analysis powerful for security research. Analysts can write custom queries tailored to their application's architecture, coding patterns, and risk profile, going far beyond the fixed rule sets of traditional scanners. ## Key Use Cases: Vulnerability Discovery, Code Mining, and Refactoring Code property graphs support several high-value use cases across security and development. Key use cases include: - Vulnerability discovery: CPGs enable precise taint analysis and pattern matching across data flow, control flow, and syntax simultaneously. Security researchers use CPG queries to find injection vulnerabilities, authentication bypasses, and insecure data handling patterns that span multiple functions or files. Teams evaluating the best SAST tools for deep analysis should consider whether the tool's detection engine leverages CPG-based representations. - Variant analysis: After discovering a vulnerability, analysts query the CPG for structurally similar patterns elsewhere in the codebase. This catches variants of the same flaw that might exist in different modules or services. - Code mining and understanding: CPGs enable large-scale queries about codebase structure: which functions handle sensitive data, which APIs lack input validation, or how authentication patterns are implemented across services. This supports AI-driven software composition analysis and architectural review at scale. - Refactoring support: Developers use CPG queries to identify dead code, unused dependencies, and tightly coupled components. The graph's combined view of structure and data flow makes it easier to assess the impact of proposed changes before implementing them. ## Code Property Graphs vs Traditional Static Analysis Representations Traditional [static application security testing](/glossary/static-application-security-testing) tools typically operate on individual representations: an AST for pattern matching, a CFG for path analysis, or a data flow graph for taint tracking. Each representation answers a subset of questions about the code, and findings from one representation cannot easily reference another. A code property graph eliminates this separation. Because all three representations share a unified graph, a single query can combine syntactic patterns, control flow conditions, and data flow paths. This produces more precise results with fewer false positives, since the analysis can apply constraints from all three dimensions simultaneously. The tradeoff is cost. Building and storing a CPG for a large codebase requires more processing time and memory than generating a single AST or CFG. For smaller codebases or narrow scanning requirements, traditional representations may be sufficient. For deep security research, variant analysis, and complex vulnerability discovery across large codebases, the CPG's unified model is significantly more powerful. ## FAQs ### What is a code property graph used for in practice? Security researchers and tools use CPGs to find complex vulnerabilities, perform variant analysis, mine codebases for insecure patterns, and support refactoring decisions through structural queries. ### How is a code property graph different from a normal AST or CFG? A CPG merges the AST, CFG, and data flow graph into a single unified structure. Individual representations capture only syntax, control flow, or data flow in isolation. ### Which tools can generate or analyze code using code property graphs? Joern is the most widely known open-source CPG tool. Commercial platforms like ShiftLeft (Qwiet AI) and Semgrep's deep analysis also use CPG-based representations for vulnerability detection. ### How are code property graphs usually stored and queried? CPGs are stored in graph databases like OverflowDB, Neo4j, or TinkerGraph and queried using graph traversal languages such as Gremlin, Cypher, or tool-specific query DSLs. ### Does a code property graph replace other static analysis techniques? No. CPGs complement traditional techniques. Lightweight linters and pattern matchers remain valuable for fast, simple checks. CPGs excel at deep, cross-cutting analysis that requires combining syntax, control flow, and data flow.