Traversal Basics

The problem, which I am told is widely known, is as follows: in Königsberg in Prussia there is an island A, called the Kneiphof; the river which surrounds it is divided into two branches [...] and these branches are crossed by seven bridges...
Leonhard Euler, 1741

Ocular helps you discover security vulnerabilities by executing graph traversals on the Code Property Graph. A traversal is formulated as an CPGQL Query, or query for short. In this article, you will learn about the different components that make up queries.

The Anatomy of a CPGQL Query

A query consists of the following components:

  1. A Root Object, which is the reference to the Code Property Graph being queried
  2. Zero or more Node-Type Steps, which are atomic traversals to all nodes of a given type
  3. Zero or more Core Steps, Filter Steps or Repeat Steps
  4. Zero or more Property Directives, which reference the properties of nodes in a traversal
  5. Zero or more Execution Directives, which execute a traversal and return results in a specific format
  6. Zero or more Augmentation Directives, which extend a Code Property Graph with new nodes, properties, or edges
  7. Zero or more Helper Directives, which provide functionality that is not related to traversals

Finally, components 2-7 can be combined into Complex Steps in the same way basic expressions of a programming language can be combined into complex expressions.

As an example, the query

cpg.method.name.toList

returns all names of methods present in a Code Property Graph and can be dissected as follows: cpg is the root object, method is a node-type step which references all METHOD nodes, name is a property directive which references the NAME property of those METHOD nodes, and toList is an execution directive which executes the traversal and returns the result as a list.

Importing a Sample Project

Before we go into the details of these components, let us import a sample program. Clone the following git repository which contains the Java implementation of a simple program named X42:

$ git clone git@github.com:ShiftLeftSecurity/x42.git

Start Ocular and specify a 4GB JVM heap size using the JAVA_OPTS environment variable:

$ JAVA_OPTS='-Xmx4g' sl ocular

Then create a Project from the JAR file of the X42 program using the importCode top-level command:

ocular> importCode("./x42/java/X42.jar", "x42-java")
Creating project `x42-java` for code at `x42/java/X42.jar`
... output omitted
res0: Option[Cpg] = Some(io.shiftleft.codepropertygraph.Cpg@31ed46c5)

A First Traversal

For our first traversal, our objective is to determine the LANGUAGE property of the METADATA node in the Code Property Graph of the X42 program. At the Ocular prompt, type cpg and press ENTER:

ocular> cpg
res1: Cpg = io.shiftleft.codepropertygraph.Cpg@ab90fdab

The executed query consists only of the root object cpg. The return value of that execution is a reference to the Code Property Graph and the reference itself is suffixed by a hexadecimal string (in this case @ab90fdab) that uniquely identifies it. We proceed to add the metaData node-type step to the query. This step represents a traversal to all nodes of type METADATA (of which there is only one):

ocular> cpg.metaData
res2: Traversal[MetaData] = Traversal

When evaluating this new query, the result is not the content of the METADATA node, but a traversal that visits the METADATA node. This behaviour points to the ephemeral nature of queries: each query is separate from the other, and Ocular holds distinct in-memory representations for them. The only object shared between queries is the root object (cpg).

Traversals are executed - as opposed to just to being held in memory - using execution directives such as toList. The directive toList executes the traversal and returns results in a list:

ocular> cpg.metaData.toList
res3: List[MetaData] = List(
MetaData(
id -> 1L,
language -> "JAVA",
version -> "0.1",
overlays -> List("semanticcpg", "dataflow", "tagging"),
policyDirectories -> List(),
spid -> None
)
)

In the result of this query, we already see the LANGUAGE field and that it is JAVA. For the sake of completeness and to illustrate property directives, let us add the property directive language to the query. Property directives provide access to individual node properties. Each node-type step can be combined with different property directives, and they usually match the properties defined on the node type represented by the node-type step:

ocular> cpg.metaData.language.toList
res4: List[AnyRef] = "JAVA"

With this last query, we have achieved our goal of executing a traversal which returns the LANGUAGE property of the METADATA node of the X42 program.

Node-Type Steps

Node-Type Steps are atomic traversals that represent traversals to nodes of a given type. Each node-type step comes with distinct Property Directives to access the properties of the nodes they represent. Let us revisit the source code of the x42 program to illustrate node-type steps.

public class X42 {
public static void main(String[] args) {
if (args.length > 0 && args[0].equals("42")) {
System.err.println("It depends!");
System.exit(42);
}
System.out.println("What is the meaning of life?");
System.exit(0);
}
}

A commonly used node-type step is method, which represents a traversal to all nodes of type METHOD. METHOD nodes represent declarations of methods, functions or procedures in programs, and one of their properties is FULL_NAME. All full names of all method nodes can thus be determined as follows:

ocular> cpg.method.fullName.toList
res55: List[String] = List(
"<operator>.fieldAccess",
"<operator>.sizeOf",
"java.lang.System.exit:void(int)",
"<operator>.equals",
"java.lang.Object.<init>:void()",
"X42.<init>:void()",
"<operator>.lessEqualsThan",
"<operator>.assignment",
"java.lang.String.equals:boolean(java.lang.Object)",
"<operator>.indexAccess",
"X42.main:void(java.lang.String[])",
"java.io.PrintStream.println:void(java.lang.String)"
)

The number of METHOD nodes may surprise you, given that the program only defines the single method main explicitly. The Code Property Graph, however, also includes method nodes for all methods invoked by the code. Moreover, built-in operators are modeled as methods, one decision made to enable language-agnostic analysis.

We will look into the details of node-type steps in a different article. For now, it is sufficient to know that Ocular offers these steps for all common node types: method, call, argument, parameter, metaData, local, literal, types, returns, identifier, namespace, namespaceBlock, methodReturn, typeDecl, member, methodRef, file, comment, tag and the generic all.

Filter Steps

Filter Steps are atomic traversals that filter nodes according to a given criteria. A common filter step is aptly-named filter, which continues the traversal in the step it suffixes for all nodes which pass its criterion. Its criterion is represented by a lamba function which has access to the node of the previous step and returns a boolean. Continuing with the previous example, let us execute a query which returns all METHOD nodes of the Code Property Graph for X42, but only if their IS_EXTERNAL property is set to false:

ocular> cpg.method.filter(_.isExternal == false).fullName.toList
res11: List[String] = List("X42.<init>:void()", "X42.main:void(java.lang.String[])")

Disecting this query, we have cpg as the root object, a node-type step method which returns all nodes of type METHOD, a filter step filter(_.isExternal == false) which continues the traversal only for nodes which have their IS_EXTERNAL property set to false (with _ referencing the individual nodes, and isExternal a property directive which accesses their IS_EXTERNAL property), followed by a property directive fullName which returns the values of the FULL_NAME property of the nodes that passed the Filter Step, and finally an Execution Directive toList which executes the traversal and returns the results in a list.

A shorter version of a query which returns the same results as the one above can be written using a Property-Filter Step. Property-filter steps are Filter Steps which continue the traversal only for nodes which have a specific value in the property the Property Filter Step refers to:

ocular> cpg.method.isExternal(false).fullName.toList
res11: List[String] = List("X42.<init>:void()", "X42.main:void(java.lang.String[])")

Disecting the query again, cpg is the root object, method is a node-type step, isExternal(false) is a property-filter step that filters for nodes which have false as the value of their IS_EXTERNAL property, fullName is a property directive, and toList is the execution directive you are already familiar with.

note

Be careful not to mix up property directives with property-filter steps, they look awfully similar. Consider that:

a) cpg.method.isExternal(true).fullName.toList returns all METHOD nodes which have the IS_EXTERNAL property set to true (in this case, 10 results)

b) cpg.method.isExternal.toList returns the value of the IS_EXTERNAL property for all METHOD nodes in the graph (12 results)

c) cpg.method.isExternal.fullName.toList is an invalid query which will not execute

A final Filter Step we will look at is where. Just like filter, where is a Filter Step which continues the traversal it suffixes for all nodes which pass its criterion. Different from filter however is the criterion, which also has access to the node of the previous step, but differs in that it returns another Step instead of a boolean. The previous query that used a Property Filter Step can be re-written using where like so:

ocular> cpg.method.where(_.isExternal(false)).fullName.toList
res24: List[String] = List("X42.<init>:void()", "X42.main:void(java.lang.String[])")

Maybe not particulary useful-seeming given this specific example, but keep it in the back of your head, because filter is a handy tool to have in the toolbox. Next up, Core Steps.

Core Steps

Core Steps are steps that can change the traversal of all other types of steps. The most important Core Step is map, which maps a set of nodes into a different form given a function. For example, say you'd like to return both the IS_EXTERNAL and the FULL_NAME properties of all METHOD nodes in X42's Code Property Graph. You can achieve that with the following query:

ocular> cpg.method.map(node => (node.isExternal, node.fullName)).toList
res108: List[(java.lang.Boolean, String)] = List(
(true, "<operator>.fieldAccess"),
(true, "<operator>.sizeOf"),
(true, "java.lang.System.exit:void(int)"),
(true, "<operator>.equals"),
(true, "java.lang.Object.<init>:void()"),
(false, "X42.<init>:void()"),
(true, "<operator>.lessEqualsThan"),
(true, "<operator>.assignment"),
(true, "java.lang.String.equals:boolean(java.lang.Object)"),
(true, "<operator>.indexAccess"),
(false, "X42.main:void(java.lang.String[])"),
(true, "java.io.PrintStream.println:void(java.lang.String)")
)

Don't be intimidated by the syntax used in the map Step above. If you examine map(node => (node.isExternal, node.fullName)) for a bit, you might be able to infer that the first node simply defines the variable that represents the node which preceeds the map Step, that the ASCII arrow => is just syntax that preceeds the body of a lambda function, and that (node.isExternal, node.fullName) means that the return value of the lambda is a list which contains the value of the isExternal and fullName Property Directives for each of the nodes matched in the previous step and also passed into the lambda. In most cases in which you need map, you can simply follow the pattern above. But should you ever feel constrained by the common pattern shown, remember that the function for the map step is written in the Scala programming language, a fact which opens up a wide range of possibilities if you invest a little time learning the language.

Complex Steps

Another useful CPGQL Query Component is the Complex Step. Complex Steps combine many simpler steps into one in order to make your queries more concise. There are a number of them available, all with different behaviours, and one good example is isConstructor. Before we use it in a query, here is the X42 program again:

public class X42 {
public static void main(String[] args) {
if (args.length > 0 && args[0].equals("42")) {
System.err.println("It depends!");
System.exit(42);
}
System.out.println("What is the meaning of life?");
System.exit(0);
}
}

Earlier we queried it for all METHOD nodes which had their IS_EXTERNAL property set to false using the isExternal(false) Property Filter Step. Two results came up, even though only one method is explicitly defined:

ocular> cpg.method.isExternal(false).fullName.l
res103: List[String] = List("X42.<init>:void()", "X42.main:void(java.lang.String[])")

You've probably already guessed it, one of the METHOD nodes represents an implicitly defined method. The METHOD node with the FULL_NAME property set to X42.<init>:void() represents the implicitly defined constructor for the X42 class.

note

This node for the implicitly defined constructor would not appear in Code Property Graphs of programs written in languages that do not have the same semantics as Java. The Code Property Graph for the same X42 program written in C, for example, would only return one result for the query above.

Say you'd like to filter for the node which represents the constructor. One way of doing it would be to use the where Step which, in its criterion, proceeds the traversal with the modifier Node-Type Step, constrained with the filter Filter Step which filters for MODIFIER nodes that have ModifierTypes.CONSTRUCTOR as the value of their modifierType property:

ocular> cpg.method.isExternal(false).where(_.modifier.filter(_.modifierType == ModifierTypes.CONSTRUCTOR)).fullName.toList
res55: List[String] = List("X42.<init>:void()")

A mouthful of a query which you can simplify using the isConstructor Complex Step:

ocular> cpg.method.isExternal(false).isConstructor.fullName.l
res46: List[String] = List("X42.<init>:void()")

Much better. Two other useful Complex Steps are astParent and astChildren, which allow you to steer your traversals through the abstract syntax tree of the program under analysis. Say you'd like to have a query that returns the CODE property for all nodes of type LITERAL which are AST child nodes of CALL nodes that have println in their NAME property:

ocular> cpg.call.name("println").astChildren.isLiteral.code.l
res87: List[String] = List("\"What is the meaning of life?\"", "\"It depends!\"")

Or taken the other way around, a query which returns the property of all CALL nodes which have AST parent nodes of type LITERAL that have their CODE property set to \"What is the meaning of life?\":

ocular> cpg.literal.filter(_.code == "\"What is the meaning of life?\"").astParent.isCall.name.toList
res100: List[String] = List("println")

Describing queries in human language tends to sound peculiar. But so would descriptions of bash one-liners, or basic regular expressions if you'd try out that exercise. As long as you understand the individual components of a query, it won't be hard to construct them correctly and understand clearly what they do.

One final CPGQL Component we'll examine in this article is the Repeat Step.

Repeat Steps

There are cases in which queries have repetitions in them and become too long. In order to make them more readable, you can use Repeat Steps. Repeat Steps are traversals that repeat other traversals a number of times. For example, say you'd like to find nodes of type LITERAL in X42's Code Property Graph that are directly reachable from the node which represents the main method. One way of doing that is by using five astChildren Complex Steps in combination with isLiteral, which is another Complex Step that filters for AST nodes of type LITERAL and maps them to actual LITERAL nodes, plus the code("42") Property Filter Step:

ocular> cpg.method.name("main").astChildren.astChildren.astChildren.astChildren.astChildren.isLiteral.code("42").toList
res137: List[Literal] = List(
Literal(
id -> 1000000050L,
code -> "42",
order -> 1,
argumentIndex -> 1,
typeFullName -> "int",
dynamicTypeHintFullName -> List(),
lineNumber -> Some(5),
columnNumber -> None,
depthFirstOrder -> None,
internalFlags -> None
)
)

The query might do the job, but it is hard to read and change. repeat-times makes the query clearer:

ocular> cpg.method.name("main").repeat(_.astChildren)(_.times(5)).isLiteral.code("42").l
res138: List[Literal] = List(
Literal(
id -> 1000000050L,
code -> "42",
order -> 1,
argumentIndex -> 1,
typeFullName -> "int",
dynamicTypeHintFullName -> List(),
lineNumber -> Some(5),
columnNumber -> None,
depthFirstOrder -> None,
internalFlags -> None
)
)

And even better is another variant of repeat, namely repeat-until:

ocular> cpg.method.name("main").repeat(_.astChildren)(_.until(_.isLiteral.code("42"))).l
res139: List[AstNode] = List(
Literal(
id -> 1000000050L,
code -> "42",
order -> 1,
argumentIndex -> 1,
typeFullName -> "int",
dynamicTypeHintFullName -> List(),
lineNumber -> Some(5),
columnNumber -> None,
depthFirstOrder -> None,
internalFlags -> None
)
)

We won't go any further in this article. If you've read so far, you already have a good overview of the most important CPGQL Components. And while the examples you've seen focused on the simple X42 program, rest assured, queries that find serious vulnerabilities in programs with millions of lines of code are not much different. Master these basics and you will already have found a strong tool in your code analysis arsenal. As soon as you feel ready, explore the more advanced walkthroughs for a level-up. Have fun!