Filter Steps

Filter Steps are CPGQL Steps which filter nodes in a traversal according to a criterion. Ocular supports four Generic Filter Steps which can be added to any other step, and number of specific filter steps called Property Filter Steps which can be used on nodes of a certain type. The Generic Filter Steps are filter, filterNot, where and whereNot. The Property Filter Steps for each node type correspond to the Property Directives it has defined.

We will look at the behaviour of each of these steps while analyzing a simple program named X42:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]) {
if (argc > 1 && strcmp(argv[1], "42") == 0) {
fprintf(stderr, "It depends!\n");
exit(42);
}
printf("What is the meaning of life?\n");
exit(0);
}

Property Filter Steps

Property Filter Steps are Filter Steps which continue a traversal if the properties of the nodes they point to pass a certain criterion. For example, to query the Code Property Graph for all CALL nodes which have the string exit as the value for their NAME property, and return their CODE property:

ocular> cpg.call.name("exit").code.l
res0: List[String] = List("exit(0)", "exit(42)")

The criterion is unique to every Property Filter Step. In the case of the name Property Filter Step of the call Node-Type Step, its criterion is a string that represents a regular expression; that is, all following queries will deliver the same result for the X42 program:

ocular> cpg.call.name("exit").code.l
res0: List[String] = List("exit(0)", "exit(42)")
ocular> cpg.call.name("[eE]xit").code.l
res1: List[String] = List("exit(0)", "exit(42)")
ocular> cpg.call.name("ex.*").code.l
res2: List[String] = List("exit(0)", "exit(42)")

Just like all other Filter Steps, Property Filter Steps can be chained together:

ocular> cpg.call.name("exit").code(".*0.*").code.l
res0: List[String] = List("exit(0)")

Unlike Property Directives, Property Filter Steps are usually greater in number than the properties defined on a node type. Most Property Filter Steps have their negated version available:

ocular> cpg.call.name("exit").codeNot(".*0.*").code.l
res0: List[String] = List("exit(42)")

where

The where Step is a Filter Step which continues a traversal for all nodes which pass its criterion. The criterion of the where step is represented by an expression which has one argument, a variable that points to the node matched in the previous step, and which returns a boolean. For example, suppose you'd like to query the Code Property Graph of the X42 program for all CALL nodes which have exit as the value of their NAME property, and return their CODE property in a list:

ocular> cpg.call.where(node => node.name == "exit").code.l
res0: List[String] = List("exit(0)", "exit(42)")

where steps can be chained together:

ocular> cpg.call.where(node => node.name == "exit").where(node => node.code.contains("42")).code.l
res0: List[String] = List("exit(42)")

And their expression can contain any combination of boolean statements:

ocular> cpg.call.where(node => node.name == "exit" && node.code.contains("42")).code.l
res0: List[String] = List("exit(42)")
// equivalent in logic to the query above
ocular> cpg.call.where(node => true && 1 == 1 && node.name == "exit" && node.code.contains("42")).code.l
res0: List[String] = List("exit(42)")

One helpful trick is to use the shorthand _ operator in where expressions, which points to the single argument that is passed into it, that is, the node.

// long form
ocular> cpg.call.where(node => node.name == "exit").code.l
res0: List[String] = List("exit(0)", "exit(42)")
// short form
ocular> cpg.call.where(_.name == "exit").code.l
res1: List[String] = List("exit(0)", "exit(42)")

filter

Just like where, the filter Step is a Filter Step (ಠ~ಠ) which continues a traversal for all nodes which pass its criterion. The expression representing the criterion takes in one argument, a variable that represents the traversal of the previous step, and returns another traversal. Say you'd like to query the Code Property Graph of the X42 program again for all CALL nodes which have exit as the value of their NAME property, and return the value of their CODE property in a list:

ocular> cpg.call.filter(node => node.name("exit")).code.l
res0: List[String] = List("exit(0)", "exit(42)")
// or using the `_` shorthand:
ocular> cpg.call.filter(_.name("exit")).code.l
res1: list[string] = list("exit(0)", "exit(42)")

filter's expression supports traversals of any length:

ocular> cpg.call.filter(_.name("exit").argument.code("42")).code.l
res0: List[String] = List("exit(42)")

And just like where, it supports chaining:

// equivalent to the previous query
ocular> cpg.call.filter(_.name("exit")).filter(_.argument.code("42")).code.l
res0: List[String] = List("exit(42)")

filterNot

filterNot is the inverse operation of the filter Step.

ocular> cpg.call.filterNot(_.name("exit")).code.l
res0: List[String] = List(
"printf(\"What is the meaning of life?\\n\")",
"fprintf(stderr, \"It depends!\\n\")",
"argv[1]",
"strcmp(argv[1], \"42\")",
"strcmp(argv[1], \"42\") == 0",
"argc > 1",
"argc > 1 && strcmp(argv[1], \"42\") == 0"
)

It supports chaining as well:

ocular> cpg.call.filterNot(_.name("exit")).filterNot(_.name("strcmp")).code.l
res0: List[String] = List(
"printf(\"What is the meaning of life?\\n\")",
"fprintf(stderr, \"It depends!\\n\")",
"argv[1]",
"strcmp(argv[1], \"42\") == 0",
"argc > 1",
"argc > 1 && strcmp(argv[1], \"42\") == 0"
)

And traversals of any length:

ocular> cpg.call.filterNot(_.name("exit").argument.code("42")).filterNot(_.name("strcmp")).code.l
res0: List[String] = List(
"exit(0)",
"printf(\"What is the meaning of life?\\n\")",
"fprintf(stderr, \"It depends!\\n\")",
"argv[1]",
"strcmp(argv[1], \"42\") == 0",
"argc > 1",
"argc > 1 && strcmp(argv[1], \"42\") == 0"
)