This article shows how you can create Code Property Graphs (CPG) using LLVM bitcode.
Example: Generating a Sample CPG
Before you generate a CPG for your project, this example will show you how to do so for a sample project.
llvm2cpg takes LLVM bitcode as an input. Create an LLVM IR file called main.ll with the following content:
Run llvm2cpg against your newly-created file:
You should see output that's similar to the following:
Start Ocular by running
At this point, you can run a query against the CPG you just created using the Ocular command line interface:
After running the final command, you should see output similar to the following:
If so, you can proceed with the next steps, which shows you how to generate a CPG for your project.
Obtaining Bitcode from Source Code
As we've previously mentioned, llvm2cpg takes LLVM bitcode as an input. The bitcode, however, can be:
- IR (a human-readable representation)
- Bitcode (a bitstream representation)
- Embedded bitcode (a bitstream representation embedded into a binary)
There are several ways for you to get LLVM bitcode out of high-level source code.
For the remainder of this article, we will be using this program as a sample:
To emit an IR for the sample project, run:
The resulting file main.ll can then be passed to llvm2cpg:
There are two ways to get the bitcode. The first way is to run:
Alternatively, you can run the same command, but with LTO in Full mode:
Regardless of whether your output is main.bc or main.o, both contain bitcode. You can verify as follows:
Both types of files can be passed to llvm2cpg (e.g.,
llvm2cpg main.bc or
Finally, you can obtain the embedded bitcode from your source code:
You can then pass main to llvm2cpg:
You'll see output that is similar to the following:
Using embedded bitcode is ideal, since it results in the most straightforward integration and can be added to an existing build system without affecting the resulting software.
Obtaining Bitcode for Your Source Code
Getting bitcode for your projects can be less straightforward than our projects, especially given the various build systems in use. One thing to remember is that you'll need to inject one of the following flags into your build system:
|The build doesn't finish (and linking fails since no object files are produced), but all bitcode files are available|
|The build completes, and all of the created intermediate object files contain bitcode|
|The build completes and the resulting binary contains bitcode|
For example, if you're building your project with CMake, you'd run:
If you're using Xcode, add the flag to both
If you're using xcodebuild, then you'd use:
You may find it helpful to include the following debugging-related flags in your build command as well:
|Disables the special handling, optimization of standard library functions like |
|disables inlining of function to ease debugging|
|disables the generation of debug info for macros|
We recommend that users of other build systems look into using whole-program-llvm.
-fembed-bitcodeflag may not work on macOS if a project links to a static library that wasn't compiled with embedded bitcode support
- If you combine
-flto, no bitcode will be embedded into the binary
- In some cases, llvm2cpg can't read debug information emitted by Xcode's version of Clang. Everything will work normally, but the debug information won't be taken into account.
- When working with
[anyObject data]in Objective-C, please note that the compiler doesn’t typecheck the two (nor are they checked at runtime) and therefore the two object types are treated as the same and there are no effects on the optimized machine code that’s emitted.
Getting the CPG out of a Project
Once you've obtain the bitcode for your project, you can get the CPG. The command required to do so depends on the method you used to obtain your bitcode.