Modify the sensitive data dictionary
This article will show you how to modify the sensitive data dictionary preZero uses when analyzing your code.
Detecting sensitive data leaks is a subscription-only feature; please reach out to Qwiet for more information.
Step 1: Create a new file to hold new definitions
Use the Qwiet command-line interface to create a file to hold your newly defined dictionary. For this exercise, you'll create a file named filepath/my-app-dictionary.policy
that uses the no-dictionary
template provided by Qwiet:
sl policy create no-dictionary filepath/my-app-dictionary.policy
Make sure to change filepath
to reflect the location you want to save your policy file.
Step 2: Define your dictionary
Once you've created my-app-dictionary.policy
, you can begin adding your sensitive-data directives to the file. Such directives have the following form:
DATA $group = VAR $term1, ..., $term_n
Parameter | Description |
---|---|
$group | The name of the sensitive data group, e.g., internalSecrets |
$term1 ... $term_n | Keywords to search for |
Example
preZero's default policy contains the following directive to characterize highly-sensitive data:
DATA highlySensitive = VAR master key, cvv num, cvv, cvc num, cvc, encrypt key, crypt key
This directive instructs preZero to look for exact matches to the specified terms, any variations, and combinations of those terms. You can add (or remove) additional terms to the directive instructing preZero to look for the presence of sensitive variables in your code.
Let's say that you want to return a limited number of data-sensitive Personal Identifying Information (PII) categories. You want to find names, email addresses, and phone numbers while ignoring all other categories. To do so, you can append the following example to the my-app-dictionary.policy
file:
DATA pii = VAR first name, last name, middle name, middle initials, full name, maiden name, player name, family name
DATA pii = VAR email, email addr, email address, alternate email
DATA pii = VAR phone number, phone, mobile, landline number, home phone number, home phone num, office phone number, office phone num, alternate phone num, alternate phone number, phone number extension
The policy directive is, therefore:
IMPORT io.shiftleft/default
# PII
DATA pii = VAR first name, last name, middle name, middle initials, full name, maiden name, player name, family name
DATA pii = VAR email, email addr, email address, alternate email
DATA pii = VAR phone number, phone, mobile, landline number, home phone number, home phone num, office phone number, office phone num, alternate phone num, alternate phone number, phone number extension
Please note that:
- Variables are case insensitive
- Terms with spacing will match alternative forms (e.g.,
first name
will also findfirst_name
,first-name
, etc.)
Step 3: Validate your dictionary file
Once you've defined your dictionary, validate the file by running the following command:
sl policy validate my-app-dictionary.policy
If your policy has syntax or semantic issues, you will receive a non-zero exit status code in return.
Step 4: Upload the dictionary to Qwiet's repository
Before you can use your dictionary, you must upload it to the Qwiet repository using sl policy push <policyLabel> <filepath>
:
sl policy push myNewDictionary my-app-dictionary.policy
If your upload is successful, you'll get in return the full name under which the dictionary file is available, e.g., ebad68...ff7e/myNewDictionary:latest
. Note that:
ebad68...ff7e
is your Qwiet organization IDmyNewDictionary
is the policy labellatest
is the tag Qwiet assigned to the policy by default
Step 5: Assign the dictionary
Assigning the dictionary to your application ensures that preZero uses it the next time it analyzes the app's code:
sl analyze --policy <policyLabel> --app <name>
Using the sample dictionary we created in this article, the sample command is, therefore:
sl analyze --policy ebad68...ff7e/myNewDictionary --app myApp ~/path/to/app
At this point, you are ready to proceed with your next code analysis. In general, the number of sensitive-data categories in a modified dictionary is smaller than the default dictionary, so the number of results you see will likely be lower.
Conclusion
In this tutorial, you learned how to create a custom dictionary that identifies the sensitive data variables you provide and how to assign it for use with preZero.