Skip to main content

Modify the sensitive data dictionary

This article will show you how to modify the sensitive data dictionary preZero uses when analyzing your code.

Detecting sensitive data leaks is a subscription-only feature; please reach out to Qwiet for more information.

Step 1: Create a new file to hold new definitions

Use the Qwiet command-line interface to create a file to hold your newly defined dictionary. For this exercise, you'll create a file named filepath/my-app-dictionary.policy that uses the no-dictionary template provided by Qwiet:

sl policy create no-dictionary filepath/my-app-dictionary.policy

Make sure to change filepath to reflect the location you want to save your policy file.

Step 2: Define your dictionary

Once you've created my-app-dictionary.policy, you can begin adding your sensitive-data directives to the file. Such directives have the following form:

DATA $group = VAR $term1, ..., $term_n
ParameterDescription
$groupThe name of the sensitive data group, e.g., internalSecrets
$term1 ... $term_nKeywords to search for

Example

preZero's default policy contains the following directive to characterize highly-sensitive data:

DATA highlySensitive = VAR master key, cvv num, cvv, cvc num, cvc, encrypt key, crypt key

This directive instructs preZero to look for exact matches to the specified terms, any variations, and combinations of those terms. You can add (or remove) additional terms to the directive instructing preZero to look for the presence of sensitive variables in your code.

Let's say that you want to return a limited number of data-sensitive Personal Identifying Information (PII) categories. You want to find names, email addresses, and phone numbers while ignoring all other categories. To do so, you can append the following example to the my-app-dictionary.policy file:

DATA pii = VAR first name, last name, middle name, middle initials, full name, maiden name, player name, family name
DATA pii = VAR email, email addr, email address, alternate email
DATA pii = VAR phone number, phone, mobile, landline number, home phone number, home phone num, office phone number, office phone num, alternate phone num, alternate phone number, phone number extension

The policy directive is, therefore:

IMPORT io.shiftleft/default

# PII
DATA pii = VAR first name, last name, middle name, middle initials, full name, maiden name, player name, family name
DATA pii = VAR email, email addr, email address, alternate email
DATA pii = VAR phone number, phone, mobile, landline number, home phone number, home phone num, office phone number, office phone num, alternate phone num, alternate phone number, phone number extension

Please note that:

  • Variables are case insensitive
  • Terms with spacing will match alternative forms (e.g., first name will also find first_name, first-name, etc.)

Step 3: Validate your dictionary file

Once you've defined your dictionary, validate the file by running the following command:

sl policy validate my-app-dictionary.policy

If your policy has syntax or semantic issues, you will receive a non-zero exit status code in return.

Step 4: Upload the dictionary to Qwiet's repository

Before you can use your dictionary, you must upload it to the Qwiet repository using sl policy push <policyLabel> <filepath>:

sl policy push myNewDictionary my-app-dictionary.policy

If your upload is successful, you'll get in return the full name under which the dictionary file is available, e.g., ebad68...ff7e/myNewDictionary:latest. Note that:

  • ebad68...ff7e is your Qwiet organization ID
  • myNewDictionary is the policy label
  • latest is the tag Qwiet assigned to the policy by default

Step 5: Assign the dictionary

Assigning the dictionary to your application ensures that preZero uses it the next time it analyzes the app's code:

sl analyze --policy <policyLabel> --app <name>

Using the sample dictionary we created in this article, the sample command is, therefore:

sl analyze --policy ebad68...ff7e/myNewDictionary --app myApp ~/path/to/app

At this point, you are ready to proceed with your next code analysis. In general, the number of sensitive-data categories in a modified dictionary is smaller than the default dictionary, so the number of results you see will likely be lower.

Conclusion

In this tutorial, you learned how to create a custom dictionary that identifies the sensitive data variables you provide and how to assign it for use with preZero.