A regex scan (short for regular expression scan) is a process of searching a string or a document for patterns that match a particular regular expression. What is a regular expression you ask? It is simply a pattern that describes a set of strings. Regular expressions are commonly used in computer programming, text editors, and other applications to search, manipulate and validate text data. In the world of cybersecurity, they are instrumental in helping prevent the loss of sensitive data with the IT estate.
During a regex scan, the regular expression is typically applied to the text data, and the matching substrings are identified and extracted. This enables IT and Security teams the ability to efficiently and flexibly extract data from text data within a security policy or workflow. A regex scan is a powerful tool for searching and extracting specific patterns from text data.
Every organization is different, and as such the files that they deem are sensitive will vary greatly. Each identity carries a separate level of risk. For example, the VP of Finance will have a completely (or should at least!) different set of access entitlements to sensitive financial data. A lower level Marketing associate will have access to files and data that are much lower in terms of sensitivity. Regardless, organizations need to be able to scan files within their environment as a preventative control to mitigate the risk of sensitive data loss.
Common use cases for running a regex scan include references to salary, internal documents, executives in the company, or specific confidential topics. An example of a regex string is as follows:
The results should return any email address match from these domains: yahoo.com, hotmail.com, and gmail.com, such as johndoe@gmail, email@example.com.
Organizations leveraging business-critical SaaS applications should use a regex scan when they know the textual structure, or pattern, of the keywords they’re looking for; and would like to be able to view results in a clear way with no discrepancies. Alternatively, the use of a regex scan is ideal when looking for proprietary sensitive information that is not considered as personally identifiable information (PII). This could include sensitive internal company items, budget data, or unique customer data. Again, every organization’s definition of sensitive data can vary greatly beyond the standard keywords and tags (i.e. credit card details, social security numbers, dates of birth, etc.).
To mitigate the risk of SaaS data loss prevention, your Security team can implement regex scans in our Security Workflows across all Google Drive assets and Slack attachments. You can notify the appropriate parties (i.e. individual users, managers, SecOps, etc.) regarding any sensitive keywords found in shared Google Drive or Slack assets, and then remediate accordingly.
To define a workflow and run a regex scan in DoControl, the following steps are required:
And that's it! Easy peasy. For existing customers, please reach out to your DoControl account team if you have any questions in how to leverage regex in your workflows. If you’re interested in learning more, please check out the demo video below to see how easily you can configure a scan: