min read
Apr 17, 2023

Preventing Sensitive SaaS Data Loss through RegEx Scans

A regex scan (short for regular expression scan) is a process of searching a string or a document for patterns that match a particular regular expression. What is a regular expression you ask? It is simply a pattern that describes a set of strings. Regular expressions are commonly used in computer programming, text editors, and other applications to search, manipulate and validate text data. In the world of cybersecurity, they are instrumental in helping prevent the loss of sensitive data with the IT estate.

What Happens in a RegEx Scan?

During a regex scan, the regular expression is typically applied to the text data, and the matching substrings are identified and extracted. This enables IT and Security teams the ability to efficiently and flexibly extract data from text data within a security policy or workflow. A regex scan is a powerful tool for searching and extracting specific patterns from text data. 

Every organization is different, and as such the files that they deem are sensitive will vary greatly. Each identity carries a separate level of risk. For example, the VP of Finance will have a completely (or should at least!) different set of access entitlements to sensitive financial data. A lower level Marketing associate will have access to files and data that are much lower in terms of sensitivity. Regardless, organizations need to be able to scan files within their environment as a preventative control to mitigate the risk of sensitive data loss.

Common use cases for running a regex scan include references to salary, internal documents, executives in the company, or specific confidential topics. An example of a regex string is as follows:

       (\W|^)[\w.-]{0,25}@(yahoo|hotmail|gmail).com(\W|$)

The results should return any email address match from these domains: yahoo.com, hotmail.com, and gmail.com, such as johndoe@gmail, jane@hotmail.com.

Organizations leveraging business-critical SaaS applications should use a regex scan when they know the textual structure, or pattern, of the keywords they’re looking for; and would like to be able to view results in a clear way with no discrepancies. Alternatively, the use of a regex scan is ideal when looking for proprietary sensitive information that is not considered as personally identifiable information (PII). This could include sensitive internal company items, budget data, or unique customer data. Again, every organization’s definition of sensitive data can vary greatly beyond the standard keywords and tags (i.e. credit card details, social security numbers, dates of birth, etc.). 

How to Run a RegEx Scan in DoControl

To mitigate the risk of SaaS data loss prevention, your Security team can implement regex scans in our Security Workflows across all Google Drive assets and Slack attachments. You can notify the appropriate parties (i.e. individual users, managers, SecOps, etc.) regarding any sensitive keywords found in shared Google Drive or Slack assets, and then remediate accordingly. 

To define a workflow and run a regex scan in DoControl, the following steps are required:

  1. From the Event dropdown, select a Google Drive or Slack trigger.
  2. Define the conditions.
  3. Select Query > Regex scan.
  4. Select Flow control > Conditional to add a conditional step to determine if the asset contains the regex expression, using the Has regex matches macro.
  5. For asset ID, enter the Get asset ID macro from the event patterns or strings.
  6. Select a lexical list or regex pattern from the dropdown or insert a regular expression. You can insert multiple patterns.
  7. If regex results are found, link the YES branch of the conditional step to the relevant action you would like to take.
  8. If the regex expression is found, select Notify > in Slack or by email to send an automatic notification.
  9. You can choose to remediate with any relevant remediation action for Google Drive or Slack, such as delete file or remove public sharing.

And that's it! Easy peasy. For existing customers, please reach out to your DoControl account team if you have any questions in how to leverage regex in your workflows. If you’re interested in learning more, please check out the demo video below to see how easily you can configure a scan:

Leya Ptiha is a seasoned Customer Success professional with over 10 years of experience in various industries. She has a passion for helping companies deliver exceptional customer experiences and drive long-term success.

Get updates to your inbox

Our latest tips, insights, and news