Hi. Today, we will learn how to use Regular Expressions (REGEX) in Google Data Studio. Regular expressions refer to an expression of characters that you can use to make a query on values (Strings or text). In simple words, Regular Expressions are the commands that you write to tell a software or system about your complex queries. For instance, to tell a computer machine to return the following results:
- Return all the words that start with “R” and end with “A”
- Return all sentences that have more than 5 words
- Return all keywords return “data/Data” without any case sensitivity.
- Find all the occurrences of “Youtube”, and change them to “youtube.com”
There are so many use cases of Regular Expressions that can be used in Data Studio, we already have learned, these are also known as the REGEXP family of functions, we had already learned all REGEXP functions in Google Sheets, here you can find the REGEXMATCH, and here you can find the REGEXREPLACE and REGEXEXTRACT, check them out after this one.
Table of Contents
Why Use Regular Expressions (RegEX) in Google Data Studio?
Many times, we have unique requirements, and we don’t have any direct function or method or even filter to perform it, so just a little more advanced to case statements, we have REGEXP in Google Data Studio.
Google Data Studio has four kinds of REGEXP functions that can be used for several tasks, we can do so many things using regular expressions, and that’s why we must know how to use regular expressions in Google Data Studio.
How to Use Regular Expressions (RegEX) in Google Data Studio?
So, we have four functions of REGEXP in Google Data Studio, in sheets, we had only three, here we have four and we will learn all of them, let’s start with the list of REGEXP functions and their usage.
Regular Expressions (REGEX) in Google Data Studio – REGEXP CONTAINS
This is the first function in the REGEXP family in Google Data Studio. The REGEXP CONTAINS is mainly used to check if a given string or its pattern, if yes it returns true, else it returns false. So, it’s a Boolean function that returns true or false based on the regular expressions we specified.
Syntax
REGEXP_CONTAINS(X, regular_expression)
Here,
- x: It is a field or “dimension” to evaluate
- regular expressions: A valid regular expression
We can use this REGEXP function to validate if a field has a specific string in it, it will make a new field with the function name and will show true or false adjacent to all the fields.
For this example, I am using a table chart in Data Studio
Step 1
Create a table
Step 2
Here I am using the page title as a dimension
Step 3
Now I am using new users as the only metric
Note: Now, we can search for any string or substring in this field, and can show true and false based on that part of the string contained by field or not.
Step 4
Go to dimension, and click on create a field
Step 5
Now write the simple REGEXP CONTAIN regular expressions for searching a home keyword in the page title field
Step 6
Click on apply to save changes
Step 7
You can see that a new field is added and showing TRUE and FASLE based on the regular expression pattern.
Regular Expressions (REGEX) in Google Data Studio – REGEXP EXTRACT
This is another useful function in the REGEXP family. REGEXP EXTRACT is used to return the first match of a string or a substring from the specified pattern in the regular expression we have given in the function. This is mainly used for returning some values not true or false but some real data values from a large data set.
Syntax
REGEXP_EXTRACT(X, regular_expression)
Here,
- x: It is a field or “dimension” that includes a field.
- regular expressions: a regular expression that extracts a portion of field_expression. regular_expression must have an extracted pattern.
Here we will see an example where we will extract the top-level directory from the URL fields
Step 1
Table chart with Destination URL as dimension and users as the metric
Step 2
Go to dimensions section > Add dimension > create field
Step 3
Name your regular expression (It will become the new column name)
Step 4
Write the regular expression for extracting the top-level directory in a URL
Step 5
Click on apply, and you can see we have got the text as returned where the regular expression matched.
This is how we use REGEXP-EXTRACT in Google Data Studio.
Regular Expressions (REGEX) in Google Data Studio – REGEXP MATCH
This is the most common and most used type of REGEXP function. REGEXP MATCH is used like a vlookup function, it looks around for a match based on the pattern or values we have specified in the regular expression and returns true if the value exists that matches the criteria or the pattern of the regular expression. If not found it will return false.
In this section, we will see how to use regular expressions in Google Data Studio REGEXP-MATCH, here I will use an example where I have pages as dimensions and sessions as a metric, I want to categorize the page’s URL based on some keywords that occurred.
Syntax
REGEXP_MATCH(X, regular_expression)
Here,
- x: It is a field or “dimension” to evaluate.
- regular expressions: a valid regular expression.
Step 1
Create a table chart, select page as dimension, and sessions as a metric
Step 2
Go to dimension and create a field for regular expression
Step 3
Give it a name, and type the entire regular expression
Step 4
Click on Apply
Step 5
Here you can see we have got all the categories based on our keywords.
This is how we use REGEXP-MATCH in Google Data Studio.
Regular Expressions (REGEX) in Google Data Studio – REGEXP REPLACE
This is the last type of REGEXP function in the REGEX family. The REGEXP EXTRACT is used to find the matching values in the data and then replace those values with the new values that we “users” specify in the replace section of this formula. This can be like a find and replace method, but have some excellent features we will see below as examples.
Syntax
REGEXP_REPLACE(X, regular_expression, replacement)
Here,
- x: It is a field or “dimension” that includes a field
- regular expressions: a valid regular expression that can match at least one part of field_expressions
- replacement: the text to replace the matched part of field_expression.
We will use an example where the page list and sessions are as dimension and metric respectively. I will write a simple regular expression, that will look for a matching word I pass, and will replace this word with a specified keyword I pass as a replacement.
All steps are going to be the same, I am changing here a keyword of the URL with another keyword (replacing store with shop)
Only Step
The Regular expression for this section will be like below:
This is how you can use REGEXP-REPLACE in Google Data Studio.
Regular Expressions (REGEX) in Google Data Studio – Character Lists
As we learned in Google sheets, there are some characters used with REGEX to precise our goal, for instance, how do we tell a computer that we want to extract or match the string from the beginning or end?
Characters | Function |
. | Matches a single character, can be a letter, number, or symbol |
? | Matches a former character 0/1 times |
+ | Matches a former character 1 or more times |
* | Matches a former character 0 or more times |
| | Creates OR match
Can not be used at the end of an expression |
Anchors | Function |
^ | Matches a character at the beginning of a string value |
$ | Matches a character at the end of a string value |
Groups | Function |
(
) |
Matches a character that is exactly enclosed in these brackets anywhere within a string.
Also used for the grouping of other expressions |
[
] |
Matches a character that is exactly enclosed in these brackets anywhere within a string. |
– | Creates a range of all the characters within the brackets to match within a string |
Character Clauses | Function |
\d | digits (≡ [0-9]) |
\D | not digits (≡ [^0-9]) |
\s | whitespace (≡ [\t\n\f\r ]) |
\S | no whitespace (≡ [^\t\n\f\r ]) |
\w | word characters (≡ [0-9A-Za-z_]) |
\W | not word characters (≡ [^0-9A-Za-z_]) |
Frequently Asked Questions
Can I Combine CASE WHEN Statement with Regular Expressions in Google Data Studio?
Yes, you can achieve complex conditional logic by using case when statement in conjunction with regular expressions in Google Data Studio. This powerful combination allows you to manipulate and analyze data efficiently. When you encounter intricate conditions, incorporating regular expressions into your CASE WHEN statement can provide the flexibility and accuracy you need for your data analysis.
How to use REGEXP-CONTAINS in Google Data Studio?
REGEXP_CONTAINS is a newly added function in the REGEXP family of Google Data Studio, in sheets we dont have REGEXP_CONTAINS. It is used to evaluate a string or a substring within a string using a pattern of regular expressions, as we did in the first section above. It returns true if the string or substring matched the pattern and false otherwise.
What are regular expressions?
Regular expressions are the combination of instruction using some characters with standardized operations in mathematics and computer science. We use regular expressions in many dynamic applications such as Google Data Studio, and Google Sheets. These expressions help us to create custom functions to instruct computers to perform a very specific task.
Conclusion
Wrapping up how to use Regular Expressions in Google Data Studio, we have learned about REGEXP functions in Data Studio, and have seen all the four functions of the REGEXP family, they are used to create customized functions and to perform precise operations using a pattern of the regular expression, and a custom valid logic following the syntactical rules of Google Data Studio.
I hope you find the above article helpful, to learn more; I would recommend you to subscribe to our blog, we have a complete upcoming series of Google Data Studio, Google Docs, Google Sheets, and Google Slides. Thank you. Keep learning with Office Demy.