Image source: Unsplash
String functions are text related functions. These functions perform an operation on a string input value and return a string or numeric value. This can be used for manipulating and examining various type of queries independently or within a table.
First, lets create a customer table.
CREATE TABLE customer_payment_detail(
customer_id SERIAL PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
movie_title VARCHAR(100) NOT NULL,
rental_id INT,
amount FLOAT,
payment_date date)
INSERT INTO customer_payment_detail
VALUES (1, 'PATRICIA', 'JOHNSON', 'patricia.johnson@sakilacustomer.org', '----ACE GOLDFINGER', 107, 1.99, '2020-01-24'),
(2, 'ELIZABETH', 'BROWN', 'elizabeth.brown@sakilacustomer.org', 'ADAPTATION HOLES', 142, 0.99, '2020-02-29'),
(3, 'JENNIFER', 'DAVIS', 'jennifer.davis@sakilacustomer.org', '----AIRPLANE SIERRA-----', 88, 6.99, '2019-12-15'),
(4,'MARIA', 'MILLER', 'maria.miller@sakilacustomer.org', 'ALIEN CENTER----', 52, 0.99, '2019-10-08'),
(5, 'SUSAN', 'WILSON','susan.wilson@sakilacustomer.org', '-----AMERICAN CIRCUS', 405, 4.99, '2020-05-31'),
(6, 'MARGARET', 'MOORE', 'margaret.moore@sakilacustomer.org', '-----ANGELS LIFE-----', 71, 8.99, '2019-11-13'),
(7, 'DOROTHY', 'TAYLOR', 'dorothy.taylor@sakilacustomer.org', '-----ARIZONA BANG' , 93, 9.99, '2020-12-31'),
(8, 'NANCY', 'THOMAS', 'nancy.thomas@sakilacustomer.org', 'BABY HALL-----', 327,3.99, '2020-04-27'),
(9, 'SANDRA', 'MARTIN', 'sandra.martin@sakilacustomer.org', 'CHICAGO NORTH-----', 64, 2.99, '2019-10-15'),
(10, 'DEBORAH', 'WLAKER', 'deborah.walker@sakilacustomer.org', '-----COMA HEAD----', 259, 4.99,'2020-02-14')
SELECT * FROM customer_payment_detail
Now, let us work on some of the string functions in the above table.
UPPER() :
This function converts all the characters in the string to upper case. This is mainly used to standarde text data, to ensure consistent formatting.
LOWER():
This function converts all the characterst in the given string to lower case. This is mainly useful for case-insensitive comparisons.
LENGTH():
This function returns the number of characters in the given string.
Syntax:
UPPER/LOWER/LENGTH(string/column_name)
Query:
SELECT
UPPER(email) AS email_upper,
LOWER(email) AS email_lower,
LENGTH(email) AS email_length
FROM customer_payment_detail
LEFT():
It is used to extract the n number of characters specified in the argument from the left of a given string. When the value of n is negative, it will extract from the left except the last n characters.
RIGHT():
It is used to extract the n number of characters specified in the argument from the right of a given string. When the value of n is negative, it will extract from the right except the first n characters.
Syntax:
LEFT/RIGHT(column_name, n)
Here n represents the number of characters to be extracted.
Query:
SELECT
first_name,
LEFT(first_name, 2),
RIGHT(first_name, 3)
FROM customer_payment_detail
Query: (for negative n)
SELECT
first_name,
LEFT(first_name,-3),
RIGHT(first_name, -2)
FROM customer_payment_detail
Here, the LEFT() function extracted all the characters from the left side except the 3 right characters and the RIGHT() function extracted all the characters from the right side except the 2 left characters, as the value of n is negative.
CONCAT():
It is used to combine multiple string values into one value.
Syntax:
CONCAT(column_name1, column_name2, ...)
Query:
SELECT
first_name,
last_name,
CONCAT(first_name, last_name) AS full_name
FROM customer_payment_detail
In this output, we see that there is no space between the names in full name. To add some space or separators (like ‘,’ or ‘.’) , we can use CONCAT_WS().
CONCAT_WS():
This function combines strings into one separated by a given separator. WS stands for With Separator.
Syntax:
CONCAT_WS(separator, column_name1, column_name 2,...)
Query:
SELECT
first_name,
last_name,
CONCAT_WS(' ',first_name, last_name) AS full_name
FROM customer_payment_detail
Another method to concatenate strings is by using ‘||’ operator. Now, let us form initials from the given first name and last name by using this operator.
Query:
SELECT
first_name,
last_name,
LEFT(first_name, 1) ||'.'|| LEFT(last_name,1) || '.' AS Initials
FROM customer_payment_detail
POSITION():
This function returns the location of specified substring in a string.
Syntax:
POSITION(substring IN string/column_name)
Here substring refers to the string that you want to locate in the main string or column.
Query:
SELECT
email,
POSITION('.' IN email)
FROM customer_payment_detail
SUBSTRING():
This function is used to extract a substring from a particular position of the given string.
Syntax:
SUBSTRING(string from start [for length])
Here string refers to the required string/column that we want to extract from, start refers to the position from where to start from and length refers to the number of characters to be extracted from that position. If length is not mentioned, it will extract till the end of the string .
Query:
SELECT
email,
SUBSTRING(email from 5 for 4)
FROM customer_payment_detail
INITCAP():
This converts the first letter of each word to upper case and the rest to lower case. This function is useful for formatting text data, to ensure consistent capitalization.
Syntax:
INITCAP(Text/column_name)
Query:
SELECT
first_name|| ' ' ||last_name AS full_name,
INITCAP(first_name|| ' ' ||last_name) AS full_name_initcap
FROM customer_payment_detail
REPEAT():
This function repeats a string the specified number of times.
Syntax:
REPEAT(Text/column_name, n)
Here, n refers to the number of times the text needs to be repeated.
Query:
SELECT
REPEAT(first_name,4)
FROM customer_payment_detail
REVERSE():
This function returns the characters in the string in the reverse order.
Syntax:
REVERSE(Text/column_name)
Query:
SELECT
last_name,
REVERSE(last_name)
FROM customer_payment_detail
REPLACE():
This function is used to replace all occurrences of the old string with the new string.
Syntax:
REPLACE(column_name, old_string, new_string)
Query:
SELECT
email,
REPLACE(email, '@', '#')
FROM customer_payment_detail
REGEXP_REPLACE():
This function is used to replace the first occurrence of any character in the string.
Syntax:
REGEXP_REPLACE(string/column_name, character, new_string)
Query:
SELECT
email,
REGEXP_REPLACE(email, 'a', '%%')
FROM customer_payment_detail
Here, only the first occurrence of ‘a’ is changed to ‘%%’. Other ‘a’ remain unchanged.
To replace the first character in a string, '^.' is used with REGEXP_REPLACE() function.
Query:
SELECT
first_name,
REGEXP_REPLACE(first_name, '^.', '***')
FROM customer_payment_detail
To replace the last character in a string, '.$' is used with REGEXP_REPLACE() function.
Query:
SELECT
first_name,
REGEXP_REPLACE(first_name, '.$', '###')
FROM customer_payment_detail
REGEXP_MATCH() :
This function allows you to extract substrings from a string based on a regular expression pattern.
Syntax:
REGEXP_MATCH(source_string/column_name, pattern)
Query:
SELECT
first_name,
REGEXP_MATCH(first_name, 'AR|OR')
FROM customer_payment_detail
Here, REGEX_MATCH() will list all the names. If there is a match('AR' or 'OR'), it will list the pattern that is present in the name; otherwise, it will be empty.
So, if we want to list only those names that have matches, we need to used REGEXP_MATCHES().
REGEXP_MATCHES():
Query:
SELECT
first_name,
REGEXP_MATCHES(first_name, 'AR|OR')
FROM customer_payment_detail
TRANSLATE():
This function is used to translate any character in he given string with the character in the replace string. The characters in the replace string are corresponding to the characters in the given string. When the character in the given string matches with the character in the matching string, translate occurs. This is used to standardize data by replacing unwanted characters.
Syntax:
TRANSLATE(string/column_name, matching_string, replace_string)
Query:
SELECT
first_name,
TRANSLATE(first_name, 'AREI', '1234')
FROM customer_payment_detail
Here '1' is replaced in all the places of first name where 'A' is present. Simultaneously, '2', '3', '4' are replaced for 'R', 'E', 'I'.
OVERLAY():
This function is used to replace specified string in the specified position of the given string. This can used to modify specific parts of a string without changing the entire string.
Syntax:
OVERLAY(string/column_name PLACING replace_string FROM starting_position FOR number_ of_characters)
Query:
SELECT
first_name,
OVERLAY(first_name PLACING '###' FROM 2 FOR 3)
FROM customer_payment_detail
TRIM():
TRIM() function removes the character/spaces from the given string. There are 3 different types of TRIM(). They are:
1. BOTH (BTRIM)
2. LEADING (LTRIM)
3. TRAILING (RTRIM)
Syntax:
TRIM([LEADING /TRAILING / BOTH] [characters] FROM string/column_name) (or)
BTRIM/LTRIM/RTRIM(string/column_name, character)
BOTH(BTRIM):
This function will remove the character/spaces from both the end of the given string. This can be done using 2 different methods.
Query1:
SELECT
movie_title,
TRIM(BOTH '-' FROM movie_title)
FROM customer_payment_detail
Query2:
SELECT
movie_title,
BTRIM(movie_title, '-')
FROM customer_payment_detail
Both these queries will give the same output.
LEADING TRIM(LTRIM):
This function will remove the character/spaces from the beginning of the given string.
Query1:
SELECT
movie_title,
TRIM(LEADING '-' FROM movie_title)
FROM customer_payment_detail
Query 2:
SELECT
movie_title,
LTRIM(movie_title, '-')
FROM customer_payment_detail
TRAILING TRIM(RTRIM):
This function will remove the character/spaces from the end of the given string.
Query1:
SELECT
movie_title,
TRIM(TRAILING '-' FROM movie_title)
FROM customer_payment_detail
Query2:
SELECT
movie_title,
RTRIM(movie_title, '-')
FROM customer_payment_detail
SPLIT_PART():
SPLIT_PART() function is used to split a string on a specified delimiter and returns the nth substring.
Syntax:
SPLIT_PART(string, delimiter, position)
Query:
SELECT
SPLIT_PART(payment_date::TEXT, '-', 2) AS month_of_payment,
SPLIT_PART(payment_date::TEXT, '-', 1) AS year_of_payment,
payment_date,
amount
FROM customer_payment_detail
Here, ‘payment_date::TEXT’ converts the column ‘payment_date’ to a text string.
LPAD():
The LPAD() function fills up the string/column on the left with the given characters/text to the specified length. If the string is already longer than the length specified, then it is truncated (on the right). This is used to ensure that the data fields have a consistent length for reporting purposes.
Syntax:
LPAD(string/column_name, length, fill)
Here length refers to the total length of the resulting string after padding and fill refers to the character to pad the string with.
Query:
SELECT
first_name,
LPAD(first_name, 15, '00')
FROM customer_payment_detail
We can pad the numeric column by typecasting the numeric/integer column to character column.
Query:
SELECT
customer_id,
LPAD(customer_id::TEXT, 3, '00')
FROM customer_payment_detail
RPAD():
The RPAD() function fills up the string/column on the right with the given characters/text to the specified length. If the string is already longer than the length specified, then it is truncated (on the left). It is used in aligning data in reports by padding strings to uniform lengths.
Syntax:
RPAD(string/column_name, length, fill)
Query:
SELECT first_name,
RPAD(first_name, 15, '##')
FROM customer_payment_detail
The numeric column can be padded by typecasting the numeric/integer column to character column.
Query:
SELECT customer_id,
RPAD(customer_id::TEXT, 3, 'xy')
FROM customer_payment_detail
CONCULSION:
We have seen here some of the string functions of PostgreSQL. All these string functions help to clean the data, standardize the text format and improve the readability of the data in the final report.
Comments