top of page
hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

Regex-Search,Edit and Validate text


Why we need Regex:

Regular Expressions or Regex (in short) in Java is an API for defining String patterns that can be used for searching, manipulating, and editing a string in Java.The search pattern can be anything from a simple character, a fixed string or a complex expression containing special characters describing the pattern.

Few points to be noted:

  1. The regex is applied on the text from left to right.

  2. Email validation and passwords are a few areas of strings where Regex is widely used to define the constraints.

  3. In the world of regular expressions, there are many different flavors to choose from, such as Java,Python, PHP and much more.This means that a regular expression that works in one programming language, may not work in another.

  4. To use regular expressions in Java, we don't need any special setup. The JDK contains a special package, java.util.regex, totally dedicated to regex operations. We only need to import it into our code.

  5. Once a source character has been used in a match, it cannot be reused

For example, consider a string “xyzyxyzzyx”. Let’s assume that a regex ‘xyz’ is defined. So now we have to apply this regex to the string. Applying the regex from left to right, the regex will match the string “xyz_xyz___”, at two places.



Types of Regular Expressions:

The following description is an overview of available meta characters which can be used in regular expressions.


Common matching Symbols include:


Regular Expression Description

. Matches any character

^regex Finds regex that must match at the beginning of the line.

regex$ Finds regex that must match at the end of the line.

[abc] Set definition, can match the letter a or b or c.

[abc][vz] Set definition, can match a or b or c followed by either v or z.

[^abc] When a caret appears as the first character inside square brackets, it

negates the pattern. This pattern matches any character except a or b

or c.

[a-d1-7] Ranges: matches a letter between a and d and figures from 1 to 7, but not d1

X|Z Finds X or Z.

XZ Finds X directly followed by Z.

$ Checks if a line end follows.


Meta characters

The following meta characters have a pre-defined meaning and make certain common patterns easier to use. For example, you can use \d as simplified definition for [0..9].

Regular Expression Description

\d Any digit, short for [0-9]

\D A non-digit, short for [^0-9]

\s A whitespace character, short for [ \t\n\x0b\r\f]

\S A non-whitespace character, short for

\w A word character, short for [a-zA-Z_0-9]

\W A non-word character [^\w]

\S+ Several non-whitespace characters

\b Matches a word boundary where a word character is [a-zA-Z0-9_]


Quantifiers


A quantifier defines how often an element can occur. The symbols ?, *, + and {} are qualifiers.

Regular Expression Examples

* X* finds no or several letter X, <sbr /> .* finds any character sequence

+ X+- Finds one or several letter X

? X? finds no or exactly one letter X

{X} \d{3} searches for three digits, .{10} for any character sequence of length 10.

{X,Y} \d{1,4} means \d must occur at least once and at a maximum of four.

*?? It tries to find the smallest match. This makes the regular expression

stop at the first match


Using Regular Expressions with String Methods

Strings in Java have built-in support for regular expressions. Strings have four built-in methods for regular expressions: * matches(), * split()), * replaceFirst() * replaceAll()

The replace() method does NOT support regular expressions.


Method Description

s.matches("regex") Evaluates if "regex" matches s. Returns only true if the WHOLE

string can be matched.

s.split("regex") Creates an array with substrings of s divided at occurrence of

"regex". "regex" is not included in the result.

s.replaceFirst("regex"), "replacement" Replaces first occurrence of "regex" with "replacement.

s.replaceAll("regex"), "replacement" Replaces all occurrences of "regex" with "replacement.


Java Regex Examples :

Program 1:

Write a regular expression which matches a text line if this text line contains either the word "Hari" or the word "Papa" or both.


Source Code:


import org.junit.Test;

import static org.junit.Assert.assertFalse;

import static org.junit.Assert.assertTrue;

public class EitherOrCheck

{

@Test

public void testSimpleTrue()

{

Strings="welcome Hari";

assertTrue(s.matches(".*(Hari|Papa).*"));

s="Welcome Marry";

assertFalse(s.matches(".*(Hari|Papa).*"));

s="Welcome Papa";

assertTrue(s.matches(".*(Hari|Papa).*"));

s="Welcome Hari Papa";

assertTrue(s.matches(".*(Hari|Papa).*"));

}

}


Program 2:

Write a regular expression which matches any phone number.


Source Code:


public class CheckPhone

{

@Test

public void testSimpleTrue()

{

String pattern ="\\d\\d\\d([,\\s])?\\d\\d\\d\\d";

Strings="1233323322";

assertFalse(s.matches(pattern));

s="1233323";

assertTrue(s.matches(pattern));

s="123 3323";

assertTrue(s.matches(pattern));

}

}


Program 3:

We can also validate email id (address) with regex using java.util.regex.Pattern.matches () method. It matches the given email id with the regex and returns true if the email is valid.


Source code:


public class EmailDemo

{

static boolean isValidemail(String email)

{

String regex = "^[\\w-_\\.+]*[\\w-_\\.]\\@([\\w]+\\.)+[\\w]+[\\w]$"; //regex to validate email.

return email.matches(regex); //match email id with regex and return the value

}

public static void main(String[] args)

{

String email = "ssthva@gmail.com";

System.out.println("The Email ID is: " + email);

System.out.println("Email ID valid? " + isValidemail(email));

email = "@sth@gmail.com";

System.out.println("The Email ID is: " + email);

System.out.println("Email ID valid? " + isValidemail(email));

}

}


Output:

The Email ID is: ssthva@gmail.com Email ID valid? true The Email ID is: @sth@gmail.com Email ID valid? false


As we can see from the above output, the first email id is valid. The second id directly starts with @, and hence regex does not validate it. Hence it is an invalid id.


Conclusion:

That’s all for Regular expressions in Java. In this blog we discussed meta characters ,some symbols and qualifiers in regex with few examples.Java Regex seems hard at first, but if you work with them for some time, it’s easy to learn and use.


38 views0 comments

Recent Posts

See All

In the early stages of a visualization project, we often start with two interrelated questions: Where can I find reliable data? What does this data truly represent? Information does not magically appe

bottom of page