Codementor Events

Java Regular Expression: part 4 - Checking common formats: phone numbers, emails

Published Jan 18, 2019Last updated Apr 05, 2019
Java Regular Expression: part 4 - Checking common formats: phone numbers, emails

In this session, we are going to use regular expression to perform validation on some common inputs: phone numbers and email addresses

Case 1: Checking phone numbers

The phone number formats vary countries from countries. It really depends on your cases to have an appropriate regular expressions for validating phone number format.
Let’s say I want to check a phone numbers with the following format:

Country code (3 digits) – area code (2 digits) – individual phone number (7 digits)

With that in mind, users are required to input:

  • Country code: must input 3 digits, followed by a dash (-) character
  • Area code: must input 2 digits, followed by a dash (-) character
  • Individual phone number: must input 7 digits

Let’s see the following code:

import java.util.Scanner;
public class Demo {
    public static void main(String[] args) {
        boolean flag;
        Scanner sc = new Scanner(System.in);
        do {
            String phonePattern = "\\d{3}-\\d{2}-\\d{7}";
            System.out.print("Input your phone(xxx-xx-xxxxxxx): ");
            String input = sc.next();
            flag = input.matches(phonePattern);
            if (!flag) System.out.println("Invalid data!");
        } while (!flag);
        System.out.println("Valid data");
    }
}

Based on the required format, I have defined the pattern as follows::

String phonePattern = "\\d{3}-\\d{2}-\\d{7}";
  • The part is for nation code which requires 3 digits, so I have used \d{3} for the task
  • The second part is for area code which contains 2 ditgits, so \d{2} should be applied
  • The last part is for individual phone number which contains 7 digits, and hence \d{7}

Now run and test your program:

Input your phone(xxx-xx-xxxxxxx): 084-888-1234567
Invalid data!
Input your phone(xxx-xx-xxxxxxx): 084-88-123456
Invalid data!
Input your phone(xxx-xx-xxxxxxx): 084-38-1234567
Valid data

084-888-1234567: invalid because the area code contained 3 digits while this part requires only 2 digits
084-88-123456: invalid because the individual phone number contained only 6 digits, while this one should be 7 digits
084-38-1234567: completely matched the pattern

Case 2: Checking phone number format with optional part using grouping technique

Usually, when we make a phone call inside a country (at least in my country) we do not need to dial the country code. Therefore, it is convenient for users if they could optinally input the country code.
To achieve the task, we can utilize a technique called grouping. Grouping is a mechanism that a group of regular expression characters can be treated as a single unit. And by grouping certain pattern characters in a group, we can allow users either to input the whole group or ignore the entire group.
Therefore, in order to optionally allow the inputted country code, we will place the country code part into a group followed by the appropriate quantifier character.
Let’s check out the following code:

import java.util.Scanner;
public class Demo {
    public static void main(String[] args) {
        boolean flag;
        Scanner sc = new Scanner(System.in);
        do {
            String phonePattern = "(\\d{3}-)?\\d{2}-\\d{7}";
            System.out.print("Input your phone(xxx-xx-xxxxxxx): ");
            String input = sc.next();
            flag = input.matches(phonePattern);
            if (!flag) System.out.println("Invalid data!");
        } while (!flag);
        System.out.println("Valid data");
    }
}

I have defined the pattern as follows:

String phonePattern = "(\\d{3}-)?\\d{2}-\\d{7}";

Basically, it is the same pattern as previously. But I have placed the first part (\d{3}-) in to a group using brackets.
Every character if being placed in brackets belongs to a group. You can have as many groups as you see fit in a pattern.
Following our group here is the question mark (?). This is when the optional part comes in.
The question mark (?) means 0 or 1, which means users can ignore the entire group; or they can input the whole group but only one time. That makes sense because we don’t want the country code to appear more than one time.
It’s time to run the program:

Input your phone(xxx-xx-xxxxxxx): 123-1234567
Invalid data!
Input your phone(xxx-xx-xxxxxxx): 12-12-1234567
Invalid data!
Input your phone(xxx-xx-xxxxxxx): 084-38-1234567
Valid data

123-1234567: invalid because the area code had 3 digits
12-12-1234567: invalid as well because the country code had only 2 digits. Note that, since we put the country code in a group followed by the question mark, users can ignore the group. But if users choose to input, the entire group must be provided
084-38-1234567: completely matched the pattern
Let’s run the program again:

Input your phone(xxx-xx-xxxxxxx): 38-1234567
Valid data

38-1234567: also valid because we could skip the country code, the other parts matched the pattern

Case 3: Checking email formats

In real life applications, different software providers require different email formats. There is no single pattern that can be used to validate all email formats. It really depends on each case that we need an appropriate approach.
Let’s start with a simple email format:

email@address.com

In this simple email sample, we can split up to 5 parts:

  • The first part is the user name (email): this part can contain alphabetical characters and digits with min of 3 and max of 15 characters allowed
  • The second part is the at (@) sign
  • The third part is the domain name (address): this part can contain alphabetical characters and digits with min of 3 and max of 15 characters allowed
  • The fourth part is the dot (.) character
  • The fifth part is the domain extension (com): this part can contain alphabetical characters only with min of 2 and max of 5 characters allowed

The program:

import java.util.Scanner;
public class Demo {
    public static void main(String[] args) {
        boolean flag;
        Scanner sc = new Scanner(System.in);
        do {
            String emailPattern = " [a-zA-Z0-9]{3,15}@[a-zA-Z0-9]{3,15}[.][a-zA-Z]{2,5}";
            System.out.print("Input your email(email@address.com): ");
            String input = sc.next();
            flag = input.matches(emailPattern);
            if (!flag) System.out.println("Invalid data!");
        } while (!flag);
        System.out.println("Valid data");
    }
}

In the program, I have had the pattern:

String emailPattern = "[a-zA-Z0-9]{3,15}@[a-zA-Z0-9]{3,15}[.][a-zA-Z]{2,5}";
  • The first part can contain letters and digits so [a-zA-Z0-9] is applied. Note I did not use \w because \w includes underscore (_) as well
  • The second part is the at (@) sign, so I just placed it there
  • The third part is the domain name. This one is similar to the first part
  • The fourth part is the dot (.) character. Pay attention to this one. In regular expression, the dot represents for any character. But in this part, we want users to input the dot, not any character, so we need to place it in spare brackets to treat it as a normal character. Another way to achieve this is to escape the dot character \.
  • The fifth part is the domain extension which allows only alphabetical characters. So, the pattern [a-zA-Z] is enough.

Now let’s run the program:

Input your email(email@address.com): email@gmail.
Invalid data!
Input your email(email@address.com): emailgmail.com
Invalid data!
Input your email(email@address.com): email@gmail.com
Valid data

email@gmail: invalid because it missed the . and domain extension
emailgmail.com: invalid because there was no @ sign
email@gmail.com: completely matched the pattern.

Now let’s upgrade our email pattern a little bit.
Some people have email addresses like:

email@gmail.com.us

As you can see, this email address has the second domain extension(.us). and that raises another requirement: we should allow users to optionally input another domain extension. But remember users either input the whole another domain extension or not at all.
Now it’s time to apply grouping technique again.
There are 2 ways to achieve the task using grouping technique.
The first and long way:

String emailPattern = "[a-zA-Z0-9]{3,15}@[a-zA-Z0-9]{3,15}[.][a-zA-Z]{2,5}([.][a-zA-Z]{2,5})?";

The second and short way:

String emailPattern = "[a-zA-Z0-9]{3,15}@[a-zA-Z0-9]{3,15}([.][a-zA-Z]{2,5}){1,2}";

I’ll leave you to test the program yourself here.

Previous part

Next part

--

Visit learnbyproject.net for a free Regular Expression courses and other free courses

Discover and read more posts from Sera.Ng
get started
post commentsBe the first to share your opinion
Show more replies