Perl Weekly Challenge 110: Valid Phone Numbers

by Abigail

Challenge

You are given a text file.

Write a script to display all valid phone numbers in the given text file.

Acceptable Phone Number Formats

+nn  nnnnnnnnnn
(nn) nnnnnnnnnn
nnnn nnnnnnnnnn

Example

Input File

0044 1148820341
 +44 1148820341
  44-11-4882-0341
(44) 1148820341
  00 1148820341

Output

0044 1148820341
 +44 1148820341
(44) 1148820341

Discussion

Our eye is drawn to the first acceptable format, and the second entry in the input file. The acceptable format has no leading space, a plus sign, two digits, two spaces, and then ten digits.

However, the entry in the input file, and which, according to the output, is valid has one leading space, then a plus sign, two digits, one space, and then ten digits.

Our conclusion is that white space does not matter in the input, and hence, we will ignore any white space in the input.

So, we will match phone numbers consisting of (after remove all white space):

Solution

Our solution strategy is simple. Reading in the input line by line, for each line read, we start off by removing all white space. Then, we replace a leading plus sign with two zeros (00). And if the string starts with an opening parenthesis, two numbers, and a closing parenthesis, we replace those four characters with 0000.

The input contains a valid phone number if, and only if, we now have a line consisting of exactly fourteen digits. If so, we print the original, unmodified, line.

Perl

while (<>) {
    print if s{\s+}           {}gr     # Remove white space
          =~ s{^\+}           {00}r    # Replace leading + with 00
          =~ s{^\([0-9]{2}\)} {0000}r  # Replace leading (NN) with 0000
          =~  /^[0-9]{14}$/            # Check if 14 digits are left
}

Here, we are making use the the /r modifier of a substitution. This leaves the string matched against unmodified — instead, it causes the s/// to return a copy of the original string, with the modifications applied.

This means we can chain our modifications, and check if the result contains exactly fourteen digits.

Find the full program on GitHub.

AWK

{
    line = $0;
    gsub (/ */, "", line)                 # Remove spaces
    sub (/^\+/, "00", line)               # Replace leading + with 00
    sub (/^\([0-9]{2}\)/, "0000", line)   # Replace leading (NN) with 0000
}

In our AWK solution, we start off by making the same modification as we did in our Perl solution. We then print the number if and only we have fourteen digits left:

match (line, /^[0-9]{14}$/) {             # Match exactly 14 digits
    print
}

print without arguments prints $0, which is the original input line.

Find the full program on GitHub.

Bash

By default, if we read a line of input in bash, it splits the line into words on white space. Which may be reconcatenated with a single space if the read command has less variables as arguments than there are words on the input line. This is not good enough for us, as that means a sequence of more than one space is lost.

But we can disable this by setting the IFS variable to an empty string. Bash uses the characters in IFS to split on — but if there are no characters, the input line is left as is.

Bash doesn't pattern matching in the sense that is returns a true/false value. It does only substitution. So, instead of checking whether we're left with fourteen digits, we remove fourteen digits, and check whether we're left with an empty string.

IFS=""  # This way, we keep the spaces as is.

valid="[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]"

while read line
do    raw=${line// }                   # Remove spaces
      raw=${raw/#+/00}                 # Replace leading + with 00
      raw=${raw/#([0-9][0-9])/0000}    # Replace leading (NN) with 0000
      left=${raw/$valid}               # Remove 14 digits
      if [ "X$left" == "X" ]           # If nothing left, the input is valid
      then echo $line                  # Print it
      fi
done

Substitution in bash looks a bit different. The syntax is ${variable/pattern/replacement}. If the pattern starts with a /, it's a global replacement. If the replacement is empty, we may leave off the second /. So, ${line// } is not replacing an empty string with a space; instead, it removes all spaces.

Substitution does not modify the variable, instead, it returns the modified string (just as in Perl if the /r modifier is used).

To anchor a pattern at the beginning, Bash uses #. Parenthesis are not special, so they do not have to be escaped. Bash does not have a {NN} quantifier, so we have to spell out "fourteen times a digit".

Find the full program on GitHub.

C

As usual, C makes us work hard. For each line of input, we make a copy — sort off. We copy the line character by character. but we skip white space. And, if the first character is a +, we write two 0s instead. After copying, we check whether the first character is a (, and the fourth a ')'. If so, we replace both those characters with a 0.

char *  raw     = NULL;
size_t  len     = 0;
size_t  str_len;
while ((str_len = getline (&raw, &len, stdin)) != -1) {
    /*
     * Make a copy of line, but without the spaces
     */
    char * line;
    if ((line = (char *) malloc ((str_len + 2) * sizeof (char))) == NULL) {
        perror ("Malloc failed");
        exit (1);
    }
    char * raw_ptr  = raw;
    char * line_ptr = line;
    while (* raw_ptr) {
        /*
         * Skip white space
         */
        if (isspace (* raw_ptr)) {
            raw_ptr ++;
            continue;
        }

        /*
         * If the first character is a '+', write two 0s.
         */
        if (line_ptr == line && * raw_ptr == '+') {
            * line_ptr ++ = '0';
            * line_ptr ++ = '0';
            raw_ptr ++;
            continue;
        }
        * line_ptr ++ = * raw_ptr ++;
    }
    * line_ptr = '\0';
    /*
     * If the first character is a '(', and the fourth is a ')',
     * replace both of them with 0
     */
    if (line [0] == '(' && line [3] == ')') {
        line [0] = '0';
        line [3] = '0';
    }

Now all we need to do is check whether we have exactly fourteen digits; we print the input line if this is the case.

    bool valid = true;
    for (size_t i = 0; i < 14 && valid; i ++) {
        if (!isdigit (line [i])) {
            valid = false;
        }
    }
    if (valid && line [14] == '\0') {
        printf ("%s", raw);
    }

    free (line);
}
free (raw);

Find the full program on GitHub.

Lua

Our Lua solution is slightly different. We do remove spaces first, but then we just do a pattern match for the three cases. Note that lua regular expressions are a bit bare bones: there is no {NN} quantifier, so we have lots of [0-9] character classes.

for line in io . lines () do
    local number = line : gsub (" ", "")   -- Remove spaces
    if                   -- nnnn nnnnnnnnnn
          number : find ("^[0-9][0-9][0-9][0-9][0-9][0-9][0-9]"   ..
                          "[0-9][0-9][0-9][0-9][0-9][0-9][0-9]$") or
                         -- +nn  nnnnnnnnnn
          number : find ("^%+[0-9][0-9][0-9][0-9][0-9]"           ..
                          "[0-9][0-9][0-9][0-9][0-9][0-9][0-9]$") or
                         -- (nn) nnnnnnnnnn
          number : find ("^%([0-9][0-9]%)[0-9][0-9][0-9]"         ..
                          "[0-9][0-9][0-9][0-9][0-9][0-9][0-9]$")
    then  print (line)
    end
end

Find the full program on GitHub.

Node.js

For our Node.js solution, we're back to the original method: remove whitespace, replace leading +, replace leading (NN):

require ('readline')
. createInterface ({input: process . stdin})   
. on ('line', _ => {
    if (_ . replace (/\s+/g,      "")             // Remove spaces
          . replace (/^\+/,       "00")           // Replace leading '+' with 00
          . replace (/^\([0-9][0-9]\)/, "0000")   // Replace leading '(NN)'
                                                  // with 0000
          . match   (/^[0-9]{14}$/)) {            // Match 14 digits
        console . log (_)
    }
})

Find the full program on GitHub.

Python

As often is the case, Python makes for the most compact program, due to its clever significant white space.

import fileinput
import re

for line in fileinput . input ():
    plain = re . sub (r'\s+',             "",     line)  # Strip whitespace
    plain = re . sub (r'^\+',             "00",   plain) # Replace leading +
    plain = re . sub (r'^\([0-9][0-9]\)', "0000", plain) # Replace leading (NN)
    if re . search (r'^[0-9]{14}$', plain):              # Chech for 14 digits
        print (line, end = '')

Find the full program on GitHub.

Ruby

Same algorithm, this in in Ruby:

ARGF . each_line do |_|
     if _ .  gsub(/\s+/,           "")     # Remove white space
          .   sub(/^\+/,           "00")   # Replace leading + with 00
          .   sub(/^\([0-9]{2}\)/, "0000") # Replace leading (NN) with 0000
          . match /^[0-9]{14}$/            # Exactly 14 digits should be left
     then print (_)
     end
end

Find the full program on GitHub.


Please leave any comments as a GitHub issue.