# Perl Weekly Challenge 110: Valid Phone Numbers

by Abigail ## Challenge

You are given a text file.

Write a script to display all valid phone numbers in the given text file.

### Acceptable Phone Number Formats

+nn  nnnnnnnnnn
(nn) nnnnnnnnnn
nnnn nnnnnnnnnn


## Example

### Input File

0044 1148820341
+44 1148820341
44-11-4882-0341
(44) 1148820341
00 1148820341


### Output

0044 1148820341
+44 1148820341
(44) 1148820341


## Discussion

Our eye is drawn to the first acceptable format, and the second entry in the input file. The acceptable format has no leading space, a plus sign, two digits, two spaces, and then ten digits.

However, the entry in the input file, and which, according to the output, is valid has one leading space, then a plus sign, two digits, one space, and then ten digits.

Our conclusion is that white space does not matter in the input, and hence, we will ignore any white space in the input.

So, we will match phone numbers consisting of (after remove all white space):

• A plus sign (+) followed by twelve digits.
• Opening paren ((), two digits, closing paren ()), then ten digits.
• Fourteen digits.

## Solution

Our solution strategy is simple. Reading in the input line by line, for each line read, we start off by removing all white space. Then, we replace a leading plus sign with two zeros (00). And if the string starts with an opening parenthesis, two numbers, and a closing parenthesis, we replace those four characters with 0000.

The input contains a valid phone number if, and only if, we now have a line consisting of exactly fourteen digits. If so, we print the original, unmodified, line.

### Perl

while (<>) {
print if s{\s+}           {}gr     # Remove white space
=~ s{^\+}           {00}r    # Replace leading + with 00
=~ s{^$$[0-9]{2}$$} {0000}r  # Replace leading (NN) with 0000
=~  /^[0-9]{14}$/ # Check if 14 digits are left }  Here, we are making use the the /r modifier of a substitution. This leaves the string matched against unmodified — instead, it causes the s/// to return a copy of the original string, with the modifications applied. This means we can chain our modifications, and check if the result contains exactly fourteen digits. Find the full program on GitHub. ### AWK { line =$0;
gsub (/ */, "", line)                 # Remove spaces
sub (/^\+/, "00", line)               # Replace leading + with 00
sub (/^$$[0-9]{2}$$/, "0000", line)   # Replace leading (NN) with 0000
}


In our AWK solution, we start off by making the same modification as we did in our Perl solution. We then print the number if and only we have fourteen digits left:

match (line, /^[0-9]{14}$/) { # Match exactly 14 digits print }  print without arguments prints $0, which is the original input line.

Find the full program on GitHub.

### Bash

By default, if we read a line of input in bash, it splits the line into words on white space. Which may be reconcatenated with a single space if the read command has less variables as arguments than there are words on the input line. This is not good enough for us, as that means a sequence of more than one space is lost.

But we can disable this by setting the IFS variable to an empty string. Bash uses the characters in IFS to split on — but if there are no characters, the input line is left as is.

Bash doesn't pattern matching in the sense that is returns a true/false value. It does only substitution. So, instead of checking whether we're left with fourteen digits, we remove fourteen digits, and check whether we're left with an empty string.

IFS=""  # This way, we keep the spaces as is.

valid="[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]"

do    raw=${line// } # Remove spaces raw=${raw/#+/00}                 # Replace leading + with 00
raw=${raw/#([0-9][0-9])/0000} # Replace leading (NN) with 0000 left=${raw/$valid} # Remove 14 digits if [ "X$left" == "X" ]           # If nothing left, the input is valid
then echo $line # Print it fi done  Substitution in bash looks a bit different. The syntax is ${variable/pattern/replacement}. If the pattern starts with a /, it's a global replacement. If the replacement is empty, we may leave off the second /. So, ${line// } is not replacing an empty string with a space; instead, it removes all spaces. Substitution does not modify the variable, instead, it returns the modified string (just as in Perl if the /r modifier is used). To anchor a pattern at the beginning, Bash uses #. Parenthesis are not special, so they do not have to be escaped. Bash does not have a {NN} quantifier, so we have to spell out "fourteen times a digit". Find the full program on GitHub. ### C As usual, C makes us work hard. For each line of input, we make a copy — sort off. We copy the line character by character. but we skip white space. And, if the first character is a +, we write two 0s instead. After copying, we check whether the first character is a (, and the fourth a ')'. If so, we replace both those characters with a 0. char * raw = NULL; size_t len = 0; size_t str_len; while ((str_len = getline (&raw, &len, stdin)) != -1) { /* * Make a copy of line, but without the spaces */ char * line; if ((line = (char *) malloc ((str_len + 2) * sizeof (char))) == NULL) { perror ("Malloc failed"); exit (1); } char * raw_ptr = raw; char * line_ptr = line; while (* raw_ptr) { /* * Skip white space */ if (isspace (* raw_ptr)) { raw_ptr ++; continue; } /* * If the first character is a '+', write two 0s. */ if (line_ptr == line && * raw_ptr == '+') { * line_ptr ++ = '0'; * line_ptr ++ = '0'; raw_ptr ++; continue; } * line_ptr ++ = * raw_ptr ++; } * line_ptr = '\0'; /* * If the first character is a '(', and the fourth is a ')', * replace both of them with 0 */ if (line  == '(' && line  == ')') { line  = '0'; line  = '0'; }  Now all we need to do is check whether we have exactly fourteen digits; we print the input line if this is the case.  bool valid = true; for (size_t i = 0; i < 14 && valid; i ++) { if (!isdigit (line [i])) { valid = false; } } if (valid && line  == '\0') { printf ("%s", raw); } free (line); } free (raw);  Find the full program on GitHub. ### Lua Our Lua solution is slightly different. We do remove spaces first, but then we just do a pattern match for the three cases. Note that lua regular expressions are a bit bare bones: there is no {NN} quantifier, so we have lots of [0-9] character classes. for line in io . lines () do local number = line : gsub (" ", "") -- Remove spaces if -- nnnn nnnnnnnnnn number : find ("^[0-9][0-9][0-9][0-9][0-9][0-9][0-9]" .. "[0-9][0-9][0-9][0-9][0-9][0-9][0-9]$") or
-- +nn  nnnnnnnnnn
number : find ("^%+[0-9][0-9][0-9][0-9][0-9]"           ..
"[0-9][0-9][0-9][0-9][0-9][0-9][0-9]$") or -- (nn) nnnnnnnnnn number : find ("^%([0-9][0-9]%)[0-9][0-9][0-9]" .. "[0-9][0-9][0-9][0-9][0-9][0-9][0-9]$")
then  print (line)
end
end


Find the full program on GitHub.

### Node.js

For our Node.js solution, we're back to the original method: remove whitespace, replace leading +, replace leading (NN):

require ('readline')
. createInterface ({input: process . stdin})
. on ('line', _ => {
if (_ . replace (/\s+/g,      "")             // Remove spaces
. replace (/^\+/,       "00")           // Replace leading '+' with 00
. replace (/^$$[0-9][0-9]$$/, "0000")   // Replace leading '(NN)'
// with 0000
. match   (/^[0-9]{14}$/)) { // Match 14 digits console . log (_) } })  Find the full program on GitHub. ### Python As often is the case, Python makes for the most compact program, due to its clever significant white space. import fileinput import re for line in fileinput . input (): plain = re . sub (r'\s+', "", line) # Strip whitespace plain = re . sub (r'^\+', "00", plain) # Replace leading + plain = re . sub (r'^$$[0-9][0-9]$$', "0000", plain) # Replace leading (NN) if re . search (r'^[0-9]{14}$', plain):              # Chech for 14 digits
print (line, end = '')


Find the full program on GitHub.

### Ruby

Same algorithm, this in in Ruby:

ARGF . each_line do |_|
if _ .  gsub(/\s+/,           "")     # Remove white space
.   sub(/^\+/,           "00")   # Replace leading + with 00
.   sub(/^$$[0-9]{2}$$/, "0000") # Replace leading (NN) with 0000
. match /^[0-9]{14}\$/            # Exactly 14 digits should be left
then print (_)
end
end


Find the full program on GitHub.