You are given a text file.
Write a script to display all valid phone numbers in the given text file.
+nn nnnnnnnnnn (nn) nnnnnnnnnn nnnn nnnnnnnnnn
0044 1148820341 +44 1148820341 44-11-4882-0341 (44) 1148820341 00 1148820341
0044 1148820341 +44 1148820341 (44) 1148820341
Our eye is drawn to the first acceptable format, and the second entry in the input file. The acceptable format has no leading space, a plus sign, two digits, two spaces, and then ten digits.
However, the entry in the input file, and which, according to the output, is valid has one leading space, then a plus sign, two digits, one space, and then ten digits.
Our conclusion is that white space does not matter in the input, and hence, we will ignore any white space in the input.
So, we will match phone numbers consisting of (after remove all white space):
+
) followed by twelve digits.(
), two digits, closing paren ()
), then ten digits.Our solution strategy is simple. Reading in the input line by line,
for each line read, we start off by removing all white space.
Then, we replace a leading plus sign with two zeros (00
). And if
the string starts with an opening parenthesis, two numbers, and
a closing parenthesis, we replace those four characters with 0000
.
The input contains a valid phone number if, and only if, we now have a line consisting of exactly fourteen digits. If so, we print the original, unmodified, line.
while (<>) {
print if s{\s+} {}gr # Remove white space
=~ s{^\+} {00}r # Replace leading + with 00
=~ s{^\([0-9]{2}\)} {0000}r # Replace leading (NN) with 0000
=~ /^[0-9]{14}$/ # Check if 14 digits are left
}
Here, we are making use the the /r
modifier of a substitution.
This leaves the string matched against unmodified — instead,
it causes the s///
to return a copy of the original string,
with the modifications applied.
This means we can chain our modifications, and check if the result contains exactly fourteen digits.
Find the full program on GitHub.
{
line = $0;
gsub (/ */, "", line) # Remove spaces
sub (/^\+/, "00", line) # Replace leading + with 00
sub (/^\([0-9]{2}\)/, "0000", line) # Replace leading (NN) with 0000
}
In our AWK solution, we start off by making the same modification as we did in our Perl solution. We then print the number if and only we have fourteen digits left:
match (line, /^[0-9]{14}$/) { # Match exactly 14 digits
print
}
print
without arguments prints $0
, which is the original input line.
Find the full program on GitHub.
By default, if we read a line of input in bash, it splits the line
into words on white space. Which may be reconcatenated with a single
space if the read
command has less variables as arguments than there
are words on the input line. This is not good enough for us, as that
means a sequence of more than one space is lost.
But we can disable this by setting the IFS
variable to an empty
string. Bash uses the characters in IFS
to split on — but if there
are no characters, the input line is left as is.
Bash doesn't pattern matching in the sense that is returns a true/false value. It does only substitution. So, instead of checking whether we're left with fourteen digits, we remove fourteen digits, and check whether we're left with an empty string.
IFS="" # This way, we keep the spaces as is.
valid="[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]"
while read line
do raw=${line// } # Remove spaces
raw=${raw/#+/00} # Replace leading + with 00
raw=${raw/#([0-9][0-9])/0000} # Replace leading (NN) with 0000
left=${raw/$valid} # Remove 14 digits
if [ "X$left" == "X" ] # If nothing left, the input is valid
then echo $line # Print it
fi
done
Substitution in bash looks a bit different. The syntax is
${variable/pattern/replacement}
. If the pattern starts
with a /
, it's a global replacement. If the replacement
is empty, we may leave off the second /
. So, ${line// }
is not replacing an empty string with a space; instead, it
removes all spaces.
Substitution does not modify the variable, instead, it returns
the modified string (just as in Perl if the /r
modifier is used).
To anchor a pattern at the beginning, Bash uses #
. Parenthesis
are not special, so they do not have to be escaped. Bash does not
have a {NN}
quantifier, so we have to spell out "fourteen times a digit".
Find the full program on GitHub.
As usual, C makes us work hard. For each line of input, we make a copy —
sort off. We copy the line character by character. but we skip white
space. And, if the first character is a +
, we write two 0
s instead.
After copying, we check whether the first character is a (
, and
the fourth a ')'. If so, we replace both those characters with a 0
.
char * raw = NULL;
size_t len = 0;
size_t str_len;
while ((str_len = getline (&raw, &len, stdin)) != -1) {
/*
* Make a copy of line, but without the spaces
*/
char * line;
if ((line = (char *) malloc ((str_len + 2) * sizeof (char))) == NULL) {
perror ("Malloc failed");
exit (1);
}
char * raw_ptr = raw;
char * line_ptr = line;
while (* raw_ptr) {
/*
* Skip white space
*/
if (isspace (* raw_ptr)) {
raw_ptr ++;
continue;
}
/*
* If the first character is a '+', write two 0s.
*/
if (line_ptr == line && * raw_ptr == '+') {
* line_ptr ++ = '0';
* line_ptr ++ = '0';
raw_ptr ++;
continue;
}
* line_ptr ++ = * raw_ptr ++;
}
* line_ptr = '\0';
/*
* If the first character is a '(', and the fourth is a ')',
* replace both of them with 0
*/
if (line [0] == '(' && line [3] == ')') {
line [0] = '0';
line [3] = '0';
}
Now all we need to do is check whether we have exactly fourteen digits; we print the input line if this is the case.
bool valid = true;
for (size_t i = 0; i < 14 && valid; i ++) {
if (!isdigit (line [i])) {
valid = false;
}
}
if (valid && line [14] == '\0') {
printf ("%s", raw);
}
free (line);
}
free (raw);
Find the full program on GitHub.
Our Lua solution is slightly different. We do remove spaces
first, but then we just do a pattern match for the three
cases. Note that lua regular expressions are a bit bare bones:
there is no {NN}
quantifier, so we have lots of [0-9]
character classes.
for line in io . lines () do
local number = line : gsub (" ", "") -- Remove spaces
if -- nnnn nnnnnnnnnn
number : find ("^[0-9][0-9][0-9][0-9][0-9][0-9][0-9]" ..
"[0-9][0-9][0-9][0-9][0-9][0-9][0-9]$") or
-- +nn nnnnnnnnnn
number : find ("^%+[0-9][0-9][0-9][0-9][0-9]" ..
"[0-9][0-9][0-9][0-9][0-9][0-9][0-9]$") or
-- (nn) nnnnnnnnnn
number : find ("^%([0-9][0-9]%)[0-9][0-9][0-9]" ..
"[0-9][0-9][0-9][0-9][0-9][0-9][0-9]$")
then print (line)
end
end
Find the full program on GitHub.
For our Node.js solution, we're back to the original method:
remove whitespace, replace leading +
, replace leading (NN)
:
require ('readline')
. createInterface ({input: process . stdin})
. on ('line', _ => {
if (_ . replace (/\s+/g, "") // Remove spaces
. replace (/^\+/, "00") // Replace leading '+' with 00
. replace (/^\([0-9][0-9]\)/, "0000") // Replace leading '(NN)'
// with 0000
. match (/^[0-9]{14}$/)) { // Match 14 digits
console . log (_)
}
})
Find the full program on GitHub.
As often is the case, Python makes for the most compact program, due to its clever significant white space.
import fileinput
import re
for line in fileinput . input ():
plain = re . sub (r'\s+', "", line) # Strip whitespace
plain = re . sub (r'^\+', "00", plain) # Replace leading +
plain = re . sub (r'^\([0-9][0-9]\)', "0000", plain) # Replace leading (NN)
if re . search (r'^[0-9]{14}$', plain): # Chech for 14 digits
print (line, end = '')
Find the full program on GitHub.
Same algorithm, this in in Ruby:
ARGF . each_line do |_|
if _ . gsub(/\s+/, "") # Remove white space
. sub(/^\+/, "00") # Replace leading + with 00
. sub(/^\([0-9]{2}\)/, "0000") # Replace leading (NN) with 0000
. match /^[0-9]{14}$/ # Exactly 14 digits should be left
then print (_)
end
end
Find the full program on GitHub.