Perl Weekly Challenge 111: Ordered Letters

by Abigail

Challenge

Given a word, you can sort its letters alphabetically (case insensitive). For example, 'beekeeper' becomes 'beeeeekpr' and 'dictionary' becomes 'acdiinorty'. Write a script to find the longest English words that don't change when their letters are sorted.

Discussion

The English language isn't well defined. There's no authority which dedices which words belong to the English language, and which don't. We therefore resort to a word list, which we assume to be on standard input (one word per line).

This has a pro and a con. The pro is that our solutions are language agnostic — feed it words from a different language, and it finds the longest word with the given constraint (of course, the language needs to be written using ASCII letters).

The con is that the answer depends on the given word list. And even for some word lists, the answer may not be unique: there can be multiple words with the same length.

For instance, the public domain ENABLE word list has two words with seven letters where the letters are in alphabetical order: beefily and billowy, and no longer words.

The file /usr/share/dict/words on my OS six such words. In addition to the two words from the ENABLE list, it also has Adelops, alloquy, begorry, and egilops.

The wordlist originally compiled by infochimps.com has an eight letter word: aegilops.

Solution

We will only consider words containing ASCII letters; any word containing anything else, hyphens, digits, accented letters, we will ignore.

We will consider the input words to be caseless. That is, 'a' sorts before 'B', which sorts before 'c'.

Three different approaches spring to mind.

  1. Split the word into letters, sort the letters, join the letters back into a word, and check whether the word is unmodified.
  2. Iterate over the characters of a word, and check we don't have any two consecutive letters where the second letter comes before the first in the alphabet.
  3. Use a regular expression: /^a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$/i matches any word where the letters are in alphabetical order.

For our solutions below, if the language supports regular expressions, we use the third option. Else, we use the second option.

Perl

my $pat = join "" => map {"$_*"} 'a' .. 'z';

my $longest = "";

while (<>) {
    $longest = $_ if /^$pat$/i && length ($_) > length ($longest)
}

print $longest; 

There's not much to it. Of all the input words matching the pattern (so its letters are in sorted order), remember the longest.

Find the full program on GitHub.

GNU AWK

GNU AWK support IGNORECASE, to do case insensitive matching. First, we create a pattern, by iterating over the ASCII values:

BEGIN {
    IGNORECASE = 1
    longest = ""  
    #
    # Create a pattern /^a*b*...z*$/
    #
    pat = "^"
    for (i = 97; i <= 122; i ++) {
        pat = pat sprintf ("%c*", i)
    }
    pat = pat "$"
}

We can then use the pattern to remember the longest word:

match ($0, pat) && length ($0) > length (longest) {
    longest = $0
}

In an END block, we print the longest word:

END {
    print longest
}

Find the full program on GitHub.

Bash

In Bash, sub patterns can have quantifiers, but the syntax uses a prefix notation instead of a postfix. And it needs to be enabled with shopt:

shopt -s extglob  

pat1="*(a)*(b)*(c)*(d)*(e)*(f)*(g)*(h)*(i)*(j)*(k)*(l)*(m)"
pat2="*(n)*(o)*(p)*(q)*(r)*(s)*(t)*(u)*(v)*(w)*(x)*(y)*(z)"

Bash doesn't have pattern matching — but it does have substitution of a pattern. So, we apply the pattern to the word, replace what was matched by an empty string, and then check whether we're left with an empty string. Those are the matching words.

Note also the unique way of lowercasing a string: ${word,,}.

while read line
do    lower=${line,,}                 # Lower case input
      left=${lower/#$pat1$pat2/}      # Remove pattern
                                      # Test whether nothing left,
                                      # and string larger longest found
      if   [ "X$left" == "X" -a ${#line} -gt ${#longest} ]
      then longest=$line
      fi
done

Find the full program on GitHub.

C

No regular expressions in C. Instead, we iterate over the read string, and check whether all its characters are in alphabetical order:

char *  line         = NULL;
size_t  len          =    0;

while (getline (&line, &len, stdin) != -1) {
    size_t i      = 0;
    bool in_order = 1;
    while (line [i] != '\n') {
        if ((i && tolower (line [i]) < tolower (line [i - 1])) ||
            !isalpha (line [i])) {
            in_order = false;
            break;
        }
        i ++;
    }

Of the words in alphabetical order, we have to remember the longest:

char *  longest      = NULL;
size_t  longest_size =    0;

if (in_order && i > longest_size) {
    longest_size = i;
    if ((longest = (char *) realloc (longest, (i + 1) * sizeof (char)))
         == NULL) {
        perror ("Realloc failed");
        exit (1);
    }
    for (size_t j = 0; j <= i; j ++) {  /* Copies the newline as well */
        longest [j] = line [j];
    }
    longest [i + 1] = '\0';
}

Find the full program on GitHub.

Lua

A short Lua program, using a regular expression. Lua doesn't have case insensitive matching, so we match against a lower case copy of the input:

local longest = ""

for line in io . lines () do
    if   line : lower ()
              : find  ("^a*b*c*d*e*f*g*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$")
         and line : len () > longest : len ()
    then longest = line
    end
end

print (longest)

Find the full program on GitHub.

Node.js

Node.js supports regular expressions, leading to:

let longest = ""  

require ('readline')
. createInterface ({input: process . stdin})
. on ('line', _ => {
    if (_ . match (/^a*b*c*d*e*f*g*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$/i) &&
        _ . length > longest . length) {
        longest = _
    }
})
. on ('close', _ => {  
    console . log (longest)
}) 

Find the full program on GitHub.

Pascal

Pascal doesn't have regular expressions, so we iterate over the characters, comparing each character with the previous one.

uses sysutils;    

var
    line, longest: string;
    ch, prev_ch:   char;
    valid:         boolean;

begin
    longest := '';
    while not eof () do
    begin
        readln (line);
        valid   := true;
        prev_ch := ' ';   (* Any char less than 'a' will do *)

        for ch in lowercase (line) do
        begin
            if (ch < 'a') or (ch > 'z') or (ch < prev_ch) then
            begin
                valid := false;
                break;
            end;
            prev_ch := ch;
        end;
        if valid and (length (line) > length (longest)) then
        begin
            longest := line;
        end;
    end;
    writeln (longest);
end.

Find the full program on GitHub.

Python

First, we create a pattern:

pat = "^"
for x in list (string . ascii_lowercase):
    pat = pat + x + "*"
pat += "$"

Then we match, and remember the longest:

longest = ""
for line in fileinput . input ():
    line = line . strip ()
    if re . match (pat, line . lower ()) and len (line) > len (longest):
        longest = line

print (longest) 

Find the full program on GitHub.

Ruby

Creating the pattern:

pat = "^"
("a" .. "z") . each do
    |letter|
    pat += letter + "*"
end
pat += "$"

Matching:

longest = ""
ARGF . each_line do   
    |line|
    if line . downcase =~ /#{pat}/ && line . length > longest . length
        longest = line
    end
end

puts (longest) 

Find the full program on GitHub.


Please leave any comments as a GitHub issue.