Perl Weekly Challenge 370: Popular Word

We will start off by parsing the input, separating the paragraph and the banned words, and putting the latter into a hash (%banned):

my ($paragraph, $banned) = split m!\s*//\s*!;
my  %banned = map {$_ => 1} lc ($banned) =~ /\pL+/g;

Splitting the list of banned words on white space would have been an option, but here we extract them by search for sequences of letters: /\pL+/g.

We then initialize the hash (%ok) which we will use to tally the unbanned words. We add a special entry named $max with value 0 to avoid using an additional variable later on:

my %ok = ((my $max = "") => 0);

We can now extract the words from the paragraph, again using /\pL+/g, check whether they are banned, and if not, tally them in the %ok hash. This is a one liner:

$ok {$_} ++ for grep {!$banned {$_}} lc ($paragraph) =~ /\pL+/g;

Finally, we find the most frequently used word by iterating over the %ok hash, and keeping track of the best seen so far:

$ok {$_} > $ok {$max} && ($max = $_) for (keys %ok);

This leaves the answer in $max.

Find the full program on GitHub.

Tcl

Tcl code looks very different from Perl code, but we still use almost the same algorithm. Here, we also start with parsing the input, but we need a trick. Tcl does have a split function, but it can only split on single characters. So, we first replace the // in the input by a NUL character ("\x{00}"), then split on the NUL character:

lassign [split [string map {// \0} $line] \0] paragraph banned_str

In Tcl, you don't have a split on white space, you can just iterate over a string; so we do that to populate the banned dictionary:

set banned [dict create]
foreach ban [string tolower $banned_str] {
    dict set banned $ban 1
}

We now replace all sequences of non-letters in the paragraph by a single space, so we're left with just the words separated by white space:

regsub -all {[^[:alpha:]]+} [string tolower $paragraph] { } paragraph

We initialize a dictionary, and populate it with a special value (the empty string) and a count of 0, like with did in the Perl solution:

set ok [dict create {} 0]

Counting the unbanned words:

foreach word $paragraph {
    if {![dict exists $banned $word]} {
        dict incr ok $word
    }
}

Finally, find the most frequently occuring word:

set max {}
dict for {word count} $ok {
    if {$count > [dict get $ok $max]} {
        set max $word
    }
}

Find the full program on GitHub.

C doesn't have hashes, nor does it have much support for string manipulation — strings just being NUL terminated arrays of chars.

We first parse the input. We will use two arrays words, and banned, consisting of the lower cased words in the paragraph, and the banned words. We will use a trick: the current read line is pointed to by the char * pointer line. Instead of copying parts of the string, we'll just insert NUL characters after the end of word, and put pointers to the first characters of the words in the words and banned arrays.

We first need to allocate memory for the words and banned arrays — initially empty, but we'll expand when necessary:

We then iterate over the input, and for each character, do the following: 1. Lower case the character 2. If the current character is a letter, but the previous one wasn't, this is the start of a word. Put it either in the words or banned arrays, after increasing their size. 3. If the character is a /, and the character is a / as well, switch from parsing the paragraph to parsing the banned words. 4. If the character isn't a letter, make it a NUL character. This makes that all the words found are proper C strings.

Next step is to eliminate the banned words from the words array. Banned words will be replaced by NULL. We need to iterate over the banned array to know whether the word is banned.

Finally, we count how often each word occurs, keeping track of the most frequently occuring word, by using a double loop over the words array.

Perl Weekly Challenge 370: Popular Word

Challenge

Solution

Input

Perl

Tcl

C