Perl Weekly Challenge 369: Valid Tag

The examples baffled me initially. In particular, the-last-of-the-mohicans doesn't result in any capital letters (so, - isn't seen as a word boundary), yet 24-Hour results in the H being capitalize, suggesting that - does break words.

In other to make sense of the examples, we don't follow the instructions as given, but use a different order:

First remove all the non-letter, non-space characters.
And only then sort out the capitalization.
Finally, join everything together, and trim everything together.

Perl

For our Perl solution, we do all the modifications using regular expression. And we chain them, using the /r modifier.

We will do the following steps, in order:

Remove all non-space, non-letter characters (s/[^\s\pL]+//gr)
Remove leading white space (s/^\s+//r)
Remove trailing white space (s/\s+$//r)
Lower case the entire string (s/(.*)/\L$1/r)
Capitalize each letter which follows white space, and delete that white space (s/\s+(\pL)/\u$1/gr)
Add a leading # (s/^/#/r)
Keep the first 100 characters, and delete the rest (s/.{100}\K.*//r).

This leads to the following program, where $_ contains the line we're processing:

say s/[^\s\pL]+//gr      =~   # Remove non-letters, but keep the spaces
    s/^\s+//r            =~   # Remove leading spaces
    s/\s+$//r            =~   # Remove trailing spaces
    s/(.*)/\L$1/r        =~   # All lower case
    s/\s+(\pL)/\u$1/gr   =~   # Capitalize first letter of each word,
                              # and delete the space proceeding it
    s/^/#/r              =~   # Add leading '#'
    s/.{100}\K.*//r           # Remove all characters after the 100th.

Find the full program on GitHub.

Tcl

Our solutions in most other languages follow the same steps as our Perl solution, but many don't have an equivalent to s/\s+(\pL)/\u$1/gr. Tcl does have a totitle function, which is similar to Perl's ucfirst.

This leads to the following program, where $input contains the input:

# * Remove non-letters, but keep space
# * Remove leading white space
# * Remove trailing white space
#
regsub -all          {[^[:alpha:]\s]+} $input {}               input
regsub               {^\s+}            $input {}               input     
regsub               {\s+$}            $input {}               input

#
# Lower case entire string
#
set input [string tolower $input]

#
# * For each sequence of letters (a word), upper case its first letter.
#   This upper cases the first letter of the string, hence
# * Lower case the first letter
# * Remove all white space
#
regsub -all -command {[[:alpha:]]+}    $input {string totitle} input
regsub      -command {^[[:alpha:]]}    $input {string tolower} input
regsub -all          {\s+}             $input {}               input

#
# Add leading '#', and print at most 100 characters
#
puts [string range "#${input}" 0 99]

The -command switch means the substitution part is seen as a piece of code, and executed (like the /e modifier in Perl), but it implicitly passes in the matched part.

Find the full program on GitHub.

C doesn't come with many functionality to manipulate strings. So, we will iterate over the string, skipping over the non-letters, and keeping track when we saw space.

Given the input in line, we use the following program:

# include <stdlib.h>
# include <stdio.h>
# include <stdbool.h>
# include <ctype.h>

char * ptr     = line;
bool saw_space = false;
short chars    = 0;

/*
 * Skip leading spaces, and non letters
 */
while (* ptr && !isalpha (* ptr)) {ptr ++;}

/*
 * Print the leading #, and first character
 */
printf ("#%c", tolower (* ptr ++));
chars += 2;
while (* ptr && chars < MAX_CHARS) {
    char ch = * ptr ++;
    if (isspace (ch)) {
        /*
         * If we saw a space, skip it, but remember we did see it
         */
        saw_space = true;
        continue;
    }
    if (!isalpha (ch)) {
        /*
         * If it's not a letter, skip it. Don't modify the
         * saw_space status
         */
        continue;
    }
    /*
     * We now have a letter. If we saw a space, print its
     * upper case, else print its lower case.
     * The saw_space status will be turned off
     */
    printf ("%c", saw_space ? toupper (ch) : tolower (ch));
    saw_space = false;
    chars ++;
}
printf ("\n");

Find the full program on GitHub.

Other Languages

We also have solutions in AWK, Bash, Go, Lua, Node.js, Python, R, Ruby and sed, all using more or less the same steps as our Perl and Tcl solutions.

Perl Weekly Challenge 369: Valid Tag

Challenge

Examples

Solution

Perl

Tcl

C

Other Languages