Perl Weekly Challenge 368: Make It Bigger

by Abigail

Challenge

You are given a given a string number and a character digit.

Write a script to remove exactly one occurrence of the given character digit from the given string number, resulting the decimal form is maximised.

Examples:

    Input: $str = "15456", $char = "5"
    Output: "1546"

    Input: $str = "7332", $char = "3"
    Output: "732"

    Input: $str = "2231", $char = "2"
    Output: "231"

    Input: $str = "543251", $char = "5"
    Output: "54321"

    Input: $str = "1921", $char = "1"
    Output: "921"

Solution

It is not hard to see that we need to remove the left most occurrence of the given character digit which is followed by a larger digit, or, if there is not such an occurrence, we need to remove the right most occurrence of the given character digit.

We will use three different methods to do this, depending on the language:

  1. Using a regular expression based substitution: (Lua, Node.js, Perl, Python, R, Ruby, sed, and Tcl).
  2. Split the string into an array of single character strings; process the array; then print the result: (AWK, bc, and Go).
  3. Find the index of the character we want to remove, then print the string, except the character on the marked index: (Bash, and C).

Our programs will read their input from standard input. Each line consists of an input string, and the character digit, separated by a space. We make the assumption the input does not contain a 0.

Perl

If we want to match a given digit, say $digit, followed by a digit which is larger, one way of doing that is to calculate $ndigit = $digit + 1, then use the pattern /$digit[$ndigit-9]/. And we have to make an exception for $digit == 9, as /9[10-9]/ is not going to match what we want.

But Perl allows use to something more clever. Consider the character classes [$ndigit-9] and [$digit-9] (remember, $ndigit == $digit + 1). The former matches all digits from $ndigit to 9. The latter matches all digits from $digit to 9, so exactly one digit more — this digit being $digit. Using a modern version of Perl, we can make use of that, so we can avoid calculating $ndigit. Consider

    s/$digit(?=(?[[$digit-9] - [$digit]]))//

The constuct (?[ ]) allows use to do arithmetic with character classes. Here, we start with the character class [$digit-9], then subtract the character class [$digit]. Hence, (?[[$digit-9] - [$digit]]) matches any digits from $digit to 9, excluding $digit. So, it matches the same as [$ndigit-9], without having to calculate $ndigit. As an added bonus, it will do 'the right thing' if $digit == 9, that is, it will never match.

So, the pattern above will match a digit $digit, which is followed by a larger digit (the following digit will not be part of the match, due to the use of (?= ), which is a the zero-width lookahead construct). Thus, if we substitute this by the empty string, we have removed the digit $digit, and nothing else.

This takes care of removing the left most occurrence of the character digit which is followed by a larger digit.

Removing the last occurrence of the digit, which we need to do with there is no occurrence of the character digit followed by a larger one, can be done as follows:

    s/.*\K$digit//

The .* part causes the entire pattern the match the last occurrence of the character digit $digit. The \K construct keeps the match to the left of it — thus only the digit $digit gets removed, and the part before it kept.

This leads to the following code:

    my ($input, $digit) = split;
    $input =~ s/$digit(?=(?[[$digit-9] - [$digit]]))// ||
    $input =~ s/.*\K$digit//;
    say $input;

Find the full program on GitHub.

Python

Python regular expressions do have zero-width lookahead, but not the character class arithmetic Perl has. So we need to do a bit more work.

We first try the first possibility, removing the left most character digit which is followed by a larger digit:

    [input, digit] = line . rstrip () . split ()

    n = 0   
    if digit != "9":
        ndigit = str (int (digit) + 1)
        pat = digit + "(?=[" + ndigit + "-9])"
        [ninput, n] = re . subn (pat, "", input, count = 1)

Unlike its Perl equivalence, Python's subn method always returns two values: the string, ninput, after substitution (or the same string if there was no match), and the number of substitutions made (n).

We can use the value of n to determine whether we need to remove the last occurrence of the character digit:

    if n == 0:
        ninput = re . sub ("(.*)" + digit, "\\1", input, count = 1)

Find the full program on GitHub.

Ruby

Ruby's sub! method doesn't return whether the substitution succeeded — it just the modifies the string. But it can take a block as argument, using the result of the block as substition; and we can use this block to signal success:

    n = 0
    if digit != "9"
        pat = Regexp . new digit + "(?=[" + (digit . to_i + 1) . to_s + "-9])"
        input.sub!(pat) {n = 1; ""}
    end

    if n == 0
        pat = Regexp . new "(.*)" + digit
        input.sub! pat, "\\1"
    end

Find the full program on GitHub.

We have implementations in different languages which use similar approaches as to ones above: Lua, Node.js, R, and Tcl.

(GNU) sed

Our implementation in sed uses regular expressions, but in a different way than the implementations above. sed doesn't have variables, just the current line of input. Hence, we can't paste a pattern together.

We need a different approach. We're going to make use of the fact the digit part of the input has only limited set of options: 1 .. 9.

We will start by stripping off the digit from the input, and jumping to a different label for each different possible digit:

    s/ 1$//; tl1
    s/ 2$//; tl2
    s/ 3$//; tl3
    s/ 4$//; tl4   
    s/ 5$//; tl5
    s/ 6$//; tl6
    s/ 7$//; tl7
    s/ 8$//; tl8
    s/ 9$//; tl9

The key here is the t command. If the previous command succeeds, the t jumbs to the label following t. So, if the digit is 1, we just to label l1. If the digit is 2, we jump to label l2, etc. At this moment, we have stripped off the digit and the space before it, meaning we just have the input left.

In the second part, we try the remove the digit:

    :l1; s/1([2-9])/\1/; t; s/(.*)1/\1/; b  # Handle digit == 1
    :l2; s/2([3-9])/\1/; t; s/(.*)2/\1/; b  # Handle digit == 2
    :l3; s/3([4-9])/\1/; t; s/(.*)3/\1/; b  # Handle digit == 3
    :l4; s/4([5-9])/\1/; t; s/(.*)4/\1/; b  # Handle digit == 4       
    :l5; s/5([6-9])/\1/; t; s/(.*)5/\1/; b  # Handle digit == 5
    :l6; s/6([7-9])/\1/; t; s/(.*)6/\1/; b  # Handle digit == 6
    :l7; s/7([8-9])/\1/; t; s/(.*)7/\1/; b  # Handle digit == 7
    :l8; s/8([9-9])/\1/; t; s/(.*)8/\1/; b  # Handle digit == 8
    :l9;                    s/(.*)9/\1/; b  # Handle digit == 9

Consider if, for instance, the digit is 4. We then jump to label l4. We first try the command s/4([5-9])/\1/, which tries to remove the leftmost 4 which is followed by a larger digit. If this succeeds, the t command performs a jump. Since no label follows the t, it jumps to the end of the program, causing sed to print the current (modified) line, and restart the program with the next line of input.

If the substitution fails, that is, there is no 4 followed by a larger digit, we execute s/(.*)4/\1/, which removes the last 4 in the current line of input. The following b is an unconditional jump — and since it doesn't have a label, it jumps to the end of the program, printing out the current line.

Note that if the digit equals 9, we only try to remove the last 9 in the input, as there is no digit which is larger than 9.

Find the full program on GitHub.

(GNU) AWK

In our AWK solution, we're not using regular expressions. Instead, we split the input into an array:

    input = $1
    digit = $2

    n = split (input, chars, "")

chars is now an array with one character strings. n is the number of characters. Note than in AWK, array indexing starts at 0.

A first pass through the array reveals whether the character digit is followed by a larger digit. If so, we'll replace the entry with an empty string, and set matched to a true value.

    matched = 0
    for (i = 1; i < n; i ++) {
        if (chars [i] == digit && chars [i + 1] > digit) {
            chars [i] = ""
            matched   = 1
            break
        }
    }

If we did not find a match, we will remove the last occurrence of the character digit:

    if (matched == 0) {
        for (i = n; i >= 1; i --) {
            if (chars [i] == digit) {
                chars [i] = ""
                break
            }
        }
    }

We can now print the results:

    for (i in chars) {
        printf "%s", chars [i]
    }    
    printf "\n"

Find the full program on GitHub.

Our bc and Go solutions use a similar approach.

Bash

In our Bash solution we won't delete the offending digit from the string; instead, we find the index of the character we'd like to remove, then print the string, omitting the character we want to omit.

First, we try to find the leftmost character digit followed by a larger character:

    ((index = -1))
    for ((i = 0; i < ${#input} - 1; i ++))
    do  ((ch  = ${input:$i:1}))             # Character on index i
        ((nch = ${input:$((i + 1)):1}))     # Character on index i + 1
        if   [[ $ch -eq $digit && $nch -gt $digit ]]
        then ((index = i))
             break
        fi
    done

Bash syntax is a bit unusual. ${#input} is the length of the string in the variable input. ${input:$i:1} gets a substring from input, starting at position $i and one character long. The Perl equivalence is substr ($input, $i, 1). -eq is a numerical equals tests, while -gt is a numerical greater than test. If we find a match, index will be a non-negative number.

If there is no match, we find the index of the last occurrence of the character digit:

     if   [[ $index -eq -1 ]]
     then for ((i = ${#input} - 1; i >= 0; i --))
          do  if   [[ ${input:$i:1} -eq $digit ]]
              then ((index = i))
                   break
              fi
          done      
     fi

We can now print the result

    echo ${input:0:$index}${input:$((index + 1))}

Find the full program on GitHub.

Our C solution uses a similar approach.


Please leave any comments as a GitHub issue.