Perl Weekly Challenge 115: String Chain

We can turn the challenge into a graph problem. Let $\mathcal{G} = (V, E)$ be a directed graph where the set of vertices $V$ is defined by the beginning and end characters of the given strings. For each of the given strings, there will be a directed edge in the set of edges $E$: an edge from the beginning character to the end character of the strings.

So, for the first example, we get $V = (a, c, d)$ and $E = (\langle a, c\rangle, \langle d, a\rangle, \langle c, d\rangle)$. Note since $\mathcal{G}$ is a directed graph, the direction of the edges matter: $\langle a, b \rangle$ is a different edge than $\langle b, a \rangle$.

To determine whether the graph contains a cycle, we will calculate the transitive closure of the graph. The transitive closure of a graph $\mathcal{G} = (V, E)$ is $\mathcal{G}^T = (V, E^T)$, where $\langle v_1, v_2 \rangle \in E^T \iff v_1 \in V \wedge v_2 \in V \wedge$ there is a path from $v_1$ to $v_2$ in $\mathcal{G}$. A graph $\mathcal{G}$ contains a cycle, iff the set $\{v \in V | \langle v, v \rangle \in E^T\}$ is not empty. That is, the transitive closure has at least one edge with the same begin and end point.

To calculate the transitive closure, we make use of a simplified Floyd-Warshall algorithm. This algorithm is used to calculate the length all the shortest paths between any pair of vertices in a graph. However, for the transitive closure, all we care about is that the shortest path isn't $\infty$.

The algorithm takes a graph as a adjacency matrix as input. That is, matrix [x] [y] is true of the graph has a directed edge between x and y, and false otherwise.

In pseudo code, the algorithm is as follows (V is the set of vertices of the graph, matrix is the adjacency matrix of the graph):

for k in V
do  for i in V
    do  for j in V
        do  matrix [i] [j] = true if matrix [i] [k] and matrix [k] [j]
        done
    done
done

At the end of the procedure, we're left with the transitive closure in matrix.

It is easy to see that the running time of this algorithm is $\Theta(|V|^3)$. For some cases, this may not be optimal; running Dijkstra's algorithm for each vertex takes $\mathcal{O} (|E||V| + |V|^2 \log |V|)$, which is better than $\Theta(|V|^3)$ if $|E| \ll |V|^2$. However, Floyd-Warshall is really easy to implement, and if the strings are ASCII strings, $|V|$ is small. So, we go with Floyd-Warshall.

We will be assuming we get a set of strings on each line of input (so, we get one answer per line of input), the string separated by white space.

Perl

There is a module on CPAN which calculates the transitive closure, Algorithm::Graphs::TransitiveClosure, and it uses the Floyd Warshall algorithm. Normally, I wouldn't use a CPAN module doing all or most of the task, except when I authored the module. And I did write and upload Algorithm::Graphs::TransitiveClosure to CPAN back in 1998.

So, all what is needed to do is to read the input and create an adjacency matrix:

foreach my $node (split) {
    $$graph {substr $node, 0, 1} {substr $node, -1} = 1;
}

call the algorithm (which modifies the given matrix):

use Algorithm::Graphs::TransitiveClosure qw [floyd_warshall];
floyd_warshall $graph;

and check for cycles:

say grep ({$$graph {$_} {$_}} keys %$graph) ? 1 : 0;

Find the full program on GitHub.

AWK

First, we read in the strings, and build a graph from it. We keep two datastructures, nodes, which tracks the nodes in our graph (each string contributes its first and last character as a node in the graph), and graph, which is the adjacency matrix of the graph. It's a two dimensional array, which has a true value if there a directed edge between the nodes. AWK has proper multi-dimensional arrays and autovivifies, so we don't have to declare any (sub-)arrays.

delete graph
delete nodes
for (i = 1; i <= NF; i ++) {
    first = substr ($i, 1, 1)
    last  = substr ($i, length ($i))
    graph [first, last] = 1
    nodes [first]       = 1
    nodes [last]        = 1
}

Now, we can apply Floyd-Warshall:

for (k in nodes) {
    for (i in nodes) {
        for (j in nodes) {
            if (graph [k, j] > 0 && graph [i, k] > 0) {
                graph [i, j] = 1
            }
        }
    }
}

All what is left is to find out whether there is a node which is reachable from itself. If so, the answer is 1, else the answer is 0:

out = 0
for (i in nodes) {
    if (graph [i, i] > 0) {
        out = 1
    }
}

print (out)

Find the full program on GitHub.

Lua

The Lua solution is similar to the AWK solution, but Lua doesn't have autovivification, so we have to do a little bit more work. Reading in the data (the line of input is in the variable line), and building the graph:

local graph = {}
local nodes = {}
for s in line : gmatch ("%S+")
do  local first = s : sub ( 1,  1)
    local last  = s : sub (-1, -1)
    if   graph [first] == nil
    then graph [first] =  {}
    end
    graph [first] [last] = 1
    nodes [first] = 1
    nodes [last]  = 1
end

Making sure the right sub arrays (or tables as Lua calls them) exist:

for node1 in pairs (nodes)
do  for node2 in pairs (nodes)
    do  if   graph [node1] == nil
        then graph [node1] = {}
        end
        if   graph [node1] [node2] == nil
        then graph [node1] [node2] = 0
        end
    end
end

Floyd-Warshall:

for k in pairs (nodes)
do  for i in pairs (nodes)
    do  for j in pairs (nodes)
        do  if   graph [i] [j] == 0 and graph [k] [j] == 1 and
                                        graph [i] [k] == 1
            then graph [i] [j] = 1
            end
        end
    end
end

Finding the final answer, and printing it:

local out = 0
for i in pairs (nodes)
do  if   graph [i] [i] == 1
    then out = 1
    end
end

print (out)

Find the full program on GitHub.

There are not hashes or associative arrays build in in C. Therefore, to make life for ourselves easier, we assume the strings consist of lowercase ASCII letters. There are 26 of them, so we have hard coded the size of the datastructure:

# include <stdbool.h>
# define NR_OF_LETTERS ('z' - 'a' + 1)
bool graph [NR_OF_LETTERS] [NR_OF_LETTERS];
for (size_t i = 0; i < NR_OF_LETTERS; i ++) {
    for (size_t j = 0; j < NR_OF_LETTERS; j ++) {
        graph [i] [j] = false;
    }
}

We can now parse the data, and populate the adjacency matrix. Here, line is a pointer to a line from stdin

# include <ctype.h>
char * line_ptr = line;
while (*line_ptr) {
    while (*line_ptr && !islower (*line_ptr)) {
        line_ptr ++;  /* Skip whitespace */
    }
    if (!*line_ptr) {
        break;        /* End of string reached */
    }
    char start = *line_ptr;
    char end   = *line_ptr ++;
    while (*line_ptr && islower (*line_ptr)) {
        end = *line_ptr ++;
    }
    graph [start - 'a'] [end - 'a'] = true;
}

Floyd-Warshall:

for (size_t k = 0; k < NR_OF_LETTERS; k ++) {
    for (size_t i = 0; i < NR_OF_LETTERS; i ++) {
        for (size_t j = 0; j < NR_OF_LETTERS; j ++) {
            graph [i] [j] = graph [i] [j] ||
                           (graph [k] [j] && graph [i] [k]);
        }
    }
}

Finding and printing the answer:

short out = 0;
for (size_t i = 0; i < NR_OF_LETTERS; i ++) {
    if (graph [i] [i]) {
        out = 1;
        break;
    }
}

printf ("%d\n", out);

Find the full program on GitHub.

Bash

Bash does have associative arrays, but no easy way do multidimensional arrays. But we can use a trick. Each of our nodes is exactly one character, so in this case, we can simulate 2-d indices by concatenating both coordinates. And since in Bash, concatenation is done by just sticking tokens together, instead of writing graph [x] [y] or graph [x, y], we will write graph[$x$y].

We have to declare we're using associative arrays:

declare -A nodes
declare -A graph

Reading in a line from stdin, splitting them on whitespace, and then placing the results in an array strings:

while read -a strings

We can now parse the data, and build our graph:

for   string in ${strings[@]}
do    first=${string:0:1}
      last=${string:$((${#string}-1)):1}
      nodes[$first]=1
      nodes[$last]=1
      graph[$first$last]=1
done

Note the way to get substrings in Bash: ${string:P:L} which is the substring of string from position P and length L. To get the length of a string, we use the syntax ${#string}.

${array[@]} expands the array array to a list of tokens, each token a value of the array. ${!array[@]} does the same, but then for the keys of the array. (This is more or less equivalent to values @array (or values %hash) and keys @array (or keys %hash) in Perl.)

We can now apply Floyd-Warshall:

for   k in ${!nodes[@]}
do    for i in ${!nodes[@]}
      do    for   j in ${!nodes[@]}
            do    if [ X${graph[$k$j]} == X1 -a \
                       X${graph[$i$k]} == X1 ]
                  then graph[$i$j]=1
                  fi
            done
      done
done

And finding the answer:

out=0
for   i in ${!nodes[@]}
do    if   [ X${graph[$i$i]} == X1 ]
      then out=1
      fi
done

echo $out

Find the full program on GitHub.

Other languages

We also have implementations in Node.js, Python, and Ruby. They're all similar to the AWK or Lua solution above.

Perl Weekly Challenge 115: String Chain

Challenge

Examples

Discussion

Solution

Perl

AWK

Lua

C

Bash

Other languages