You are given a text file.
Write a script to transpose the contents of the given file.
name,age,sex Mohammad,45,m Joe,20,m Julie,35,f Cristina,10,f
name,Mohammad,Joe,Julie,Cristina age,45,20,35,10 sex,m,m,f,f
A straight forward solution would be to read in the input line by line, split each line on commas, and put the fields in a two dimensional array. Then transpose the array and printing the result row by row, joining the fields by commas.
We won't do that. Mostly because not all languages we will be using have two dimensional arrays (I'm looking at you, Bash).
Instead, we will read in the input line by line, and split the line on commas and then use the fields to construct the lines we're going to output. That way, we only need a single, one-dimensional array. We're adding the \(n^\text{th}\) field of each input line to the end of the \(n^\text{th}\) output line — and each time we're adding a comma too.
When we have created all the output lines, we print the lines, after chopping off the final comma.
Note that we read the input file from standard input. We also assume the input is sane, that is, each line contains the same number of fields.
map {$- = 0; $; [$- ++] .= "$_," for /[^,\n]+/g} <>;
Here, we use <>
to read the input, and we use map
to process
each line. We're using a regular expression to extract the fields
(/[^,\n]+/
); we could have used split
, but then we would
have needed some extra code to get rid of the trailing newline.
The regular expression just matches anything which isn't a comma
or a newline. Due to the /g
modifier, there are implicit parenthesis,
so the regular expression returns all the fields. Which we iterate
over using for
. We are storing the output lines into the array
@;
. We use the variable $-
to keep track of which line we need
to update. It's set to 0
for each line of input, and incremented
for each field.
chop, say for @;
For each line in @;
, we chop of the final comma, and then we
print it.
Find the full program on GitHub.
We could use split
to split the input into records, or extract the
fields using a regular expression, but why would we in AWK? One of
the strengths of AWK is it is line and field oriented. We just have
to tell AWK our fields aren't white space separated, but they're
separated by commas:
BEGIN {
FS = ","
}
This uses the special variable FS
,
the Field Separator, which controls how AWK splits the input
into fields.
We can now just iterate over the input fields (which are available
in the special variables $1
, $2
, etc), and build our output
string in the array out
:
{
for (i = 1; i <= NF; i ++) {
out [i] = out [i] "," $i
}
}
All what is left is to print out the output string, without the trailing comma:
END {
for (i = 1; i <= length (out); i ++) {
print substr (out [i], 2)
}
}
Find the full program on GitHub.
Just like AWK, Bash will split the input into records when reading
data using read
. We're telling to split fields on commas by
assigning to the variable IFS
, the Input Field Separator:
IFS=,
Commas are not special to Bash, so the don't have to be escaped.
We're now ready to read in the input. We read each line of input into
the array chunks
— Bash has neatly split the input into the right
fields for us. We then iterate over the array chunks
, adding each
field to the appropriate output line, which are stored in the array
out
. Concatenation in Bash is done by just sticking the things you
want to concatenate together:
IFS=","
while read -a chunks
do for ((i = 0; i < ${#chunks[@]}; i ++))
do out[$i]=${out[$i]}${chunks[$i]},
done
done
We can now print the output string, without the trailing commas:
IFS=""
for ((i = 0; i < ${#out[@]}; i ++))
do echo ${out[$i]:0:-1}
done
Find the full program on GitHub.
In Lua, we follow the same strategy we used for most languages. We use the following to read in the data, split the input lines into fields, and to construct the output lines:
local output = {}
for line in io . lines () do
local i = 1
for field in line : gmatch ("[^,\n]+")
do if output [i] == nil
then output [i] = ""
end
output [i] = output [i] .. field .. ","
i = i + 1
end
end
Things to note are that Lua does not have a split
method, instead
we use gmatch
, a global match. And while Lua autovivifies array
elements, we cannot concatenate to undefined (nil
) values, so we
have to handle that case. Lua does not have a post decrement operator (++
)
either.
Print the output without their trailing commas is easy:
for _, line in ipairs (output)
do print (line : sub (1, -2))
end
Find the full program on GitHub.
Our solution in Node.js follows a similar strategy. Just like in Lua,
we cannot concatenate strings to undefined values, we have to deal
with that. In our Node.js solution however, we don't add a comma
after we have added a field to an output string; instead, we first
add the comma, then the field. This is purely because in Node.js, we
cannot as easily use substr
to include all, but the last the character.
On the other hand, using substr
to include all but the first character
is easy.
let out = []
require ("fs")
. readFileSync (0) // Read all.
. toString () // Turn it into a string.
. split ("\n") // Split on newlines.
. filter (_ => _ . length) // Filter out empty lines.
. map (_ => _ . split (',')) // Split each line on commas
. forEach (_ => _ .
forEach ((field, index) => { // Create output strings
if (out [index] == null) {
out [index] = ""
}
out [index] += "," + field
}))
out . forEach (_ => console . log (_ . substr (1)))
Find the full program on GitHub.
A slight deviation for our Python solution. Python doesn't autovivify array elements. So, for the first line of input read, we just copy the fields to the output array. This gives two added bonusses: we won't have to deal with concatenating undefined strings, and we don't have a problem with a trailing (or leading) comma. With hindsight, we could have used this strategy for other languages, but Python was one of the last languages we implemented the problem in, and it didn't seem worth it to update the other solutions.
import fileinput
outputs = []
for line in fileinput . input ():
fields = line . rstrip () . split (",")
if (len (outputs)):
for i in range (len (fields)):
outputs [i] = outputs [i] + "," + fields [i]
else:
outputs . extend (fields)
for line in outputs:
print (line)
Find the full program on GitHub.
By now, we know the drill...
output = []
ARGF . each_line do
|_|
_ . strip . split(/,/) . each_with_index do
|_, i|
output [i] = (output [i] || "") + _ + ","
end
end
output . each do
|_|
puts _ [0 .. -2]
end
Find the full program on GitHub.
In C, we will use a different strategy. We won't be building output strings. Instead, we will read the complete input into a string, and position a pointer at the start of each input line. To generate the output, we visit each pointer (in order), and advance the pointer character by character to the next comma, printing each character while advancing.
First, we read the input, counting lines and columns:
int main (void) {
char * text = NULL;
size_t len = 0;
size_t str_len;
if ((str_len = getdelim (&text, &len, '\0', stdin)) == -1) {
perror ("Failed to read stdin");
exit (1);
}
/*
* Count the number of input lines and columns.
*/
size_t nr_of_lines = 0;
size_t nr_of_columns = 0;
char * ptr = text;
while (* ptr) {
if (nr_of_lines == 0) {
if (* ptr == ',' || * ptr == '\n') {
nr_of_columns ++;
}
}
if (* ptr ++ == '\n') {
nr_of_lines ++;
}
}
We're then position the pointers (which themselves are placed in an array). We also turn the newlines in the input into commas, to make life easier later on:
char ** outputs;
if ((outputs = (char **) malloc (nr_of_lines * sizeof (char *)))
== NULL) {
perror ("Malloc failed");
exit (1);
}
ptr = text;
size_t c = 0;
outputs [0] = ptr;
while (* ptr) {
if (* ptr == '\n') {
* ptr = ',';
if (* (ptr + 1) != '\0') {
outputs [++ c] = ptr + 1;
}
}
ptr ++;
}
Now we can print:
for (size_t i = 0; i < nr_of_columns; i ++) {
for (size_t j = 0; j < nr_of_lines; j ++) {
if (j) {
printf (",");
}
while (* outputs [j] != ',') {
printf ("%c", * outputs [j] ++);
}
outputs [j] ++;
}
printf ("\n");
}
Find the full program on GitHub.