You are given 7-characters alphanumeric SEDOL.
Write a script to validate the given SEDOL. Print 1 if it is a valid SEDOL otherwise 0.
For more information about SEDOL, please checkout the wikipedia page.
Input: $SEDOL = '2936921'
Output: 1
Input: $SEDOL = '1234567'
Output: 0
Input: $SEDOL = 'B0YBKL9'
Output: 1
The SEDOL Wikipedia page explains SEDOL codes consists of
six alpha numerical characters, followed by a check digit. Although not
explicitly mentioned, all the examples, including the example code,
assume the letters are capitals. Vowels are not used. (Vowels in this
context means the letters A
, E
, I
, O
, and U
; Y
is not a vowel
here.)
To calculate the checksum, we first have to map each character to a value.
Digits map to themselves; letters map consecutively, with A
mapping to 10,
up to Z
mapping to 36. For mapping purposes, we do not skip the vowels.
We then multiply values we got after mapping the characters by different
weights, depending on their position:
A SEDOL code is valid if the sum of the multiplied values is evenly divisible by 10.
Note that since we are only interested in having a check sum which
is divisible by 10, everything will also work out if we map A
to Z
to 0 to 25. We will do this for most solutions.
Each solution will read SEDOL codes from standard input, one code per
line, outputting a 0
or 1
for each code, depending whether the
code is valid or not.
We first start off with creating a hash which we can use to map characters to values:
my %c = do {my $c = 0; map {$_ => $c ++} 0 .. 9, 'A' .. 'Z'};
Given that, the rest is just a (long) one-liner:
say /^((?[[0-9A-Z]-[AEIOU]]))@{["((?1))" x 5]}([0-9])$/x &&
($c {$1} + 3 * $c {$2} + $c {$3} + 7 * $c {$4} + 3 * $c {$5} + 9 * $c {$6} +
$c {$7}) % 10 == 0 || 0 while <>;
Let's have a closer look at the regexp. It starts off with:
/^ # Match at the beginning of the string
( # Start of capture group 1
(?[ # Perform character class arithmetic
[0-9A-Z] # Character class matching a digit, or an uppercase letter
- # ... except ...
[AEIOU] # vowels
]) # End of character class arithmetic
) # End of capture group 1
So this matches the first character of a SEDOL code, and, if the entire
pattern matches, it makes the first character available in $1
.
The next part of the pattern is:
@{["((?1))" x 5]}
What is going on here? First, we have the
baby cart
operator, @{[ ]}
. It evaluates the expression inside the brackets (in
list context), and interpolates the result in the string
(or regexp in this case).
Inside the brackets, we have:
"((?1))" x 5
This is the repetition operator (x
).
It takes the string on the left of it, and repeats that as many times as
the value on its right. So, this results in fives times ((?1))
.
Since this is interpolated in the regexp. Now, what does ((?1))
mean
in a regexp? The outer parenthesis are capturing parethesis. So this
is capturing $2
, $3
, $4
, $5
, and $6
.
But what is (?1)
? This is a recursive subpattern.
It means to repeat the same pattern as in the first capture group.
So, all this means we're capturing the next five characters, as long as they
are valid SEDOL characters (else, the entire match will fail).
The final part of the regexp is simple:
([0-9])$
It matches a digit, followed by the end of the string. The matched digit
is going to be placed in $7
.
The regular expression matches if and only if it is matched against a
syntactical valid SEDOL (that is, exactly seven characters, all of the
allowed alpha numerics). And if it matches, each of its seven characters
are available in $1
to $7
.
We can now easily calculate the checksum:
$c {$1} + 3 * $c {$2} + $c {$3} + 7 * $c {$4} +
3 * $c {$5} + 9 * $c {$6} + $c {$7}
which we then check if it is divisible by 10
: % 10 == 0
.
Find the full program on GitHub.
First we create an array w
with weights, and an array c
which maps
characters to values:
BEGIN {
ord_0 = 48
ord_9 = 57
ord_A = 65
ord_Z = 90
split ("1 3 1 7 3 9 1", w, " ")
for (i = ord_0; i <= ord_9; i ++) {
c [sprintf ("%c", i)] = i - ord_0
}
for (i = ord_A; i <= ord_Z; i ++) {
c [sprintf ("%c", i)] = i - ord_A
}
}
Then we do some validation: each character should be a digit, or an upper case letters excluding vowels, the string should be exactly seven characters long, and the string should end with a digit:
/[^0-9BCDFGHJKLMNPQRSTVWXYZ]/ {print 0; next}
length ($0) != 7 {print 0; next}
/[^0-9]$/ {print 0; next}
Now we calculate the check sum, and verify whether it is divisible by 10:
{
check = 0
for (i = 1; i <= 7; i ++) {
check += w [i] * c [substr ($0, i, 1)]
}
print (check % 10 == 0 ? 1 : 0)
}
Find the full program on GitHub.
First, we create an array with weights, and some constants:
w=(1 3 1 7 3 9 1)
printf -v ord_0 %d "'0"
printf -v ord_A %d "'A"
With the input line in line
, we can first so some basic validation:
if ((${#line} != 7))
then echo 0
elif [[ $line =~ [^0-9BCDFGHJKLMNPQRSTVWXYZ] ]]
then echo 0
elif [[ $line =~ [^0-9]$ ]]
then echo 0
else
If this all validates, we can calculate the check sum, and validate it:
((check = 0))
for ((i = 0; i < 7; i ++))
do char=${line:$i:1}
printf -v ord %d "'$char"
if [[ $char =~ [0-9] ]]
then ((value = ord - ord_0))
else ((value = ord - ord_A))
fi
((check += ${w[i]} * value))
done
if ((check % 10 == 0))
then echo 1
else echo 0
fi
Find the full program on GitHub.
An array of weights:
short w [] = {1, 3, 1, 7, 3, 9, 1};
We won't be using regular expressions in our C solution; instead we scan over the string checking each character.
Here, we have the input line in the variable line
, and its length
in a variable str_len
. The line includes a newline.
char * line_ptr = line;
short valid = 1;
if (str_len == 8) {
int check = 0;
for (size_t i = 0; i < 7 && valid; i ++) {
char first = 0;
if ('0' <= line_ptr [i] && line_ptr [i] <= '9') {
first = '0';
}
else {
if ('B' <= line_ptr [i] && line_ptr [i] <= 'Z' &&
line_ptr [i] != 'E' &&
line_ptr [i] != 'I' &&
line_ptr [i] != 'O' &&
line_ptr [i] != 'U' &&
i < 6) {
first = 'A';
}
else {
valid = 0;
}
}
check += (line_ptr [i] - first) * w [i];
}
if (check % 10 != 0) {
valid = 0;
}
}
else {
valid = 0;
}
printf ("%d\n", valid);
Find the full program on GitHub.
An array of weights:
var w = [] rune {1, 3, 1, 7, 3, 9, 1}
In Go a rune
is a type for (Unicode) character values.
We pick up validation when text
contains the line of input, trimmed
of its newline. First, we check for syntax:
match, _ := regexp . MatchString (
"^[0-9BCDFGHJKLMNPQRSTVWXYZ]{6}[0-9]$", text)
This validates if text
contains 6 alpha-nums (without vowels), followed
by a digit; else we fail and continue:
if !match {
fmt . Print ("0\n")
continue
}
Calculating and validating the check sum:
var check rune = 0
for i, rune := range text {
if rune <= '9' {
rune -= '0'
} else {
rune -= 'A'
}
check += w [i] * rune
}
if check % 10 == 0 {
fmt . Print ("1\n")
} else {
fmt . Print ("0\n")
}
Find the full program on GitHub.
For our scheme solution, we start off with a function which takes a
byte
(in effect, an ASCII value), and which returns the value it
is mapped to:
(define ords (bytevector->u8-list (string->bytevector "09A" "UTF-8")))
(define ord-0 (list-ref ords 0))
(define ord-9 (list-ref ords 1))
(define ord-A (list-ref ords 2))
(define (byte->val b)
(if (<= b ord-9)
(- b ord-0)
(- b ord-A)))
We start off with the three character string 09A
, which we then turn
into a bytevector in UTF-8
encoding. This is a superset of ASCII,
and the three characters in the string are all ASCII characters.
string->bytevector
turns the string into a bytevector, while
bytevector->u8-list
takes this bytevector and turns it into a
list. This list is placed in the variable ords
.
Next is setting three constants, ord-0
, ord-9
, and ord-A
; these
will hold the ASCII values for 0
, 9
, and A
. list-ref
is a function
which takes two arguments, a list and an index. It will return the element
on said index.
Finally, we define a function byte->val
which takes a byte as argument (b
),
and returns the mapping value. If the byte is the value of a digit
(<= b ord-9)
, we subtract the value of the digit 0
: (- b ord-0)
,
else we subtract the value of the letter A
: (- b ord-A)
.
Defining an array of weights is easy:
(define w (list 1 3 1 7 3 9 1))
Next, we need a bunch of variables:
(define pat "^[0-9BCDFGHJKLMNPQRSTVWXYZ]{6}[0-9]$") ;; Pattern to match with
(define is-sedol) ;; Match object
(define valid 0) ;; Is the SEDOL valid?
(define check 0) ;; Checksum
(define values) ;; mapped values of SEDOL
(define sedol (read-line)) ;; Line of input
We start with matching the input against the pattern. If it fails, it can never be a valid SEDOL:
(set! is-sedol (string-match pat sedol))
If the match succeeds, we will take the characters, map them to their
ASCII values, and then, using the byte->val
function above, calculate
their mapped values:
(if (regexp-match? is-sedol)
(begin
(set! values
(map-in-order byte->val
(bytevector->u8-list
(string->bytevector
(match:substring is-sedol 0) "UTF-8"))))
match:substring
returns a capture of the match, with the zeroth capture
being the entire match. (This is how we get rid of the newline which is
in sedol
). string->bytevector
and bytevector->u8-list
gives us
a list of ASCII values (similar to what is explained above).
map-in-order
takes two arguments, a function and a list: it applies
the function to each element of the list. Finally, the set!
assigns
this to values
.
We can now calculate the checksum without an explicit loop:
(set! check
(fold (lambda (weight val sum) (+ sum (* weight val))) 0 w values)
)
lambda
creates an anonymous function. Here we have a function taking
three arguments. It will multiply the first two, and add them to the third.
fold
takes three or more arguments: a function, an initial value,
and a list of lists. It will repeatedly call the function, with the next
elements of the lists as parameters. The last parameter is the return
value of the previous invocation of the function (and the initial value
the first time it is called). The return value of the last invocation
if the final value.
Now that we have the checksum, we can test whether it is divisible by 10
.
If it does, the input is a valid SEDOL:
(if (= 0 (modulo check 10)) (set! valid 1))
Printing the result is easy:
(display valid)(newline)
Find the full program on GitHub.
We also have solutions in other languages, solutions which are very similar to one of the other solutions: Java, Lua, Node.js, Python, Ruby, and Tcl.