You are given a given a string caption for a video.
Write a script to generate tag for the given string caption in three steps as mentioned below:
- Format as camelCase Starting with a lower-case letter and capitalising the first letter of each subsequent word. Merge all words in the caption into a single string starting with a #.
- Sanitise the String Strip out all characters that are not English letters (a-z or A-Z).
- Enforce Length If the resulting string exceeds 100 characters, truncate it so it is exactly 100 characters long.
Example 1
Input: $caption = "Cooking with 5 ingredients!"
Output: "#cookingWithIngredients"
Example 2
Input: $caption = "the-last-of-the-mohicans"
Output: "#thelastofthemohicans"
Example 3
Input: $caption = " extra spaces here"
Output: "#extraSpacesHere"
Example 4
Input: $caption = "iPhone 15 Pro Max Review"
Output: "#iphoneProMaxReview"
Example 5
Input: $caption = "Ultimate 24-Hour Challenge: Living in a Smart Home controlled entirely by Artificial Intelligence and Voice Commands in the year 2026!"
Output: "#ultimateHourChallengeLivingInASmartHomeControlledEntirelyByArtificialIntelligenceAndVoiceCommandsIn"
The examples baffled me initially. In particular, the-last-of-the-mohicans
doesn't result in any capital letters (so, - isn't seen as a word
boundary), yet 24-Hour results in the H being capitalize, suggesting
that - does break words.
In other to make sense of the examples, we don't follow the instructions as given, but use a different order:
For our Perl solution, we do all the modifications using regular expression.
And we chain them, using the /r modifier.
We will do the following steps, in order:
s/[^\s\pL]+//gr)s/^\s+//r)s/\s+$//r)s/(.*)/\L$1/r)s/\s+(\pL)/\u$1/gr)# (s/^/#/r)s/.{100}\K.*//r).This leads to the following program, where $_ contains the line
we're processing:
say s/[^\s\pL]+//gr =~ # Remove non-letters, but keep the spaces
s/^\s+//r =~ # Remove leading spaces
s/\s+$//r =~ # Remove trailing spaces
s/(.*)/\L$1/r =~ # All lower case
s/\s+(\pL)/\u$1/gr =~ # Capitalize first letter of each word,
# and delete the space proceeding it
s/^/#/r =~ # Add leading '#'
s/.{100}\K.*//r # Remove all characters after the 100th.
Find the full program on GitHub.
Our solutions in most other languages follow the same steps as our Perl
solution, but many don't have an equivalent to s/\s+(\pL)/\u$1/gr.
Tcl does have a totitle function, which is similar to Perl's ucfirst.
This leads to the following program, where $input contains the input:
# * Remove non-letters, but keep space
# * Remove leading white space
# * Remove trailing white space
#
regsub -all {[^[:alpha:]\s]+} $input {} input
regsub {^\s+} $input {} input
regsub {\s+$} $input {} input
#
# Lower case entire string
#
set input [string tolower $input]
#
# * For each sequence of letters (a word), upper case its first letter.
# This upper cases the first letter of the string, hence
# * Lower case the first letter
# * Remove all white space
#
regsub -all -command {[[:alpha:]]+} $input {string totitle} input
regsub -command {^[[:alpha:]]} $input {string tolower} input
regsub -all {\s+} $input {} input
#
# Add leading '#', and print at most 100 characters
#
puts [string range "#${input}" 0 99]
The -command switch means the substitution part is seen as a piece
of code, and executed (like the /e modifier in Perl), but it implicitly
passes in the matched part.
Find the full program on GitHub.
C doesn't come with many functionality to manipulate strings. So, we will iterate over the string, skipping over the non-letters, and keeping track when we saw space.
Given the input in line, we use the following program:
# include <stdlib.h>
# include <stdio.h>
# include <stdbool.h>
# include <ctype.h>
char * ptr = line;
bool saw_space = false;
short chars = 0;
/*
* Skip leading spaces, and non letters
*/
while (* ptr && !isalpha (* ptr)) {ptr ++;}
/*
* Print the leading #, and first character
*/
printf ("#%c", tolower (* ptr ++));
chars += 2;
while (* ptr && chars < MAX_CHARS) {
char ch = * ptr ++;
if (isspace (ch)) {
/*
* If we saw a space, skip it, but remember we did see it
*/
saw_space = true;
continue;
}
if (!isalpha (ch)) {
/*
* If it's not a letter, skip it. Don't modify the
* saw_space status
*/
continue;
}
/*
* We now have a letter. If we saw a space, print its
* upper case, else print its lower case.
* The saw_space status will be turned off
*/
printf ("%c", saw_space ? toupper (ch) : tolower (ch));
saw_space = false;
chars ++;
}
printf ("\n");
Find the full program on GitHub.
We also have solutions in AWK, Bash, Go, Lua, Node.js, Python, R, Ruby and sed, all using more or less the same steps as our Perl and Tcl solutions.