February 13, 2009

Killer Text Parsing using Javascript

This Codeocolate (Code + Chocolate = Codeocolate ), presents a killer text parsing javascript. Our killer javascript does the following 7 tasks while parsing the text:

  1. Grab the text to parse
  2. Converts everything to lowercase
  3. Remove numbers
  4. Remove special characters
  5. Sorts words alphabetically
  6. Removes duplicates
  7. Prints sorted word list with a single word on each line

Let's consider each of the 7 task one by one:

Grabbing the text to parse: 

var textToParse = new String(form.textToParse.value);


form.textToParse.value is the text contained in HTML textarea control, that needs to be parsed.

new String(form.textToParse.value) creates a string using form.textToParse.value and assigns it to a variable textToParse.

Converting everything to lowercase: 

textToParse = textToParse.toLowerCase();


textToParse.toLowerCase() returns a string with all lowercase characters.

Removing numbers and special characters: 

var regex = /[`~!@#$%^&*()_+={}|:";'<>,.?0-9]+/g;

textToParse = textToParse.replace(regex, "");


First, a regex representing all the special characters and numbers is defined. Then, all the special characters and numbers are removed from the string.

Splitting the string into of individual words

regex = /\s+/;

var arr = textToParse.split(regex);


First, a regex representing one or more space is defined. Then, using regex we split the string into an array of words and store inside arr.

Removing duplicates

for ( i = 0 ; i < arr.length-1 ; i++ )
    for ( j = i + 1 ; j < arr.length ; j++ )
        if ( arr[i] == arr[j] )
            arr[j] = '';

Converting array back to string

var killerOutput = new String();
killerOutput = arr.toString();

Putting each word on its own line

var regex = /[,]+/g;
killerOutput = killerOutput.replace(regex,"\n");


Using regex we insert a newline character ('\n') after each word.

Printing the Killer Output! 

form.killerOutput.value = killerOutput;


Setting the text of form.killerOutput HTML textarea to killerOutput string.

The source for the above Codeocolate can be downloaded from here.

Hope, you found this killer Codeocolate useful. Cheers.

The Amazing 1234567890 Moment

Time is among those things in the universe that never stops. On Friday, Feb 13 at exactly 3:31:30 PM Pacific Standard Time the Unix Time will equal 1234567890.

For Indians, this amazing moment will  come on an amazing day, The Valentines Day i.e. Feb 14, at 5:01:30 AM Indian Standard Time.

The moment will work like an additive in the love enriched environment of Valentines Day, with the Unix lovers and fans joining too in the celebration.

By the way, for folks who do not know what Unix Time is, it is the number of seconds elapsed since midnight Coordinated Universal Time (UTC) of January 1, 1970.