February 13, 2009

Killer Text Parsing using Javascript

This Codeocolate (Code + Chocolate = Codeocolate ), presents a killer text parsing javascript. Our killer javascript does the following 7 tasks while parsing the text:

  1. Grab the text to parse
  2. Converts everything to lowercase
  3. Remove numbers
  4. Remove special characters
  5. Sorts words alphabetically
  6. Removes duplicates
  7. Prints sorted word list with a single word on each line

Let's consider each of the 7 task one by one:

Grabbing the text to parse: 

var textToParse = new String(form.textToParse.value);

Dissection:

form.textToParse.value is the text contained in HTML textarea control, that needs to be parsed.

new String(form.textToParse.value) creates a string using form.textToParse.value and assigns it to a variable textToParse.


Converting everything to lowercase: 

textToParse = textToParse.toLowerCase();

Dissection:

textToParse.toLowerCase() returns a string with all lowercase characters.


Removing numbers and special characters: 

var regex = /[`~!@#$%^&*()_+={}|:";'<>,.?0-9]+/g;

textToParse = textToParse.replace(regex, "");

Dissection:

First, a regex representing all the special characters and numbers is defined. Then, all the special characters and numbers are removed from the string.


Splitting the string into of individual words

regex = /\s+/;

var arr = textToParse.split(regex);

Dissection:

First, a regex representing one or more space is defined. Then, using regex we split the string into an array of words and store inside arr.


Removing duplicates

for ( i = 0 ; i < arr.length-1 ; i++ )
{
    for ( j = i + 1 ; j < arr.length ; j++ )
    {
        if ( arr[i] == arr[j] )
        {
            arr[j] = '';
        }
    }
}

Converting array back to string

var killerOutput = new String();
killerOutput = arr.toString();


Putting each word on its own line

var regex = /[,]+/g;
killerOutput = killerOutput.replace(regex,"\n");


Dissection:

Using regex we insert a newline character ('\n') after each word.


Printing the Killer Output! 

form.killerOutput.value = killerOutput;

Dissection:

Setting the text of form.killerOutput HTML textarea to killerOutput string.


The source for the above Codeocolate can be downloaded from here.

Hope, you found this killer Codeocolate useful. Cheers.

1 comment:

  1. Great post my dear KG(King of Codes), hey dude do visit my blog, my latest post deals with video conversion without software, in this post I explain how to convert video just by using a windows binary code. I am sure it will interest you.

    THE URL is http://technology66.blogspot.com/

    ReplyDelete