Saturday, December 15, 2012

QIZMT

I have been working mostly on windows for the past 2.5 years and I badly wanted to get hands on using some MapReduce framework implementation like Hadoop. For quite sometime, I was thinking to install and configure Hadoop on my windows machine. The introductory tutorials about Hadoop on Windows were really daunting as every tutorial started with a  caution about using Hadoop on windows as Hadoop is not officially supported on Windows. I badly wanted to play with some parallel computing framework. Though I know about few of them, I was certainly uncertain of which one could be really handy to try out few freaky things. Then I came across MySpace's Qizmt (Kiz-Mit). Qizmt is an implementation of the MapReduce framework from MySpace. It is free and open. Qizmt is licensed through GNU GPL V3. I tried installing Qizmt and I got everything ready in matter of few minutes. I ran my first example in 10 mins or so.

Overall, the getting started experience was so easy and smooth. I simply love these kind of toolsets and frameworks that make your life easy rather than chasing some installation or configuration issues. I ran a simple word count program and everything went fine. Qizmt has a very nice feature. (I am not sure if other frameworks readily support it). You can create sample data on the fly and check the correctness of your map and reduce code instantly. And all the job definitions are serialized as xml data (even the map and reduce module code). As of now, I am not sure when that code is compiled. The toolset also comes with a decent debugger. It supports viewing call stack, local variables and debug output. Immediate window and thread window have been really good. All I tried was a single machine set up and I am yet to do the cluster setup.

Tutorial related to QIZMT can be found here : http://code.google.com/p/qizmt/




Monday, July 30, 2012

The BD


I have always wondered about how internet companies like facebook, google, yahoo etc manage the enormous amount of data that they collect from various parts of their services. Before that let us consider the amount of data that Google processes everyday. Allegedly, the internet giant processes few petabytes of new data that is collected. Collected data may include request logs, link logs, crawled web documents etc. It is imperative to understand that the valuable information apparent in this data is huge and can be used to take some influential decisions. For example, people may need to calculate page rank or inverted indices for documents and websites from the mined data. There would be lot of test automation that relies upon the mined data as well. Most of the operations seem more straight forward. But for God's sake, they are not so facile when the data is really huge(when I say 'really huge' I mean really really huge). For example, consider the task of identifying spam links that has a page rank greater than 0.45 from a data source which is around 5 Petabytes in size. It is very readily appealing to think in terms of distributed primitives to solve this problem. But I swear when you start writing a distributed systems to accomplish this tedious task, you will find yourself in a very bad soup debugging multiple processes. Some processes may hang because of some other processes in the cluster which it may not know itself. Developers or analysts generally lose track of their original problem and start running behind crazy problems like the one mentioned above. So is there a way to get out of these messy things and deal with clean interfaces and simple primitives? Some months ago, I started hearing more about the MapReduce framework from Google. I started reading about it that time but had no good time to write my thoughts about it. The MapReduce paradigm or the framework can potentially can bring the developers and analysts out of this soup introduced by tedious implementation challenges. Hadoop is an implementation of the MapReduce framework that can run on commodity hardware and solve problems using the MapReduce paradigm. All that the MapReduce framework understands is the Map primitive and the Reduce primitive. You can think of map and reduce as functions that do some computing to solve the problem. Let us take a simple problem and try to address it using the MapReduce framework. Let us assume that your document crawler has crawled over all shared documents in your network share and created a single file which is like 1 Tb is size. Now you have to give the word count of the words inside this big file. For example if the contents of the file is something like: "This is an example example statement", the output should be something like: This - 1 is - 1 an - 1 example - 2 statement -1. please note the simplicity of the problem when we are dealing with small data. You just need to write a simple function in any language that is not more than 20 lines to accomplish this. But this is not what we are interested in. We want to solve the same problem where the file is really huge. Let us logically break the problem and get into the MapReduce paradigm. I will state things only at a higher level so that people can get the big picture. Hadoop comes with a file system called the HFS (Hadoop File System). You can put in your large file in HFS and ask Hadoop to split it into chunks. HFS takes care of replicating the copies of the split chunks for fault tolerance. We just don't need to worry about things like : "What will happen if the file was deleted by some random adversary process?", "Will my file be available for computing all the time?" etc. Now we have logically split the big file into small ones. Here is the 'how it works' : If you can compute the count of each word at each node of the cluster that has the data then we can combine them and then find the universal data. There are 3 main phases in MapReduce frameowrk : Map, Combine and Reduce. We will see how the Map and Reduce primitive look like: map(String key, String value): for each word w in value: EmitIntermediate(w, “1”); reduce(String key, Iterator values): int result = 0; for each v in values: result += v.ToInteger(); Emit(AsString(result)) When you see, the map function actually will run and give you the word count for each word that it has seen and the reduce actually sums it up and 'Emits' the answer. It would be bit difficult to understand how to write the Map and Reduce, how to configure the Hadoop cluster and things of that nature. In my next post we will discuss about those things in detail. I believe that this post gave a good idea on how we can use the MapReduce framework to process BigData. Stay tuned! Lot more to come in this month ;)

Tuesday, May 17, 2011

From chennai

Chennai.
I am back to chennai after a long time(5 months) to see my parents. I could see lots of changes in chennai- new bridges everywhere , new roads and new buses. Wow that s great. Even people have changed a lot - my brother has started studying properly :P, my mom is not so bothered about me coming home late in the night and myself & my father have started talking like never before(Wow now thats getting real interesting-This is something my friends will be shocked to see :) yes but true). Lots of things have changed except the summer :(. It is too hot here in chennai and I pity all those people who ride 2 wheelers in the scorching sun. I have been in hyderabad for most of this summer and hyderabad is way better compared to chennai.
I have been relaxing nicely for the last 2-3 days like never before- hardly getting out of the room with laptop in my lap. Though I read my office mails which make me think that I should go back to office soon !.
Had a nice time with my friends in the last 2-3 days. Happy to make a note here that one of my close friends has got an admit from IIM-K , 2 others are joining better firms and others seriously preparing for some exam or the other(Still a lot more to come). Thats good !!
What am I upto?? - I DON'T KNOW :) I don't know what I am bound for. Lets see. I am least worried about this because I have a beautiful bunch of friends who will help me to get what I really want.
To be continued...

Tuesday, January 25, 2011

My adventures with comp sci...!

Being a very tiring fortnight after facing some unhappy happenings, i made up my mind to write something about my engineering experiences in college since I believe that would rejuvenate me :) . So what did not go well in the last 10 days? - One, I lost my new windows 7 phone while travelling in an auto. And that is the only reason :) . You might ask losing a phone is a very normal happening but that phone was special to me. I started developing applications for that phone and it went away from me in no time :( . Okay anyways you cannot hold anything unless it is written for you.

So today i wanted to share my thoughts about how a student in school like me finds how amazing computer science can be in some good college like CIT(Notice that i am not talking about very good colleges like IITs or very few NITs).This applies only for colleges like CIT or some other nice colleges that give unimaginable freedom to its students. To start with , my school was a place where you cannot apply your thought. Yes, my school was one of the most popular schools in chennai but I hated that place like anything. It never gave me a chance to explore or innovate. I never dreamt of studying in a college like CIT(wrt the freedom :) ). In a place like CIT where you are expected nothing, it is very easy to choose your way of nurturing your skills. CIT is one of the best places in my life that I have dwelt. You can become CEO of a very big MNC or a politician or even a third grade criminal or an useless goose studying in CIT. Its all in your hands. I was very new to computer science when i joined CIT and I am new still :) . I had no idea of what computers were doing. But my instinct always told me that I will be computer science guy.
The first semester had only common subjects for all branches of study and I dint face any problems then. I was quite happy with my performance. The second semester started and I had my first department specific paper "C programming theory and practice". Ms.Devi was the one who handled that subject. She is very brilliant(lol) that she used to write "Please Enter the number" inside a scanf statement and ask us to take notes of the same. I should not blame her , at that point I didnt even know that :) . Soon the classes started moving fast and I was not able to write a single program in classes, Not even a program that would print Hello world. It took me nearly 1 week to successfully compile my fisrt program. Ms.Devi noted that I didnt know anything in the subject and stated asking me questions on a daily basis. Still I remember the question she used to ask me -"What is function definition and what is function declaration?".Many guys used to finish programs very easily while I was searching for a person who will give me a USB stick with the mere 40 lines of C code loaded. Eventually I scored 14 upon 50 in my first lab exam. I was very sure that I would re-write that paper in october :) . At that time there were some boasters in my class. They would talk as if they were the ones who mentored Dennis to design the language. I was intimidated by them like anything. I tell you one thing, it does not matter how fast you learn a programming language. What matters is only your love for a language. I loved C but I didn't learn it fast. Finally I somehow managed to pass that lab.

Second year started with DSA.The subject had a good head start. I had a flair for that subject as the subject was clean and it was do more with pure math rather than fancy programs(that would not have even a simple application :P ).I still remember that second year where in I used to get dreams in which I would design a finite state automata which would behave like any machine recognizing what is actually expected(lol)!!. I started writing some small programs in C and C++ and started parallely to work with algortihms.My seniors and orkut have played a very crucial role in my engineering career. There are many people in my life who have inspired and influenced me completely in various stages of my life. To mention a few, Dr.Prabhakar,Dr.Naveen,R.K(my brother's friend),Selva,Adith,Abhinand,Felix and the list goes on. For me and most of the comp sci students, learning algorithms can never be compensated by any other pleasure(be it even having a lakh rupee in hand and 10 friends sourrounding you). Once you start writing good, efficient algorithms your coding skills will shoot up like anything in a few days. You will start writing some solid code.While others were trying to complete some assignments , I would bunk the classes and participate in some orkut discussions in forums. I swear that I have never written a single assignment on my own except for some coding assignments in OS and Algorithms. Even people started asking me doubts and I was able to tell them somthing or the other(But ofcourse not the exact solution :P ).Since then I decided that i should do my masters and doctorate in something pertaining to algorithms. For me it has been a life time ambition. Lets see how it works :) .

Once you get the glimpse of writing efficient code , comp sci is your home and indeed your world..!!. You will start loving each and every aspect of computer science and people who are passionate about writing code. You will relish it and live it like I and many others do :).
Thats all I had for this post. Soon will be back with some interesting stuff about one particular topic that one of my close friends has requested me to write about...!!

Sunday, January 9, 2011

Microsoft interview questions- Jan 2011

Interview 1
1.Code for string reversal and test the code
2.Test cases for a pen

Interview 2
1.Given a linked list, we need to write a function that reverses the nodes of a linked list ‘k’ at a time and returns modified linked list.
For Example
Linked List : 1->2->3->4->5->6->7->8->9->10->11
For k = 3
Return value: 3->2->1->6->5->4->9->8->7->10->1
2.Test cases for a vending machine

Interview 3
1.How will u load test a web page
2.Given a function bool func(int x,int y,int piece) where x,y is the destination of a piece on a chess board and the piece denotes the chess piece type. write test cases and a code to test the function.
3.Write a function which returns a string after removing the duplicate characters from a string.

Interview 4
1.Write a function to take a string as an input and give the first recurring substring.You have to ask various questions and find out what he wants exactly.Sometimes he might ask u to code for two different results he expects.
3.Write test cases and categorise them.
2.Some HR questions

Interview 5
1.Some more HR questions.
2.One puzzle- You are given two magnesium strips which don burn uniformly. Each takes 1 hour to burn completely. You have to measure 45 minutes with these strips. u r also provided with a lighter.

5 years back on the same day..!!!

Today is a remarkable day in my life. The reason being - I was thrown out from my math tuition center by my Tutor Mr.Anna (exactly on the same day 5 years back) for not being regular to the classes. We had a different timing for our classes. The class used to start at 8 in the night and end at around 10. I usually come from my school in Anna nagar at around 6.30 pm because of the so called special classes where we used to do nothing special. The classes were meant to be rigorous since we were the first set to face the new syllabus framed then. I had a bad habit of sleeping for some time after coming back from school.But i used to set my alarms, get up and pedal it to my center. I would then come back home when almost everyone has slept off in the entire two storied apartment. The next day when i was thrown out of my center Mr.Anna asked me to bring my mother to the center if i again had to resume with the classes. So my mother and I went to the center and Mr.Anna took me in to the classes after 20 mins of advice. Mr.Anna used to be the unbeaten pro in math in the whole region. I had a lot of respect for him and so I did not want to discontinue from the classes. Even then he was not happy with my performance because earlier in the year I used to crack lots of full scores(and some incidents like explaining skew lines in analytical geometry to others) and now my performance was degrading. I knew that all those were real silly mistakes in my integrations and calculcations.
And finally my board exams started. I had no fear except for one thing.Even now I am shivering (:P) to tell that word..BIOLOGY!!. Yes I was in biology stream in school and had no idea of computers. So don't ask if I were a pro in Biology. I didn't like the subject and always had no ideas with those hi fi Binomial Nomenclatures. I have hardly obtained more than 80 percent in that subject in my school life. I had a companion called Vasanth who used to be like me in that regard. It was my destiny to go with computers. Somehow we managed to study something in biology and wrote something that came to our minds. Most of our class mates were serious medical aspirants and we were nowhere near them. And I was in a great confusion whether there was entrance exams or not but I dint mind to study anything great. One of Vasanth's relatives were in DPI. So we planned to know our board exam results a day before it was officially announced. Two days before the results were officially announced,I knew my results. I was not able to believe that i have got 93 % in biology. And I also made a full score in Math. But I knew that many in my center would grab a full score. And after two days the results came out. I went to my center and informed Anna that i have got full score in Math. I could hardly find any brighties there. And after some time I came to know that I was the only one in the center who had got full score that year. I was shocked because Anna had produced an amazimg 23 centums in a batch of 50 the previous year. Anna was also shocked to the core. Then ,after some time he came near me and said WELL DONE RAM!..I was not so happy because Anna was not completely happy with the results.
I met Mr.Anna once after that in a get together of that tuition center. Though I was not his favorite student , I still respect him and owe him...!!!

Friday, January 7, 2011

Product development and problem solving

Hi..,
It has been long time since i blogged something. Today is a holiday and i hardly have any work at home. I just have to throw away my used clothes into the machine. So today I wanted to write something about how product development world and problem solving is. I just have 6 months of experience in product development. But I still think I can feed some college graduates(especially who have had no internship experiences in product development- so almost all grads from CIT).

When you are in college (and if you have interest in writing code), you write a lot of code. Many college grads use windows, linux , unix as their predominant platforms of developing and technologies like unmagaged code (c, c++) , java , managed code(.net) .
According to me , a good coder should be a master of logic that is not specific to any platform or technology , should have decent awareness in her favorite technology and should write code that is free of bugs(still even world class programmers are prone to bugs) and solves even corner cases.There are subtle differences between the words CODER and DEVELOPER. There are basically two different aspects that make up a good developer. One(the most important one), is your analytical skills or call it Problem solving skills and the second one being your style,methodology of developing code. A coder should have great analytical skills but not necessarily good style or methodology of developing. But on the other hand a good developer must possess each of the above mentioned. It is indeed your thing to decide how your skill set look like.
For me(like most college grads), when i was in college, I personally loved to be a coder rather than a developer. Now of course I would die to a be good developer rather than a plain coder. The best way to look at it is something like this : Until pre-final year aspire to be a coder and once you get out of the shell better be a developer and not a plain coder. A developer must have good awareness in technology that he/she develops primarily on. For ex., I had no good knowledge in any of the technologies when i was in college except for one or two. And now I am atleast trying to get a good knowledge in technology i work predominantly on(Managed code). When you are in college the code that you write may be used utmost by you or your project mates.But imagine developing a product that is used by half of the globe. Certainly a coder cannot shine in this space unless he/she learns nice methods and standards of developing code.
Some straight forward tips to cultivate good developing practices when you are in college.

1.Write utility classes for almost anything that you can and circulate it to your friends for usage and get their feedback and incorporate any valid suggestions.

2.Save your classes in your system and maintain them as repository since you can use them when you need.(I used to write some classes in Php and post it in a community website for php where they will save it and display it for use for other users).

3.Try using technologies like java , .net , server side scripting etc(That will expose you to commonly used technologies).

4.If you have the habit of developing code in your leisure time then practise new things like how to trace your application,doing error logging for your application etc.

Above of all,
(Notice the bullet number)
0. Improve your problem solving skills by consistently solving problems , practising code on paper or even mind :). Interact with analytically strong people, read technical blogs that talk about problem solving and show great interest,attitude for solving problems.

Thats all I had in mind for this post. Happy coding...!!!