Introduction to Hadoop
I am new to Apache Hadoop, here are brief notes on my Hadoop and related technologies learning. First, I share the key ideas from Hadoop basics. Life without Hadoop? For computing and storage intensive tasks, single processor or even multi-processor computer is not enough, because it would take long to process millions of tasks on Peta Bytes of data. The cheapest way to improve the performance is utilizing tens or hundreds of computer connected to Local Area Network. But to use computers connected to LAN, we have to write program that would transfer data on them and copy the program on all computers too. Then we need to decide what chunk of records should be processed on which computer on network, that means more programming to manage task distribution. We also have to use socket programming or MPI or RMI or EJBs or DCOM type approaches to make other computers to receive and process the data. Its all doable but its low level programming in a sense thats its not the primary task in-hand...