five things you need to know about hadoop v apache spark : Fri, 11 Dec 2015 12:01:36 -0800 - Listen in on any conversation about big data, and you'll probably hear mention of Hadoop or Apache Spark. Here's a brief look at what they do and how they compare. 1: They do different things. Hadoop and Apache Spark are both big-data frameworks, but they don't really serve the same purposes. Hadoop is essentially a distributed data infrastructure: It distributes massive data collections across multiple nodes within a cluster of commodity servers, which means you don't need to buy and maintain expensive custom hardware. It also indexes and keeps track of that data, enabling big-data processing and analytics far more effectively than was possible previously. Spark, on the other hand, is a data-processing tool that operates on those distributed data collections; it doesn't do distributed storage. To read this article in full or to leave a comment, please click here
Readmore …