Laserfiche's Hadoop connector gets files from or sends files to data directories on the Hadoop Distributed File System (HDFS) servers to which the system has access.
HDFS is the primary distributed storage system used by Hadoop applications. The Hadoop connector:
- Is built on top of the Apache Hadoop version 2.2.0 API library.
- Works with remote Hadoop cluster resources, version 2.2.0 and higher.
- Works with Cloudera CDH, combining Apache Hadoop with other open source projects.
- Interacts with remote Hadoop clusters using Hadoop API libraries. For information about configuring the native IO libraries, see the linked topics.
- Does not open a connection to a remote Hadoop cluster name. In addition, the connector does not listen to or accept connections to Hadoop cluster nodes.
Hadoop MapReduce is a technique for working on large sets of data by spreading multiple copies of the data across different machines that work in parallel on small pieces. A Hadoop JobTracker keeps track of job runs, schedules individual maps, monitors individual tasks, and works to complete the entire batch. Typically, the MapReduce framework and the HDFS run on the same set of nodes.