Integrating Apache Spark and External Data Sources Using Hadoop Interfaces
Abstract
Requirements of data-processing are undergoing a profound transition with the dramatic increase of various application data. Along with this demands, sorts of storage systems with highly scalable are developed for large-scale data sources, and Apache Spark for immense amounts of data calculation has also captured attentions and excitements of the industry since release for its excellent performance. Technologies of combining Spark and external data sources have lots of potential to produce valuable data analysis platforms for solving a wide range of data-handling needs, accordingly it's very easy to shift from Hadoop to Spark application development since the industry pour a lot in these datastore system for Hadoop in the past and Spark can work well with Hadoop-supported systems. Paper delves into integration mechanism of spark and external data sources, such as NoSQL and relational databases (RDBs) and proposes a reference for tight and efficient integration of the two using Hadoop interfaces.
Keywords
Integration, Apache Spark, Data sources, Hadoop interfaces
Publication Date
DOI
10.12783/dtetr/ssme-ist2016/3990
10.12783/dtetr/ssme-ist2016/3990
Refbacks
- There are currently no refbacks.