Integrating Apache Spark and External Data Sources Using Hadoop Interfaces

Yong-Liang LI; Shu-Qiang YANG

doi:10.12783/dtetr/ssme-ist2016/3990

Integrating Apache Spark and External Data Sources Using Hadoop Interfaces

Yong-Liang LI, Shu-Qiang YANG

Abstract

Requirements of data-processing are undergoing a profound transition with the dramatic increase of various application data. Along with this demands, sorts of storage systems with highly scalable are developed for large-scale data sources, and Apache Spark for immense amounts of data calculation has also captured attentions and excitements of the industry since release for its excellent performance. Technologies of combining Spark and external data sources have lots of potential to produce valuable data analysis platforms for solving a wide range of data-handling needs, accordingly it's very easy to shift from Hadoop to Spark application development since the industry pour a lot in these datastore system for Hadoop in the past and Spark can work well with Hadoop-supported systems. Paper delves into integration mechanism of spark and external data sources, such as NoSQL and relational databases (RDBs) and proposes a reference for tight and efficient integration of the two using Hadoop interfaces.

Keywords

Integration, Apache Spark, Data sources, Hadoop interfaces

Publication Date

2016-11-30 00:00:00

DOI
10.12783/dtetr/ssme-ist2016/3990

Refbacks

There are currently no refbacks.

Username
Password
Remember me

ENGINEERINGand TECHNOLOGY RESEARCH

Integrating Apache Spark and External Data Sources Using Hadoop Interfaces

Abstract

Keywords

Publication Date

DOI

Refbacks

ENGINEERING
and TECHNOLOGY RESEARCH