POC - Big Data Ecosystem

POC around Big Data Ecosystem including Hadoop & Spark, NOSQL DB: Cassandra, REST APIs and react.js at Front End.,

TECHNOLOGY

6/19/20191 min read

Article at Medium.

A Polyglot architecture comprising: Hadoop HDFS, a storage for XML files (uploaded either by a REST API or UI through Spring-Boot based web-app) to be processed by Spark (in-memory data processing engine) which then calls REST API (exposed by flask, a python based micro web framework web-app) to push processed information into Cassandra (NOSQL database); same flask web-app exposes GET end-point to fetch the processed information from the same Cassandra instance to be rendered by react.js front-end app.

Authentication is also added via SAML.