SNAP Overview

SNAP is designed to work on Data Lakes. It organizes business data at the detailed level – ad impressions, general ledger, sales, payables, travel and expense etc in the form of SNAP QuBE, a “Queryable Business Entity”,  with rich metadata on your business data.

SNAP QuBEs can run into hundreds of terabytes and still provide lightning fast query response times.

SNAP Qubes are rich lenses into business data,  backed by a multi-dimensional file format optimized for sub-second queries.

OLAP is a powerful technology that has served complex B.I needs for decades. SNAP QuBEs are multi-dimensional like the OLAP cubes with the downside of OLAP pre-aggregation eliminated. Further SNAP QuBEs are in-memory data + indexes and are designed to work with modern distributed computing frameworks.

SNAP is “Apache Spark native”. SNAP is deployed on an Apache Spark cluster.

SNAP and SQL Joins/Star Schema

Traditional datawarehouses have Facts and Lookup(Dimension) tables. A typical analysis involves joining multiple tables for an analysis. Joins are expensive. SNAP’s unique architecture allows analysts to express their SQL as joins, but behind the scenes the joins are eliminated to go against the QuBE.


SNAP Pipeline

SNAP runs on Apache Spark. Data from Hadoop/S3/Datawarehouses are ingested into SNAP. SNAP Qubes are logical datasets on a specific topic and represent data for a set of Fact and Lookup Tables. Periodically data for a set of tables is ingested into SNAP using a data pipeline. SNAP can connect to any B.I Visualization tool using JDBC/ODBC. Once data is ingested into SNAP it is exposed in memory to any front end tool.

As new data comes in, the data pipeline can add that data to SNAP as often as every 5 minutes.

SNAP Architecture

SNAP Optimizer

  • An advanced engine and optimizer to take an incoming query and determine the best and most cost effective execution path on the SNAP data.


  • A logical layer capturing business metadata on dimensions, measures/metrics, hierarchies etc.

SNAP Storage/Fileformat

  • An optimized file format designed for fast data retrieval.