SNAP Overview

SNAP is a topical datamart. This means, SNAP is designed for business data at the detailed level – ad impressions, general ledger, sales, payables, travel and expense etc in the form of SNAP Qubes, a logical business process centric view with rich metadata on your business data.

SNAP Qubes can run into hundreds of terabytes and still provide lightning fast query response times.

We call them “Qubes” to differentiate from the old OLAP summary Cubes which are pre-aggregated datasets. Unlike old OLAP, SNAP Qubes are rich logical lenses into business data,  backed by a multi-dimensional file format optimized for sub-second queries.

OLAP is a powerful technology that has served complex B.I needs for decades. SNAP Qubes are multi-dimensional like the OLAP cubes with the downside of OLAP pre-aggregation eliminated. Further SNAP Qubes are in-memory data + indexes and are designed to work with modern distributed computing frameworks.

SNAP is “Apache Spark native”. SNAP is deployed on an Apache Spark cluster.

SNAP and SQL Joins/Star Schema

Traditional datawarehouses have Facts and Lookup(Dimension) tables. A typical analysis involves joining multiple tables for an analysis. Joins are expensive. SNAP’s unique architecture allows analysts to express their SQL as joins, but behind the scenes the joins are eliminated.

SNAP’s logical datamart and Qubes include a metadata layer that captures business information about each topic – dimension, metrics, hierarchies and calculations, so they can all be reused across B.I and A.I workloads.


SNAP Pipeline

SNAP runs on Apache Spark. Data from Hadoop/S3/Datawarehouses are ingested into SNAP. SNAP Qubes are logical datasets on a specific topic and represent data for a set of Fact and Lookup Tables. Periodically data for a set of tables is ingested into SNAP using a data pipeline. SNAP can connect to any B.I Visualization tool using JDBC/ODBC. Once data is ingested into SNAP it is exposed in memory to any front end tool.

As new data comes in, the data pipeline can add that data to SNAP as often as every 5 minutes.

SNAP Architecture

SNAP Optimizer

  • An advanced engine and optimizer to take an incoming query and determine the best and most cost effective execution path on the SNAP data.

SNAP Qubes

  • A logical layer capturing business metadata on dimensions, measures/metrics, hierarchies etc.

SNAP Storage/Fileformat

  • An optimized file format designed for fast data retrieval.