SNAP is a topical datamart. This means, SNAP is designed for business data at the detailed level – ad impressions, general ledger, sales, payables, travel and expense etc in the form of SNAP Qubes, a logical business process centric view with rich metadata on your business data.
SNAP Qubes can run into hundreds of terabytes and still provide lightning fast query response times.
We call them “Qubes” to differentiate from the old OLAP summary Cubes which are pre-aggregated datasets. Unlike old OLAP, SNAP Qubes are rich logical lenses into business data, backed by a multi-dimensional file format optimized for sub-second queries.
OLAP is a powerful technology that has served complex B.I needs for decades. SNAP Qubes are multi-dimensional like the OLAP cubes with the downside of OLAP pre-aggregation eliminated. Further SNAP Qubes are in-memory data + indexes and are designed to work with modern distributed computing frameworks.
SNAP is “Apache Spark native”. SNAP is deployed on an Apache Spark cluster.
SNAP and SQL Joins/Star Schema
Traditional datawarehouses have Facts and Lookup(Dimension) tables. A typical analysis involves joining multiple tables for an analysis. Joins are expensive. SNAP’s unique architecture allows analysts to express their SQL as joins, but behind the scenes the joins are eliminated.
SNAP’s logical datamart and Qubes include a metadata layer that captures business information about each topic – dimension, metrics, hierarchies and calculations, so they can all be reused across B.I and A.I workloads.