SNAP is designed to work on Data Lakes. It organizes business data at the detailed level – ad impressions, general ledger, sales, payables, travel and expense etc in the form of SNAP QuBE, a “Queryable Business Entity”, with rich metadata on your business data.
SNAP QuBEs can run into hundreds of terabytes and still provide lightning fast query response times.
SNAP Qubes are rich lenses into business data, backed by a multi-dimensional file format optimized for sub-second queries.
OLAP is a powerful technology that has served complex B.I needs for decades. SNAP QuBEs are multi-dimensional like the OLAP cubes with the downside of OLAP pre-aggregation eliminated. Further SNAP QuBEs are in-memory data + indexes and are designed to work with modern distributed computing frameworks.
SNAP is “Apache Spark native”. SNAP is deployed on an Apache Spark cluster.
SNAP and SQL Joins/Star Schema
Traditional datawarehouses have Facts and Lookup(Dimension) tables. A typical analysis involves joining multiple tables for an analysis. Joins are expensive. SNAP’s unique architecture allows analysts to express their SQL as joins, but behind the scenes the joins are eliminated to go against the QuBE.