Jul 2016

Terabyte scale Data Lake analytics on S3, Hadoop with Spark

In our recent work with customers, there is one constant. The need to make sense of terabytes of fact and time-series data that lands in the datalake( Physically S3 or HDFS). Here is a typical process before we get engaged.  The first step in this process is organizing data in the datalake. A typical fact table for our customers, such as events of all advertising-exposures...

Read More


Jan 2016

Interactive analysis on Apache Spark with Druid

  NOTE: June 2017 update: Our new product SNAP is built completely on Spark and our focus is on building SNAP. SNAP combines all the benefits of a fast index like Druid with an advanced optimization engine on Spark for Enterprise Datawarehousing needs, multiway joins, star/snowflake schemas etc. SNAP also works well with Tableau across billion row live datasets. SNAP can be deployed on AWS...

Read More