Mar 2017

SNAP – SparklineData Nextgen Analytics Platform

Many of you who have followed us over the past year or two, know that we have been heads down on making life easier for those who struggle with the challenges of ad-hoc analysis on modern data lakes. We have seen the frustrations of Tableau users and in the words of one of those users, ” We have a Ferrari in Tableau but using it...

Read More


Feb 2017

Fast aggregations/metrics on Spark with Tableau

Ad-hoc queries, with sub second response time, is critical for enterprises. Vast amounts of data exist in Hadoop or AWS datalakes and consumption of this data, in a scalable /fast manner using existing B.I tools like Tableau, is a challenge.  Transactions at the lowest grain(hourly/daily etc), are stored in fact tables. In order to achieve an acceptable level of performance, companies resort to writing extracts or summary tables...

Read More


Feb 2017

Advanced Tableau on Spark /Hadoop

Most benchmarks on datawarehouse optimizations and SQL engines stop with simple examples. The real world uses business intelligence tools where the use cases are not single user single SQL as in a simulated benchmark, Modern B.I on Big Data should satisfy three key requirements Should be able respond interactively as a user drills down into data in Hadoop/Spark, in seconds. While B.I is not about retrieving...

Read More


Nov 2016

Optimizing an Enterprise Datawarehouse on Hadoop

As companies move from analytic datamarts and datawarehouses built on Teradata, Vertica or even Oracle/MYSQL to a Hadoop based architecture, consumption of data for B.I and Analytics workloads become critical. Hadoop has traditionally not been geared for consumption of data as users of Tableau know very well. Hive queries are slow. Products like Impala and Presto have eased the pain a bit but the challenge...

Read More


Sep 2016

Going beyond Data Lakes

We often see customers start to build data lakes on Hadoop or S3 as way to get their transactional data with dimensional data in a common place. This data is cleaned and organized in a star schema like in an enterprise data warehouse. The challenge begins here since consuming data in a Hadoop data lake is not easy. The first challenge is ad hoc analytics....

Read More


Jul 2016

Terabyte scale Data Lake analytics on S3, Hadoop with Spark

In our recent work with customers, there is one constant. The need to make sense of terabytes of fact and time-series data that lands in the datalake( Physically S3 or HDFS). Here is a typical process before we get engaged.  The first step in this process is organizing data in the datalake. A typical fact table for our customers, such as events of all advertising-exposures...

Read More


Jun 2016

Fast B.I on Spark SQL

A typical slice and dice query on a database has the following pattern. On large datasets, the response for such interactive queries have to be in the order of 1 or 2 seconds as users navigate across different Tableau worksheets or choose filters on their web application. A standard in-memory solution may be suboptimal for such slice and dice queries. First, caching large amounts of data...

Read More


May 2016

Analyzing a billion rows with Tableau

“How do I make Tableau go against a live table with 100+ million rows and perform ad hoc queries on various slices of data”. This is a question we get often from data teams across all industries. With growing data across Hadoop, Oracle, Teradata – whatever be the environment, the need to do dimension analysis on the data in an ad-hoc manner with timely responses is...

Read More


Apr 2016

Fast BI – A new approach

Business Intelligence is the heart of all analysis at enterprises. Data, that is collected about business events and metrics have to be organized and analyzed to glean business insights and the industry around data warehousing and visualization of trends and metrics is enormous. But as data has grown, the tools and technologies to analyze the data have not kept pace. While tools like Tableau are...

Read More


Feb 2016

Fast B.I on Spark with Tableau

Tableau 9.0 and above comes with the driver to connect to Spark. Customers who are used to commercial databases will find using Spark with Tableau a bit cumbersome due to the lack of performance on joins and ease of using star schemas. Connecting Tableau live causes performance issues and many companies resort to building materialized tables and temporary tables or extracts to Tableau to work...

Read More

Page 2 of 3123