Segmentation and user profile analysis is key for ad-tech platforms. Multiple user ids and device ids are gathered from events and user behavior analysis is driven by matching the behavior with a user profile store. Analysts may perform ad-hoc queries asking the following questions.
How many users watched Quantico and Person of Interest in the last 6 months?
Analysts, like to discover patterns and so its never one question but a series of questions. The above question may be followed by( pseudo language as an example but will be expressed in Tableau or SQL tools or custom web UI)
Create a set of distinct users with profile.device = 1 and event.series.quantico.count > 5 and event.series.person_of_interest > 5 between Oct 10 2015 and Jan 5 2016
The IDs may be used for retargeting campaigns.
For such interactive large volume data analysis( each event data set can run to 100s of terabytes and number of IDS exported from a query can run to tens of millions), the platform should be able to provide think time response, easy ad-hoc drill downs across multiple dimensions and filters and aggregate metrics on the fly.
With Sparkline Data you can do all of the above and integrate with existing apps and tools like Tableau.
As enterprises build their data lakes on Hadoop or S3, the need for consuming data from these data lakes efficiently and at think speed becomes paramount.
Analytics Datawarehouses were built on Teradata, Greenplum, Netezza and such specialized MPP databases in the past and had a rich analytics query layer and fast response times. As data volumes and user consumption of data explodes, companies find the performance of traditional solutions unacceptable and cost of scaling prohibitive.
With Spark, enterprises have a rich engine to access data and Spark SQL has provided a way to bring the vast SQL savvy analyst community to consume big data.
Sparkline Data Accelerator is the V10 engine built on Spark that powers fast B.I and enables analysts and data scientists to access ALL data in their data lakes.
The ALL data part is important. In the past querying historical data, say 5 years of data in a Finance data lake of ERP information, was prohibitively expensive for a large enterprise. So summary tables were created by quarter, by years etc and specific dimensions were rolled up.
Data was cut and only a slice was exposed to analysts. Modern data analysis does not work that way. Analysts want to look at and derive patterns from the lowest grain of data possible.
With Sparkline Data, IT and Engineering do not need to make a choice of pre-materializing rolls ups by time. Data can be stored and consumed at the lowest grain needed.
This means, enterprises for example can store a daily event store of all their sales data for 5 years and still allow their analysts to interactively browse and analyze their data using their favorite tools.
Retailers have traditionally been the early adopters of business intelligence and large scale datawarehouses.
With mobile spatial and geo location data, combined with user transactions and traditional SKU level data, there is an explosion of user and product data. Retailers have always used specialized tools requiring store level comparisons across time and drill downs into product and SKUs, navigating a product, geo and time hierarchies. Rollups and drill downs need to be fast and easily accessible.
With traditional SQL on hadoop tools a lot of this “B.I” functionality is just not available. It is time consuming and expensive to make even standard queries on large datasets perform at the speed that interactive analysis needs.
With Sparkline’s OLAP indexes and efficient use of Spark’s parallelism retailers now can allow their analysts to drill and navigate interactively on terabytes of historical datasets without sacrificing performance and functionality.
Devices are sensor equipped and sending back data. What was once called “call-home” data for pro-active support is now becoming mainstream. There are several use cases around what is now known as “IoT” analytics.
Analyzing device data for predictive maintenance
- Gather all sensor data into your datalake and use Tableau with Sparkline data for B.I analysis.
- Using the same mechanism, expose data from Sparkline to your data scientists to run models in Spark through Sparkline to understand underlying patterns in machine failure rate.
Augment data from your devices along with Sales and Support data for a complete installed base analysis.
- Business selling to other business direct or through channels need a “Support data warehouse:” to fully understand everything about post-sales activity: from contracts, to service renewals to support metrics, the support data lake is the foundation of support analytics. As data volume grows, extracting and analyzing this data becomes a challenge, Use Spark and Sparkline to enhance the consumption of this data by your business analysts with faster response times and deeper insights.
Banking, Finance, Industrial new age companies, Healthcare providers and many more services and manufacturing companies are building the next generation Enterprise Data Warehouse on Hadoop or S3.
Traditional B.I has to transition to Big Data B.I.
While much work has been done to get the data into the data lake, consuming massive amounts of data from the Data Lake through Tableau, Qlik and such tools still remain a huge challenge.
The volume of data and the distributed architecture of Hadoop /Spark means, data consumption has to be thought out new. Traditional B.I tools were not built from the ground up on modern data stacks. SQL on hadoop tools and in-memory engines are not enough.