Unlocking analytics with data lake and graph analysis

Join AI & data leaders at Transform 2021 on July 12th for the AI/ML Automation Technology Summit. Register today.


This post was written by David Sullivan, a data scientist at Valkyrie.

Among numerous industries and business types, modern analytics are growing increasingly personalized and dependent on the most recent, up-to-date data. It has become evident that, in order to empower businesses to keep up with their competition and customer demands simultaneously, single purpose dashboards, fed by pre-computed analytics stored in a document database, are no longer sufficient. In lieu, the solution that will increasingly be used to support modern analytics will be a combination of streaming ingestion tools and centralized data lakes with separate storage and compute layers.

Many of the recently popular specialized data services were designed, in part, to solve this problem by optimizing for certain types of data structures and operations, with the notable example of graph databases. Graph analysis algorithms require certain operations to be very fast in order to function well, so an entire ecosystem of tools was developed around them, with a variety of languages, toolsets, and nuances to learn. Knowing how to solve a particular graph problem in one implementation does not necessarily help you solve that same problem on another platform.

However, with the high performance environment provided by tools such as Apache, Spark, or Snowflake; graph analysis can be run against the exact same data structures as more traditional, tabular data analysis without sacrificing performance or siloing off parts of your data infrastructure. This unification of tools also provides a more generalized compute platform upon which skills can be readily transferred from one platform and problem to another.

In some ways, this is akin to the general computing revolution enabled by the development of programming languages that could be run on a variety of underlying architectures. When the same set of code could be run against machines built for different purposes with different instruction sets, the true power of software was unleashed. When a scientist can run analysis from a spectrum of techniques across the entirety of the available data, they can unlock the power of modern analytics and the insights they provide.

This story originally appeared on Www.valkyrie.ai. Copyright 2021

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Leave a Comment