ingraph – project proposal

The aim of the ingraph project is to evaluate openCypher graph queries incrementally. Our long-term goal is to provide a horizontally scalable graph query engine, which is able to perform complex graph queries on graphs with 100M+ elements.

openCypher

The openCypher project is an initiative to standardize the Cypher query language of Neo4j.

Technical goals

Understanding the big data landscape is a challenging task. Even for processing graphs, there are dozens of available technologies. For ingraph, we try to clarify the technical goals of the project, listing both the strengths and limitations of the project.

Data model

ingraph operates on the property graph data model. In computer science, property graphs are also known as typed/heterogeneous attributed graphs.

Typed graphs are also used in other disciplines, usually for analysis with the tools of network theory. Related terms include multidimensional networks (social network analysis), multiplex networks (physics), multi[-]layer[ed] networks (social network analysis and physics) and heterogeneous networks.

Suited for

ingraph is suited for the following technical challenges:

Standing queries on a continuously changing graph
Evaluating queries over a runtime (live) model
Global analytical graph patterns

Not suited for

ingraph is not efficient/expressive enough for the following technical challenges:

Queries that are evaluated only evaluated once or infrequently (i.e. batch processing, daily analysis)
Graph analytics involving PageRank, community detection, etc.

Currently, the ingraph project is not mature enough for production use. Instead, it should be used in prototypes and performance experiments.

Use cases

Candidates for primary use cases of incremental openCypher queries are:

Model validation (IncQuery-D paper, Train Benchmark paper)
Static analysis of source code repositories (thesis work #1, #2)
Fraud detection (Neo4j white paper)
Model simulation and analysis of runtime models (SDL paper)

Incremental openCypher queries can also be beneficial for:

Recommendation engines (Neo4j white paper)
Stream processing (real-time urban monitoring)

Grapflow also aims to provide “continuous subgraph queries”, based their extension of the openCypher query language, openCypher++.
Strider supports incremental queries on distributed RDF data.

Publications

Papers

J. Marton, G. Szárnyas, D. Varró: Formalising openCypher Graph Queries in Relational Algebra (ADBIS 2017)
J. Marton, G. Szárnyas, M. Búr: Model-driven engineering of an openCypher engine: using graph queries to compile graph queries (SDL 2017) Corresponding ingraph revision is tagged as ingraph-compiler-based-on-emf-viatra
G. Szárnyas, B. Izsó, I. Ráth, D. Varró: The Train Benchmark: Cross-Technology Performance Evaluation of Continuous Model Validation (Software and Systems Modeling Journal, 2017)
G. Szárnyas, B. Izsó, I. Ráth, D. Harmath, G. Bergmann, D. Varró: IncQuery-D: A Distributed Incremental Model Query Framework in the Cloud (MODELS 2014)
G. Szárnyas, J. Maginecz, D. Varró: Evaluation of Optimization Strategies for Incremental Graph Queries (Periodica Polytechnica 2017)
J. Maginecz, G. Szárnyas, Sharded Joins for Scalable Incremental Graph Queries (PhD Minisymposium 2016)

Talks

ingraph: Live Queries on Graphs (slides), GraphConnect 2017 lightning talk
The ingraph project and incremental evaluation of Cypher queries, 2nd openCypher Implementers Meeting
Incremental Graph Queries for Cypher, 1st openCypher Implementers Meeting
Incremental Graph Queries with openCypher, FOSDEM 2017, Graph devroom
Social Network Benchmark: Business Intelligence workload, Linked Data Benchmark Council, 10th TUC meeting
The Train Benchmark: Cross-Technology Performance Evaluation of Continuous Model Queries, Linked Data Benchmark Council, 9th TUC meeting

ingraph
Train Benchmark, a benchmark framework for continuous model validation