Breaking Down A Graph Write Query
Published on Fri, 8 Dec 2023|12 minute read

gremlin (1) graph (5) performance (3) aws (1)

In this post, I continue this small series on the challenges faced when writing data to a Graph database. For the previous post in this series, see Redis For Graph Write Latency?, and for the beginning of the series, check out Handling Millions of Graph Writes Per Day.

Wait, what are we doing?

The core problem is this: given a bunch of computers (devices) with software installed on them (software), return a collection of_devices_that have_software_installed.

Ok, so what's the issue?

We have about 10M DeviceSoftwareUpdated events in a given week, and each of those events could have hundreds of Software entries, all of which we need to ensure exist in our graph database.

To write these to the graph, we use a software update query does two things:

Creates vertices for software inside an organization, if missing, and create an edge between them (an organization owns this software).
Creates edges, if missing, from the device to the software (a device owns this software).

So far so good, except that we're observing terrible performance for this query. Consider the duration for two simple examples, with only a few updates:

4 software: between 3 and 4 seconds (!!!)
1 software, still 2700/2800 milliseconds!

With this performance, there's no way we can keep up with the influx of data using the query as it exists!

Expand to see an example of this Gremlin query

Below, check out an example of this query, written for a single software with name "RollerCoaster Tycoon": ``` g. V('80dba177-3863-4868-8ad3-d59f6b0ebf1c'). hasLabel("organization"). not(has("deleted", true)). // Adding software sideEffect( // add software if missing out("hasSoftware"). has("name", "RollerCoaster Tycoon"). has("version", "2"). fold(). coalesce( unfold(), addV("software"). property(single, "name", "RollerCoaster Tycoon"). property(single, "version", "2"). addE("hasSoftware"). from(V('80dba177-3863-4868-8ad3-d59f6b0ebf1c')) ) ). // Getting the device out("ownsDevice"). hasId('2b91ce56-4285-492e-b6c7-f177c9a35072'). hasLabel("device"). not(has("deleted", true)). // Adding edge mergeE([ (T.label): "hasSoftware", (from): Merge.outV, (to): Merge.inV ]). option(Merge.outV, V('2b91ce56-4285-492e-b6c7-f177c9a35072')). option(Merge.inV, V('80dba177-3863-4868-8ad3-d59f6b0ebf1c'). out("hasSoftware"). has("name", "RollerCoaster Tycoon"). has("version", "2") ). // Back to the org if to add another V('80dba177-3863-4868-8ad3-d59f6b0ebf1c'). ```

Mentioned as one of the possible paths in the post Achieving Inadequate Graph Write Throughput, let's explore possible improvements to our graph query... by dissecting it!

Exploring Various Queries

Considering that the query conditionally creates a vertex and edges, we want to explore the performance of separating these pieces into atomic operations. The hypothesis is that the combination of conditionals and edge creation for the original software update query (example above) is a key driver of its duration.

Ok, great. What happens if we break things down?

How about something like this:

Create the software vertex.
Retrieve the ID of the software vertex
Use the ID of the software vertex to:
1. Create the edge from organization to software denoting ownership.
2. Create the edge from device to software, denoting installation.

But there's clearly more than one way to get the data we want. So why not explore our options?

In the following sections, let's explore different levels of atomicity, and see where there's improvement, and evaluate a path forward. All tests will be run with 6 software to add the organization and one device to connect to the 6 software added.

Execution 1: Query As-Is

Using the same query detailed in the above example, let's do some lightweight benchmarking to see where we're starting.

The following is performed in its entirety by the single query:

For each software:
- Upsert the software edge in the organization with the coalesce/fold/unfold pattern
- Navigate to the device
- Look up the software
- Upsert an edge using mergeE
Chain the remaining 5 software writes to the same query

Operation Description	Create new software	Update existing software
Single Query, Chained	19747 ms	18383 ms

Execution 2: Remove the sideEffect step

In Gremlin, the sideEffect is used to perform some computation or operation without affecting the flow of the traversal (see Apache TinkerPop documentation for more detail). Here, we're using it to conditionally create (i.e. upsert) the software node, and still maintain our traversal from the organization node, from which we'll move on to the device via edge "ownsDevice".

Check out the relevant snippet:

// from an Organization...
    .sideEffect(// add software if missing)
    // From the same Organization, now get the device
    .out("ownsDevice")...

In this test, we want to eliminate the use of sideEffect. To do so, we'll simply perform our operation (adding the software if missing), and then simply traverse back to the organization with a chained navigation, i.e. .V('{org vertex id}').

Operation Description	Create new software	Update existing software
Without sideEffect	14335 ms	14499 ms

Ok, there's some improvement here! We're feeling perhaps surprised, perhaps intrigued. Let's continue!

Execution 3: Remove Extra Label and Property Checks

Many of the additional descriptors could be considered "extra" checks, in that IDs should be sufficient. While we can certainly rely on IDs, these checks help improve readability of the queries, and help us gain a measure of safety, accuracy, and confidence. However, since they're not strictly necessary, let's benchmark without them.

Let's remove all that isn't strictly necessary; property filters like .not(has("deleted", true)) or labels like .hasLabel("organization") are now gone, poof! For this test, we'll only use IDs, e.g. .V('{org vertex id}').

Operation Description	Create new software	Update existing software
ID-only lookups and traversals	12714 ms	12681 ms

Ok, again, we're seeing some measure of improvement here.

Execution 4: Split All Writes Into Irreducible Operations

Although AWS and Apache TinkerPop both suggest chaining operations, which should result in faster execution compared with executing several single operations, we still want to explore a query broken down into irreducible operations.

Why should chaining operations be faster? Well, a few reasons.

Reduced Network Overhead: Chaining operations in a single query can reduce network overhead because all operations are sent to the server in a single request.
Query Optimization: Some graph databases can optimize the execution of a single query more effectively than a series of separate queries.
Conciseness: Although not related to performance, chained queries can be more concise and easier to read, especially for simple traversals. This could improve developer agility, though!

Unfortunately, not only do we lose out on these potential advantages, we're also gaining a disadvantage with regard to atomicity^[1] - we're losing it! Say goodbye to our dear friend, for when we separate all of the operations into pieces, we separate ourselves from atomicity.

Fortunately for us, however, the operations we're performing do not necessarily need to be atomic; we're comfortable accepting cases where some of the operations succeed, and other need retrying, as long as we're eventually consistent.

Ok, let's move on to the test itself.

We'll perform the following two operations for all 6 software independently (for a total of 12 writes):

Upsert for the software vertex and edge to the organization (using coalesce/fold/unfold)
Upsert an edge from the software to the device (using mergeE)

Operation Description	Create new software	Update existing software
software vertex and edge to organization (single invocation)	46 ms	45 ms
device to software edge (single invocation)	50 ms	42 ms
*Aggregated Total (All 6 software* invocations)**	576 ms	522 ms

Woah!

Holy moly.

These results are a total change with respect to the previous tests. Splitting the operations, going against Apache TinkerPop/AWS recommendations, really changed the performance. What gives?

Well, we're not quite done yet...

Execution 5: chain back the operations

Remember that we're benchmarking with 6 software writes. In the previous test, we separated them entirely, without chaining at all. For this test, let's try bringing them all back together in a single connected chain, but for the two separate operations like we did previously.

So we'll try:

For 6 chained software, create the software vertices and edges to organization (using coalesce/fold/unfold)
For 6 chained software, upsert an edge from the software to the device (using mergeE)

Operation Description	Create new software	Update existing software
software vertex and edge to organization (chain of 6)	6318 ms	6009 ms
device to software edge (chain of 6)	61 ms	60 ms

Chaining the vertex creation results in a very slow execution. However, the device to software edge creation is fast... what's going on?

Well, on the surface, it looks like there's a difference between our implementations: chaining mergeE is appears fast, and coalesce/fold/unfold is super slow. That gives us an idea...

Let’s try to replace coalesce/fold/unfold with MergeV, and see what happens.

Execution 6: Replace coalesce/fold/unfold with mergeV

Just like our use of coalesce, mergeV() can help us to conditionally create the software vertex.

For this test case, we suspect that coalesce/fold/unfold could be contributing to the problem, and so we aim to replace it.

From the Apache TinkerPop documentation for mergeV, we learn that the

mergeV()-step is used to add vertices and their properties to a graph in a "create if not exist" fashion... [and] the input passed tomergeV()can be either aMap, or a childTraversalthat produces aMap.

Ok, so we need to provide some stuff to mergeV() in order to correctly use it. What can we provide?

Well, herein lies the problem. Since our graph contains software vertices that are specific to an organization (and therefore can be duplicated between organizations), we can't just specify only the properties for name and version, because that doesn't nest the software within an organization. Without this aspect, we would create software super nodes that aren't attached to organizations^[2].

But what about the non-Map option, or a child Traversal? To use a traversal with mergeV() would mean to bring back coalesce/fold/unfold, like so:

g.V(orgId).
    out('hasSoftware').
    has('name', softwareName).
    has('version', softwareVersion).
    fold().coalesce(
        unfold(), 
        addV('software').
            property('name', softwareName).
            property('version', softwareVersion)
    ).
    mergeV().

Since the whole point of this test is to eliminate coalesce/fold/unfold, we'd be back where we started with this approach for mergeV().

Great, so we're back where we started?

Not necessarily! Let's explore another option: making the software properties passed to mergeV() unique per an organization.

We'll simply add another property on software, organization, along with its ID, and boom, we're in business.

Here's what that looks like:

``` // add sw vertex, returning its ID g. mergeV([ (T.label): 'software', 'name': 'RollerCoaster Tycoon', 'version': '2', 'organization': '80dba177-3863-4868-8ad3-d59f6b0ebf1c' ]) .id()

// add the edge to organization g. mergeE([ (T.label): "hasSoftware", (from): Merge.outV, (to): Merge.inV ]). option(Merge.inV, V('')). option(Merge.outV, V('80dba177-3863-4868-8ad3-d59f6b0ebf1c'))

// add the edge from software to device g. mergeE([ (T.label): "hasSoftware", (from): Merge.outV, (to): Merge.inV ]). option(Merge.inV, V('')). option(Merge.outV, V('2b91ce56-4285-492e-b6c7-f177c9a35072')) ```

There's one other ^_small wrinkle. When adding properties to a vertex with this syntax, we want to be careful about the cardinality^[3] of the values. For some graph databases (cough cough AWS Neptune), the values of properties are by default a set. While Gremlin exposes a configuration to set this default gremlin.tinkergraph.defaultVertexPropertyCardinality, it may not be accessible. In our situation, we would ideally only have single values for properties, called single cardinality, but this is not the default behavior in our graph, and an overriding behavior cannot be specified with mergeV() directly. Instead, the only option is to modify this after the fact with option:

option(Merge.onMatch,sideEffect(property(single,"name","RollerCoaster Tycoon")).constant([:]))

With all of those caveats out of the way, let's proceed with our test: and create edges with mergeE() and the ID property of the upserted software node.

Operation Description	Create new software	Update existing software
software vertex (with `mergeV`)	45 ms	41 ms
organization to software edge (single invocation)	41 ms	39 ms
device to software edge (single invocation)	44 ms	40 ms
*Aggregated Total (All 6 software* invocations)**	780 ms	720 ms

Alright, so this test was admittedly fruitless. We had to add a bunch of caveats in order to get rid of coalesce/fold/unfold, and our performance didn't improve when compared with Execution 4. But hey, just because the result doesn't bear fruit doesn't mean that it wasn't worth exploring.

Putting things together

Considering all the results above, we have some really promising results. If we combine the execution patterns of two of our tests, plus some of the minor optimizations we saw in the early executions, we come up wih a new approach:

Upsert the software and add the edge to organization in N independent operations using Execution 4.
Upsert all device to software edges in a chained operation using Execution 5.

Here's the first step looks like:

g.
    V('80dba177-3863-4868-8ad3-d59f6b0ebf1c').
    out("hasSoftware").
    has("name", "RollerCoaster Tycoon").
    has("version", "2").
    fold().
    coalesce(
            unfold(),
            addV("software").
                    property(single, "name", "RollerCoaster Tycoon").
                    property(single, "version", "2").
                    addE("hasSoftware").
                    from(V('80dba177-3863-4868-8ad3-d59f6b0ebf1c'))
    )
    ).
    V('80dba177-3863-4868-8ad3-d59f6b0ebf1c'). // back to Org for next steps

And then the second:

``` // Getting the device g. V('80dba177-3863-4868-8ad3-d59f6b0ebf1c'). out("ownsDevice"). hasId('2b91ce56-4285-492e-b6c7-f177c9a35072').

// Adding edge
mergeE([ (T.label): "hasSoftware", (from): Merge.outV, (to): Merge.inV ]).
option(Merge.outV, V('2b91ce56-4285-492e-b6c7-f177c9a35072')).
option(Merge.inV,
        V('80dba177-3863-4868-8ad3-d59f6b0ebf1c').
        out("hasSoftware").
        has("name", "RollerCoaster Tycoon").
        has("version", "2")
).

// Back to the org to chain another (e.g. 5 more!)
V('80dba177-3863-4868-8ad3-d59f6b0ebf1c').

```

Performing this as a final test, we produced the following results...

Operation Description	Create new software	Update existing
software vertex and edge to organization (single invocation)	46 ms	45 ms
device to software edge (chain of 6)	61 ms	60 ms
Aggregated Total	291 ms	330 ms

Whew!

We made a bunch of progress here.

And we think our write throughput is finally unblocked... but there's no testing like production testing. Stay tuned for more results as we continue to explore the performance of our graph database with this scenario!

Footnotes

1. ↩︎ In database systems, an atomic transaction is an_indivisible_and_irreducible_series of database operations such that either_all_occur, or_none_occur. See Wikipedia's great entry on this topic for more.
2. ↩︎ To read a previous exploration of this problem, and the decision to nest software within organization, see Software Vertex Madness.
3. ↩︎ In mathematics, cardinality refers to the number of elements within a set.

Breaking Down A Graph Write QueryPublished on Fri, 8 Dec 2023|12 minute read