Introducing Crux – Open Time Store

We have just released Crux - an open-source bitemporal database.

What is Crux?

Crux is a document store that indexes documents for graph query. The indexes are bitemporal, meaning that you can query against valid time and transaction time, the usefulness of which is covered in our previous post "the value of bitemporality".

Unbundled

Crux is an 'unbundled' database - to use Martin Kleppman’s phrase - shipping as a connected set of pluggable parts. This means that users can swap out parts and contribute their own, and that Crux itself follows the Unix philosophy of each part doing one thing particularly well.

This pluggability allows Crux to scale with you as your scaling needs increase. You can start out using Crux with the transaction log being a local-disk based implementation, and then in future you could switch it out to Kafka, which offers much higher data throughput and retention guarantees.

With the open, unbundled architecture, it’s intended that Crux be extended and experimented with. The various 'parts' in Crux are described by Clojure protocols, meaning that users can get in and provide their own implementations that would either fully replace or decorate the existing ones.

How Crux Works

crux system2
Crux Architecture

Crux is schemaless, with transactions being submitted through the Crux API. The data is then sent to two event-log topics for storage: the transaction topic and the document topic.

We use two topics because whilst the transaction topic is immutable, messages in the document topic can be permanently erased, forming the basis of Crux’s ground-up strategy to provide ease of content eviction for data privacy reasons, to align with compliance regimes such as GDPR.

Using a separate topic for the content documents also allows for compaction to remove duplicates, as the message ID is a content hash of the document. From a Kafka perspective, the transaction topic uses a single Kafka partition, but it is in our roadmap to shard the document topic to potentially use multiple partitions.

The event-log that Crux uses is the golden store of data, with Crux leveraging Kafka’s infinite retention capability.

Crux Nodes will then ingest the data from the event-log and index the transactions and documents locally into a local Key/Value store such as RocksDB or LMDB, which acts as the foundation for both a local document store and a set of bitemporal indexes that Crux maintains for graph query. RocksDB and LMDB use fundamentally different data structures and therefore present a choice of performance characteristics and trade-offs.

Crux currently supports both a Java and Clojure API. See the JavaDocs.

Transacting and Querying

Crux supports an Edn Datalog format, similar to - though not the same as - Datomic’s. To get a feel of transacting to and querying against Crux, check out the query documentation and/or read Ivan Fedorov’s "a bitemporal tale".

Crux supports four transaction operations:

  • PUT

  • DELETE

  • CAS

  • EVICT

PUT will store a document whereas DELETE will delete it from a given valid time, but the data will still be stored in Crux history. Use EVICT to get rid of data permanently, either for all of history, or for a given valid time window. Use CAS to compare-and-swap, to ensure that the data in a document/entity is what you think it is before adding a new version, or else abort the transaction.

Inside of Crux we use a Worse Case Optimal Join algorithm, which enables the query engine to lazily stream out results for an arbitrary complex query with multiple join conditions and clauses. This, in combination with an external merge sort used for additional sorting, means that we avoid manifesting intermediary results in memory.

Deployment

crux deployment
Crux Architecture

Crux can be deployed as a JAR file within your application, or Crux has a HTTP server that you can use. You can use Crux in a 'standalone' mode without Kafka (substituting in a local disk-based event-log), or you can deploy a cluster of Crux nodes that use Kafka.

Open

Crux is open source so that you can see the code, commit history, warts and all. You can see the GitHub issues where design decisions are made, and you can contribute in this process. You can fork Crux and send PRs our way. We encourage developers to try out Crux and to expose and publish patterns of using it, to feedback their ideas and critique.

Crux is a product that JUXT will offer various support models for, including enterprise support and managed hosting. If you have any questions about Crux or would like to talk to us about using it, please email us or visit our Zulip.

Have a play with Crux - add the JAR to your project and scale up from there. Crux is Alpha. Please raise any issues on our GitHub.

crux logo

Published: 2019-05-15

Privacy policy

Whose Time Is It Anyway? jon
by Jon Pither
Published: 2019-04-03
History. Of histories. iva
by Ivan Fedorov
Published: 2019-04-19
Adding queryability to enterprise data
Published: 2018-08-15