Predix Columnar Store Service Overview

About Columnar Store

Predix Columnar Store is a data storage service based on Cassandra, a NoSQL database designed to handle large data workloads across multiple nodes with no single point of failure.

Cassandra has a peer-to-peer distributed system architecture where data is distributed among multiple homogeneous nodes organized into data centers, and clusters that contain one or more data centers. Data is replicated across nodes and data centers to protect against catastrophic loss and speed request processing. Any authenticated user can connect to any node in any data center to access data by using CQL (Cassandra Query Language, similar to SQL). Read and write requests can be sent to any node in a cluster, and the recipient node acts as a proxy between the client application and the nodes where the requested data are located. If a node or data center is down, data is retrieved from the nearest node, and changes are synched when the nonfunctional node or data center is restored.

Cassandra Infrastructure Components

  • Node: Data is stored in nodes, which can be virtual or physical locations.
  • Data center: A group of related nodes, either physical or virtual, in the same physical location. Replication is configured at this level, and data can be written to multiple data centers. Distinct workloads should be handled by separate data centers to keep requests close to each other and reduce data latency.
  • Cluster: A group of one or more data centers that can be distributed across multiple physical locations.
  • Commit log: Data is first written to this log for durability, and then written to disk when log memory is full. After all data is written to disk, logs can be archived, deleted, or recycled.
  • SSTable: A sorted string table file to which Cassandra writes data. These tables are append-only, stored to disk sequentially, and maintained for each Cassandra table.
  • CQL Table: A collection of ordered columns that has a primary key and is fetched by table row.

Features and Benefits

Columnar Store provides you with all of the power and flexibility of Cassandra database within the Predix platform, with pre-built infrastructure and integration and easy provisioning.

Columnar Store has the following features:
  • Decentralized: Masterless architecture means all nodes are equal, and there is no single point of failure. Data can be written to and read from all nodes and is automatically distributed among nodes. Hardware failures therefore do not impact your important data, and network bottlenecks are eliminated.
  • Fault tolerant: Columnar Store distributes your data across multiple nodes and data centers to provide even more failover protection. When nodes fail, they can be easily restored or replaced, and the commit log design prevents data loss.
  • Scalable: Easy provisioning means you quickly scale from three to n nodes as your needs evolve.
  • Fully replicated: You can customize data replication by selecting a replication factor that meets your requirements.

About Cassandra

Columnar Store is based on Cassandra, a non relational database that offers benefits not found in traditional RDBMS products.

The changing data landscape of today's online applications has created a need for data storage technologies with low latency and massive scalability, continuous uptime, and global data distribution with the ability to read and write in any location. These key requirements, along with the desire to reduce software and operational costs, are the reasons behind the growing popularity of non relational database technologies

Cassandra differs from a more traditional relational database, such as PostgreSQL, in the following ways:

Table 1. Relational Databases Compared to Cassandra
Relational DatabaseCassandra
Supports moderate incoming data velocitySupports high incoming data velocity
Incoming data from one or few locationsIncoming data from many locations
Designed to manage mostly structured dataDesigned to manage all types of data
Supports complex and nested transactionsSupports simple transactions
Single point of failure with failoverNo single point of failure with continuous uptime
Handles moderate data volumesHandles very high data volumes
Centralized architecture and deploymentDecentralized architecture and deployment
Most data written in a single locationData written in many locations
Read scalability support, with consistency sacrificesRead and write scalability support
Vertical scale-up deploymentHorizontal scale-out deployment
When deciding whether Columnar Store is the best choice for your data storage needs, consider the following questions:
  • What volume of incoming data do you need to store?
  • Do you anticipate that data volume will grow over time?
  • What is the expected incoming data velocity?
  • How many locations generate the data you need to store?
  • Is your data structured or unstructured?
  • What level of transaction complexity support do you need?
  • How important are continuous uptime and data durability?

Columnar Store Architecture

Figure: Columnar Store Architecture

Predix Columnar Store can exchange data with Cloud Foundry apps, and receive inputs from other Predix services. Cloud Foundry apps can send data to Columnar Store and other Predix services. External cloud instances of apps and services are blocked from access to Columnar Store or any other components of the Predix Data Services Virtual Private Cloud (VPC).