Overview of NewSQL Database: Scalable Relational Systems
By: Ramsy Johnstone Banda
Student Id: 2120176007
The scalability limitations of traditional database management systems inspired the development of alternative database technologies such as Not only SQL (NoSQL) and NewSQL. NewSQL is a class of database systems that provides the same scalability performance of NoSQL systems for OLTP workloads while maintain the ACID properties of traditional DBMSs. NewSQL delivers high availability and performance to modern data without sacrificing the robust consistency requirements and transaction capabilities that NoSQL withholds. This paper provides the overview of NewSQL as the scalable relational database systems in order to help potential user to utilize the technology where and when it makes sense.
Keywords: NewSQL, DBMS, OLTP, NoSQL
Today’s databases are expected to deliver extreme performance, flexible enough to handle different variety of data formats and scale easily. Generation of large amounts of unstructured data, computational and storage requirements of applications such as social networking have pushed traditional Relational Database Management System (RDBMS) to their limits. Huge data with varied formats, high volume and velocity in nature which cannot be afforded and managed effectively by RDBMS imposes problems to technological vendors that capture, store and process huge data with complex relationships.
The use of RDBMS lately is not adequate and presents problems in modelling of data and constraints of scalability over several servers and big data amount. In addition, the rapid growth of data volume generated on daily basis and the complexity and interdependence of data such as social networks, Web 2.0, and the Internet has captured attention from software community internationally.
In attempt to address these new requirements, organization and companies-maintained farms and data centres which contains clusters with thousands of commodity hardware machines. However, traditional relational databases do not work well in such cases due to its total support of Atomicity, Consistency, Isolation, Durability (ACID) properties which is limited to vertical scalability (Binani, Gutt and Upadhyay, 2011). RDBMS guarantees performance on the order of thousands of transactions per second, however, in online transaction processing (OLTP) scenarios such as games and advertising which involves close to more than million transactions per second, traditional RDBMS does not easily scale horizontally. This horizontal scale limitation of traditional RDBMS demands for the current application scenarios to require appropriate system designs based on the nature of its application and its data or query to cater for them.
Consequently, alternative database that are schema-less, usually avoid join operations, typically scale horizontally and stored in distributed data stores were designed. One of the schema-less database system is Not only SQL (NoSQL). NoSQL was designed to meet scalability requirements of distributed architecture.
Despite a lot features and benefits to using NoSQL, lack of Structured Query Language (SQL) support and non-adherence to Atomicity, Consistency, Isolation, Durability (ACID) properties creates a gap to enterprises with Online Transaction Processing (OLTP). The support of ACID properties and SQL is requirement for businesses responding to current needs such as developing new and powerful applications on scalable OLTP systems and migrating existing applications to adapt to new trends of rapid data growth. Therefore, a new type of data-management solutions was required to address large data OLTP concerns without sacrificing SQL interfaces and ACID properties.
The motivation of this paper is to provide an independent understanding of the strengths and weaknesses of various NewSQL database approaches to supporting applications that process huge volumes of data without losing SQL and ACID properties; as well as to provide a global overview of this modern relational NewSQL databases.
The rest of the paper is structured as follows. Section 2 provides background information. Section 3 discusses characteristics of NewSQL databases. Section 4 discuses Classification of NewSQL databases. Section 5 compares different types of NewSQL databases. Adoption of NewSQL databases. Finally, section 6 presents conclusion.
Relational Database Management System (RDBMS) also known as relational model dominated among the different data models since 1980s with variety of implementations such as PostgreSQL, MySQL, Microsoft SQL Servers and Oracle (Pavlo ; Aslet, 2016). RDBMS also conform to ACID properties that enforce strong transactional and consistency requirements. However, constraints of horizontal scalability over many servers and relational model limited RDBMs. These problems would be attributed to exponential growth of the volume of data generated on daily basis, and the increasing complexity and relationships of data due to, for instance, social networks, Internet, web 2.0.
These problems turned out to be the origin of Not only Structured Query Language (NoSQL) in late 2000s (Aslet, 2011). The key aspect of NoSQL is that they forgo strong transactional guarantees and the relational model of traditional DBMSs in favour of eventual consistency and alternative data models such as key/value, graphs and documents (Pavlo ; Aslet, 2016). NoSQL DBMS follow the Consistency, Availability and Partition tolerance (CAP) theorem and their transactions conforms to Basically, Available, Soft State, Eventually Consistent (BASE) properties(Moniruzzaman, 2014).
Despite the horizontal scalability provided by NoSQL, many enterprise systems that handle high-profile data, for example, financial transaction and order processing systems could not use the NoSQL systems. This was attributed to lack of strong transactional and consistency requirements in NoSQL(Pavlo ; Aslett, 2016). NoSQL database focus on analytical processing of large scale datasets, offering increased scalability over commodity hardware (Konstantinou, Angelou, ; Boumpouka, 2013) triggered by the computational and storage requirements of applications like Big Data Analytics, Business Intelligence and social networking over peta-byte datasets (Moniruzzaman ; Hossain, 2013). This motivated the development of horizontally scalable non-relational data stores such as Facebook’s Cassandra, Google’s Bigtable and an open source implementation HBase(Pavlo ; Aslett, 2016).
Until recently, the need to implement a scale-out architecture for Online Transaction Processing (OLTP) required either NoSQL model or relying on sharding and explicit replication. This need inspired the development of NewSQL. NewSQL systems were developed to achieve the same scalability of NoSQL DBMSs but still keep the relational model and transactional support of the legacy DBMSs(Pavlo ; Aslett, 2016).
3.0 NoSQL Databases
Not only SQL databases (NoSQL) are distributed, non-relational databases designed for large-scale data storage and for massively-parallel data processing across a large number of commodity servers(Moniruzzaman, 2014). Broadly divided into the following types: Key-Value based storage e.g. SimpleDB, Redis, Riak, Dynamo, Voldemort, BerkeleyDB; Wide Column data stores e.g. Cassandra, DynamoDB, Accumulo, HBase, BigTable; Graph databases e.g. Neoj and Document Stores databases e.g. MongoDB, CouchDB. NoSQL were designed to meet scalability requirements of distributed architectures that is an alternative database technology. These all data stores not require fixed table schemas (schema-less), avoid join operations as well as scale horizontally. NoSQL were mainly developed in response to the failure of existing suppliers to address the performance, scalability, and flexibility requirements of large-scale data processing (Cloud Computing and Web applications). These database systems conform to CAP (Consistency, Availability, Partition tolerance) theorem and thus their transactions conform to the BASE (Basically, Available, Soft State, Eventually Consistent) properties. Examples of NoSQL are
Characteristics of NoSQL are do not use SQL language, stores huge amount of data, no inconsistency in distributed environment, fault tolerant, no fixed schema, do no support ACID properties, horizontally scalable leading to high performance and flexible structure.
4.0 NoSQL Scalability and Performance
These systems, databases automatically spread data across servers without requiring any application to participate. NoSQL also support data replication across the cluster and or data centres to ensure availability and disaster recovery support. Therefore, with NoSQL servers can be added or removed from the data layer without application downtime. In addition, NoSQL retain their full expressive power even when distributed across several servers. NoSQL also reduces latency and increases data throughput through transparently caching data in system memory.
Although NoSQL could help manage large distributed data, but NoSQL could not help scaling the OLTP concerns. Lack of SQL support and non-adherence to ACID properties of NoSQL limited the extent to be adopted. In addition, NoSQL ineffective support for applications already existing for RDBMS and demand for new knowledge creates a bottleneck for large scale DBMS. The need to support SQL as well as adherence to ACID properties inspired the development of NewSQL.
5.0 NewSQL Databases
NewSQL is scalable relational database management systems (RDBMS) for Online Transaction Processing (OLTP) that provide scalable performance of NoSQL systems for read-write workloads, as well as maintaining the ACID (Atomicity, Consistency, Isolation, Durability) guarantees of transactional database systems (Stonebraker, 2011). These systems break through the conventional RDBMS limitations of scalability, performance and availability and provide partial or full SQL query capability. NewSQL employs features of NoSQL-style such as column-oriented data storage and distributed architectures, or by using technologies like in-memory processing, symmetric multiprocessing (SMP) or Massively parallel Processing (MPP) advance features and integrate NoSQL or Search components, designed to handle the volume, variety, velocity and variability challenges of Big Data(Moniruzzaman ; Hossain, 2013). Therefore, NewSQL should be considered as alternative to NoSQL and traditional RDBMS for New OLTP applications.
5.1 NewSQL Characteristics
The technical characteristics of NewSQL are: (1) SQL language as a way of interaction between the DBMS and the application; (2) Support for ACID transactions; (3) Non-blocking concurrency control, so that reads and writes do not cause conflict with each other; (4) Architecture that provides higher performance per processing node; (5) Scalable architecture with distributed memory and ability to function in a cluster with a large number of nodes; and NewSQL systems are approximately 50 times faster than traditional OLTP RDBMS(Kumar ; Charu, 2014).
5.2 Categorization of NewSQL
NewSQL categorization is based on salient aspects of implementations adopted by vendors. These are: (1) Novel systems that are built from scratch (New architectures databases); (2) re-implementation of the same sharding infrastructure developed by others (Middleware); and database as-a-service offering from the cloud computing providers that are also based on the new architectures.
5.3 New architectures
These type of NewSQL systems are based on new codebase to achieve scalability and performance improvements. DBMS in this category are based on distributed architectures that operate on shared-nothing resources and contain components to support multi-node concurrency control, fault tolerance through replication, flow control, and distributed query processing. The replication of database across clusters allow new machines to be introduced easily in a system that’s already up and running (Moniruzzaman, 2014), which import scalability characteristic. This was not possible in the traditional RDBMS to implement.
All parts of the system can be optimized for multi-node environment for DBMS built on distributed environment. For instance, most NewSQL DBMS are able to send intra-query data directly between nodes without rerouting them to central location.
These DBMSs also manage their own primary storage, either in-memory or on disk to achieve improved performance. The DBMS is responsible for distributing the database across its resources with a custom engine instead of relying on an off-the-shelf distributed filesystem or storage fabric. This allows the DBMS to send the query to the data which results in significantly less network traffic than transmit data to the computation. For example, bringing data to computation brings tuples along with indexes and materialised views which increases network traffic. This is best option for those who want to develop highly scalable and efficient OLTP systems.
Adopting these new solutions requires data migration and some changes to the code. It also means that organization will potentially lose access to existing administration and reporting tools. Solutions under this category can be software only such as VoltDB, NuoDB and Drizzle or supported as an appliance such as Clustrix and Translattice.
5.4 Transparent Sharding Middleware
These solutions allow an organisation to split a database into multiple shards that are stored across a cluster of single-node DBMS for scalability improvement. One example of transparent sharding is ScaleBase. The centralized middleware component routes queries, coordinates transactions, as well as manages data management, replication, and partitioning across the nodes. Each DBMS node contains a shim layer that communicates with middleware and is responsible for executing queries on behalf of the middleware at its local DBMS instance and returning results, hence presenting a single logical database to the application without the need to modify the underlying the DBMS. Transparent sharding middleware allow reuse of existing skillsets as well as ecosystem and avoid the need to write code or perform any data migration. However, such systems still require traditional DBMS on each node such as MySQL, Postgres, and Oracle. These DBMS are based on disk-oriented architecture which cannot use the concurrency control scheme or storage manager optimized for memory-oriented built in some of NewSQL architectures. Disk-oriented architectures prevents these traditional DBMS from scaling up to take advantage of higher CPU core counts and larger memory capacities(Pavlo & Aslett, 2016). This approach also incurs redundant query and optimization on sharded nodes for complex queries i.e. once at the middleware and once on the individual DBMS nodes but each node does not apply their own local optimization on each query.
NewSQL database-as-a-service (DBaaS) products are offered by cloud computing vendors. It is the cloud computing provider responsible for maintaining physical configuration of database i.e. replication, backups and system tuning (resizing buffer pool size). The customer access the DBMS through dashboard and URL to control the system. Amazo’s Aurora and ClearDB are examples of DBaaS. In this approach, customers pay according to expected application resource utilization. For example, the maximum capacity required for storage size, computation power, memory allocation guaranteed by the vendor. The key providers of cloud computing, are the big companies due to economies of scale.
Table 1: Comparison of oldSQL, NoSQL and NewSQL
Characteristic OldSQL NoSQL NewSQL
ACID yes No yes
OLTP yes No yes
Data Analysis Yes No Yes
Distributed computing yes yes yes
Scaling No Yes yes
Performance data with growth Fast Fast Very fast
Performance overhead Large Moderate Minimal
Query complexity Low High Very High
Table 1 shows a comparison of the characteristics of traditional relational databases (OldSQL), NoSQL, and NewSQL with their abilities and strengths.
From this table we can see that NewSQL is an improvement of traditional RDBMS (oldSQL). It retains features of the standard databases, implementing the innovations of NoSQL.
NewSQL databases are new type of databases that allow both data integrity and scalability in a low cost. They can be built with new architecture that stored the data on memory or expand the capabilities on existing databases. The traditional RDBMS can adopt the concepts of NewSQL, by implementing it them self or by acquiring technologies and integrate them into their products.
NewSQL systems emerged as an alternative database technology to address the scalability problems of OLTP that traditional DBMS and NoSQL could not address. These systems provide scalability performance of NoSQL systems for OLTP read-write workloads while still maintaining the ACID guarantees of traditional database systems. NewSQL database systems is suitable to businesses responding to current needs such as developing new and powerful applications on scalable OLTP systems or migrating existing applications to adapt to new trends of data growth rapidly. NewSQL is enhancement of SQL providing horizontal scaling while maintaining ACID Properties and allows working with Big Data as well as providing the ability to work concurrently.
Konstantinou, I., Angelou, E., ; Boumpouka, C. (2013). On the Elasticity of NoSQL Databases over Cloud Management Platforms (extended version), 10.
Kumar, R., ; Charu, S. (2014). NewSQL Databases: Scalable RDBMS for OLTP Needs to Handle Big Data. https://doi.org/10.13140/RG.2.1.2915.8561
Moniruzzaman, A. B. M. (2014). NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management. International Journal of Database Theory and Application, 7(6), 121–130. https://doi.org/10.14257/ijdta.2014.7.6.11
Moniruzzaman, A. B. M., ; Hossain, S. A. (2013). NoSQL Database: New Era of Databases for Big data Analytics – Classification, Characteristics and Comparison. International Journal of Database Theory and Application, 6(4), 14.
Pavlo, A., ; Aslett, M. (2016). What’s Really New with NewSQL? ACM SIGMOD Record, 45(2), 45–55. https://doi.org/10.1145/3003665.3003674
Stonebraker, M. (2011). New Sql: An Alternative to Nosql and Old Sql For New Oltp Apps. Retrieved June 6, 2018, from https://cacm.acm.org/blogs/blog-cacm/109710-new-sql-an-alternative-to-nosql-and-old-sql-for-new-oltp-apps/fulltext