NoSQL and NewSQL
Prof. SAMUEL MADDEN
1. Traditional DBMS features:
1) Record oriented (row store);
2) Schema (carefully structured);
3) Use SQL;
4) Do transactions (ACID semantics)
2. For high throughput, high velocity data
1) Failures in a single node may not be a problem, OK to 'failover', and thus don't require 100% consistency;
2) Some data are not relational, e.g. json;
3) Strict data schema may not be necessary in face of so diversed data sources;
Definition: Data model: The way data is stored/represented in a database system.
3. NoSQL
NoSQL is non-relational. It's either a Key-Value store or a document store.
1) Key-Value: can only put (write) and get (read);
2) Document: like key-value, but with values as dictionaries or XML/json documents.
no ACID - limited atommicity and consistency, no schema.
Easy to understand, implement and program; easy to distribute across nodes;
Eventual consistency + majority read/write protocol to conquer replica failures. Note that majority read protocol includes the 'most recent' rule.
Without majority read/write protocol, the eventual consistency strategy may give wrong answers.
4. NewSQL: H-Store (by MIT)
Provides ACID and SQL, but with as high-throughput as NoSQL.
Reduces caching, logging/recovery and locking time to boost speed.
Column store, in-memory, data partitioning.
H-Store does serial execution instead of concurrent execution, thus avoid locking.
H-Store does compact logging to avoid heavy-weight recovery.
Partition - single thread in each partition
- separated procedures can be expressed using SQL, but are predeclared when setting up the customized H-Store system, rather than arbitrarily composed by users.
H-Store can run 25 times as many transactions as traditional DBMSs.
OLTP workloads are mostly easy to partition.
5. Summary
New DBMSs should support new data models (key-value, documents), high availability (use replica for high-throughput data and ensure consistency),
没有评论:
发表评论