2015-08-03

Notes of mitX Big Data course: NoSQL and NewSQL

NoSQL and NewSQL

Prof. SAMUEL MADDEN

1. Traditional DBMS features:

1) Record oriented (row store);
2) Schema (carefully structured);
3) Use SQL;
4) Do transactions (ACID semantics)

2. For high throughput, high velocity data

1) Failures in a single node may not be a problem, OK to 'failover', and thus don't require 100% consistency;
2) Some data are not relational, e.g. json;
3) Strict data schema may not be necessary in face of so diversed data sources;

Definition: Data model: The way data is stored/represented in a database system.

3. NoSQL

NoSQL is non-relational. It's either a Key-Value store or a document store.
1) Key-Value: can only put (write) and get (read);
2) Document: like key-value, but with values as dictionaries or XML/json documents.
no ACID - limited atommicity and consistency, no schema.
Easy to understand, implement and program; easy to distribute across nodes;

Eventual consistency + majority read/write protocol to conquer replica failures. Note that majority read protocol includes the 'most recent' rule.
Without majority read/write protocol, the eventual consistency strategy may give wrong answers.

4. NewSQL: H-Store (by MIT)
Provides ACID and SQL, but with as high-throughput as NoSQL.
Reduces caching, logging/recovery and locking time to boost speed.
Column store, in-memory, data partitioning.
H-Store does serial execution instead of concurrent execution, thus avoid locking.
H-Store does compact logging to avoid heavy-weight recovery.

Partition - single thread in each partition
- separated procedures can be expressed using SQL, but are predeclared when setting up the customized H-Store system, rather than arbitrarily composed by users.

H-Store can run 25 times as many transactions as traditional DBMSs.

OLTP workloads are mostly easy to partition.

5. Summary

New DBMSs should support new data models (key-value, documents), high availability (use replica for high-throughput data and ensure consistency),

没有评论:

发表评论