Spanner: Google's Globally-Distributed Database
https://ai.google/research/pubs/pub39966
Last updated
https://ai.google/research/pubs/pub39966
Last updated
Lock-Free distributed read only transactions
External consistency of writes
Temporal multiple versioning
Schematized, semi-relational database with
associated structured query language
Applications control replication and placement
All of this because of TrueTime
TrueTime is a global synchronized clock with bounded non-zero error: it returns a time interval that is guaranteed to contain the clock’s actual time for some time during the call’s execution. More specifically, Spanner provides the following APIs.
***Spanner is that it gets serializability from locks, but it gets external consistency (similar to linearizability) from TrueTime.
As with most ACID databases, Spanner uses two-phase commit (2PC) and strict two-phase locking to ensure isolation and strong consistency. 2PC has been called an “anti-availability” protocol because all members must be up for it to work. Spanner mitigates this by having each member be a Paxos group, thus ensuring each 2PC “member” is highly available even if some of its Paxos participants are down. Data is divided into groups that form the basic unit of placement and replication
Note: I'd like to point out that Google runs Spanner on its own private global network. Spanner is not running over the public Internet — in fact, every Spanner packet flows only over Google-controlled routers and links. Thus, it is infeasible to borrow the idea of TrueTime and implement something on your own. However, if you are unsatisfied with the logical clock and want to leverage physical clocks, there is something called Hybrid Logical Clock, which provides a feasible alternative. As far as I know, HLC is implement in Dropbox's new Distributed File System and CockroachDB, an open source clone of Spanner.
TrueTime is used by Cloud Spanner to assign timestamps to transactions and it allows Spanner provides external consistency, which is the strictest consistency property for transaction-processing systems. It is stated as following: For any two transactions, and , if starts to commit after finishes committing, then the timestamp for is greater than the timestamp for .