您的当前位置:首页正文

系统设计DDIA之Chapter 7 Transactions 之弱隔离级别之快照隔离

2024-11-08 来源:个人技术集锦

快照隔离和可重复读是用于管理数据库并发性的隔离级别,通过为每个事务提供一致的数据库视图来避免数据不一致的问题。以下是这两者的概述:

Snapshot Isolation and Repeatable Read are isolation levels that help manage concurrency in databases by allowing transactions to see a consistent view of the database. Here's a breakdown:

  1. Read Committed Issues: While the "read committed" isolation level prevents dirty reads and dirty writes, it does not prevent all concurrency issues. For example, a phenomenon called read skew can occur, where a transaction sees different data at different times, leading to inconsistent states.

  2. Snapshot Isolation:

    • What It Is: Snapshot isolation ensures that each transaction reads from a consistent snapshot of the database taken at the start of the transaction. Even if the data changes due to other transactions, a transaction under snapshot isolation will only see the data as it was when the transaction began.
    • Use Cases: It is particularly useful for long-running, read-only queries (e.g., backups, analytics) that need to see a stable version of the data. It avoids issues like read skew, where different parts of a query see different versions of data.
    • Implementation: Implemented using Multi-Version Concurrency Control (MVCC), which maintains multiple versions of each piece of data. When a transaction begins, it is given a unique transaction ID and can see only those versions of data that were committed before it started. It ignores any changes made by transactions that start after it.
  3. How MVCC Works:

    • Multiple Versions: The database keeps several committed versions of an object because different transactions may need to see the state of the database at different points in time.
    • Visibility Rules: At the start of a transaction, the database identifies which transactions are in progress. It makes visible only the data committed before the transaction started, ensuring a consistent snapshot.
    • No Locking for Reads: Readers do not block writers, and writers do not block readers, enhancing concurrency and performance.
  4. Index Handling in MVCC:

    • Index Points to All Versions: The index may point to all versions of an object, and queries filter out versions not visible to the current transaction.
    • Append-only Structures: Some databases use append-only structures like copy-on-write B-trees, which maintain consistent snapshots without modifying existing versions.
  5. Naming Confusion with Repeatable Read:

    • Many databases use different names for snapshot isolation. For example, PostgreSQL and MySQL call it "repeatable read," while Oracle calls it "serializable." The SQL standard does not define snapshot isolation, leading to confusion and inconsistencies between database implementations.

问题列表和答案

  1. 什么是读偏差(read skew),为什么在“读已提交”隔离级别下是可接受的?

    • 回答读偏差指的是一个事务在不同时间点读取同一数据时,由于其他并发事务的修改,看到不同的值,导致数据不一致。在“读已提交”隔离级别下,这种现象是可接受的,因为“读已提交”只保证一个事务只能读取已提交的数据,而不保证事务在整个执行过程中看到的数据是一个一致的快照。
  2. 为什么快照隔离(snapshot isolation)对长时间运行的只读查询有用?

    • 回答快照隔离对长时间运行的只读查询(例如备份或分析)有用,因为它提供了一个一致的数据库快照,确保查询过程中不会被其他事务的更改所影响。这样,查询能够在不需要锁定资源的情况下看到数据的稳定视图,避免因数据不断变化而导致的无意义结果,同时提高了性能和并发性。
  3. 数据库如何使用多版本并发控制(MVCC)实现快照隔离?

    • 回答:数据库通过**多版本并发控制(MVCC)**实现快照隔离,方法是为每个数据项维护多个版本。每个事务在开始时获得一个唯一的事务 ID,并在其开始时获取数据库的一个快照。事务只能看到在其开始之前已提交的数据版本,而忽略之后发生的更改,从而确保事务看到的数据是一致的。
  4. 在快照隔离下,一致快照的可见性规则是什么?

    • 回答
      • 一个事务可以看到由其开始之前已经提交的事务创建的数据。
      • 一个事务看不到由其开始之后的事务创建或修改的数据。
      • 如果数据被标记为删除,但删除操作是由一个在当前事务之后开始的事务发起的,则该数据对当前事务仍然可见。
  5. 在多版本数据库中,索引如何在快照隔离下工作?

    • 回答:在多版本数据库中,索引可能指向一个对象的所有版本,查询时会过滤掉当前事务不可见的版本。此外,一些数据库使用追加结构写时复制的技术,在更新时创建新的索引副本而不是修改原有索引,从而保持一致的快照并避免锁定冲突。
  6. 快照隔离和可重复读之间的命名混淆是什么,为什么会存在这种混淆?

    • 回答:由于SQL 标准并没有对快照隔离进行明确定义,这导致不同的数据库对同一隔离级别使用不同的名称。比如,PostgreSQL 和 MySQL 将快照隔离称为“可重复读”,而 Oracle 则称之为“可串行化”。这种命名混淆的存在是因为标准定义不明确,使得不同数据库的实现之间存在差异和不一致。
  7. What is read skew, and why is it acceptable under the "read committed" isolation level?

    • Answer: Read skew occurs when a transaction reads the same data at different times and sees different values due to modifications by other concurrent transactions, leading to inconsistent data. Under the "read committed" isolation level, this is acceptable because "read committed" only ensures that a transaction reads data that has been committed but does not guarantee a consistent snapshot of data throughout the transaction.
  8. Why is snapshot isolation useful for long-running, read-only queries?

    • Answer: Snapshot isolation is useful for long-running, read-only queries (such as backups or analytics) because it provides a consistent snapshot of the database, ensuring that the query is not affected by changes made by other transactions. This allows the query to see a stable view of the data without needing to lock resources, avoiding meaningless results due to data changes and improving performance and concurrency.
  9. How do databases implement snapshot isolation using Multi-Version Concurrency Control (MVCC)?

    • Answer: Databases implement snapshot isolation using Multi-Version Concurrency Control (MVCC) by maintaining multiple versions of each data item. Each transaction is assigned a unique transaction ID at its start and gets a snapshot of the database at that moment. The transaction only sees the versions of data that were committed before it started, ignoring any changes made afterward, ensuring a consistent view.
  10. What are the visibility rules for a consistent snapshot under snapshot isolation?

    • Answer:
      • A transaction can see data created by transactions that committed before it started.
      • A transaction cannot see data created or modified by transactions that started after it.
      • If data is marked for deletion by a transaction that started after the current transaction, the data is still visible to the current transaction.
  11. How do indexes work in a multi-version database under snapshot isolation?

    • Answer: In a multi-version database, indexes may point to all versions of an object, and queries filter out versions not visible to the current transaction. Some databases use append-only or copy-on-write structures, creating new index copies on updates rather than modifying the original index, thus maintaining a consistent snapshot and avoiding lock contention.
  12. What is the naming confusion between snapshot isolation and repeatable read, and why does it exist?

    • Answer: The naming confusion exists because the SQL standard does not explicitly define snapshot isolation, leading different databases to use different names for the same isolation level. For example, PostgreSQL and MySQL refer to snapshot isolation as "repeatable read," while Oracle calls it "serializable." This confusion arises from the ambiguous standard definition, resulting in differences and inconsistencies among database implementations.
Top