

















































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Operating Systems. An Advanced Course. Edited by. R. Bayer, R. M. Graham, and G. Seegmüller. Springer-Verlag. Berlin Heidelberg New York 1978 ...
Typology: Slides
1 / 89
This page cannot be seen from the preview
Don't miss anything!
Edited by G. Goos and J. Hartmanis
Notes on Data Base Operating Systems Jim Gray IBM Research Laboratory San Jose, California. 95193 Summer 1977
This paper plagiarizes the work of the large and anonymous army of people working in the field. Because of the state of the field, there are few references to the literature (much of the “literature” is in internal memoranda, private correspondence, program logic manuals and prologues to the source language listings of various systems.)
The section on data management largely reflects the ideas of Don Chamberlin, Ted Codd, Chris Date, Dieter Gawlick, Andy Heller, Frank King, Franco Putzolu, and Bob Taylor. The discussion of views is abstracted from a paper co-authored with Don Chamberlin and Irv Traiger.
The section on data communications stems from conversations with Denny Anderson, Homer Leonard, and Charlie Sanders.
The section on transaction scheduling derives from discussions with Bob Jackson and Thomas Work.
The ideas on distributed transaction management are an amalgam of discussions with Honor Leonard.
Bruce Lindsay motivated the discussion of exception handling.
The presentation of consistency and locking derives from discussions and papers co-authored with Kapali Eswaran, Raymond Lorie, Franco Putzolu and Irving Traiger. Also Ron Obermark (IMS program isolation), Phil Macri (DMS 1100), and Paul Roever clarified many locking issues for me.
The presentation of recovery is co-authored with Paul McJones and John Nauman. Dieter Gawlick made many valuable suggestions. It reflects the ideas of Mike Blasgen, Dar Busa, Ron Obermark, Earl Jenner, Tom Price, Franco Putzolu, Butler Lampson, Howard Sturgis and Steve Weick.
All members of the System R group (IBM Research, San Jose) have contributed materially to this paper.
I am indebted to Mike Blasgen, Dieter Gawlick, especially to John Nauman, each of whom made many constructive suggestions about earlier drafts of these notes.
If you feel your ideas or work are inadequately plagiarized, please annotate this manuscript and return it to me.
At peak periods the system runs about thirty transactions per second with a response time of two seconds.
The DEBIT_CREDIT transaction is very “small”. There is another class of transactions that behave rather differently. For example, once a month a transaction is run which produces a summary statement for each account. This transaction might be described by: MONTHLY_STATEMENT: ANSWER :: = SELECT * FROM ACCOUNT, HISTORY WHERE ACCOUNT. ACCOUNT_NUMBER = HISTORY. ACCOUNT_NUMBER AND HISTORY_DATE > LAST_REPORT GROUPED BY ACCOUNT. ACCOUNT_NUMBER, ASCENDING BY ACCOUNT. ACCOUNT_ADDRESS; That is, collect all recent history records for each account and place them clustered with the account record into an answer file. The answers appear sorted by mailing address.
If each account has about fifteen transactions against it per month then this transaction will read 160,000,000 records and write a similar number of records. A naive implementation of this transaction will take 80 days to execute (50 milliseconds per disk seek implies two million seeks per day.) However, the system must run this transaction once a month and it must complete within a few hours.
There is a broad spread of transactions between these two types. Two particularly interesting types of transactions are conversational transactions that carry on a dialogue with the user and distributed transactions that access data or terminals at several nodes of a computer network,
Systems of 10,000 terminals or 100,000,000,000 bytes of on-line data or 150 transactions per second are generally considered to be the limit of present technology (software and hardware).
1.2. RELATIONSHIP TO OPERATING SYSTEM
If one tries to implement such an application on top of a general-purpose operating system it quickly becomes clear that many necessary functions are absent from the operating system. Historically, two approaches have been taken to this problem:
o Write a new, simpler and “vastly superior” operating system.
o Extend the basic operating system to have the desired function.
The first approach was very popular in the mid-sixties and is having a renaissance with the advent of minicomputers. The initial cost of a data management system is so low that almost any large customer can justify “rolling his own”. The performance of such tailored systems is often ten times better than one based on a general purpose system. One must trade this off against the problems of maintaining the system as it grows to meet new needs and applications. Group’s that followed this path now find themselves maintaining a rather large operating system, which must be modified to support new devices (faster disks, tape archives,...) and new protocols (e. g. networks and displays.) Gradually, these systems have grown to include all the functions of a general-purpose operating system. Perhaps the most successful approach to this has been to implement a hypervisor that runs both the data management operating system and some non-standard operating system. The "standard” operating system runs when the data manager is idle. The hypervisor is simply an interrupt handler which dispatches one or another system.
The second approach of extending the basic operating system is plagued with a different set of difficulties. The principal problem is the performance penalty of a general-purpose operating system. Very few systems are designed to deal with very large files, or with networks of thousands of nodes. To take a specific example, consider the process structure of a general-purpose system: The allocation and deallocation of a process should be very fast (500 instructions for the pair is expensive) because we want to do it 100 times per second. The storage occupied by the process descriptor should also be small (less than 1000 bytes.) Lastly, preemptive scheduling of processes makes no sense since they are not CPU bound (they do a lot of I/O). A typical system uses 16,000 bytes to represent a process and requires 200,000 instructions to allocate and deallocate this structure (systems without protection do it cheaper.) Another problem is that the general-purpose systems have been designed for batch and time-sharing operation. They have not paid sufficient attention to issues such as continuous operation: keeping the system up for weeks at a time and gracefully degrading in case of some hardware or software error.
1.3. GENERAL STRUCTURE OF DATA MANAGEMENT SYSTEMS
These notes try to discuss issues that are independent of which operating system strategy is adopted. No matter how the system is structured, there are certain problems it must solve. The general structure common to several data management systems is presented. Then two particular problems within the transaction management component are discussed in detail: concurrency control (locking) and system reliability (recovery).
Several experimental systems are under construction at present. Some of the more interesting are:
Astrahan et. al., “System R: a Relational Approach to Database Management”, Astrahan et. al., ACM Transactions on Database Systems, Vol. 1, No. 2, June 1976.
Marill and Stern, "The Datacomputer-A Network Data Utility.” Proc. 1975 National Computer Conference, AFIPS Press, 1975,
Stonebraker et. al., "The Design and Implementation of INGRESS.” ACM Transactions on Database Systems, Vol. 1, No. 3, Sept 1976,
There are very few publicly available case studies of data base usage. The following are interesting but may not be representative:
IBM Systems Journal, Vol. 16, No. 2, June 1977. (Describes the facilities and use of IMS and ACP).
“IMS/VS Primer," IBM World Trade Systems Center, Palo Alto California, Form number S320-5767-1, January 1977.
"Share Guide IMS User Profile, A Summary of Message Processing Program Activity in Online IMS Systems" IBM Palo Alto-Raleigh Systems Center Bulletin, form number 6320-6005, January 1977
Also there is one “standard” (actually “proposed standard" system):
CODASYL Data Base Task Group Report, April 1971. Available from ACM
The description of the system, the databases, the transactions, the telecommunications network, and of the users are all collected in the dictionary. This repository:
o Defines the attributes of objects such as databases and terminals.
o Cross-references these objects.
o Records natural language (e. g. German) descriptions of the meaning and use of objects.
When the system arrives, the dictionary contains only a very few definitions of transactions (usually utilities), defines a few distinguished users (operator, data base administrator,...), and defines a few special terminals (master console). The system administrator proceeds to define new terminals, transactions, users, and databases. (The system administrator function includes data base administration (DBA) and data communications (network) administration (DCA). Also, the system administrator may modify existing definitions to match the actual system or to reflect changes. This addition and modification process is treated as an editing operation.
For example, one defines a new user by entering the “define” transaction and selecting USER from the menu of definable types. This causes a form to be displayed, which has a field for each attribute of a user. The definer fills in this form and submits it to the dictionary. If the form is incorrectly filled out, it is redisplayed and the definer corrects it. Redefinition follows a similar pattern; the current form is displayed, edited and then submitted. (There is also a non-interactive interface to the dictionary for programs rather than people.)
All changes are validated by the dictionary for syntactic and semantic correctness. The ability to establish the correctness of a definition is similar to ability of a compiler to detect the correctness of a program. That is, many semantic errors go undetected. These errors are a significant problem.
Aside from validating and storing definitions, the dictionary provides a query facility which answers questions such as: "Which transactions use record type A of file B?" or, "What are the attributes of terminal 34261".
The dictionary performs one further service, that of compiling the definitions into a "machine readable" form more directly usable by the other system components. For example, a terminal definition is converted from a variable length character string to a fixed format “descriptor” giving the terminal attributes in non-symbolic form.
The dictionary is a database along with a set of transactions to manipulate this database. Some systems integrate the dictionary with the data management system so that the data definition and data manipulation interface are homogeneous. This has the virtue of sharing large bodies of code and of providing a uniform interface to the user. Ingress and System R are examples of such systems.
The Data management component stores and retrieves sets of records. It implements the objects: network, set of records, cursor, record, field, and view.
3.1. RECORDS AND FIELDS
A record type is a sequence of field types, and a record instance is a corresponding sequence of field instances. Record types and instances are persistent objects. Record instances are the atomic units of insertion and retrieval. Fields are sub-objects of records and are the atomic units of update. Fields have the attributes of atoms (e. g. FIXED(31)or CHAR(*)) and field instances have atomic values (e. g. “3” or “BUTTERFLY”). Each record instance has a unique name called a record identifier (RID).
A field type constrains the type and values of instances of a field and defines the representation of such instances. The record type specifies what fields occur in instances of that record type.
A typical record might have ten fields and occupy 256 bytes although records often have hundreds of fields (e. g. a record giving statistics on a census tract has over 600 fields), and may be very large (several thousand bytes). A very simple record (nine fields and about eighty characters) might be described by:
DECLARE 1 PHONE_BOOK_RECORD, 2 PERSON_NAME CHAR(), 2 ADDRESS, 3 STREET_NUMBER CHAR(), 3 STREET_NAME CHAR(), 3 CITY CHAR(), 3 STATE CHAR(*), 3 ZIP_CODE CHAR(5). 2 PHONE_NUMBER, 3 AREA_CODE CHAR(3), 3 PREFIX CHAR(3), 3 STATION CHAR(4);
The operators on records include INSERT, DELETE, FETCH, and UPDATE. Records can be CONNECTED to and DISCONNECTED from membership in a set (see below). These operators actually apply to cursors, which in turn point to records.
The notions of record and field correspond very closely to the notions of record and element in COBOL or structure and field in PL/l. Records are variously called entities, segments, tuples, and rows by different subcultures. Most systems have similar notions of records although they may or may not support variable length fields, optional fields (nulls), or repeated fields.
3.2. SETS
A set is a collection of records. This collection is represented by and implemented as an “access path” that runs through the collection of records. Sets perform the functions of :
o Relating the records of the set.
o In some instances directing the physical clustering of records in physical storage.
A record instance may occur in many different sets but it may occur at most once in a particular set.
There are three set types of interest:
o Sequential set : the records in the set form a single sequence. The records in the set are ordered either by order of arrival (entry sequenced (ES)), by cursor position at insert (CS), or are ordered (ascending or descending) by some subset of field values (key sequenced (KS)). Sequential sets model indexed-sequential files (ISAM, VSAM).
o Partitioned set: The records in the set form a sequence of disjoint groups of sequential sets. Cursor operators allow one to point at a particular group. Thereafter the sequential set operators are used to navigate within the group. The set is thus major ordered by hash and minor ordered (ES, CS or KS) within a group. Hashed files in which each group forms a hash bucket are modeled by partitioned sets,
o Parent-child set : The records of the set are organized into a two-level hierarchy. Each record instance is either a parent or a child (but not both). Each child has a unique parent and no children. Each parent has a (possibly null) list of children. Using parent-child sets one can build networks and hierarchies. Positional operators on parent-child sets include the operators to locate parents, as well as operations to navigate on the sequential set of children of a parent. The CONNECT and DISCONNECT operators explicitly relate a child to a parent, One obtains implicit connect and disconnect by asserting that records inserted in one set should also be connected to another. (Similar rules apply for connect, delete and update.) Parent-child sets can be used to support hierarchical and network data models.
A partitioned set is a degenerate form of a parent-child set (the partitions have no parents), and a sequential set is a degenerate form of a partitioned set (there is only one partition.) In this discussion care has been taken to define the operators so that they also subset. This has the consequence that if the program uses the simplest model it will be able to run on any data and also allows for subset implementations on small computers.
Inserting a record in one set map trigger its connection to several other sets. If set “I” is an index for set “F” then an insert, delete and update of a record in “F” may trigger a corresponding insert, delete, or update in set “I”. In order to support this, data manager must know:
o That insertion, update or deletion of a record causes its connection to, movement in, or disconnection from other sets.
o Where to insert the new record in the new set:
o For sequential sets, the ordering must be either key sequenced or entry sequenced.
o For partitioned sets, data manager must know the partitioning rule and know that the partitions are entry sequenced or key sequenced.
o For parent-child sets, the data manager must know that certain record types are parents and that others are children. Further, in the case of children, data manager must be able to deduce the parent of the child.
We will often use the term “file” as a synonym for set.
A cursor is opened to traverse a particular set. Positioning expressions have the syntax:
--+-------------
where RID, FIRST , N-th, and LAST specify specific record occurrences while the other options specify the address relative to the current cursor position. It is also possible to set a cursor from another cursor.
The selection expression may be any Boolean expression valid for all record types in the set. The selection expression includes the relational operators: =, !=, >, <, <=, >=, and for character strings a "matches-prefix" operator sometimes called generic key. If next or previous is specified, the set must be searched sequentially because the current position is relevant. Otherwise, the search can employ hashing or indices to locate the record. The selection expression search may be performed via an index, which maps field values into RIDs.
Examples of commands are:
FETCH (CURSOR1, NEXT NAME='SMITH') HOLD RETURNS (POP); DELETE (CURSOR1, NEXT NAME='JOE' CHILD); INSERT (CURSOR1, , NEWCHILD);
For partitioned sets one may point the cursor at a specific partition by qualifying these operators by adding the modifier GROUP. A cursor on a parent-child (or partitioned) set points to both a parent record and a child record (or group and child within group). Cursors on such sets have two components: the parent or group cursor and the child cursor. Moving the parent curser, positions the child cursor to the first record in the group or under the parent. For parent-child sets one qualifies the position operator with the modifier NEXT_PARENT in order to locate the first child of the next parent or with the modifier WITHIN PARENT if the search is to be restricted to children of the current parent or group. Otherwise positional operators operate on children of the current parent.
There are rather obscure issues associated with cursor positioning.
The following is a good set of rules:
o A cursor can have the following positions:
o Null. o Before the first record. o At a record. o Between two records. o After the last record.
o If the cursor points at a null set, then it is null. If the cursor points to a non-null set then it is always non-null.
o Initially the cursor is before the first record unless the OPEN_CURSOR specifies a position.
o An INSERT operation leaves the cursor pointing at the new record.
o A DELETE operation leaves the cursor between the two adjacent records , or at the top if there is no previous record, or at the bottom if there is a previous but no successor record.
o A UPDATE operation leaves the cursor pointing at the updated record.
o If an operation fails the cursor is not altered.
3.4. VARIOUS DATA MODELS
Data models differ in their notion of set.
3.4.1. RELATIONAL DATA MODEL
The relational model restricts itself to homogeneous (only one record type) sequential sets. The virtue of this approach is its simplicity and the ability to define operators that “distribute" over the set, applying uniformly to each record of the set. Since much of data processing involves repetitive operations on large volumes of data, this distributive property provides a concise language to express such algorithms. There is a strong analogy here with APL that uses the simple data structure of array and therefore is able to define powerful operators that work for all arrays. APL programs are very short and much of the control structure of the program is hidden inside of the operators.
To give an example of this, a “relational” program to find all overdue accounts in an invoice file might be:
SELECT ACCOUNT_NUMBER FROM INVOICE WHERE DUE_DATE<TODAY;
This should be compared to a PL/l program with a loop to get next record, and test for DUE_DATE and END_OF_FILE. The MONTHLY_STATEMENT transaction described in the introduction is another instance of the power and usefulness of relational operators.
On the other hand, if the work to be done does not involve processing many records, then the relational
Which implicitly sets up cursors one and two.
The implicit record naming of the hierarchical model makes programming much simpler than for a general network. If the data can be structured as a hierarchy in some application then it is desirable to use this model to address it.
3.4.3. NETWORK DATA MODEL
Not all problems conveniently fit a hierarchical model. If nothing else, different users may want to see the same information in a different hierarchy. For example an application might want to see the hierarchy “upside-down” with invoice at the top and location at the bottom. Support for logical hierarchies (views) requires that the data management system support a general network. The efficient implementation of certain relational operators (sort-merge or join) also require parent-child sets and so require the full capability of the network data model.
The general statement is that if all relationships are nested one-to-many mappings then the data can be expressed as a hierarchy. If there are many-to-many mappings then a network is required. To consider a specific example of the need for networks, imagine that several locations may service the same account and that each location services several accounts, Then the hierarchy introduced in the previous section would require either that locations be subsidiary to accounts and be duplicated or that the accounts record be duplicated in the hierarchy under the two locations. This will give rise to complexities about the account having two balances..... A network model would allow one to construct the structure:
+----------+ +----------+ | LOCATION | | LOCATION | +----------+ +----------+ | | | | +---)-------------+ | | | | | +------------------+ | | V V +----------+ +----------+ | ACCOUNT | | ACCOUNT | +----------+ +----------+ | | | | V V +----------+ +----------+ | INVOICE | | INVOICE | +----------+ +----------+ A network built out of two parent-child sets.
By using “symbolic” pointers (keys), one may map any network data structure into a relational structure. In that sense all three models are equivalent, and the relational model is completely general. However, there are substantial differences in the style and convenience of the different models. Analysis of specific cases usually indicates that associative pointers (keys) cost three page faults to follow (for a multi-megabyte set) whereas following a direct pointer costs only one page fault. This performance difference explains why the equivalence of the three data models is irrelevant. If there is heavy traffic between sets then pointers must be used. (High-level languages can hide the use of these pointers.)
It is my bias that one should resort to the more elaborate model only when the simpler model leads to excessive complexity or to poor performance.
3.5. VIEWS
Records, sets, and networks that are actually stored are called base objects. Any query evaluates to a virtual set of records which may be displayed on the user's screen, fed to a further query, deleted from an existing set, inserted into an existing set, or copied to form a new base set. More importantly for this discussion, the query definition may be stored as a named m. The principal difference between a copy and a view is that updates to the original sets that produced the virtual set will be reflected in a view but will not affect a copy. A view is a dynamic picture of a query whereas a copy is a static picture.
There is a need for both views and copies. Someone wanting to record the monthly sales volume of each department might run the following transaction at the end of each month (an arbitrary syntax):
MONTHLY_VOLUME= SELECT DEPARTMENT, SUM(VOLUME) FROM SALES GROUPED BY DEPARTMENT;
The new base set MONTHLY_VOLUME is defined to hold the answer. On the other hand, the current volume can be gotten by the view:
DEFINE CURRENT_VOLUME (DEPARTMENT, VOLUME) VIEW AS: SELECT DEPARTMENT, SUM(VOLUME) FROM SALES GROUPED BY DEPARTMENT;
Thereafter, any updates to the SALES set will be reflected in the CURRENT_VOLUME view. Again, CURRENT_VOLUME may be used in the same ways base sets can be used. For example one can compute the difference between the current and monthly volume.
The semantics of views are quite simple. Views can be supported by a process of substitution in the abstract syntax (parse tree) of the statement. Each time a view is mentioned, it is replaced by its definition.
To summarize, any query evaluates to a virtual set. Naming this virtual set makes it a view. Thereafter, this view can be used as a set. This allows views to be defined as field and record subsets of sets, statistical summaries of sets and more complex combinations of sets.
3.5.1 Views and Update
Any view can support read operations; however, since only base sets are actually stored, only base sets can actually be updated. To make an update via a view, it must be possible to propagate the updates down to the underlying base set.
If the view is very simple (e. g., record subset) then this propagation is straightforward. If the view is a one-to-one mapping of records in some base set but some fields of the base are missing from the view, then update and delete present no problem but insert requires that the unspecified ("invisible”) fields of the new records in the base set be filled in with the “undefined” value. This may or may not be allowed by the integrity constraints on the base set.
Beyond these very simple rules, propagation of updates from views to base sets becomes complicated, dangerous, and sometimes impossible.
To give an example of the problems, consider the WHOLE_THING view mentioned above. Deletion of a record may be implemented by a deletion from one or both of the constituent sets (PHONE_BOOK and ACCOUNTS). The correct deletion rule is dependent on the semantics of the data. Similar comments apply to insert and update.
My colleagues and I have resigned ourselves to the idea that there is no elegant solution to the view update problem. (Materialization (reading) is not a problem!) Existing systems use either very restrictive view mechanisms (subset only), or they provide incredibly ad hoc view update facilities. We propose that simple views (subsets) be done automatically and that a technique akin to that used for abstract data types be used for complex views: the view definer will specify the semantics of the operators NEXT, FETCH, INSERT, DELETE, and UPDATE,
3.6. STRUCTURE OF DATA MANAGER
Data manager is large enough to be subdivided into several components:
o View component : is responsible for interpreting the request, and calling the other components to do the actual work. The view component implements cursors and uses them to communicate as the internal and external representation of the view.
o Record component : stores logical records on “pages”, manages the contents of pages and the problems of variable length and overflow records.
o Index component : implements sequential and associative access to sets. If only associative access is required, hashing should be used. If both sequential and associative accesses are required then indices implemented as B-trees should be used (see Knuth Vol. 3 or IBM’s Virtual Sequential Access Method.)
o Buffer manager : maps the data “pages” on secondary storage to a primary storage buffer pool. If the operating system provided a really fancy page manager (virtual memory) then the buffer manager might not be needed. But, issues such as double buffering of sequential I/O, Write Ahead Log protocol (see recovery section), checkpoint, and locking seem to argue against using the page managers of existing systems. If you are looking for a hard problem, here is one: define an interface to page management that is useable by data management in lieu of buffer management.
3.7. A SAMPLE DATA BASE DESIGN
The introduction described a very simple database and a simple transaction that uses it. We discuss how that database could be structured and how the transaction would access it.
The database consists of the records
ACCOUNT(ACCOUNT_NUMBER, CUSTOMER_NUMBER, ACCOUNT_BALANCE, HISTORY) CUSTOMER(CUSTOMER_NUMBER, CUSTOMER_NAME, ADDRESS,.....) HISTORY(TIME, TELLER, CODE, ACCOUNT_NUMBER, CHANGE, PREV_HISTORY) CASH_DRAWER(TELLER_NUMBER, BALANCE) BRANCH_BALANCE(BRANCH, BALANCE) TELLER(TELLER_NUMBER, TELLER_NAME,......)
This is a very cryptic description that says that a customer record has fields giving the customer number, customer name, address and other attributes.
The CASH_DRAWER, BRANCH_BALANCE and TELLER files (sets) are rather small (less than 100, bytes). The ACCOUNT and CUSTOMER files are large (about 1,000, 000,000 bytes). The history file is extremely large. If there are fifteen transactions against each account per month and if each history record is fifty bytes then the history file grows 7,500,000,000 bytes per month. Traffic on BRANCH_BALANCE and CASH_DRAWER is high and access is by BRANCH_NUMBER and TELLER_NUMBER respectively. Therefore these two sets are kept in high-speed storage and are accessed via a hash on these attributes. Traffic on the ACCOUNT file is high but random. Most accesses are via ACCOUNT_NUMBER but some are via CUSTOMER_NUMBER. Therefore, the file is hashed on ACCOUNT_NUMBER (partitioned set). A key-sequenced index, NAMES, is maintained on these records that gives a sequential and associative access path to the records ascending by customer name. CUSTOMER is treated similarly (having a hash on customer number and an index on customer name.) The TELLER file is organized as a sequential set. The HISTORY file is the most interesting. These records are written once and thereafter are only read. Almost every transaction generates such a record and for legal reasons the file must be maintained forever. This causes it to be kept as an entry sequenced set. New records are inserted at the end of the set. To allow all recent history records for a specific account to be quickly located, a parent child set is defined to link each ACCOUNT record (parent) to its HISTORY records (children). Each ACCOUNT record points to its most recent HISTORY record. Each HISTORY record points to the previous history record for that ACCOUNT.
Given this structure, we can discuss the execution of the DEBIT_CREDIT transaction outlined in the introduction. We will assume that the locking is done at the granularity of a page and that recovery is achieved by keeping a log (see section on transaction management.)