Read/Write Consistency



next up previous contents
Next: Times Returned by Up: Performance Issues Previous: Performance Issues

Read/Write Consistency

 

Read/write consistency, which means that data written by a process becomes ``visible'' immediately after a write returns, is explicitly guaranteed by IEEE 1003.1-1990. Figure 3.1 illustrates the concept of read/write consistency for two processes, Process A and Process B, which are running on the same node using Unix IO, not stdio, and are accessing local data. In this section, the term ``Unix IO'' is used to refer to file access which does not use stdio. When Process A or Process B executes a write(), the data goes to a cache for the device. When either Process A or Process B executes a read(), the data is read from cache. Read/write consistency is maintained because both processes read directly from the same cache. When the write() of Process A or Process B returns, the data is ``visible'' immediately to any other process on the system using Unix IO. There is a demonstration of this at the end of this section (see fig. 3.4). If caching is not used, data goes directly to the device.

  
Figure 3.1: Local case for read() and write().

There are several situations which involve the concept of read/write consistency when accessing local files. These are:

  1. Single Process.

    A process opens a file, writes some data at the beginning of the file, uses lseek() to position the file pointer to the beginning of the file, and reads. This case is read/write consistent. The process reads what it just wrote.

  2. Processes sharing an open file description.

    A process opens a file, writes some data at the beginning of the file, forks a child who uses lseek() to position the file pointer to the beginning of the file and reads. This case is read/write consistent. The child process reads what the parent process just wrote.

  3. Processes not sharing an open file description (see Process A and Process B in fig. 3.1).

    Process A opens a file and writes some data at the beginning of the file. Some time after the write of Process A returns, Process B opens the file, positioning the file pointer to the beginning of the file, and reads. This case is read/write consistent. Process B reads what Process A just wrote.

  4. Stdio.

    Figure 3.1 shows two processes, Process C and Process D, which use stdio to buffer data for accessing local files. In order to improve performance, data written using stdio usually goes to a buffer associated with a process. The data is not visible to any other process until the buffer is flushed. When an fwrite() returns, the data may have been written to disk or the data may still be in the buffer. An initial fread() will read data from the disk into a buffer associated with the process. Subsequent fread()s will read data from the buffer. Once all the data in the buffer has been read, the buffer is refilled with data from disk. It is possible for data in the buffer to be inconsistent with data on disk. Therefore, read/write consistency fails when using stdio. There is a demonstration of this case at the end of this section (see fig. 3.5).

  
Figure 3.2: read() and write() for two processes on the same client.

Read/write consistency is also an issue in a network environment. In order to improve performance, most client implementations use some caching mechanism when accessing remote files. Processes on the same node are usually read/write consistent because they read from and write to the same cache (see fig. 3.2). However, processes on different nodes using client caching may not be read/write consistent because the processes read from and write to different caches.

  
Figure 3.3: read() and write() for two processes on different clients using client caching.

The lack of read/write consistency when using stdio on the same node is analogous to the lack of read/write consistency when processes are running on different nodes. Both stdio and client caching are used to improve performance. With stdio, each process has a set of read and write buffers. Thus, read/write consistency is usually only maintained at the level of a single process. With client caching in a network environment, read/write consistency may only be maintained for the set of processes on a single client. Processes on different clients may not be read/write consistent when client caching is used. One way to ensure read/write consistency among processes on different nodes is to forgo the use of client caching. This means that when Unix IO is used, all writes are sent to the server before the write() returns, and all reads are obtained directly from the server before the read() returns. This usually entails a performance penalty. There are techniques which permit the use of client caching and still maintain read/write consistency. Appendix gif contains references for some of these techniques.

Figure 3.3 depicts two processes, Process A and Process B, which are running on different clients and use client caching. Assume that both processes are accessing the same file on the server. An initial read() by a process will read data from the disk on the server into the cache associated with each client. Subsequent read()s by a process will read data from client cache. If Process A executes a write(), the data may not be written immediately to the server. All writes on the client are cached and are written to the server at some later time. Therefore, it is possible for data in cache to be inconsistent with the server's data. In order for Process B to see the data that Process A just wrote, Client A would have to write the data in its cache to the server, and Client B would have to fill its cache with new data from the server. Processes on different nodes may not be read/write consistent when using client caching because the data may not be ``visible'' to all processes after a write() returns.

Demonstrations were developed using an NFS implementation to illustrate read/write consistency. The following demonstrations use the programs WriteUnixIO and WriteStdIO. Source code for these programs is located in Appendix gif. WriteUnixIO is a program which reads input from the terminal and writes output to outfile. The file name outfile, which in the demonstrations refers to either a local file or a remote file mounted under the directory mnt, is a parameter to WriteUnixIO. Data is read and written a byte at a time using Unix IO. WriteStdIO is similar to WriteUnixIO except that the program uses stdio to buffer data for writing. Periodically, the command ``cat outfile; echo `` ''; ls -l outfile'' is run to display the contents and the size of outfile. This command shows whether or not data is ``visible'' immediately after a write() returns. For all the demonstrations, Process A is either the program WriteUnixIO or WriteStdIO and Process B is a process which displays the contents and size of outfile. Figure 3.4 and figure 3.5 illustrate how the demonstration proceeds. Commands are displayed in italics and the output of those commands is displayed in bold.

Processes on different nodes that do not use client caching are read/write consistent, but processes on different nodes that do use client caching may not be. Applications that need to guarantee read/write consistency should use record locking. In some implementations where client caching is used, record locking is the preferred method of guaranteeing read/write consistency. Because of possible performance degradation from providing read/write consistency all of the time, some implementations only guarantee read/write consistency among processes who use record locking for simultaneous file access.

  
Table 3.1: Read/write consistency summary

Table 3.1 summarizes the results of the read/write consistency demonstrations. For those cases that are not read/write consistent, they may be read/write consistent some of the time, but read/write consistency is not guaranteed all of the time. For example, the demonstration which concerns processes on different nodes using NFS without client caching is read/write consistent even though Process B uses client caching. This is because Process B reads the entire file at once. Processes on different clients using client caching or processes using stdio may be read/write consistent some of the time, but are not all of the time. The only way to guarantee read/write consistency for all cases of processes simultaneously accessing files and for all implementations is to use record locking and forgo the use of stdio.



next up previous contents
Next: Times Returned by Up: Performance Issues Previous: Performance Issues



Karen Olsen
Mon Aug 21 10:18:32 EDT 1995