CS 3733 Operating Systems: Producer-Consumer Failures


When a producer-consumer bounded buffer program runs without protecting its critical section, it is possible for some of the variables to attain incorrect values. In the case of a single producer and a single consumer, the only critical section is the incrementing and decrementing of the counter.

If the incrementing of the counter is interrupted in the producer, the counter may attain a value that is one greater than it should. This can be referred to as an internal producer failure. In a similar way, if the decrementing of the counter in the consumer is interrupted, the counter may attain a value that is one smaller than it should, giving an internal consumer failure. In either case the value of the counter is inconsistent with the number of item inserted and removed from the buffer.

An internal inconsistency may not immediately have any consequences. The counter is only used to indicate whether the buffer is full or empty and so an incorrect value will not affect the program unless the buffer becomes full or empty. If the counter is one greater than it should be, the consumer may attempt to remove an item from an empty buffer. It this case the consumer gets an item that has already been consumed. This is referred to as an external repeat failure since the consumer repeats the consume operation on an item.

If the counter is one less than the correct amount, a producer may be able to put an item into the buffer when the buffer is full. This writes over an item in the buffer that has not yet been processed by a consumer, so the item is skipped. This is referred to as an external skip failure.

The important distinction between internal and external failures is that an internal failure does not directly cause the program to behave incorrectly, while an external failure can cause incorrect results.

Another way of looking at is is that an error in a program may not be immediately detected. By the time it has been detected, it may be difficult to trace back to where the error occurred, since it occurred so long ago. An internal inconsistency may not be detected for a long period of time and may go unnoticed during testing. An internal inconsistency may even correct itself before it does any damage since an internal producer failure may be cancelled out by an internal consumer failure.