Problem |
How do you know whether a problem will work itself out or not?
|
Context |
A fault-tolerant application where some errors, overload
conditions, etc. may be transient. The system can escalate
through recovery strategies, taking more drastic action at each
step. A typical example is a fault tolerant telecommunication
system using static traffic engineering, where you want to check
for overload or transient faults.
|
Forces |
You want to catch faults and problems. There is no
sense in wasting time solving a problem that goes away by itself
Many problems work themselves out, given time.
|
Solution |
Don t react immediately to detected conditions. Make
sure the condition really exists by checking several times, or use
Leaky Bucket Counters to detect a critical number of occurrences
in a specific time interval. For example: by averaging over time
or just by waiting a while, give transient faults a chance to pass.
|
Resulting context |
Errors can be resolved with truly minimal
effort. The human operator need not intervene for transient errors
(as in the pattern Minimize Human Interaction).
|
Rationale |
This pattern detects "temporally dense" events. Think
of the events as spikes on a time line. If a small number of spikes
(specified by a threshold) occur together (where "together" is
specified by the interval), then the error is a transient. If the
episode transcends the interval, it's not transient: the leak rate
is faster than the refill rate, and the pattern indicates an error
condition. If the burst is more intense than expected (it exceeds the
error threshold) then it's unusual behavior not associated with
a transient burst, and the pattern indicates an error condition.
Used by Leaky Bucket Counters, Five Minutes of No Escalation
Messages. and others.
|
[Source: James Coplien, "Pattern Mining", C++ Report, Oct 95, p83] |