Archive Compression

Archive Compression Overview

The Data Archiver performs archive compression procedures to conserve disk storage space. Archive compression can be used on tags populated by any method (collector, migration, File collector, SDK programs, Excel, etc.)

Archive compression is a tag property. Archive compression can be enabled or disabled on different tags and can have different deadbands.

Archive compression applies to numeric data types (scaled, float, double float, integer and double integer). It does not apply to string or blob data types. Archive compression is useful only for analog values, not digital values.

Archive compression can result in fewer raw samples stored to disk than were sent by collector.

If all samples are stored, the required storage space cannot be reduced. If we can safely discard any samples, then some storage space can be reduced. Briefly, points along a straight or linearly sloping line can be safely dropped without information loss to the user. The dropped points can be reconstructed by linear interpolation during data retrieval. The user will still retrieve real-world values, even though fewer points were stored.

Archive Compression uses a held sample. This is a sample held in memory but not yet written to disk. The incoming sample always becomes the held sample. When an incoming sample is received, the currently- held sample is either written to disk or discarded. If the currently-held sample is always sent to disk, no compression occurs. If the currently-held sample is discarded, nothing is written to disk and storage space is conserved.

In other words, collector compression occurs when the collected incoming value is discarded. Archive compression occurs when the currently-held value is discarded.

Held samples are written to disk when archive compression is disabled or the archiver is shut down.

Any sample written to disk is a true incoming sample. No timestamp or value or quality is ever changed or interpolated.

Note: internal performance tags use an archive compression deadband of 0% or close to 0%.

Archive Compression Logic

The following describes the logic that is executed on every sample written to the archiver while archive compression is enabled for the tag:

IF the incoming sample data quality = held sample data quality
IF the new point is a bad
Toss the value to avoid repeated bads. Do we toss new bad or old bad?
ELSE
Decide if the new value exceeds the archive compression deadband/
ELSE//
data quality is changed, flush held to disk
IF we have exceeded deadband or changed quality
// Store the old held sample in the archive
// Set up new deadband threshold using incoming value and value written to disk.

// Make the incoming value the held value

Example: Change of data quality

The effect of archive compression is demonstrated in the following examples.

This example demonstrates that:

A change in data quality causes held samples to be stored.
Held samples are returned only in a current value sampling mode query.
Restarting the archiver causes the held sample to be flushed to disk.

Normally, a flat straight line would never cause the held value to be written to disk. An important exception is that changes in data quality force the held value to be written to disk. Assume a large archive compression deadband, such as 75% on a 0 to 100 EGU span.


Time	Value	Quality
t)	2	Good
t1	2	Bad
t2	2	Good

The following SQL query lets you see which data values were stored:

Select * from ihRawData where samplingmode=rawbytime and tagname = t20.ai-1.f_cv and timestamp > today

Notice that the value at t2 does not show up in a RawByTime query because it is a held sample. The held sample would appear in a current value query, but not in any other sampling mode:

select * from ihRawData where samplingmode=CurrentValue and tagname = t20.ai-1.f_cv

The points should accurately reflect the true time period for which the data quality was bad.

Shutting down and restarting the archiver forces it to write the held sample. Running the same SQL query would show that all 3 samples would be stored due to the change in data quality.

Archive Compression of Straight Line

In this case we have a straight flat line. Assume a small archive compression deadband, say 2% on a 0 to 100 EGU span. Since data occurs on a straight flat line, the deadband will never be exceeded.


Time	Value	Quality
t0	2	Good
t0+5	2	Bad
t0+10	2	Good
t0+15	2	Good
t0+20	2	Good

Shut down and restart the archiver, then perform the following SQL query:

select * from ihRawData where samplingmode=rawbytime and tagname = t20.ai-1.f_cv and timestamp > today

Only t0 and t0+20 were stored. T0 is the first point and T0+20 is the held sample written to disk on archiver shutdown, even though no deadband was exceeded.

Bad Data

This example demonstrates that repeated bad values are not stored. Assume a large archive compression deadband, such as, 75%.


Time	Value	Quality
t0	2	Good
t0+5	2	Bad
t0+10	2	Bad
t0+15	2	Bad
t0+20	2	Good
t0+25	3	Good

The t0+5 value is stored because of the change in data quality.
The t0+10 value is not stored because repeated bad values are not stored.
The t0+15 value is stored when the t0+20 comes in because of a change of quality.

Disabling Archive Compression for a Tag

Assume a large archive compression deadband, such as 75%


Time	Value	Quality
t0	2	Good
t0+5	10	Good
t0+10	99	Good
t0+15	Archive compression disabled

The t0 value is stored because it is the first sample.
The t0+5 is stored when the t0+10 comes in.
The t0+10 is stored when archive compression is disabled for the tag.

Archive Compression of Good Data

This example demonstrates that the held value is written to disk when the deadband is exceeded.

In this case, we have an upward ramping line. Assume a large archive compression deadband, such as 75% on a 0 to 100 EGU span.


Time	Value	Quality
t0	2	Good
t0+5	10	Good
t0+10	10	Good
t0+15	10	Good
t0+20	99	Good

Shut down and restart the archiver, then perform the following SQL query:

select * from ihRawData where samplingmode=rawbytime and tagname = t20.ai-1.f_cv and timestamp > today

Because of archive compression, the t0+5 and t0+10 values are not stored. The t0+15 value is stored when the t0+20 arrives. The t0+20 value would not be stored until a future sample arrives, no matter how long that takes.