Storage
Archive Storage
Historian stores up to 1200 seconds of future data.
Archive Compression Overview
The Data Archiver performs archive compression procedures to conserve disk storage space. Archive compression can be used on tags populated by any method (collector, migration, file collector, SDK programs, Excel, etc.)
Archive compression is a tag property. Archive compression can be enabled or disabled on different tags and can have different deadbands.
Archive compression applies to numeric data types (scaled, float, double float, integer and double integer). It does not apply to string or blob data types. Archive compression is useful only for analog values, not digital values.
Archive compression can result in fewer raw samples stored to disk than were sent by collector.
If all samples are stored, the required storage space cannot be reduced. If we can safely discard any samples, then some storage space can be reduced. Briefly, points along a straight or linearly sloping line can be safely dropped without information loss to the user. The dropped points can be reconstructed by linear interpolation during data retrieval. The user will still retrieve real-world values, even though fewer points were stored.
Archive Compression uses a held sample. This is a sample held in memory but not yet written to disk. The incoming sample always becomes the held sample. When an incoming sample is received, the currently- held sample is either written to disk or discarded. If the currently-held sample is always sent to disk, no compression occurs. If the currently-held sample is discarded, nothing is written to disk and storage space is conserved.
In other words, collector compression occurs when the collected incoming value is discarded. Archive compression occurs when the currently-held value is discarded.
Held samples are written to disk when archive compression is disabled or the archiver is shut down.
Archive Compression Logic
IF the incoming sample data quality = held sample data quality
IF the new point is a bad
Toss the value to avoid repeated bads. Do we toss new bad or old bad?
ELSE
Decide if the new value exceeds the archive compression deadband/
ELSE//
data quality is changed, flush held to disk
IF we have exceeded deadband or changed quality
// Store the old held sample in the archive
// Set up new deadband threshold using incoming value and value written to disk.
// Make the incoming value the held value
Archive compression example: Change of data quality
Archive compression example: Change of data quality
The effect of archive compression is demonstrated in the following examples.
- A change in data quality causes held samples to be stored.
- Held samples are returned only in a current value sampling mode query.
- Restarting the archiver causes the held sample to be flushed to disk.
Time | Value | Quality |
---|---|---|
t) | 2 | Good |
t1 | 2 | Bad |
t2 | 2 | Good |
The following SQL query lets you see which data values were stored:
Select * from ihRawData where samplingmode=rawbytime and tagname = t20.ai-1.f_cv and timestamp > today
Notice that the value at t2 does not show up in a RawByTime query because it is a held sample. The held sample would appear in a current value query, but not in any other sampling mode:
select * from ihRawData where samplingmode=CurrentValue and tagname = t20.ai-1.f_cv
The points should accurately reflect the true time period for which the data quality was bad.
Shutting down and restarting the archiver forces it to write the held sample. Running the same SQL query would show that all 3 samples would be stored due to the change in data quality.
Archive Compression Example: Archive Compression of Straight Line
Time | Value | Quality |
---|---|---|
t0 | 2 | Good |
t0+5 | 2 | Bad |
t0+10 | 2 | Good |
t0+15 | 2 | Good |
t0+20 | 2 | Good |
Shut down and restart the archiver, then perform the following SQL query:
select * from ihRawData where samplingmode=rawbytime and tagname = t20.ai-1.f_cv and timestamp > today
Only t0 and t0+20 were stored. T0 is the first point and T0+20 is the held sample written to disk on archiver shutdown, even though no deadband was exceeded.
Archive Compression Example: Bad Data
Time | Value | Quality |
---|---|---|
t0 | 2 | Good |
t0+5 | 2 | Bad |
t0+10 | 2 | Bad |
t0+15 | 2 | Bad |
t0+20 | 2 | Good |
t0+25 | 3 | Good |
- The t0+5 value is stored because of the change in data quality.
- The t0+10 value is not stored because repeated bad values are not stored.
- The t0+15 value is stored when the t0+20 comes in because of a change of quality.
Archive Compression Example: Disabling Archive Compression for a Tag
Time | Value | Quality |
---|---|---|
t0 | 2 | Good |
t0+5 | 10 | Good |
t0+10 | 99 | Good |
t0+15 | Archive compression disabled |
- The t0 value is stored because it is the first sample.
- The t0+5 is stored when the t0+10 comes in.
- The t0+10 is stored when archive compression is disabled for the tag.
Archive Compression Example: Archive Compression of Good Data
This example demonstrates that the held value is written to disk when the deadband is exceeded.
In this case, we have an upward ramping line. Assume a large archive compression deadband, such as 75% on a 0 to 100 EGU span.
Time | Value | Quality |
---|---|---|
t0 | 2 | Good |
t0+5 | 10 | Good |
t0+10 | 10 | Good |
t0+15 | 10 | Good |
t0+20 | 99 | Good |
Shut down and restart the archiver, then perform the following SQL query:
select * from ihRawData where samplingmode=rawbytime and tagname = t20.ai-1.f_cv and timestamp > today
Because of archive compression, the t0+5 and t0+10 values are not stored. The t0+15 value is stored when the t0+20 arrives. The t0+20 value would not be stored until a future sample arrives, no matter how long that takes.
Determining Whether Held Values are Written During Archive Compression
When archive compression is enabled for a tag, its current value is held in memory and not immediately written to disk. When a new value is received, the actual value of the tag is compared to the expected value to determine whether or not the held value should be written to disk. If the values are sufficiently different, the held value is written. This is sometimes described as "exceeding archive compression".
Archive compression uses a deadband on the slope of the line connecting the data points, not the value or time stamp of the points themselves. The archive compression algorithm calculates out the next expected value based on this slope, applies a deadband value, and checks whether the new value exceeds that deadband.
The "expected" value is what the value would be if the line continued with the same slope. A deadband value is an allowable deviation. If the new value is within the range of the expected value, plus or minus the deadband, it does not exceed archive compression and the current held value is not written to disk. (To be precise, the deadband is centered on the expected value, so that the actual range is plus or minus half of the deadband.)
Archive Compression Example: Exceeding Archive Compression
EGUs are 0 to 200000 for a simulation tag.
Enter 2% Archive compression. This displays as 4,000 EGU units in the administration UI.
When a sample arrives, the archiver calculates the next expected value based on the slope and time since the last value was written. Let's say that the expected value is 17,000.
The deadband of 4,000 is centered, so the archiver adds and subtracts 2,000 from the expected value. Thus, the actual value must be from 15,000 to 19,000 inclusive for it to be ignored by the compression algorithm.
In other words, the actual value must be less than 15,000 or greater than 19,000 for it to exceed compression and for the held value to be written.
Determining Expected Value
The Archive Compression algorithm calculates the expected value from the slope, time, and offset (a combination of previous values and its timestamp):
ExpectedValue = m_CompSlope * Time + m_CompOffset;
Where
m_CompSlope = deltaValue / deltaT
m_CompOffset = lastValue - (m_CompSlope * LastTimeSecs)
Archive Compression Example: Determining Expected Value
Values arriving into the archiver for tag1 are
Time | Value |
---|---|
t0 | 2 |
t0+5 | 10 |
t0+10 | 20 |
m_CompSlope = deltaValue / deltaTime m_CompSlope = (20-10) / 5
m_CompSlope = 2
m_CompOffset = lastValue - (m_CompSlope * LastTimeSecs)
m_CompOffset = 20 - (2 * 10)
m_CompOffset = 0
ExpectedValue = m_CompSlope * Time + m_CompOffset;
ExpectedValue = 2 * 15 + 0;
ExpectedValue = 30
The expected value at t0+15 is 30.
Archive Compression Example: Archive Compression of a Ramping Tag
An iFIX tag is associated with an RA register. This value ramps up to 100 then drops immediately to 0.
Assume a 5-second poll time in Historian. How much archive compression can be performed to still "store" the same information?
11-Mar-2003 19:31:40.000 0.17 Good NonSpecific
11-Mar-2003 19:32:35.000 90.17 Good NonSpecific
11-Mar-2003 19:32:40.000 0.17 Good NonSpecific
11-Mar-2003 19:33:35.000 91.83 Good NonSpecific
11-Mar-2003 19:33:40.000 0.17 Good NonSpecific
An archive compression of 1% stores the most samples. An archive compression of 0% logs every incoming sample. Even on a perfectly ramping signal with no deviations, 0% compression conserves no storage space and essentially disables archive compression.
Archive Compression Exampmle: Archive Compression of a Drifting Tag
A drifting tag is one that ramps up, but for which the value barely falls within the deadband each time. Even though a new held sample is created and the current one discarded, the slope is not updated unless the current slope exceeded. With a properly chosen deadband value, this is irrelevant: by specifying a certain deadband, the user is saying that the process is tolerant of changes within that deadband value and that they do not need to be logged.
Archive Compression Example: Archive Compression of a Filling Tank
In the case of a filling tank, the value (representing the fill level) ramps up, then stops. In this case, the system also uses collector compression, so when the value stops ramping, no more data is sent to the archiver. At some point in the future, the value will begin increasing again.
As long as nothing is sent to the archiver, no raw samples are stored. During this flat time (or plateau), it will appear flat in an interpolated retrieval because the last point is stretched. This illustrates that you should use interpolated retrieval methods on archived compressed data.
How Archive Compression Timeout Works
The Archive Compression Timeout value describes a period of time at which the held value is written to disk. If a value is held for a period of time that exceeds the timeout period, the next data point is considered to exceed the deadband value, regardless of the actual data received or the calculated slope.
After the value is written due to an archive compression timeout period, the timeout timer is reset and compression continues as normal.
Archive De-fragmentation - An Overview
What is De-fragmentation? Having an IHA file with contiguous data values for a particular tag is called Archive De-fragmentation.
An Historian IHA (Historian Data Archive) file could contain data values for multiple tags and data written for a particular tag may not be in continuous blocks. This means data values in IHA files are fragmented.
Archive De-fragmentation improves the performance of reading and processing of archive data dramatically.
- An archive can be de-fragmented, when it is not active.
- A command line based tool can be used to run on any existing archive as needed.
- De-fragmentation can be done on all versions of archives, and the resulting archive will be the latest version.
- The de-fragmentation must be started manually.
De-fragmenting an existing archive
To de-fragment an archive using the tool: