Measuring Historian Performance

About Measuring Performance of Proficy Historian

You can use the Windows Performance Counters to measure activity and performance of the Data Archivers. The Counters are more familiar to the system administrators and monitors the following Historian information.
  • Read rates
  • Side by side with non-Historian counters such as CPU usage or handle
  • Thread counts
The following topics provide the objects and counters most useful for measuring and describing the system activity. The counters that are specified in the following topics are a subset of all the available counters.
Note:
  • The Historian Advanced Topics documentation is not a replacement of the documentation for the full set of counters. Examples contained in each topic of this documentation use the counters to produce other measurements. Sometimes, the measurement number that you want is not exposed as a single counter. However, the number can be a combination of counters or a comparison of two counters.
  • The counters only describe the behavior of the Data Archiver. For more information about troubleshooting and optimizing performance using the Historian and Windows counters, refer to the appropriate Historian documentation.

About the Proficy Historian Overview Objects

The Overview objects are the counters that measure the samples collected and sent by the Data Archiver. You cannot use the counters to perform the following actions:
  • Measure the performance of a specific read.
  • Track the reads of a specific client or program.

The Overview objects are preferred to measure and describe a system. It is calculated as the sum total of the numbers in each instance of a data store. After you understand the Overview object, you can identify the most active data store by using the associated counters.

The performance counters are more useful because:
  • You cannot access the read rates in the administrator UI.
  • The write rate is updated only once in a minute. The counters are updated in real time making it much easier to see exactly when a problem began.
  • The administrator UI shows only the data in the last 10 minutes but the counters are displayed over a longer time period to locate active times.
  • You can access the counters in relation to the non-historian counters in the same trend.
  • The counters are accessible when you cannot access the administrator UI due to performance or security reasons.
The reads vary based on the load on the Data Archiver.
Note: The load on the Data Archiver does not depend only on the number of read calls. The load increases with the increased number of tags, archives, and the raw samples. You can monitor some of these activities using the counters.
You can use the Overview object to measure the following:
  • Number of samples examined internally with respect to the number of samples returned to a user at a given time
  • Inconsistency of reads and writes in a day, week, or month
  • The number of out-of-order writes during a given time range
  • The average number of samples examined per read call
Counter NameDescription
Read Rate (Calls/min)The number of user or program initiated read calls processed over the last minute
Read Raw Rate (Samp/min) The number of raw data samples examined internally over the last minute in response to read calls
Read Samp Rate (Samp/min) The number of raw data samples returned to external programs over the last minute in response to read calls
Note: The counts and rates provided in the above table are generic across the Data Archiver. The counts and rates do not provide detailed information, such as the reason of the time taken by a read (for example, 8 seconds) and the activities where the time was spent. However, the counts and rates can describe the reason of a scenario, when the same read criteria takes different time frames in two different days. If there are more reads or writes happening in the Data Archiver, the read criteria takes more time.

Comparing Read Raw Rate and Read Samp Rate

Run a query for a month average of 200 tags that have collected data in every second. Assume that you stored the data in one day archives. The Data Archiver has to examine a large number (200 tags x 60 seconds x 24 hours x 30 days) of raw samples that are spread across 30 one day archives to produce only 200 returned samples. If the query is run in 1 minute without any error, the Read Raw Rate value is a large number and the Read Samp Rate value is 200.

Run the same query with samplingmode Raw By Time. The Read Raw Rate shows the same value because the same number of raw samples were examined. As the query results returned to the caller, the Read Samp Rate = Read Raw Rate.
Note: The number of archives examined is not reflected for the counter. You will get the same Read Raw Rate and Read Samp Rate if you have a 30 days archive instead of a 31 days archive.

You cannot only look at the number of write calls with reads. A write can have samples for multiple tags and the timestamps on the data can affect the number of archives accessed by a write. A collector can typically write data for all its tags, but with the same timestamp and the write call can access the same archive. A migration program can write two years of data for a tag, which can access many archives.

The following table provides the counters that have a rate over the last minute. These counters describe the data write activity in the Data Archiver.
Counter Name Description
Write Rate (Average) The number of raw samples received from the external programs in the last minute
Write Rate (Max) The highest number of Write Rates (average) after starting the Data Archiver
The following table provides the total counters after starting the Data Archiver. If the Data Archiver runs for a long time, the counters are set to zero.
Counter Name Description
Writes (Expensive) The total number of raw samples that are expensive writes after the Data Archiver started
Writes (Total Failed) The total number of data samples that failed to be stored after the Data Archiver started
Writes (Total) The total number data samples stored to IHA files after the Data Archiver started
Writes (Total OutOfOrder)The total number of data samples written out of time order after the Data Archiver started. The number only includes the successful writes, and performs slower than when the data is in time order.
Note: Although some counters are rates and some are totals, all the counters are in units of data samples.

Comparing the number of Raw Samples Read and Written

The Write Rate (Average) is the write equivalent to the Read Samp Rate. You can compare the two counters to see if more reads or writes per minute are created in your Data Archiver.

Understanding the varying load on the Data Archiver

Trend the Write Rate (Average) and Read Samp Rate over a 24 hour period. You may see certain times of the day where the load varies, such as when reports are run, or a collector has a store and forward flush, or data is recalculated with Calculation collector. Access the data available for a month. A system used for compliance or billing will have a very low read rate until you run the report till the end of month. Compare the value to a system used for real time, auto updating trending. That system will have a more consistent read load throughout the month.

Calculating the rate of out of order writes during a given time range

Out of order data writes are only exposed as a count, not a rate.

You can compute the number of out of order writes between a specific time (for example, between 3:15pm and 3:25pm) by getting the value of Total Out of Order at each timestamp and subtracting the value. You can convert the value to a rate per minute by dividing the value by 10 minutes.

The measurement is necessary because there are occurrences of out of order data in many systems. There is a base rate of out of order data for the system. If the system has intermittent changes in write performance, you can calculate the out of order rate during those times and compare the data to the base rate.

Calculating samples examined per read

As both the number of read calls and number of samples examined are exposed, you can divide to get the number of samples examined per read. In some systems, the number is near one, which indicates many small reads while the Calculation collector does many current value reads that examine one sample and return it. The samples per read will also be one, if you query raw data, such as when replicating data. An analytic program can summarize the data into 5 minute averages. For one second uncompressed data, the value is 300 samples examined per read. The number is an overall system wide number so it will not be useful to troubleshoot one read.

About Proficy Historian Message Queue Object

In any server software, there will be a number of queues. Most of the time, and all the queues should ideally have 0 items. This implies that the server is keeping up with the workload. The read and write counters of the overview object tell you how many read and write operations were performed. However, the queue counters can tell you how many actions are expected to happen, and if the user had to wait for a response.

Measuring the system performance through queues is an excellent way to determine if the server has reached the steady state performance limit. It can also tell you if the usage comes in bursts and needs to be mores spread out over time.

When using the queues for measurement, you should think about what are the ???items on that queue. The ???items or ???messages here are read calls or write calls. One read call can have multiple tag names and one write call can have multiple data samples.

There are 3 queue instances exposed by counters
  • Write Queue: Data writes from collectors and non-collectors.
  • Read Queue: Anything for data that is not a write. It is not just data reads, it can also be tag browses.
  • Msgs Queue: Anything other than read queue and write queue. You can practically ignore this queue as it is only a tiny part of the activity and it is not considered in this document.

You can get basic or very detailed information from the queue counters. At a basic level, if the queues are non-zero at a point in time, you are doing too much work at that point in time. If your queues are always non zero, then you are always expecting too much and have reached your performance limit.

Use the Queue Counters on the Read and Write queues to measure the following parameters as explained in the sections that follow:
  • Last Read time vs Average Read Time
  • Variability of the current queue counts
  • Variability of the processed rate of read or write queue
  • Number of samples per write

Basic Queue Counters

These counters represent concepts that apply to any queue usage in any server software. There is a set of these for the read queue and set for the write queue.

Counter NameDescription
Count (Max) The highest number reached by the Count (Total).
Count (Total) Number of messages currently on the queue.
Processed Count Number of messages processed from the queue since Data Archiver startup. This number will wrap around and reset to zero if the Data Archiver runs for a long time.
Processed Rate (msg/min) Number of messages processed from the queue in the last minute.
Processing Time (Ave) Average time (in milliseconds) since the Data Archiver startup to process a message.
Processing Time (Last) Time (in milliseconds) to process the most recently processed message.
Processing Time (Max) Highest number the Processing Time (Last) reached since the Data Archiver startup.
Recv Count (msgs) Number of messages received into the queue since the Data Archiver startup.
Recv Rate (msgs/min) Rate at which messages are received in the last minute.

If your Processed Rate (msgs/min) is more than your Recv Rate (msgs/min), then your Count (Total) will be zero as the Data Archiver will be keeping up with the incoming requests.

The current value of these counters in report view is displayed at all times in the Performance Monitor. You can log these counters to a Performance Monitor group file so that the times can be matched up with periods of slow performance.

Detailed Queue Counters

These counters require a detailed understanding of how the queues are used.

There is no single read or write queue in memory. They are a virtual queue that is the sum total of all the client queues. Each connection from a client uses a socket. Each socket is monitored by a thread called a client thread. A queue is used between one client thread and the pool of threads that access the IHA files. This can be called as a client queue. No client thread goes directly to the IHA files. There are a fixed number of threads that monitor all client queues and read and write the IHA files.

A default system has one write thread and four read threads. You may have 20 collectors and 35 clients connected to the data archiver, that is, 20+35=55 client threads. That is, 55 x 2 = 110 client queues as each client thread has one read and one write queue.

The four read threads will monitor the 55 client read queues, most of which are empty most of the time. The one write thread monitors the 55 client write queues.

The Count (Total) on the Read Queue instance or the Write Queue instance is the sum total of all the items on the 55 read queues.

Counter NameDescription
ThreadsNumber of configured threads that go to the IHAs. This number will not change at runtime. It defaults to one write thread and four read threads.
Threads WorkingNumber of configured queue processing worker threads that are currently working on processing a message. If there is not much work to do, there will be idle threads and which will be much less than the Threads counter, possibly zero.
Time In Queue (Ave)The average time since the Data Archiver startup of the ???Time In Queue (Last)???.
Time In Queue (Last)Time (in milliseconds) that the last message waited in the queue before a thread started processing it. This should be near zero, meaning the archiver is keeping up with the requests and writes.
Time In Queue (Max)The max time since the Data Archiver startup of the Time In Queue (Last).
ClientQueues with Msgs

The number of client queues with messages on them. In the previous example, this is how many of the 55 read client queues have at least one item on it. It doesn???t matter how many items are on the client queue, only that it has at least one item. The number would be between 0 and 55.

This number gives some idea about how balanced the incoming load is and how balanced the servicing of the clients is. You don???t want any single client doing too many reads or write causing other clients to have to wait.

The time to process one read or one write would be the Time In Queue (Last) + the Processing Time (Last). But these are not visible as these are overall system wide counters, and not the way to troubleshoot one read or one client. The Time in Queue (Last) increases when the Threads Working equals Threads meaning all threads are busy.

Example: Comparing current to average processing time

Every system is different and has its own ???normal data rate. You can measure it if your current rate is above or below normal. To determine if the Recv Rate or Processed Rate is above or below normal, you must look at the number over a longer period of time, maybe 1 hour or 24 hours.

To determine if the processing time is taking longer than normal, you can trend the Processing Time (Last) to the Processing Time (Average) at the same time range. One line will be above the other to show if the range is above or below normal.

Example: Measuring the variability of Queue Count Total

This demonstrates that the Count (Total) can change. The number will change based on the Recv Count and the Processed Count.

The Write Queue Recv Rate is usually consistent. But you may see the Write Queue Recv Rate increase during a Store and Forward flush of a collector. The Write Queue Processed Count will vary more, and that will cause the Write Queue Count (Total) to vary as well. Consider an archive backup done at midnight each day. During a backup, the writes have to stop. The Write Queue Recv Rate will stay the same because collectors are still writing. The Processed Count will be zero during the backup so the Write Count (Total) will grow.

The same happens if there are long reads happening. If there are any reads, then the writes will have to wait and the Write Count (Total) will grow. But the Overview object Read Raw Rate should be busy, indicating the Data Archiver is busy doing some work, but not the writes.

If the writes are out of time order, the exact same number and bundle size of the raw samples can take longer to write. The exact same number of raw samples can take longer to read if there are cache misses and the data archiver does file I/O.

Reads are unlike writes because collectors will keep sending writes, even if they don???t get responses. A client that does a read will wait for the response before sending the next read. The reads will not queue up in the Data Archiver. In general, the Read Queue Count (Total) will not grow as high as the Write Queue Count (Total) unless you have many read clients.

You can measure how much your Read and Write Queue Count (Total) vary over a 24 hour period, and understand that Count (Total) variability is caused by the variability of the Recv Rate and Processed Rate. The variability of those is caused by the variability of the sizes of the reads and writes combined with whatever else is happening on the machine.

Example: Computing the number of samples per write

The Overview object has a Read Calls counter but does not have a Write calls counter. You don???t know the number of write calls nor can you compute a number of samples per write call. But, since one Write Queue Recv Count is one write call, you can use that number.

About Proficy Historian Cache Object

Caching is used in many kinds of server software. You may have a basic idea of the concept and terminology of caching and just need to know how Historian makes use of a cache to give improved performance. The Historian Data Archiver is used to store and retrieve data from gigabytes of archives on disk. All those raw samples can not be kept in memory. For performance reasons, the Data Archiver will attempt to keep the most recently used information in memory. Cache hits avoid file I/O which is the number one negative performance factor in any server software.

As with multiple queues, there are multiple caches in the Data Archiver, each holding a different type of object. As with ???items in queues you want to understand what ???objects are in a cache. One Read Call in the overview object becomes one Recv Count in the Queue object which becomes one or more cache hits or misses in the Archive Data Cache. This is because one read may span the raw samples stored in multiple data nodes. Some of those data nodes may be in cache and some may not.

There are four caches within the Data Archiver: ArchiveDataCache, ArchiveIndexCache, ArchiveTagCache, and ConfigTagCache. You can ignore three of them and only monitor the Archive Data Cache. These are raw samples. So, this cache is the simplest to understand and has the biggest effect on performance.

The Archive Data Cache starts empty at Data Archiver startup and fills as data is read and written.

Cache counters, like Queue counts are best viewed as current values in the report view of Performance monitor. These are displayed on the Archive Data Cache instance.

Counter NameDescription
HitsWhen a program is queried or re-queried a tag and time range, and it was found in the cache.
MissesData reads where the requested information was either never in cache or had to be removed to make room for more recently accessed data.
Hit Percentage Hits divided by Misses expressed as a percent. A high percentage means most data requests are being satisfied without having to access the disk.

The objects in the Archive Data Cache are the data nodes. One data node is about 250 consecutive raw samples for one tag.

Counter NameDescription
Obj Count Number of objects (data nodes containing raw samples) in the cache.
Num Adds Total number objects added to cache since Data Archiver startup. This number will always be increasing as new data is collected and queried.
Num Deletes Total number of objects deleted from cache. Deletes will not happen until the cache has reached its maximum size.
Size (MB)The amount of memory used by the cache to contain the raw samples.

Possible uses of the counters are demonstrated in the sections that follow.

Example: Computing the cache hits for a specific time range

All the counters are numbers since Data Archiver startup, which means it is hard to detect a period of time that had many cache misses. If you know that the read was run at 4pm and took 1 minute, you can get the hits and miss counts at 3:59 and 4:02 and subtract them to know what the hit percentage was at the time the read was done. This is more useful than the hit percentage since startup. When subtracting, verify if the counter had rollover and went back to zero.

Example: Best Case Archive Data Cache hit percentage

Run the exact same SQL query 10 times with fixed start and end times. Your hits would be nine and misses would be one, that is, the first read. This is a cache hit percent of 90%. If you keep doing the read you will keep hitting the same raw samples in the same data nodes.

You must have an auto updating chart that always shows the data up to current time. You will have a high but not 100% cache hit percentage. This is because, as new data is added, you will have one cache miss accessing that newly created data node.

Example: Diagnosing Data Archiver memory growth due to cache

The overall Data Archiver memory usage consists of multiple kinds of objects. But you can monitor the memory usage due to caching in detail.

There are memory usage numbers on the cache and another way to do it is look at the object count in the cache. If Data Archiver Virtual Memory use is increasing, look at the Object Count over the same time period to see if it is also increasing.

Example: Removing items from cache to limit memory usage

There is no maximum reserved size for the cache. If adding more objects would put you past the configured Archiver Memory Usage, then adding one object will delete another object. Or, if the archiver memory is used for non-cache reasons like large tag browses, then the Data archiver cache will remove items to meet the target memory usage.

Example: Monitoring the size in bytes of the cache

Configure the Archiver Memory Size (MB) in the Admin UI to 100 meg and in the Report View of Performance Monitor look at the Size(MB) counter of the ArchiveDataCache instance. It is zero. Now change the Archiver Memory Size to 1700 and restart the data Archiver. The number is still zero.

This is because the counter measures how much space the cache is currently using, not a configured size nor maximum size. If you start reading and writing data, the Size (MB) will grow.