December 25, 2024

How Equipment Reliability Guides Maintenance

by Gay Gordon-Byrne, Vice President Technology & Development TekTrakker Information Systems, LLC
As utilities integrate new technology into their grids and communications links, the challenges of keeping equipment in service (i.e., maintenance) will fall hard on the unprepared. So what should managers and planners expect when “… meters are no longer meters but computers with metering applications?” – Joe Rigby, CEO & Chairman, Pepco Holdings, Inc.

SmartGrid and AMI projects are deploying electronic equipment with very little history of reliability to support service projections. Useful life is being calculated by accounting rules instead of being informed by field experience. Reliability is assumed by planners to be sufficient, but there is no empirical support. Each utility is testing and piloting new devices focused on proving that SmartGrid connections can be made to function, but without regard to the difficulties and costs of keeping millions of computers functioning in the field.

Experience with similar devices deployed in other industries already shows that concerns should be high. The most reliable of current electronic devices do not approach the stability of the old brass and glass meters they must replace. Unless utilities are extremely careful, the hidden service requirements of many AMI and Smart Grid items will easily exceed the savings potential promised in many project proposals.

The dilemma is that utilities need to know in advance of massive deployments how well their selections perform in the field, yet all are exploring new options simultaneously. The best and most effective way to approach this information gap is to utilize the information already being collected internally by operations and maintenance departments. Tracking and reporting of repair and service issues from the maintenance teams informs the broader business and forms the backbone for improved decisions. So here are a few tips on how to take the guesswork out of the process and start making decisions based on reliable information instead…

Action Item # 1 – Calculate Current Maintenance Burden and Costs
Determining the reliability of existing products is fundamental to being able to evaluate the relative improvements that will come from deployment of new products. Planners need to be working with Operations and Maintenance departments to quantify the important baseline of current reliability and costs.

From an equipment maintenance perspective, the definition of a reliable product is one that never needs adjustments or attention. Calculating the failure rate or repair rate is therefore, a direct measurement of reliability. Once the failure rate is calculated, all new equipment choices can be compared empirically and cost projections made effectively.

Unlike the costs of initial installation, repairs of any size are costly to the organization because they arise unexpectedly; need relatively quick attention; exercise the customer service, problem reporting, and supervisory systems; and always involve warm hands (labor) to resolve. A truck roll to replace a $5 battery is just as costly as a total meter replacement. The price of the part itself is often inconsequential compared with the labor to resolve the problem. It is the impact of hundreds of thousands of relatively low-cost items requiring high levels of repair attention that can quickly unravel the business case for deployments.

In the chart below, we compare various metering product types and their general failure rates in multiple formats. We used a total cost per repair of $500 as the basis for this chart to illustrate the impact of small repairs on the overall budget. The important concept is to grasp the very wide differences across various products and appreciate the cost impact to the organization of supporting the maintenance of the poorly performing products. 

This chart shows current standard meters with their industry “standard” failure rate of 0.5% as the starting point. Each of the new types of AMI meters is compared to the old style, each resulting in very different costs to maintain the same quantity of equipment.

Use Failure Rate to Project and Compare Products
Armed with the failure rate of any product, managers can project both the quantity of failures that they are likely to experience as well as the types of parts that they should stock and in what quantities. Most utilities can derive their own failure rates by utilizing tracking that is already in place for repairs to issue work orders and quantity information from asset management or fixed asset accounting systems. In cases where parts details are not readily available, calculating the device failure rate remains enormously valuable.

By combining the granular elements collected in the cost-per-repair calculation (see illustration, following) managers can project labor, warehousing, support and training needs even into departments not under direct control. Reporting generated in this step is the basis for monitoring and validating product performance over time. Keeping a history of these reports allows later comparison of failure rates, which will identify changes in patterns such as those showing degradation of performance and shorter (functional) useful life expectancies.

This methodology is effective for monitoring any device in any setting. Heavily customized items or items manufactured for unique purposes can be tracked, but the resulting statistics are unlikely to be useful due to lack of a sufficient sample
size. Calculating the failure rate of 1,000 of something will quickly reveal problems even with the most highly reliable equipment. The same calculations of 10 of something may take a decade to reveal trends – a timeframe of little use in solving near-term problems.

Maintenance vs. Warranty
We have observed a tendency for planners to assume that manufacturer warranties will cover all maintenance issues. We take the position that warranty is valuable only to the extent that it covers the cost of the parts or unit.

Most electronic equipment comes with a limited time warranty (usually under a year), during which period the manufacturer will replace the item at no additional cost. Labor is rarely included unless at significant extra cost. Even if warranty part replacement is of sufficient term, returning equipment under warranty involves organizational costs rarely considered. Equipment must be packaged according to the manufacturer’s instructions, a contact must be made to set up an RMA (Return Material Authorization), shipping both back and forth might not be included, and tracking of the RMA must be performed so that the replacement unit can be entered back into the asset system.

Warranty events must therefore be part of a maintenance strategy only for the value of the parts replacement. The additional organizational and labor costs to pull and replace the part or item may once again exceed the value of the initial product. A product with a high failure rate and a good warranty may be a poor choice if a more stable product is available once all costs are taken into account.

Authorization, Training & Certification
Most electronic equipment is sold with an understanding that the warranty from the manufacturer is “null and void” if repairs are made by anyone other than an authorized repair center, designated by the manufacturer.This is likely to be another area of unexpected expense. Current personnel will need some additional training to be qualified to service these new devices, and some outside services may need to be engaged to support products not easily repaired by existing staff.
Organizations already supporting hundreds of thousands of electronic devices in the field (such as large deployments of laptop and desktop computers) can provide some guidance on how maintenance support will be different for AMI and SmartGrid products than for the less complex devices that they usually replace.

Action Item # 2 – Calculate Your Own Cost-per-Repair
The attached checklist shows many of the areas commonly (and uncommonly) understood to tally in a total cost per repair analysis. Pinning down an exact number may be challenging, but an allowance needs to be included for every item on the list. Walking through a typical problem from initial call through completed repair shows why each element contributes to the hidden burden of each repair.

The initial problem is usually reported by a phone call into a call center. The call center staff attempts to rectify the problem immediately over the telephone. A large percentage of problems are issues of software or user interface and can be resolved remotely. If the problem appears to be hardware, the call center (help desk, or service desk) will escalate the problem to dispatch service personnel.

Most contracts for technician dispatch are based upon a Service Level Agreement (SLA) which dictates the minimum response time from the time the problem is confirmed by remote diagnosis. There are usually different levels of response required based on different types of severity, and contracts often include financial penalties for failure to respond with the contracted timeframe.

Once the determination has been made that a hardware problem needs attention, a technician is dispatched to diagnose, repair, or replace the problem device. This may not be the device or part identified by the call center. The service technician may require additional or more experienced assistance, different parts or devices, or remote diagnostic help to correct the problem.

One the problem has been resolved, the technician reports on the damaged parts or equipment used, makes a determination if the problem was caused by user damage (such as vandalism), and accounts for their time. The back office organization feeds the data from the technician back into an RMA (Warranty) system to have any parts under warranty replaced, or updates the equipment inventory so that the correct parts are re-stocked, and ideally closes out the ticket with a tidy description of the actual problem as well as the resolution.

Management uses reporting generated from this cycle to monitor conformity with the SLA, keep track of all expenses, monitor effectiveness of staff, and feed into upper level accounting and billing systems.

Share and Compare
Utilities have a unique opportunity in the near term to share the failure rate data collected for internal purposes and leverage it to build a database of hardware failure rates. The ability of one utility to learn about itself is limited to those projects already underway. By sharing data, each project in test, scale deployment, or production is a piece of the overall puzzle of equipment reliability.

The wider the variety of equipment experience reported from peers, the more informed the decisions. Differences between products, architectures, environments will be revealed, with the result that the entire industry can drive a focus on reliability which is essential to AMI and SmartGrid success.

There should be a sense of urgency for this task. The pressures to create SmartGrid and AMI projects are intense and unprecedented. The funding being allocated to stimulate projects is a bounty for those vendors able to prevail in their marketing. In a very short period of time, utilities will be selecting billions of dollars in equipment in near blindness with regard to the equipment reliability. All stakeholders will benefit from good selections, and the opportunity to create the ability to make good selections exists now. Waiting only increases the period of blindness with no benefit.

Executing this type of sharing (aka benchmarking) is impossible without external help. No one utility keeps records in the same manner, and almost certainly never describes the same products in exactly the same way. Benchmarking efforts are therefore limited to single use surveys where participants submit data that already has been conformed manually to the syntax required by the survey. Manual efforts will always be too costly and limited to achieve the depth and scope a shared database should provide.

This manual reporting impasse has been resolved by this solution, in part because the methodology employed was designed to facilitate sharing of disparate and non-standard files for exactly the purpose of comparing products and peers. Moreover, data standardization, organization, calculation, security, and reporting functions for the peer group in the format of a Data Cooperative is an integral part of the solution. Members join their peers anonymously, at very modest cost, and benefit from the combined wisdom of their peers at this critical point in time.