Eric Stockton, Director
Stockton Infrared Thermographic Services, Inc.8472 Adams Farm Road
Randleman, NC 27317 Ph: 800-248-7226 www.compuscanir.com
The economic impact of downtime at data centers is the highest in the commercial sector. Therefore, the return on investment of predictive maintenance activities, especially infrared, is excellent. These information technology customers want extreme reliability of the equipment, what they call “availability”. In the last year or so, they have become interested in more efficient cooling of the server racks, so that more servers can be operated per square foot of floor space at the centers. This paper outlines a systematic approach of utilizing infrared thermography to check many different aspects of computer center operations.
The Business Impact of Downtime at Data Centers
Downtime in these facilities is not an option. Infrared thermography is being utilized for regular electrical switchgear surveys, optimizing of cooling systems and servers, and commissioning of all electrical equipment, including UPS modules, PDU (power distribution unit) equipment and computer servers. Many construction project specifications have infrared surveys as a requirement before the building is turned over to the owner.
Data center infrared thermography must have total accountability for all infrared data in the commissioning process, regardless of whether or not there are problems. This accountability can be achieved by documenting all equipment inspected with time, date, location, and equipment condition. The thermographer must create a data log and record the infrared video onto a digital storage device of some type. New technologies in data acquisition and report preparation will make historical data (images previously taken) available for comparison. This will enable the thermographer to more closely compare circuit boards and other UPS equipment with previously acquired images. If something fails or causes downtime in the system, an IR image of that component may be referenced to document that the equipment was operational, at thermal steady-state and in acceptable condition when the survey was made.
Table 1: Uptime and Maximum Downtime1
|Uptime||Uptime||Maximum Downtime per Year|
|six nines||99.9999%||31.5 seconds|
|five nines||99.999%||5 minutes, 35 seconds|
|four nines||99.99%||52 minutes, 33 seconds|
|three nines||99.9%||8 hours, 46 minutes|
|two nines||99.0%||87 hours, 36 minutes|
|one nine||90.0%||36 days, 12 hours|
Table 2: Data Center Downtime Losses
|Industry Sector||$ Revenue / Hour|
Source: Media Group
Estimates for other industries provide a cross check. A 2004 survey, for instance, put losses on brokerage operations at $4,500,000/hour, banking operations at $2,100,000/hour, media operations at $1,150,000/hour and e-commerce operations at $113,000/hour. Retail operations trailed at $90,000/hour. Share value for some companies can be affected. For example, eBay’s outages in 1999 saw shares temporarily drop by over 26 percent, while e*Trade’s similar problems saw a 22 percent temporary drop2.
IR Commissioning of Data Center Equipment
The commissioning process should include these types of equipment and considerations. The following infrastructure support equipment should be tested:
- Cooling systems, including chillers and all HVAC equipment
- CRAC (computer room air-conditioning) units
- All associated switchgear
- Emergency diesel generator systems
- ATS (automatic transfer switch) equipment
- UPS modules
- Resistive load banks and associated cables/connectors
- Static transfer switches
- Rotary UPS system, if applicable
- Battery banks, breakers and charging systems
- Transformers (utility and site)
- PDU (power distribution unit) equipment
- All distribution electrical panels
- All normal switchgear and electrical panels need to be checked under load
- Test generator leads and emergency source for the automatic transfer switches under load
- Resistive load banks must be attached to the PDU’s and tested with increasing load percentages
- Each UPS module must be tested independently, including a full load battery test
- UPS battery connections and individual battery cells should be checked during and after discharge
- Rotary UPS systems must be checked during operation. Rotary systems utilize the same rectifier technology as static topologies on the front end to create DC current from AC, but use spanning motor-generators to recreate the sine wave on the output.
- Each PDU must be tested on both the preferred and alternate sources as well as in each respective bypass
- All normal transfers should be verified operable
- PDU distribution breakers must be checked after they are put into service on the panel boards
Causes for Electrical Failure and Downtime in Data Center3
The critical power distribution system takes conditioned power from the UPS and distributes it throughout the facility to individual loads. Most site failures occur in areas where hot electrical work is required and physical maintenance is difficult to perform.
Typical causes for failures include:
- Cover slipped while accessing load panel
- Overheated breakers tripped unexpectedly
- Wires were not physically secured under screws
- Screws were not torqued adequately
- Wires or circuit breaker handles were dislodged while adjacent work was being performed
- Screws were stripped
- Insulation was skinned, causing faulted wires
- Rotations were reversed
Infrared Applications for Servers and Server Racks
Ten percent of all server racks currently in service are too hot to meet industry standards for maximum IT reliability and performance4.
“Institute research into computer room cooling indicates 1/3 all perforated tiles are incorrectly located and 60% of all available cooling capacity is being wasted by bypass airflow. Increasing under-floor static pressure to get air where it needs to go requires permanently blocking all unnecessary air escape routes. This includes sealing cable cutouts behind and underneath products or racks (this unmanaged airflow is what is really cooling most computer rooms) as well as the penetrations in the floor or walls or ceiling and any other openings in the raised floor. Perforated floor tiles with 25% openings can be replaced with 40% and 60% grates to permit a much higher airflow. For sites with unused raised floor space deliberately spreading equipment out to create white space and reduce the averaged gross watts per square foot power consumption will be a viable option.”
Server infrared applications include:
- Thermally mapping complete data center from sub-floor to ceiling
- Verifying proper hot aisle / cold aisle operation, preventing short circuiting and bypassing of air flow
- Verifying high density server farm cooling capabilities
- Monitoring server rack temperature distribution patterns
- Finding internal server fans which are inoperable or damaged
Of course, the thermographer must comply with all OSHA and NFPA 70E regulations. The good news is that unlike most industrial sites, the switchgear rooms and data centers have controlled temperatures and low humidity, which makes the use of the arch flash suits and associated safety equipment much less onerous for the thermographer.
How Does a Thermographer Become “Qualified” and Obtain Contracts to do Data Center Thermal Survey Work?
First, the thermographer must understand the critical nature of the equipment being tested as well as the surrounding equipment. Furthermore, he/she should understand that the work he/she is performing is critical and vital to the operation. A thermographer wanting to do this type of work should get general training and certification on electrical switchgear and also get specific training on data center equipment. He/she should contact UPS vendors and their clients and cultivate relationships with them.
Since this work has a high accountability, the methodology for performing the surveys and creating the reports must be “upgraded” from the typical office building or factory. This means the thermographer must use a high resolution, radiometric and sensitive thermal imager and learn how to record all thermal, visual and textual data by using a detailed data logging system. Also, data center specific work schedules often include nighttime maintenance windows from Saturday midnight until Sunday morning. Therefore, the thermographer must get used to working during off-peak times.
We know that large companies commission all data center equipment but, do the smaller companies have UPS and server systems? Absolutely! In order to successfully complete the commissioning process and maintain the systems, large and small companies must find thermographers that are close by and have experience in critical facility activities.
A thermographer interested in providing these services must be commercially available to the UPS, electrical and facilities maintenance contractors. Having a great professional reputation with no accidents or system failures is essential to being the preferred thermographer for data center infrared work. What these infrared service clients want are the most professional, experienced and qualified thermographers in the electrical infrared industry.
1 Hiles, Andrew, (2004) “Five Nines: Chasing the Dream?” Continuity Central (12/18/06)
2 Hiles, Andrew, (2004) “Five Nines: Chasing the Dream?” Continuity Central (12/18/06)
3 UpTime Institute, (2006) “Procedures and Guidelines for Safely Working in an Active Data Center”, pg. 9,UpTime Institute (12/18/06)
4 Brill, Kenneth, (2006) “2005-2010 Heat Density Trends in Data Processing, Computer Systems, and Telecommunications Equipment: Perspectives, Implications and the Current Reality in Many Data Centers” pg. 13 UpTime Institute (12/18/06)
Eric R. Stockton received a BA in Zoology from the University of North Carolina at Chapel Hill in 1982. He was an environmental consultant for Carolina Power and Light’s Shearon Harris Nuclear Power Plant for 14 years before becoming Vice President of Stockton Infrared Thermographic Services, Inc. He now manages the CompuScanIR™, ElectriScanIR™ and ConnectIR™ divisions.
Visit out Sponsors: