Use Cases and Limitations FAQ

Printer-friendly version

The purpose of this article is to describe various use cases, capabilities and limitations of the NHDC. The Q&A will be revised as we learn more about that as-built operation of NHDC. Last revision: 3/14/2017 by K. Grier.

Power Scenarios

Q: What happens when a Utility (Edison) Power Failure event happens?

 A: This means Edison electrical power to NHDC has failed. Power to other parts of the campus may or may not be affected. NHDC enters a mode where the onsite generator provides power to the UPS systems and critical HVAC (Heating, Ventilation and Air Conditioning) components.

  1. Utility Power rack PDU's  (Power Distribution Units, i.e. outlet strips) will lose power immediately. Customer equipment powered solely by a Utility Power PDU will lose power until power is restored. NHDC will also lose management access to Utility Power PDU's.
  2. UPS Power rack PDU's continue to provide power to customer equipment.
  3. The on-site generator is expected to start within 1 minute. The generator provides power to the UPS system, backup electric chillers, UPS room air handler and selected data center floor air handlers.
  4. As of Winter 2016 occupancy, the backup HVAC equipment powered by the generator can control ambient heat buildup in the customer spaces of NHDC.
  5. As of WInter 2016 occupancy, the central UPS system can power all customer IT equipment hosted at NHDC while on genertor power.
  6. We have created categories of equipment priority in terms of the need to remain operational during Utility power failures if we need to shed either electrical or heat load. At this time they are:
  • Critical 1 (C1) - equipment needed to manage NHDC itself and campus core networking equipment
  • Critical 2 (C2) - customer equipment of high value to campus-wide operations, i.e. heart lung machine or Life / Public Safety equipment.
  • Non-Critical (NC) - everything else

As the room heats up, at to-be-determined thresholds we will command power outlets off in the order of NC, C2, C1. We expect to deploy signaling to customers advising on status and requesting shutdown, but it is not yet in production. We do make available at-the-rack intake air temperature readings which customers can poll for alerts and self-initate shutdown if they desire at http://nhdcstatus.ets.ucsb.edu/

Q: What happens if Utility Power fails and the generator does not start?

A:  "Houston, we have a problem." As stated above, the HVAC system is powered by the generator, not the UPS system. Without generator power the UPS room will heat rapidly, much more so than the NHDC customer equipment spaces. We will have approximately 15 minutes before reaching heat levels damaging to the UPS battery system. This scenario represents a "crash dive" event  where we will need to EPO (Emergency Power Off) equipment running in the facility and the UPS systems.  At this time we do not have the ability to signal customer equipment of a "crash dive" event. Customer equipment will simply lose power as we command power outlets down.

Update: In August 2012 we performed an UPS HVAC test the provides more information on this topic, sufficient to use for more precise planning. You may vIew the published results at NHDC UPS Room HVAC Load Bank Testing - August 2012

Q: What happens if a UPS fails?

A: NHDC has two UPS units (UPS1 and UPS2) providing a combined usable capacity of 162Kw (81KW per UPS). They are installed as independent power distributions, i.e. there is no load bus sync. A device powered solely by a failing UPS unit will lose power if the UPS cannot enter bypass mode, and thereby transfer load to Utility Power. Normally a UPS will transfer load to bypass when it detects a problem. However, if Utility Power is unavailable due to a power outage, than the loads on that UPS will lose power.  Currently, a dual-power supply device may have a power feed from both UPS and Utility power, and can sustain a failure of a UPS unit as long as Utility power is available. As of Fall 2015 we are providing additional UPS power feeds to row locations. This can provide dual-ups power for critical systems.

Q: Who determines if my gear is C2 or NC? 

A: This will be a joint determination between NHDC and the equipment owner, though the ultimate decision rests with NHDC. We understand that nearly all hosting customers will consider their gear critical to some function they provide. Yet the simple fact is that we cannot provide emergency power and/or cooling to everything that may eventually be in NHDC. The NHDC design goal was that 10% of the potential 2MW of load would be designated Critical IT Equipment, provided UPS power with generator backup, AND cooled via the Critical Cooling infrastructure. The other 90% would be non-critical and be on filtered Utility Power as well as ambient cooling. 

That said, as of Winter 2016 our UPS systems remain below their capacity and we have reserve thermal capacity, so all customer equipment coming into NHDC is being treated as C2 operationally, even though it is all coded as NC. As either UPS reaches 25% capacity, we will visit the Critical / Non-Critical determination with our customers. 

Q: Why do we have to visit the Critical / Non-Critical determination at only 25% of UPS Capacity? Isn't that pretty low?

A: First some background: the NHDC design criteria envisioned that only Critical IT Equipment would be on UPS power (our C1 and C2 designations). In this model, a Critical Server with dual power supplys would have one power supply on UPS1, and the other on UPS2. Should either UPS fail, then all load would go to the remaining UPS. Therefore neither UPS1 nor UPS2 could be loaded over 50%, as the failure of the other UPS would take the remaining one to 100% load. Since we do not want to load a UPS over 90%, we are limited to 45% load on each UPS unit, which is 81KW per UPS, 162KW for the pair. The HVAC cooling for the UPS room is designed around the 50% loading per UPS figure. 

No one want really wants their IT equipment to lack a minimum of UPS protection, especially if initially there is capacity. Therefore NHDC Governance directed that all incoming customer equipment would receive UPS power, until the time that was not feasible. Our task was how to do that near-term, knowing what the design goal of NHDC was, and knowing where we would likely end up. To facilitate the maximum UPS utilization today,  most racks have a UPS power feed from one UPS (not both) and a Utility Power feed. Should Utility Power fail, all load will then transfer to the UPS. Therefore a UPS that is normally at 25% load, will see it's load jump to 50% if Utility Power fails. That is why at 25% UPS loading, we may need to start determining what equipment is Critical and Non-Critical. As of Fall 2015, we have provisioned dual ups power and utility power to rows 7 and 9 to increase our power options in those locations. Rows 1-4 also recieved Utility Power feeds to supplement their UPS-only feeds coming out of the renovation.

Q: Are there other UPS alternatives that NHDC can provide, as you prohibit customer in-rack UPS systems?

A: The NHDC design criteria really saw Non-Critical as cluster compute nodes, with the head nodes being Critical and on UPS protection. This is in fact the model that the CNSI Dell Cluster used when hosted in NHDC prior to the renovation. And what other UC's like UCB are moving to as they reach capacity on their UPS systems. As NHDC strives to meet the needs of Academic, Administrative and Research computing, we will likely have more equipment needing UPS  protection than the design criteria envisioned. To that end, we anticipate that NHDC will select and deploy on floor, row based UPS systems that are managed by NHDC to provide additional UPS power, though not generator backed. NHDC providing and managing will allow for standardization and incorporation into our signaling mechanisms. 

Q: Tell me more about what type of signaling NHDC will provide to hosted systems?

A: This has proven to be more of a challenge than we have wished. Our initial ideas to use APC's PowerChute Network signaling were stymied by the fact that all clients accessing a "sentinel" APC UPS require administrative access to the UPS, there is no "read only" access. After exploring open and closed source APC software implementations, and discussions with APC development, we learned there were no plans to do otherwise and we abandoned that path. 

As of December 2014  we have made available a parseable test report of rack intake air temperatures that customers may poll to determine if their equipment is potentially impacted by high temperature conditions and ultimately be powered off by NHDC failsafe automation. Please visit our status information at:

http://nhdcstatus.ets.ucsb.edu/

As of Winter 2016 we are moving to APC Power DIstribution Units and APC Struxureware for Data Center Infrastructure Management. It is TBD if Struxureware will provide a reasonable signalling method for customers.

HVAC (Heating, Ventilation, Air Conditioning) Scenarios

Q: What happens if NHDC has an undetected / unmitaged cooling failure, i.e  has an "FUO - Fever of Unknown Origin", and gets too hot?

A:  Our PDU's provide a "Smart Load Shedding" feature where we set a maximum permissible temperature. When a PDU reaches this maximum temperature "setpoint" the PDU will alert us and shutdown the outlets controlled by that PDU. The intelligence to do this is contained entirely within the rack PDU and requires no external management station. We shutdown power outlets when a rack inlet air temperature of 92F is reached, which exceeds the 89.6F maximum allowable inlet temperature under the ASHRAE standards. More specifically at 90F we shutdown the UPS power PDU, at which point a dual power supply device will transfer all load to the Utility power PDU. At 92F the Utility power PDU is shutdown. The temperature sensor is placed to detect inlet air temperature, on the cold isle side, mid rack. The temperature setpoints are subject to revision based on operational requirements.

High Availability Scenarios

Q: What can I do with my equipment to run through a Gap Fire type event?

A: In an extended Utility Power Failure event NHDC equipment will be running on generator power until the temperature in NHDC rises and non-critical equipment, and possibly critical equipment, must be shutdown. As of Winter 2016 all equipment housed at NHDC can be supported by the Critical Cooling system. Through the use of at-the-rack heat exchanging door technology connected to NHDC's critical cooling chilled water loop, cooling can be provided to equipment. NHDC is designed with the ability to use this technology with minimal additional facility work. However, there are significant cost  considerations to implementing the technology, so the determination to use it for high availability should involve a cost-benefit, availability and risk analysis. Below is a photo of the POC system for NHDC deployed in production at SAASB during the NHDC renovation.

Maintenance Scenarios

Q: How does NHDC handle the periodic Preventive Maintenance (PM) necessary to the Physical Plant, i.e. HVAC, electrical, fire suppression, etc?

A: The physical plant equipment in NHDC has a manufacturer defined maintenance schedule that must be adhered to maintain warranty, ensure proper operation and a long service life. In contrast to run until break maintenance model, NHDC mechanical is provided monthly, quarterly, semi-annual and annual Preventive Maintenance. PM will be scheduled in advance and is posted on the public NHDC Calendar for customer review. PM that is not expected to have a material impact to NHDC service, i.e. maintenance on one of the five CRAH units, may be scheduled during business hours will not be announced beyond the calendar. PM that may have an a material impact to NHDC service, i.e. UPS service, will be scheduled into non-business day hours if possible, and announced to NHDC-L as well as posted to the calendar.

Outage Events

Q: Do you have a history / reports of significant outages or operational impacts at NHDC and are they publically available?

A: Yes, at the link below you will find reports on major outages / incidents at NHDC.