Data Centers have been growing over the past few years owing to the widespread use of the internet. They have become a critical part of the IT infrastructure as we create and consume more and more data. Both colocation and cloud-based data centers have seen an uptick in their growth, although we see a trend where a lot of these applications are slowly moving to the cloud.
As these data centers grow, the key metric that drives the design decisions in building and running these data centers is Total Cost of Ownership (TCO). Several TCO models have been proposed in the past, but in a nutshell, these costs are broken down mainly into costs from infrastructure, server hardware, networking, power and maintenance.
TCO = Ic + Sc + Nc + Pc + Mc
Infrastructure cost (Ic) includes cost of land, building, power and cooling provisioning assets. Server hardware cost (Sc) comprises the CPU, memory, storage and other capital costs of server acquisition including hot spares. Memory is a significant portion of the server acquisition cost and the capital overhead for memory needs to be accompanied by high performance cores. The use of less performant cores require more of these systems to be built for a given performance target thereby adding to the cost of the peripherals even more. Networking costs (Nc) accommodates the cost of networking switches, cables etc. Power provisioning (Pc) includes the energy-related costs coming from power consumed by the hardware as well as the cooling overhead, both of which are greatly affected by the load running on the systems. Lastly, maintenance costs (Mc) depend on maintenance personnel, security, planned and unplanned hardware/software upgrades. The infrastructure, server hardware and networking costs (Ic+Sc+Nc) are part of the capital expenditures (CAPEX) for the data center which need to be amortized over the period of operation with the addition of cost to capital. The maintenance and the power related costs (Pc+Mc) contribute to the operating expenses (OPEX) for the data center.
Among all of these cost factors, for a typical data center, server hardware dominates the TCO roll up. The other two factors that play a pivotal role in TCO are infrastructure costs and the power efficiency cost. Lately, power efficiencies have seen significant improvements by improving the data center Power Usage Effectiveness (PUE), which essentially is a proxy for how much of the total power is used by IT equipment, thereby reflecting the quality of the data center. These numbers on an average are roughly 1.678 with some approaching even better efficiencies viz. 1.10 for a leading hyperscaler data center. On the hardware side, another useful metric that ties into power efficiency is the Server Power Usage Effectiveness (SPUE). This essentially represents the electrical losses within the server enclosure which has also shown improvements over the years going from 1.6 to 1.11 or lower in some cases. In addition to tracking energy efficiency, energy proportionality is another key behavior that needs to be addressed i.e. awareness is needed for energy efficiency not just under full load but across all densities of workloads with minimal energy used at idle. Server processors are more energy proportional than memory, disk, networking and cooling components and the energy scaling of these non-CPU components is much worse at lower loads. So in order to maximize TCO investment, this requires the entire server to be run at higher utilizations. Hence the processors need to be designed and architected accordingly to leverage available memory bandwidth to stay energy efficient in these high utilization scenarios. It is imperative that both hardware and software optimizations reflect such design choices to enable best hardware performance in a data center.
Since server hardware and power efficiency, as mentioned above, are key metrics to reducing TCO, they become major optimization targets for improving data center efficiency. Within the server hardware, the CPUs are a smaller fraction of the overall purchase price but they play a significant role in defining the performance and power burden for the data center. Typically, CPU and DRAM end up accounting for the bulk of the power in an average load/utilization (> 80%) scenario, of which CPU is the biggest contributor. Such a power draw is justified if it comes with a significant increase in server CPU performance. What the industry needs is a CPU that can deliver the highest absolute performance with best in class energy efficiency and proportionality for a given data center size and energy footprint, i.e. for a given TCO. Hence the term Performance/TCO, has become the ultimate figure of merit that accommodates all of the aforementioned factors for evaluating the performance efficiency of a data center.
At NUVIA, we aim to create a server CPU that will provide industry leading CPU performance improving scalability and density while optimizing for energy efficiency to deliver world class Performance/TCO for our customers. Our vision is to reimagine silicon design and create a new class of processors that will power the next era of computing.
- T. Barnett, Jr. et al., “Cisco Global Cloud Index: 2015-2020”, 2016.
- L. A. Barroso et al., “The Datacenter as a Computer”, 3rd Edition, Synthesis Lecture on Computer Architecture, 2018.
- C. D. Patel et al., “Cost model for planning, development and operation of a data center”, HP Labs publication, 2005.
- Y. Cui et al., “Total cost of ownership model for data center technology evaluation”, 16th IEEE ITHERM Conference, 2017.
- D. Hardy et al., “An analytical framework for estimating TCO and exploring data center design space”, IEEE ISPASS, 2013.
- N. Rasmussen, “Determining total cost of ownership for data center and network room infrastructure”, White Paper 6, 2011.
- A. Shehabi et al., “United States Data Center Energy Usage Report”, LBNL-1005775, June 2016.
- A. Shehabi et al., “Data center growth in the United States: decoupling the demand for services from electricity use”, Environmental Research Letters, 2018.
- Climate savers computing efficiency specs. http://www.climatesaverscomputing.org/about/tech-specs
- B. Grot et al., “Optimizing datacenter TCO with scale-out processors”, IEEE Micro, special issue on energy-aware computing, 2012.