Anil Valluri , Country Director, Client Solutions Organisation , Sun Microsystems India Pvt. Ltd.
Businesses today are
increasingly defined by their applications, and now more than ever, an
organization's prospects for success are increasingly fixed to its ability to
deploy technology in an agile and effective fashion. The risks are extreme. In
today's competitive and highly regulated business environment, the cost of
technology failure can be rapid and severe. Even small lapses in IT competence
can result in wide spread damage and loss.
Business
Requirements
Increasing
the pressure, an endless variety of new networked devices and users are
demanding ever-higher levels of performance, capacity, availability, and
security from the applications and services that serve them. Real estate
concerns along with very real and rising energy costs for both power and cooling
are now significant factors that discourage merely adding endless racks of
traditional servers. The cost and complexity of managing very large numbers of
systems is another pressing concern, especially when coupled with the very low
levels of utilization typically found in traditional infrastructure.
To respond to these myriad challenges, business must:
Â¥
Increase application throughput along with capacity and performance to address
pressing business needs as well as capture new customers and opportunities
Â¥
Reduce power, cooling, and real estate costs both to save money and to enable
necessary growth and scalability
Â¥
Maintain application compatibility and enhance security across the organization
to preserve investments and limit risks to the firm and its clientele
Beyond
mere packaging, these issues drive to the very technology used to design
processors, systems, and applications. Processor design in particular can have
enormous ramifications for business-level issues and solutions. Unfortunately,
traditional high frequency, single-threaded processors are increasingly yielding
diminishing returns.
Even with ever-higher clock rates, these processors are producing only small
improvements in real-world application performance. At the same time, these
high-frequency processors generate escalating costs in the form of higher levels
of power consumption, and significantly higher levels of heat load that must be
addressed by multiple large and expensive HVAC systems. With economic and
competitive pressures at an all-time high, most understand that significant
change is needed.
The
Diminishing Returns of Complex Processor Design
While optimistic marketing statements constantly call attention to
presumably impressive multiple-gigahertz frequencies and high levels of cache
for new generations of processors, corresponding small gains in real-world
system performance and productivity continue to frustrate IT professionals.
Throughput Computing, along with Sun's focus on optimizing real workload
performance is designed to help resolve these divergent trends. This approach
provides higher levels of delivered performance and computational
throughput while greatly simplifying the data center. Understanding the
importance of throughput computing requires a look at how both processors and
systems have been designed in the past, and the trends that are defining better
ways forward.
The oft-quoted tenet of Moore's Law states that the number of transistors that
will fit in a square inch of integrated circuitry will approximately double
every two years. For over three decades the pace of Moore's law has held,
driving processor performance to new heights. Processor manufacturers have long
exploited these chip real estate gains to build increasingly complex processors,
with instruction-level parallelism (ILP) as a goal.
Today these traditional processors employ very high frequencies along with a
variety of sophisticated tactics to accelerate a single instruction pipeline,
including:
Â¥
Large caches
Â¥
Superscalar designs
Â¥
Out-of-order execution
Â¥
Very high clock rates
Â¥
Deep pipelines
Â¥
Speculative pre-fetches
While
these techniques have produced faster processors with impressive-sounding
multiple-gigahertz frequencies, they have largely resulted in complex, hot, and
power-hungry processors that don't serve many modern applications, or the
constraints of today's data centers. In fact, many of today's data center
workloads are simply unable to take advantage of the hard-won ILP provided in
these processors. As shown in Table 1, applications with high shared memory and
data requirements are typically more focused on processing a large number of
simultaneous threads (thread-level parallelism) rather than running a single
thread as quickly as possible (ILP).
 |
Figure
1. Increasing single-threaded processor performance by 100 percent (a 50-percent
reduction in compute-time) provides only a small relative gain in application
performance due to memory latency.>
Figure
1 illustrates how even doubling processor performance (frequency) often provides
only a small relative increase in application performance. In this example,
though the compute time is reduced by half, only a small overall improvement in
execution time results, due to the constant and dominant influence of memory
latency.
Complicating matters, the disparity between processor speeds and memory access
speeds means that memory latency dominates application performance, erasing even
very impressive gains in clock rates. While processor speeds continue to double
every two years, memory speeds have typically doubled only every six years. This
growing disconnect is the result of memory suppliers focusing on density and
cost as their design center, rather than speed. Unfortunately, this relative gap
between processor and memory speeds leaves ultra-fast processors idle as much as
85 percent of the time, waiting for memory to return required data.
Ironically, as traditional processor execution pipelines get faster and more
complex, the effect of memory latency grows fast, expensive processors spend
more cycles doing nothing. Worse still, idle processors continue to draw power
and generate heat. It's easy to see that frequency (gigahertz) is truly a
misleading indicator of real performance
While
some vendors have seemingly awakened to the inherent limitations of traditional,
frequency-based processor designs, they are now attempting to graft power-saving
technologies and multiple cores onto old, once discarded architectures.
Unfortunately, these efforts represent stop-gap measures at best. Effective
Throughput Computing can only be realized with fundamentally new processor
designs that deliver truly compelling benefits to customers while leaving legacy
approaches behind.