HyperTransport™ Consortium

HyperTransport I/O Technology Comparison
With Traditional And Emerging
I/O Technologies

White Paper
HTC_WP04

June 2004

The HyperTransport Consortium

www.hypertransport.org
The HyperTransport Technology Consortium disclaims all warranties and liability for the use of this document and the information contained herein and assumes no responsibility for any errors that may appear in this document, nor does the HyperTransport Technology Consortium make a commitment to update the information contained herein.

DISCLAIMER
This document is provided “AS IS” with no warranties whatsoever, including any warranty of merchantability, non-infringement, fitness for any particular purpose, or any warranty otherwise arising out of any proposal, specification or sample. The HyperTransport Technology Consortium disclaims all liability for infringement of property rights relating to the use of information in this document. No license, express, implied, by estoppels, or otherwise, to any intellectual property rights is granted herein.

TRADEMARKS
HyperTransport is a licensed trademark of the HyperTransport Technology Consortium.

Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
I - The Spectrum of I/O Technologies

Over the past twenty years, system architectures have most often reflected the processor architecture upon which the system was based. Choices made in bus width, clock speed, and control signaling by processor architects became embedded in the system core. Peripheral I/O interfaces were added in a haphazard manner with additional mezzanine and specialized buses inserted where needed to relieve traffic on the main processor bus. In personal computers it was typical to find a fast, processor-centric NorthBridge device used to connect processor, memory and graphics, while a slower, I/O-centric SouthBridge was used to concentrate and connect I/O interfaces to the processor through a proprietary SouthBridge-NorthBridge link.

The PCI (Peripheral Component Interconnect) bus emerged as a primary choice in the proliferation of custom and semi-proprietary buses. Perhaps by being processor agnostic, PCI became widely used in systems ranging from PCs, servers and even embedded systems, even though its 66 MHz top clock speed was far from the Gigahertz and up clock speeds of modern processors. Like the processor buses upon which it was based, PCI was a shared, parallel multi-drop bus with multiplexed address and data signals along with a number of control signals. Unlike processor buses, PCI was designed to support board-to-board communications as well as chip-to-chip links and included along with 5-volt and later 3.3-volt signal drive capabilities, a card slot expansion connector specification and board form factor definitions. PCI supports a robust system initialization, discovery and setup. During initialization, the operating system can discover all attached PCI compatible components, allocate resources and configure the I/O devices according to their capabilities.

As the industry realized that existing I/O buses were not adequate to support Gigahertz and up processor-based systems, two development camps for future I/O interconnect technologies arose. The first was aimed at extending the PCI bus and it resulted in the PCI-X specification that extended PCI and boosted bus speeds to 533 MHz. The second aimed at creating an entirely new technology, using point-to-point structures and the new low voltage differential signaling (LVDS) or differential CML (Differential Current Mode Logic) electrical protocols. Three separate efforts emerged from AMD, Intel and Motorola.
AMD and its partners developed HyperTransport chip-to-chip interconnect technology in the late 90s and it uses 1.2V LVDS signals. HyperTransport technology was formally introduced to the industry on July 24, 2001 along with the formation of the HyperTransport Technology Consortium by founding members AMD, Alliance Semiconductor, Apple Computer, Broadcom Corporation, Cisco Systems, NVIDIA, PMC-Sierra, Sun Microsystems, and Transmeta. RapidIO, developed by Motorola, emerged in 2001 in its chip-to-chip parallel format, with a subsequent evolution into a backplane-oriented serial protocol called Serial RapidIO that uses 1.0V CML signals. The last technology to come to market has been PCI Express, originally developed by Intel and eventually adopted by the PCI-SIG organization. PCI Express is a serial I/O technology aimed at providing chip-to-chip links, board-to-board interconnects and even system-to-system links. PCI Express 1.0, using 1.0 CML signals was approved in late 2003 and PCI Express AS (PCI Express Advanced Switching) in 2004.

This white paper will explore the evolution of interconnect technologies and contrast and compare the various I/O options.
Figure 1 - There are many old and new I/O technologies that support chip-to-chip, board-to-board and system-to-system communications. Traditionally, PCI and a plethora of proprietary local buses have emerged from the processor-centric, board-level designs to connect to other boards. At the system level, Ethernet and Fibre Channel have provided network and storage network links. More advanced point-to-point, inside-the-box technologies include HyperTransport, RapidIO and PCI Express. Outside-the-box technologies include InfiniBand, SPI-4 and PCI Express Advanced Switching.

**Where And Why Different Interconnect Technologies Are Used**

Why the proliferation of different, but functionally similar I/O technologies? Can’t the industry just settle on one master I/O technology and simplify matters? The answers become clear when one examines the specific I/O needs of systems targeting different applications.
Traditional Inside-The-Box Versus Outside-The-Box
Inside-the-box technologies include chip-to-chip links connecting devices on the main motherboard and board-to-board technologies based on expansion card architectures linking add-in card subsystems to the main motherboard. Traditional technologies for chip-to-chip communications are proprietary processor-centric local buses, local I/O buses or local coprocessor or interprocessor buses. While there are custom or proprietary board-level buses, PCI is the dominant solution in a wide array of applications.

Outside-the-box technologies are either network-oriented technologies such as Ethernet and SPI-4 or cluster-to-cluster technologies like Fibre Channel.

Chip-to-chip links require the lowest latency and highest performance. Their traffic has been traditionally compute-oriented load/store memory-oriented data exchanges. Board-to-board links accept high latency, but require higher drive capabilities and card and connector specifications. Their traffic has also been traditionally load/store type traffic. System-to-system links are bandwidth focused, less sensitive to latency and carry their traffic in packets over channels.

Point-to-point Inside-The-Box Versus Outside-The-Box
The newer point-to-point technologies include HyperTransport and parallel RapidIO for chip-to-chip links, serial RapidIO and PCI Express for board-to-board links, and PCI Express Advanced Switching for system-to-system links. While HyperTransport is firmly focused on chip-to-chip communications, RapidIO and PCI Express present themselves as possible solutions to most, if not all chip-to-chip, board-to-board, and system-to-system communication problems.

While I/O technologies have similar objectives, the difference in they way they are implemented and the overall performance characteristics they produce within any given system implementation should prompt system designers to carefully evaluate their options and choices. For example, one of the advantages of HyperTransport and PCI Express is that they provide full PCI compatibility. This makes them suitable for applications in personal computers and servers that have a large installed base of PCI-compatible subsystems and a great deal of industry investment in PCI-oriented software and system expertise. On the other hand, RapidIO can support thousands of peer-to-peer connections, but offers
more limited PCI compatibility. These features make it ideal for DSP farms in telecomm applications but limits its use in PCs and servers.

The figure below illustrates where different interconnect technologies are successfully being deployed. The width of the bar indicates that a technology is already in volume shipment.

![Where Interconnect Technologies Are Deployed Diagram](image)

Figure 2 – HyperTransport technology is already shipping in high volume in a number of applications as a PCI and proprietary bus replacement for chip-to-chip interfaces, while other emerging technologies offer primarily board-to-board advantages over PCI.
Figure 3 – The proliferation of I/O technologies makes more sense when examining the needs of the underlying application. For example, HyperTransport technology is widely used in personal computers and servers because of its low latency, high bandwidth and PCI compatibility. High performance systems, including supercomputers, require the extremely low latency that HyperTransport provides. Parallel RapidIO, a similar technology, targets pure embedded systems and Serial RapidIO, telecomm systems. PCI Express delivers its best in applications where PCI compatibility and board-to-board communications is a priority while PCI Express Advanced Switching attempts to solve system-level communications in high-end systems.

II - Traditional Shared Buses Versus Point-to-point Interconnects

PCI and other multi-drop, processor-centric buses are shared parallel buses with multiplexed address/data lines and bus control signals. The disadvantage of shared buses is that as additional devices are connected to the bus, they must share the same bus and therefore total bus bandwidth is reduced. While one bus master is utilizing the bus, the other devices must wait until the bus is free, or they may request control of the bus through a bus contention protocol.

Parallel, shared buses such as PCI have many multiplexed address and data signals as shown in Figure 4. Multiplexed bus structures require extra sideband control signals to control the sharing of the address/data lines and to maintain the
timing constraints throughout the system. The 3.3 Volt or 5.0 Volt operating voltages mean that the signal swings require additional passive filtering components to preserve signal integrity, leading to larger printed circuit board real estate requirements, poor system integration and higher product cost.

Figure 4 – A parallel 32-bit bus such as PCI has many different types of signals including multiplexed address/data lines, numerous control lines and system level signals. The many sideband signals are required to keep track of what type of activity is occurring on the multiplexed address/data lines.

III - HyperTransport Technology
A point-to-point parallel bus like HyperTransport has many advantages over shared bus structures. It needs far fewer sideband signals because its enhanced 1.2V LVDS signals are not multiplexed, use less power, exhibit better noise immunity and require no passive filtering. The electrical characteristics of the link are simplified and enable much faster clock speeds and correspondingly greater bandwidth. LVDS links use two wire lines per each signal line – otherwise called balanced, or differential line – carrying electrical signals that are equal in amplitude and timing but with opposite polarity. The specular nature of the
signals carried over the balance line prevents electrical noise within the system from affecting and potentially corrupting the signal detection process at the receiver end – typical problems of single-ended signaling in high-speed parallel buses – thus allowing for much cleaner signal transmission and higher clock rates. LVDS signaling consumes less power and delivers a more robust signal that requires no passive components to maintain signal integrity. Even with the use of two wires per signal, the faster, narrower HyperTransport link uses fewer total signals, consumes less power, and delivers higher bandwidth at a lower overall system cost than traditional parallel multiplexed buses.

Figure 5 – A parallel point-to-point bus such as HyperTransport (8-bit wide link shown) greatly reduces the number of signals required by combining the command, address and data information into HyperTransport packets that are carried in a single direction over CAD (Command, Address, Data) lines. The simplified control structure needs only 4 sideband signals. An 8-bit wide HyperTransport link can carry more data faster than a 32-bit wide PCI bus, but is far simpler and less costly to implement. A full 32-bit wide HyperTransport link can deliver up to 22.4 Gigabytes/second aggregate bandwidth.

**Bus Speeds and Bandwidth**

Standard PCI is defined as a 32- or 64-bit bus with a 33 or 66 MHz clock rate. Extended PCI or PCI-X boosts clock speeds to up to 533 MHz. The maximum theoretical bus bandwidth of the fastest PCI type bus is 4.3 Gigabytes/second.

One of the issues facing shared buses is that as more devices are connected to the bus, the total bandwidth decreases. Bus contention arbitration is required to allow bus masters to allocate bus availability between multiple masters who need access to the bus. Worse, while one bus master uses the bus, the other masters must wait until the bus becomes available, either through completion of the initial
transaction or through the bus contention protocol. With PCI-type buses, actual throughput falls to far less than half of theoretical maximum when two bus masters share the bus, and less than a quarter when three or more share. Worse yet, when slow speed devices share with high-speed devices, the slow devices can dramatically impact the bandwidth available to higher speed devices.

HyperTransport point-to-point links on the other hand, are direct links between two devices only. Bus traffic is encapsulated in the control and data packet format and passed from device to device in daisy-chain fashion. Between any two devices, bus bandwidth is always at the maximum speed supported by the chosen clock frequency. Multiple transfers can be initiated throughout the daisy chain, up to a maximum of 32 concurrent transfers in HyperTransport specification 1.03 and 128 concurrent transfers in specifications 1.1 (DirectPacket™) and 2.0.

With HyperTransport Specification 2.0, top clock speed is 1.4 GHz dual-data rate or DDR. Dual-data rates mean that data is exchanged on both the rising and falling edge of the clock. This yields an effective data throughput of 2.8 Gigatransfers/second per signal pair, supporting up to 22.4 Gigabytes/second aggregate data throughput or bandwidth.

Due to smaller, narrower links, with higher signal integrity, lower implementation cost, lower power consumption and far greater bandwidth, there is little doubt that unidirectional, point-to-point link represent the future of high-speed interconnect link technology.

IV - Serial Links Versus Parallel Point-to-Point Links
Given the trend towards point-to-point links, there are two major alternatives emerging: parallel links such as HyperTransport and parallel RapidIO, and serial point-to-point links such as serial RapidIO and PCI Express.

Parallel RapidIO
Originally defined by Motorola, RapidIO is now supported by the RapidIO Trade Association that offers a parallel and serial I/O specification. The specifications define a three-layer architecture with the logical layer defining the protocol and packet formats, the transport layer defining the routing information to move a
packet through the system, and the physical layer defining the device level interface characteristics such as packet transport mechanisms, flow control, electrical characteristics and low-level error management. The two specifications share programming models, transactions, addressing mechanisms, and error management and transmission error reporting. They differ in the physical specification. The parallel interface is defined as an 8 or 16 bits wide interface using LVDS signals (8/16 LP-LVDS) and employs a source synchronous clocking scheme plus a set of control signals. The serial interface is defined as 1 or 4 lanes using 8b/10b serial encoding.

Like HyperTransport, Parallel RapidIO uses dual unidirectional link. It uses one clock signal per direction for an 8-bit wide link and two clock signals per direction for a 16-bit wide link. A FRAME signal is used to indicate the start of a packet or control symbol.

RapidlIO data exchanges are packet based and use request and response pairs to complete transactions. A master, or initiator, generates a request transaction and a target generates a response transaction. Packets contain the control information required to successfully complete transmit and receive operations and the data, if any. 16-bit control symbols are used for packet acknowledgement, flow control, and maintenance. Packets can contain data payloads ranging from 1 to 256 bytes and up to 256 transactions can be outstanding. CRCs are used to ensure data integrity across the link.
The RapidIO specification defines three layers, logical, transport, and physical. RapidIO packets include information from all three layers. In the figure, numbers are bits, unless otherwise specified as bytes. Packet overhead can easily be greater than 50 percent on data packets smaller than 64 bytes.

A primary difference between both parallel and serial RapidIO and HyperTransport is that HyperTransport offers the option of complete PCI transparency. The ability to conform exactly to PCI ordering and configuration specifications has greatly accelerated HyperTransport’s adoption in PCI-centric markets such as personal computers, servers, network equipment and even some embedded markets. RapidIO on the other hand, although supporting a subset of PCI functionality, is not a completely transparent and therefore compatible solution with PCI. The primary reason is that RapidIO was designed to serve embedded applications and especially telecom applications characterized by tens or hundreds of parallel processing elements, such as DSP farms requiring peer-to-peer connection. A PCI host-based ordering scheme is

---

**Figure 6** – The RapidIO specification defines three layers, logical, transport and physical. RapidIO packets include information from all three layers. In the figure, numbers are bits, unless otherwise specified as bytes. Packet overhead can easily be greater than 50 percent on data packets smaller than 64 bytes.

---

<table>
<thead>
<tr>
<th>Request Packets</th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Previous Packet</td>
<td>$S$</td>
<td>AckID</td>
<td>$S$</td>
<td>rsrv</td>
<td>$S$</td>
<td>rsrv</td>
<td>Prio</td>
</tr>
<tr>
<td>Target Address</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Source Address</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Transaction</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Size</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Source TID</td>
<td>8</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Device Offset Address</td>
<td>32, 48, 64</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Optional Data Payload</td>
<td>8 to 256 bytes</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>CRC</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Next Packet</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Response Packets</th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Previous Packet</td>
<td>$S$</td>
<td>AckID</td>
<td>$S$</td>
<td>rsrv</td>
<td>$S$</td>
<td>rsrv</td>
<td>Prio</td>
</tr>
<tr>
<td>Target Address</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Source Address</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Transaction</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Status</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Target TID</td>
<td>8</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Optional Data Payload</td>
<td>8 to 256 bytes</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>CRC</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Next Packet</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

* from RapidIO Trade Association
not needed for these types of market applications and RapidIO was not focused on PCI compatibility from its inception.

A second difference between parallel RapidIO and HyperTransport links is in their flexibility and scalability. Parallel RapidIO is set at a fixed parallel data bus width of either 8 or 16 bits. HyperTransport, instead, can scale from 2 to 4, 8, 16 or 32 bits wide and allows narrower links to interface seamlessly with wider links. In addition, HyperTransport devices auto-negotiate width, frequencies and protocol level support allowing the intermixing of advanced HyperTransport 2.0 devices with devices that support earlier 1.1 and even 1.03/1.05 specifications.

With a top clock speed of 1.0 GHz, the parallel RapidIO delivers a peak bandwidth of 4 Gigabytes/second with 8-bit wide links and 8 Gigabytes/second bandwidth with 16-bit wide links. In comparison, HyperTransport links deliver a peak bandwidth of 5.6 and 11.2 Gigabytes/second bandwidth with 8-bit links 22.4 Gigabytes/second bandwidth with full 32-bit wide links.

Another difference is in packet formats and packet overheads. Typical RapidIO transactions include NREAD, NWITER, SWRITE and MESSAGE packets. Overhead bytes range from 28 for NREAD, NWITER, and MESSAGE to 16 bytes for SWRITE transactions. Given a typical data payload of 64-bytes, RapidIO yields a nearly 44 percent overhead. Compared to 8- or 12-byte overhead for reads (12-bytes) and writes (8-bytes) for HyperTransport, equating to an overhead of 12.5 percent and 18.75 percent respectively.

**HyperTransport Advantages Over Parallel RapidIO**

While parallel RapidIO and HyperTransport share some common characteristics such as the use of a LVDS electrical interface and packet-based transactions, HyperTransport has some notable advantages. HyperTransport provides a more flexible and scalable solution compared to fixed width parallel RapidIO, a higher bandwidth solution, 100 percent PCI compatibility, and a more effective throughput resulting from lower packet overhead.

**Serial RapidIO**

As noted, serial RapidIO shares the same logical and transport layers as parallel RapidIO. The differences occur at the physical layer with a few packet definition
differences that reflect the lack of parallel control signals. At face value, these differences make the two not entirely software transparent between each other. More importantly, serial RapidIO uses differential current steering drivers defined by the IEEE 802.3 XAUI specification (10 Gigabit Ethernet Attached Unit Interface). This technology was designed to drive signals over long distances within environments like system backplanes. The serial RapidIO specification defines a short-run signal transmitter designed for routing through a printed circuit board, or between a printed circuit board and a single mezzanine card-type connector; and a long-run signal transmitter designed for driving backplane signals. For interoperability, receiver inputs require AC coupling.

The serial RapidIO specification defines a Physical Coding Sublayer (PCS) and a Physical Media Attachment (PMA) sublayer. These sublayers serialize packets into an 8b/10b encoded serial bit stream at the transmitter and deserialize the bit stream and reformat the packets at the receiver. The PCS handles idle sequence generation, lane striping/de-striping, lane alignment functions, and determines the port mode as 1-lane or 4-lane. The PMA serializes the 10-bit parallel code-groups to and from the serial bit stream at the transmit end. At the receiver end, it aligns the received bit stream to 10-bit code-group boundaries and feeds the 10-bit code groups to the PCS.

Parallel RapidIO uses a “packet and in-band control symbol” protocol with a separate FRAME signal to differentiate between a packet and a control symbol. Serial RapidIO, lacking any control signals, uses previously unused characters of the 8b/10b encoding technique to indicate start of a packet, end of a packet or an embedded control symbol.

One shared characteristic between serial RapidIO and PCI Express is that they were designed with large and very complex system architectures in mind. To that end, they support a wide range of maintenance, system organization, error detection, and error recovery features and a wide range of transaction formats and types. While useful with large systems, these features may conflict with the goal of delivering the lowest signal latency and the lowest transaction overhead for high effective bandwidth data movement in board level systems.
HyperTransport Advantages Over Serial RapidIO

HyperTransport has some notable advantages over serial RapidIO in board level systems. A primary advantage is that HyperTransport delivers far lower chip-to-chip latency than a serial link can provide. The overhead of serial link decoding is 20 percent. When this overhead is added to the native RapidIO packet overhead, it makes RapidIO a much less efficient solution than HyperTransport. A second advantage that HyperTransport has over RapidIO is raw bandwidth, with HyperTransport delivering as much as 22.4 Gigabytes/second aggregate bandwidth versus the maximum 8 Gigabytes of a x4 lane RapidIO link. A third advantage is complete PCI software transparency and compatibility. Another important advantage of HyperTransport is the ability to intermix both load/store compute-oriented transactions with user packet-based, communications-oriented transactions.

Serial PCI Express

PCI Express, like HyperTransport, is a point-to-point link that uses dual unidirectional lanes to connect devices. Unlike HyperTransport, the link is serial, resulting in two effects. On the one hand, serial links eliminate all sideband signals and simplify the link to just one signal line or lane. However, on the other hand, a single lane that includes clock, command, system status, data and address information, burdens the interface with the extensive overhead of serial/deserializers and 8b/10 clock encoding/decoding logic.

PCI Express’ serial link operates in similar fashion to serial RapidIO, but uses its own unique differential current steering drivers, not those of the XAUX specification. PCI Express’s PCS and PMA layers operate similarly to serial RapidIO. The maximum transmission rate of PCI Express is 2.5 Gigabit/second, yielding a 2.0 Gigabit/second data transfer rate after subtracting the clocking and encoding overhead. This provides for less data transfer than Serial RapidIO’s 3.125 Gigabit/second transmission rate and 2.5 Gigabit/second data transfer rate. Both PCI Express and serial RapidIO end up yielding less than HyperTransport’s 2.8 Gigatransfers/second rate. Note that HyperTransport uses the measurement of transfers/second rather than bits/second because HyperTransport links may vary in bit width. A single HyperTransport bit performs at a 2.8 Gigatransfer/second rate, but a single HyperTransport link is at a minimum, 2 bits wide.
Management data is contained in the Data Link Layer Packet, or DLLP and payload data is contained in a Transaction Layer Packet, or TLP. DDLP and TLP packets are interspersed in a PCI Express data stream. Within the DDLP are the traditional PCI functions and the new PCI Express functions like flow control and packet acknowledgements.

As shown in Figure 7, the PCI Express packet format consists of three layers, the Transaction Layer, the Data Link Layer and the Physical Layer.

Figure 7 – The PCI Express specification defines three layers, the Transaction, the Data Link and the Physical. The figure shows the overhead required for each layer in bytes. Packet overhead can be significant for data packets smaller than 64 bytes. While PCI Express supports large packet sizes, up to 4096 bytes, this can affect system throughput, as PCI Express has no mechanism for interrupting long packet transfers.

The data payload in PCI Express is carried in the Transaction Layer Protocol or TLP packet. In addition to the data, the TLP has a header of 12 or 16 bytes that hold information such as packet size, message type, traffic class for QoS and any special handling instructions. The TLP is concluded with a CRC coding for data integrity.

An important aspect of PCI Express is its being introduced as a replacement all other buses, and its intention to become the standard architecture for the next 15 years. For this reason, the effort put into the development of the PCI Express
specification included all of the facilities needed not only to support current
generations of products but also future products to be developed over the next
decade. Consequently, the specification includes a significant number of features
and capabilities that make PCI Express extremely robust in many important
areas. These include data flow controls, QoS classifications, hot plug/swap
capabilities, queuing, link error reporting, device-to-device error reporting, error
handling, power management features, extended configuration attributes and
peer-to-peer communications to support multi-hierarchy topologies – all within
the PCI software compatibility framework!

Needless to say, in spite of the best efforts of the PCI Express developers the
need for further capabilities was still not fully satisfied in terms of interlink
communications. The solution was to add yet another layer of encapsulation
upon the base PCI Express layer, called PCI Express Advanced Switching, or
AS. Advanced Switching breaks free of the PCI compatibility constraint, adding
extensive peer-to-peer, multi-protocol support, message passing through multiple
address domains, virtual channels for QoS, multicast support and advanced link
support features.

To minimize market fragmentation, PCI Express base and PCI Express AS share
the same physical and data link layers of the protocol. They diverge at the
transaction layer protocols layers as shown in the following figure.
Figure 8 – The PCI Express and PCI Express AS specifications share the Data Link and the Physical Layer protocols. The two diverge at the Transaction Link Layer. Compatibility is maintained by allowing PCI Express AS to carry PCI Express base packets encapsulated within PCI Express AS packets.

PCI Express AS maintains compatibility with PCI Express by encapsulating PCI Express packets within the AS packet definition. In fact, PCI Express AS is designed to create a switching fabric that will be “protocol agnostic” enabling a complex PCI Express AS fabric to carry information from a variety of sources in a variety of native protocol formats.

As shown below, the PCI Express AS protocol is grafted upon the base PCI Express protocol with an insertion into the PCI Express data link header of a AS Start comma and the insertion of an AS Header into the start of the PCI Express base Transaction Layer. The AS Header defines a Protocol Interface or PI and a routing path.
Figure 9 – The PCI Express AS protocol is inserted into a base PCI Express format. The data payload, in this case a PCI Express base packet, could be data in any protocol format as PCI Express AS is protocol agnostic. The PCI Express AS protocol requires a start indicator, or comma, and an AS header that contains the PI or Protocol Interface and the routing path of the AS packet.

HyperTransport Advantages Over Serial PCI Express and PCI Express Advanced Switching

HyperTransport technology delivers a number of advantages to board-level designers as compared to PCI Express and PCI Express AS. They include higher bandwidth, lower latency, and greater effective throughput due to lower packet overhead. Naturally, being a clock forwarded parallel architecture, HyperTransport delivers the lowest latency possible. By needing no clock encoding/decoding overhead, HyperTransport’s fast 2.8 Gigatransfers/second transmission rate becomes the fastest 2.8 Gigabit/second data rate. This yields the highest aggregate bandwidth, 22.4 Gigabytes/second of any board-level communications technology.
In addition, HyperTransport provides many of the system-level features that PCI Express provides, although, in many cases, at a minimum processing level when compared to PCI Express. For example, starting with HyperTransport DirectPacket specification 1.1 HyperTransport provides the ability to carry user packets (in any protocol) across HyperTransport links. Virtual channels include streaming channels with flow control and link error reporting for supporting the convergence of computing and communications applications.

Finally, HyperTransport provides some key technical advantages that other technologies do not provide. One of these is Priority Request Interleaving™, or PRI. It enables a high priority request command (only 8-byte long) to be inserted within a potentially long, lower priority data transfer. A typical use is shown in the figure below. While data transfer 1 is underway between peripheral B and the host, the need arises for peripheral A to start a data transfer from the host. Without PRI, transfer 2 would have to wait until transfer 1 completes and, should transfer 1 be the answer to a cache miss, for instance, latency for transfer 2 would become prohibitive. With PRI, a control packet is promptly inserted within transfer 1’s data stream, instructing the link to initiate data transfer 2 on the other link channel concurrently with the completion of data transfer 1. This mechanism, unique to HyperTransport technology, greatly reduces latency of HyperTransport-based systems and improves overall system responsiveness to heterogeneous traffic.
V – Conclusion

HyperTransport technology was designed to be an optimized board-level architecture delivering lowest possible latency, highest bandwidth, design flexibility, performance scalability and PCI compatibility. HyperTransport delivers all of these capabilities within a framework that enables board-level designers to develop system architectures free of cumbersome constraints and performance burdens.

The widespread adoption of HyperTransport across a broad spectrum of high performance product sectors ranging from consumer devices to personal computers, network equipment and supercomputers, is tangible proof of the power and flexibility of the HyperTransport architecture. In many cases, HyperTransport’s integration extends into the processor, such as in AMD’s Opteron and Athlon64 64-bit x86 processors, Transmeta’s Efficeon x86 processor, Broadcom’s BCM1250 64-bit MIPS processor, and PMC-Sierra’s RM9000 64-bit MIPS processor family. In these instances, HyperTransport
operates as a fully integrated front-side bus and the traditional NorthBridge-SouthBridge structure is eliminated. In other instances, such as in Apple’s G5 PowerMac, HyperTransport is used as an integrated, high performance I/O bus that pipes PCI, PCI-X, USB, Firewire and audio/video links through the system. In all cases, HyperTransport replaces the overlapping processor and local I/O buses of earlier generation systems with a unified, high bandwidth, low latency, and low-cost architecture that is scalable, low-cost and extensible to future product generations.

While other I/O technologies certainly have their place in the market and provide their own benefits to their intended targeted applications, HyperTransport technology has been, is, and will continue to be the lowest latency, highest performance solution for board-level systems.

For more information on the HyperTransport technology, please refer to the other white papers available at www.hypertransport.org, in particular, the “HyperTransport™ I/O Technology Overview, An Optimized, Low-latency Board-level Architecture” white paper published in 2004 and “HyperTransport I/O Technology DirectPacket™ Specification: Efficient User Packet Handling Supports Streaming Communications.”

End of HTC_WP04 White Paper.
About the HyperTransport Consortium

The HyperTransport Technology Consortium is a membership-based non-profit organization in charge of managing and promoting HyperTransport Technology. It consists of over 40 member companies including major industry players in the personal computer, server, network equipment, silicon IP, software and supercomputing markets. Founding members include Advanced Micro Devices, Alliance Semiconductor, Apple Computer, Broadcom Corporation, Cisco Systems, NVIDIA, PMC-Sierra, Sun Microsystems, and Transmeta. Membership is open to any company interested in leveraging the HyperTransport technology and is based on a minimal yearly fee that includes the right to royalty-free use of HyperTransport technology and Intellectual Property. For more information, please visit: http://www.hypertransport.org/org_join.html.

The HyperTransport-enabled product portfolio includes tunnel, bridge, and graphic chips; programmable-logic devices; security processors; IP cores; BIOS software; verification and test tools; and training courses and an architecture reference manual. A full product listing can be found at: http://www.hypertransport.org/featuredproducts/products.html.