A peripheral component interconnect (PCI) is a hardware interface that allows for connecting peripheral devices to an already existing computer. Initially, computers came with several inbuilt PCI slots but could not hold for long as with time, computers came with more control circuits packed onto the motherboard chipsets.
PCI cards have been designed to carry out various computing functions. Examples include connecting network, video and sound cards to the motherboard. They can host up to 5 devices at a time and come with a fixed bandwidth of only 32 bits.
They however have one major drawback holding them back. This is that, in as much as the other devices, such as sound cards, video cards, and processors, are attached to it, the PCI has had little to no change over the years.
The Need for PCI Express
A newer version of PCI known as the PCI Express seeks to eliminate that very problem. A PCIe is basically used to connect very high speed components to the motherboard to complement the installed powerful processors. Desktops have in them a couple of PCIe slots built onto their motherboards for the purposes of adding other devices such as the graphics processing units. A host of other add-on cards that could go into these include solid state drives, Wi-Fi cards and RAID cards.
PCIe cards have on them lanes, which are how data is transmitted in and out of the PCIe card. They are classified by how many lanes a card has. This is denoted by an x and then followed by the number of lanes present.
For example, you could have a variation of x4 and x16, among others. A PCIe x4 card thus has 4 lanes and has the ability to transmit data at four bits per cycle. The greater the number after the x the more the bandwidth and frequency of data transfer of the said card.
One key difference between the PCIe and the PCI, is that it encompasses a switched architecture with the ability to run up to 32 separate serial lanes unlike the shared bus which the PCI uses. The serial lanes use the parallel mode of data transmission and each individual lane is full duplex and has its own clock.
What Are Accelerator Cards?
Accelerator cards are a special type of cards that are dedicated for the purposes of expansion. They are thus meant to accelerate specific workloads. These are plugged in through the PCIe slots and are categorized as standard PCIe devices by the inbuilt processor.
Instructions can be passed onto the accelerator cards with the effect of performing various operations by the commanding programs. Such programs are usually embedded by the card manufacturer in the form of hardware specific library code. Once the card is done computing it then relays the results to the host processor.
Why We Need Them
With the onset of new technologies such as 5G networks, more components have been interconnected now more than ever. This has pushed for more power needs for more solutions to existential problems such as the need for more storage and thus the increased demand for cloud storage services.
This in turn has pushed over the roof, demand for increased computing performance especially on sites such as servers and data processing units and centres.
Accelerator cards offer such advantages as flexibility, ease of system configuration, ability to carry out high speed parallel computing while still maintaining low latency and keeping the development cycle short.
How Do They Work?
Accelerator cards are powered almost exclusively by ASIC chips (application-specific integrated circuit) which can also be referred to as accelerators. They are, at their very base level, integrated circuits that have been designed to perform specialized tasks.
The IC mostly comes as a combination of an Analog circuit, an amplifier, a denoising circuit and a digital block such as registers and arithmetic logic units (ALUs) as well as memory blocks.
They make use of discrete signals for a digital plane and continuous signals for an Analog one.
These chips may have numerous applications but at their core level are used mainly to control other electronic devices and how they will function. The metal oxide semiconductor technology is used to fabricate them. Their complexity and the level of functionality have increased significantly especially owing to the fact that there has been a downwards trend in feature sizes and improvements in design tools.
Seeing as these chips are dedicated to one or a group of functions, they execute workloads way faster and efficiently as compared to their counterparts, the general purpose processors.
These operations are therefore accelerated on the card as opposed to if they were being carried out on a general purpose processor. The accelerator is incorporated with specialized logic that enables it to perform the said complex operations more efficiently.
Examples of accelerator cards include AI accelerator cards, PCIe accelerator cards, cryptographic accelerator cards, programmable accelerator cards and graphics accelerator cards. We shall discuss the first two that I have listed below.
Where AI comes in
AI accelerators can be defined as a specially designed hardware accelerator which has been specifically crafted to accelerate machine learning and artificial intelligence applications in general.
These applications also extend to computer vision and artificial intelligence neural networks. AN networks mostly fall under the realms of deep learning (DL). Examples of these applications include the implementation of algorithms for internet of things (IoT), robotics and carrying out automated tasks.
These types of accelerators make use of techniques such as optimized memory use and lower precision arithmetic, which have the effect of increasing computational throughput and accelerating calculations.
Optimized memory employs algorithms that analyse the use of an external memory model, also referred to as an I/O model or a disk access model. It forms an abstraction which performs the same as a Random Access Memory (RAM) machine model but has an added cache memory on top of the main memory already in place.
This method harnesses the speed by which data can be retrieved from the cache memory block. Read and write operations are also performed much faster as compared to doing the same tasks in main memory. A common metric used in measuring the performance of an algorithm is the running time. It is defined as the number of read and write operations to an external memory.
Low precision arithmetic makes use of floating point values which are denoted by very few bits known as mini floats. These are specialized for specific functions and thus do not fare well where general purpose numerical arithmetic operations are being carried out. These specific functions, which mostly fall under computer graphics, require that the iterations are small. Machine learning techniques also make use of these with such formats as the bfloat16.
PCIe Accelerator Cards
These are an answer to the current huge demands and thus rely on the availability of PCIe interfaces for plugging in accelerator cards. These slots then make it possible to accomplish all of the above with the goal of handling processor workloads by trying to meet the required data processing bandwidth.
Accelerators cards have slots on them which have been specifically PCIe standardized. These, however , pose a challenge as the size of the accelerator boards are fixed and cannot be expanded.