Hello, my name is Elad Alon from UC Berkeley, and in this video I'm going to talk about
how far can we go with electrical I/O.
Before I dive into too much detail on the exact topic, I do want to make sure that we're
all on the same page as to why it is even important to think about electrical I/O in
the first place.
We're obviously in this era of data, the cloud, and things like this, and any time you're
actually processing things, you have to make sure you're going to be able to move around
fast enough within those processing units to actually get things to be done in an overall
efficient enough manner.
So it turns out if you've ever looked inside those big boxes, the big racks, that actually
implement the computers that are behind the cloud, the performance you can get in terms
of that I/O between the different pieces of the system as well as even chips within the
different backplanes of the different of PCBs that are there - they actually oftentimes
set the overall power as well as the performance of these systems.
Just to give an idea, these pictures here are realistic, and there can literally be
tens of thousands of links inside of one of these data centre systems.
So really, you have to pay a lot of attention to what are they going to be doing to you.
To then take one more layer of the onion, if we then look at what is it the electrical
link designer actually going to have to do.
It turns out that they've actually got some really stringent challenges that they have
to deal with.
So the first, the bump density - the ability to get signals on and off of the chip, is
really not scaling very fast for basically simple physical reasons.
So that implies that electrically, we have to make our data rates go up on a per lane
basis or really per I/O basis, to make sure we can support the overall increased throughput.
The other problem is that the overall chip has a power dissipation limit, so that means
we can't just go and arbitrarily increase the power we spend on links, we have to keep
that roughly about constant.
So if you think about the link efficiency, or so-called pJ/bit or mW/Gb/s - this is just
telling you for however fast you're going, how much power you have to spend for that.
It's going to have to improve because I have a fixed power and my speed has to go up.
Now to make life even worse, as I'm kind of hinting at a little bit on the right-hand
side of the so-called channel picture, I'll clarify what I mean by channel in a couple
more slides, but basically the characteristics of the signals we are transmitting from the
transmitter to the receiver see keep on basically getting worse.
If you just use the exact same physical interconnect or platform you did before, you make it harder
and harder to actually receive that correctly at the receiver and get sufficiently low so-called
bit error rate.
What I want to do now is to just see, roughly, how is it that we've done so far on the link
design space given these difficult challenges that were facing?
What I've done here is I've just grabbed some data of essentially published link papers
from various conferences over approximately the last 7-8 years or so, and obviously there's
a lot of spread in the data points here, so don't pay too much attention to the exact
numbers that I'm quoting, but there is certainly a trend where the data rates are going up
and you can see that there are some fast papers early on, but it takes a while before the
rest of the pack really catches up, and that's where people are starting to actually commercially
produced things.
But if you kind of draw a rough line across here, it has been about an 8x speed increase
over about 8 years - about 30% per year, at least in terms of publications that people
have been showing so far.
It turns out that the standards typically are a bit slower than that, so if you look
at for example PCI-express or one of these others, those tend to move only about 4x every
8 years or so.
But that makes sense because standards processes have a lot more involved in them.
This is really only showing the first things that people are publishing, and that's why
I think it moves a bit faster.
If we look at the power efficiency and you remember I said before that we have to actually
make sure that the pJ/bit goes down because otherwise we would just be spending more power
in our links.
Again, there are a lot of our clusters and data points spread around here, but the line
is indeed moving down.
You can see that people are no longer doing these 100pJ/bit links - everyone is down in
the 10-20pJ/bit range, and actually a bunch of people are pushing the single pJ/bit, if
not even below that.
So again, if I draw a very rough line across all this data, it is about a factor of 10
improvement in efficiency over 8 years - some may be a little bit faster - maybe 33% per
year, but again, it's good news because it says that power indeed has stayed about constant.
The next natural question of course is: are we going to be able to keep this up?
I want to be the first to admit that there are a whole bunch of very complicated challenges
that need to be addressed to really keep on marching this path forward - a bunch of them
I'm not going to be able to talk about for the sake of time and clarity.
So I'm going to have to fairly dramatically simplify things in the rest of the discussion,
but pointing out where the limitations are will tend to actually hold pretty true, even
if we take some of these complicated issues into account.
Just as an example, I'm going to stick with 2-level modulation for now - a 1 is one level,
a 0 is another, and that's all I'm going to be doing.
You can do things like multi-level modulation, but for detailed reasons, it turns out that
the scaling there is a little bit less clear, and in fact, not entirely obvious that you'll
necessarily win from that standpoint.
I want to focus on two key issues.
The first is the so-called channel characteristic, which is just: what does the medium in between
my transmitter and receiver do to the signal I send, and as we'll see in a second, if that
channel so-called loss - meaning the frequency-dependent roll-off gets worse - it means you need more
complicated link circuits and architectures, which obviously directly impacts how much
power you have to spend.
The other important point, as I'll show you, the device/circuit performance is going to
set some basic speed limits that eventually may be the dominant thing that stops us from
going any faster.
First, let's take a look at the so-called bad news.
Why is it that people are really actually worried about being able to continue moving
forward, even though historical trends have actually looked pretty good?
The first and perhaps most important one is that if I say that I always have to communicate
over 1m or something like this, then if just look at the way these loss curves go - let's
ignore these deep notches which are usually associated with reflections - let's assume
we get rid of those, then basically you have these curves that all look like they continue
rolling off linearly in dB/frequency.
In other words, if you go from (in this case) 4-GHz up to about 8-GHz - you're going from
about 20dB or so all the way beyond (in this case) 40dB, so to keep this in mind - that's
another factor of 100 of loss you have to deal with.
If I want to go another factor of 2, that's going to mean another factor of 10,000 in
loss from 8-GHz to 16-GHz.
That's a huge factor if you didn't do anything about it.
The other bad news is that along with this, which is an exponential badness, back in the
good old days we could rely on transistors getting faster, lower power, cheaper, etc.,
but even if you look at the ITRS, (this is about a year old prediction now) you're only
going to get about a 2x improvement in intrinsic device speed over the next 8 years or so.
That means that if we want to do that 8x speed that I was saying before in the same 8-year
time period, clearly, this is going to be a big problem.
Just to be clear, for those of you who have not seen these things before, the reason again
that channel loss is a problem is that if you take these frequency-domain pictures showing
the transfer function and amplitude of frequency at the receiver versus what you sent.
Basically it has two dominant effects.
If I look at my original transmitted binary pulse on the transmit end and I compare it
against the receiver, obviously this point is quite a bit smaller than that point, so
I'm going to need some sort of extra swing or gain to compensate for that.
But in addition, there's all this extra energy just hanging around due to the fact that I
low-pass filtered the signal in this particular case.
In fact again, you can see these notchy/ripply things associated with the notches in the
channel.
The point is that this residual energy introduces inter-symbol interference, meaning that if
I had another bit that was going to be transmitted over here somewhere - all this residual stuff
would basically interfere with that next bit and may make it so that I interpret what should
have been, for example, a 1 or a 0, the other way around.
Basically, the way you counter that is building additional circuits, usually known as equalizers,
that essentially figure out what this badness is and try to introduce the opposite effect
so you can really figure out what the original transmitted pulse was.
Of course, the more complicated the circuit, the more power it's going to burn.
The good news however is I think there are some fairly straightforward techniques we
can adopt to really dramatically improve the losses that we're going to see.
Most of the time when people show you those curves, they assume the so-called traditional
backplane type of application.
The actual chips that you're trying to talk from/to are these squares on these PCBs - they
run through some interconnect on an FR4 here, go through some connector - in fact two sides
of the connector, get out on this so-called backplane, arrive at a cross, and then come
back all the way on the other side.
The materials that people use there are not necessarily the best for getting low loss.
In fact, there are several options out there, but one that I think is indeed very compelling
overall is the so-called cable-based interconnect, where you can kind of see a picture of this
here - you might have a chip with a short connection over to this connector, jump out
through this flex cable, and then get over to the next chip.
This little cable is just made out of twinax - two conductors next to each other with some
Teflon wrapped around them, and a braided ground shield wrapped around that.
The real key point here is that those cables are built first of all with dielectrics that
have much lower loss per meter - Teflon is about a factor of 20-30x better than, for
example, FR4.
And the roughness of the conductors themselves also tends to be a lot better than what these
standard backplane PCBs look like.
It turns out, there is really nothing exotic in terms of the materials or construction
that we're using, and in fact, you can buy these for quite cheap price points.
These are some example loss numbers that were taken from initial measurements in 2010.
There are actually some updates in terms of particularly how people connected these cables
and made things even better, but much more importantly, if you just take a look at what
this curve looks like and compare it to the backplane channel - huge difference once you
get out into these high frequencies.
Even today if you went out to 40-GHz or so, you're still looking at about 20dB loss, which
is well within the range what people are used to dealing with backplanes today.
It turns out that if you take a close look at the standards, what almost all of them
have been doing is they basically assume that the loss (roughly speaking) is going to be
pretty fixed, even as you change the frequency, just so that the complexity of the equalizers
you have to build don't scale up in some crazy way, which of course would hit your power
consumption.
This is not to say that we're going to be able to take this out to infinity, but I'm
very confident that we'll be able to get low-enough loss channels out to the point actually where
I think the circuits will become the bottleneck.
So let's actually take a look at what the circuits themselves can do.
If you remember, there are really two considerations here: (1) fast can I actually run the circuit,
and (2) how much power is it going to take?
Both of those are potentially the limiter.
Let's just take the simplest single-stage, Class-A, resistively loaded amplifier.
What I want to quickly do is derive how much power it will take to run this amplifier as
a function of the gain, the bandwidth, and really the capacitive loading that you're
driving.
If I have a certain gain-bandwidth that I want to get - just AvOmegabw, that's set by
gm over the total load capacitance.
I just want to remind you that there's always some sort of Cself associated with the parasitics
of the transistor itself, and the key point is that the larger you make that gm, the larger
that Cself is going to be.
Just to capture that relationship, gm is related to the input gate capacitance through OmegaT,
so this gamma factor is taking into account the ratio between that Cself or that drain
parasitics and the total gate capacitance.
Typically, that gamma is 1 so you can almost even forget about it, but I'm going to leave
it here for the sake of clarity.
If you just do about two lines of algebra - you take these two equations, you plug them
back in, rearrange - you'll find that the transconductance you need is equal to the
gain-bandwidth times the capacitance, divided by 1 minus the gain-bandwidth normalized by
the OmegaT/ gamma.
Pay close attention to that, I'm just going to do one small tweak in this next slide here,
which is: I told you how much transconductance I need - I really actually care about power,
in this case bias current, so I'm just going to relate that transconductance to the bias
current.
I can do that by just defining this magic V* to be the ratio of ID/gm.
If you haven't seen this before, you can think of this like an overdrive, but really, it's
by definition this quantity that I'm giving you here, so you can always actually measure
that quantity.
Once you've done that, the bias current is just very simply that gain-bandwidth times
the V* times the load capacitance, divided by that 1 minus normalized gain-bandwidth
against OmegaT term.
What's interesting here is that you've probably always been told how it easy it is in digital
to estimate power - it's just CVf, well analog is actually the same thing - it's just slightly
different V and f.
But basically, the power is being set by the gain-bandwidth, the overdrive/V*, the load
capacitance, and of course the supply voltage you're working at.
The thing that I really want to highlight here is that if you look at this term in the
denominator, there is absolutely a bandwidth limit beyond which you cannot go, because
as soon as this normalized piece here - AvOmegabw/(OmegaT/ gamma) - gets to 1, meaning you're operating
right at the speed of the transistor itself, you can spend infinite power but you're not
going to go any faster.
That's captured in these plots over on the right, where when you change the load capacitance,
it just scales all those curves up the same way, but they all have the same limit because
it's the same transistor biased the same way.
The reason I brought this up is in fact, I think we can do a simple calculation that
tells us roughly where the speed limit is going to be.
So by 2023, if we follow those ITRS trend curves, (this is supposed to be for NMOS devices
but we will see) NMOS transistors that have approximately 500-GHz of fT - that's roughly
where things are supposed to line up.
So if I want to have a stage with a gain of 1.5, it's going to give me about 300-GHz of
absolute maximum bandwidth.
As I'm showing you in this highly simplified, yet representative block diagram of a typical
high-speed link, there are several stages that one typically needs along the chain here
to do different types of functionality associated with the link.
This is not to say that any of this is absolutely set in stone, but I'm saying I'm going to
need at least four of those gain stages, and I think even if you start doing some more
exotic things, you're still probably going to need about four gain stages just for the
driver and one stage of gain over at the receiver.
Basically, if I have four stages like that, the bandwidth of my cascade drops to about
150-GHz.
I should remind you if I actually operated exactly at this point, it would cost me infinite
power because everything is driving its own load capacitance.
Obviously that's not something practical that I'm going to be able to do, so if I back off
from this by about a factor of 2, this self-loading penalty, meaning that 1/(1-x) term goes down
to about a factor of 2x - let's assume for now that that's a tolerable point to operate
at.
That would tell you that the final result is about 75-GHz or so of bandwidth, which
you can argue a little about how much the ratio between the bandwidth and the bit rate
needs to be - a typical number is about 1.5x of the bandwidth is the bit rate, so that
would give you around 110-Gb/s or so.
So that means we've got a factor of about 4 from where we are today, but much farther
than that starts to look pretty challenging.
To even actually hit that 110-Gb/s point, remember, today if we are about 30-Gb/s, which
is about where most people are starting to commercialize, the link efficiency would have
to improve by another 3-4x versus today's designs.
I think unfortunately, most of that, or in fact almost all of that is going to have to
come from getting transistors that have less capacitance because the supply voltage is
largely limited by the linearity constraint - there's a certain amount of loss in the
channel, and that means that there are small signals relative to large signals because
of the frequency dependence, and if I want to handle those properly without having anything
clipped that I don't want to clip, it basically indirectly sets the supply voltage you need
to use because that sets the overdrives and therefore everything else.
If the capacitance doesn't scale by this 3-4x factor, which is not entirely clear at this
point, then rather than being limited by how fast you can run this circuit, you may very
quickly hit this point where you could've run the circuit faster, but you can't actually
spend that much power.
With that, I would like to briefly summarize and make sure that the high-level points have
been made clear.
The first is that, as I pointed out right at the beginning, high speed design is really
about this need: we always have to go faster but do so at a constant power consumption.
So if we want to have any hope of doing that, especially as we move forward and the way
the losses scale with frequency, that really means we have to manage those channel losses
and probably keep them about constant as we try to push up the speed, just to make sure
that our circuits aren't constantly dealing with an exponentially increasing problem,
because again, most channels beyond a certain frequency have loss that just increases linearly
in dB with frequency, which is really not a curve you want to be fighting against.
The good news is, however, things like copper cable-based solutions or again, you just use
better materials with lower loss really look promising and actually don't look all that
expensive to implement.
So when we put that together in about 8 years or so, it looks like that either the circuit
speed and/or power consumption is likely to be the limiter, and again, I think 100-Gb/s
is a pretty safe rough estimate.
I'm again the first to admit that there are a lot of other ideas and techniques one can
think about applying to try to shift this around, but to be honest, they all have their
trade-offs, and I don't think they'll actually break this barrier substantially.
I think that does say that data rate scaling is going to slow down because this is probably
about a factor of 2 slower than the previous scaling we had seen.
But there is at least another 3-4x beyond where we are today, and there's obviously
still a lot of work to still be done to get to that point.
Thank you very much for your attention.
No comments:
Post a Comment