Hi, thanks for tuning into Singularity
Prosperity. This video is the fifth in a
multi-part series discussing computing.
In this video, we'll be discussing the
gap between competing performance and
memory and how this 'memory wall'
is to be the demolished.
[Music]
While computer performance and memory
capacity have followed the trend of Moore's
Law, memory access time, in other words,
latency, has not, only increasing by
approximately 1.1 times per year. While
at first the difference between an
exponential with a 2x versus 1.1x
rate of growth is not
distinguishable, as time progresses it
becomes more and more apparent, in terms
of the memory gap this is equivalent to
a 50% increase in the gap per year. We
are now at a point where this gap is now
large enough to cause severe bottlenecks
in applications. For example, for machine
learning in terms of genomics, financial
and scientific research, 80 to 90 percent
of the time is just spent accessing
memory while the other 10 to 20 percent
is spent doing the computation itself.
This really exemplifies how important
memory and storage is, a computer can only
process data as fast as its memory can
be accessed. Now there are a variety of
memory devices that computers use today,
with each device storing certain
types of information and having their
own trade-offs based off performance,
capacity and cost. The most expensive,
best performance memory is found on the
die of a CPU core itself, the registers. The
registers essentially just store the
data that the CPU is using to compute
its current instruction. Moving outward
we are met with static random-access
memory, SRAM, more commonly referred to as
the CPU cache. There are multiple levels
usually two to three, each growing in
capacity but decreasing in performance
as we move further away from the
registers, and they store the data that
the CPU will need to use soon or has
been using a lot of recently. The SRAM
and registers on a CPU are fixed in size
due to space, the processors job as well,
to process data not store it. Going off
chip there's DRAM, NAND solid-state drives
and then mechanical hard drives, each
many orders of magnitude slower than
on-chip memory implementations: Today
there are three primary technologies
used to store and access data. First
there's DRAM memory, which serves as hot
storage for the most frequently accessed
data. DRAM is incredibly fast but also
expensive, especially at large capacities.
For cold data there are hard drives,
which are orders of magnitude slower
than DRAM but very cheap and high
capacity, and that's why the bulk of our
data is stored there. In the warm tier
there are NAND SSDs, which are much
faster than hard drives but cheaper than
DRAM.
So, when viewing memory types in this
pyramid structure, the processor
registers are at the very top point and
SRAM takes up the other top 1% of the
pyramid. To summarize, SRAM and DRAM are
hot memory types, the most expensive but
also the fastest. They are volatile
meaning when power is unplugged, all data
is essentially lost and they are also
fairly fixed in their amount of storage
capacity. Moving down to warm storage
types we get NAND, in other words, SSDs.
This technology is beginning to
mature with 3D NAND, allowing for
rapid increases in non-volatile storage,
in other words, it's suitable for storing
long term information and now due to
their increasing capacity are able to
compete with mechanical hard drives.
However, they are many orders of
magnitude slower than the speed the
processor runs at and DRAM. At cold
storage we have standard mechanical hard
drives, these have the most non-volatile
storage but are extremely slow, really
making them suitable only for backups or
long-term storage on a budget. This is
how computer memory has been for quite
some time, but now Intel and Microns new
X-Point technology is coming in to
change the game. X-Point allows for
memory devices that are much faster than
NAND, have more storage capacity than
DRAM and are non-volatile, essentially
bridging the gap between DIMMs and
SSDs: To make 3D X-Point memory
dense we packed lots of capacity into a
tiny footprint. We started by slicing
submicroscopic layers of materials into
columns, each containing a memory cell
and selector. Then we connected those
columns using an innovative crosspoint
structure consisting of perpendicular
wires, that enables memory cells to be
individually addressed by selecting one
wire on top and another at the bottom. We
can stack these memory grids three
dimensionally to maximize density. This
technology along with 3D NAND will
completely change the way the storage
pyramid is organized: 3D X-Point
technology is about to completely
rewrite the rules governing the hot, warm,
cold storage pyramid. Why? Because it's
nearly as fast as DRAM while its cost is
much closer to NAND SSDs. Intel is
designing both DIMM and SSD solutions
with 3D X-Point technology. Intel DIMMs
based on 3D X-Point technology will
enable you to greatly expand your
high-performance hot storage capacity,
without breaking the bank,
unleashing new levels of performance and
responsiveness for the apps and services
that power your business. Intel
Optane SSDs based on 3D X-Point
technology and other Intel storage
innovations will enable much faster warm
storage, plus Intel has already
introduced 3D NAND technology into SSDs,
boosting density and increasing the
capacity of those drives. These new Intel
SSD technologies can expand warm storage
capacity with excellent performance and
cost. So, with multiple different memory
types and new memory technologies breaking
current memory paradigms, we can
begin interfacing multiple memory
technologies together to increase
performance, this is referred to as a
memory pool: The OS and your applications
run as is, without modification, with full
access to the extended memory pool.
Memory capacity associated with each
processor scales by up to 8x or 12
terabytes for a single processor. The
software also intelligently determines
where data should be located in the pool
to maximize performance, in select
workloads the software can enable
servers to deliver near DRAM only
performance even with DRAM supplying
only 1/10 of the overall memory pool
capacity. This combination of increased
capacity and cost efficiency means you
can use Intel memory drive technology to
displace a portion of DRAM to reduce
overall memory costs. Or extend the
memory pool capacity beyond DRAM
limitations when greater system memory
is required. Including Intel's memory
drive technology to pool various types
of memory, each memory type is now also
beginning to incorporate smart caching
and fetching assisted by AI. These
software optimizations will have huge
impacts on the performance of computer
memory. We touched upon these principles
heavily in previous videos in this
series, new hardware devices to break
current paradigms as well as increased
software optimization to leverage the
hardware resources. There are various
other memory hardware and software
optimizations beyond the scope of this
video we haven't even touched, such as
DDR5 and virtual memory to increase
memory bandwidth and decreased latency,
there are many amazing creators on this
platform and resources online if you wish to
learn more. While in the previous section
we discussed the types of memory devices
as well as new hardware and software
technologies to increase memory
bandwidth and decrease latency, we didn't
discuss another huge bottleneck, data
transfer. SRAM has transfer speeds in thr
range of hundreds of gigabytes per
second and DDR4 DRAM in the range of 50
to 70 gigabytes per second, however, as
mentioned previously, these are expensive,
volatile, fixed capacity memory sources.
For non-volatile memory, there are two
standards for data transfer: 1) The Advanced
Host Controller Interface, AHCI, which
uses the Serial Advanced Technology
Attachment, SATA, and 2) Non-Volatile Memory
Express, NVME, which uses the Peripheral
Component Interconnect Express, PCIe, to
transfer data. now SATA and PCIe are the
physical hardware mediums used to
transfer data in the computer. SATA has
been used for the majority of the 2000s
for memory devices, with transfer rates
tapping out at 600 megabytes per second,
it's medium is a cable. PCIe is connected
directly to the CPU through a socket on
the motherboard, for much of its lifetime
it has primarily been used for graphics
cards. PCIe 1.0 had transfer rates of
250 megabytes per second, followed by 2.0
with 500 megabytes per second and now
our current generation, 3.0, with one
gigabyte per second.
As a sidenote, PCIe 4.0 is expected next
year, with transfer rates of two
gigabytes per second and 5.0 by 2021
with rates of 4 gigabytes per second. Now
with SATA tapping out in transfer rates,
PCIe has been the new direction the
industry has been leaning towards, with
NVME protocol standards in development
since 2012. AHCI was a great standard
for old mechanical hard drives to
transfer data, it could handle 1 data
queue at a time, which was fine given how
slow mechanical hard drives are, but with
SSDs AHCI severely bottlenecks
performance. NVME however can handle
65,000 queues at a time, extremely
parallel due to the utilization of the
wider bandwidth of PCIe lanes and PCIe's
direct connection to a CPU. NVME is
able to truly unleash the performance of
SSDs, yielding six times performance
boosts over SATA, essentially meaning
SATA and AHCI are dead. The high speed,
low latency transfer rates that NVME
brings unlike with SATA isn't going to
capped anytime soon, due to evolving PCIe
and NVME standards. There are also various
PCIe slots on a computer: 2x, 4x, 8x, 16x,
with each multiple adding
additional bandwidth. For example, an 8x
PCIe 3.0 port could transfer 8 gigabytes
per second. So, with multiple types of
memory, new memory types breaking current
defined memory paradigms, memory devices
optimized with smart caching and
fetching, memory pooling, new data
protocols and data transfer mediums,
memory vertical scaling such as 3D NAND
and X-Pint and finally 3D
integrated circuits, the memory gap will
finally begin to start decreasing
instead of increasing. Beyond these
concepts, in future videos we'll be
discussing optical computing and
neuromorphic computing.
[Music]
At this point the video has come to a
conclusion, I'd like to thank you for
taking the time to watch it.
If you enjoyed it, consider supporting me
on Patreon to keep this channel growing
and if you want me to elaborate on any
other topics discussed, or have any topic
suggestions, please leave them in the
comments below.
Consider subscribing for more content,
follow my Medium publication for
accompanying blogs and like my Facebook
page for more bite-sized chunks of
content. This has been Ankur, you've been
watching Singularity Prosperity and I'll
see you again soon!
[Music]
No comments:
Post a Comment