What is an integrated memory controller? RAM. RAM clock speed

Not long ago, processors of the AMD64 family appeared on the market, which are based on the new revision E core. This core is manufactured using a technological process with 90 nm production standards, as well as using SOI (Silicon on Insulator) and DSL (Dual Stress Liner) technologies ) found application in several lines of processors from AMD. The areas of application of the revision E kernel are very different. It can be found in both the Athlon 64 and Athlon 64 FX processors, where it is codenamed Venice and San Diego; in dual-core CPU family Athlon 64 X2, where it is called Toledo or Manchester; as well as in Sempron processors, where this core is called Palermo.

By developing and bringing new cores to the stage of mass production, AMD company strives not only to increase the maximum clock speeds of its processors, but also to improve their characteristics. The revision E kernel became the next step on this path: with its implementation, Athlon 64 processors and their derivatives acquired new properties. The most noticeable improvement was the appearance in AMD processors of support for SSE3 instructions, which were available in competitor products since the launch of CPUs with a 90 nm Prescott core. In addition, the integrated memory controller has also undergone traditional fine-tuning.

Tests have shown that support for SSE3 commands gives very little. Today there are very few applications that effectively use these instructions, and the SSE3 set itself can hardly claim to be a full-fledged subset of instructions.

Therefore, this time we decided to pay more attention to the changes made to the integrated memory controller of processors with the revision E kernel. It should be noted that in earlier cores of its CPUs, AMD not only increased the performance of the memory controller, but also expanded its compatibility with various combinations of different memory modules. The revision D kernel, known primarily for the Athlon 64 processors codenamed Winchester, was a kind of milestone in this regard. Firstly, the performance of the memory controller has slightly increased in Winchester processors compared to their predecessors. Secondly, processors with the Winchester core are now capable of working with DDR400 SDRAM modules installed in all four DIMM slots at once. motherboard. It would seem that the optimum has been achieved, however, AMD engineers thought otherwise. AMD processors with revision E kernel have an even more advanced memory controller.

Where were the engineers' efforts directed this time? Naturally, certain optimizations were again made to increase the performance of the memory controller. Thus, tests of processors with the Venice core demonstrated their slight superiority over their counterparts with the Winchester core. In addition, compatibility has improved again. AMD processors with revision E kernel are now able to function normally when several memory modules of different organization and size are installed in the system, which undoubtedly greatly simplifies the selection of components for further upgrades. Also, processors based on the new core can now work without problems with four double-sided DDR400 SDRAM modules. Another interesting property of processors with the revision E kernel was the appearance of new dividers that set the memory frequency. Thanks to this, new CPUs from AMD now without any reservations support DDR SDRAM operating at frequencies exceeding 400 MHz.

advertising

In this article we will look at some of the above features of the integrated memory controller of the revision E kernel, because, in our opinion, they clearly deserve it.

Works with four double-sided DDR400 SDRAM modules

The integrated memory controller of Athlon 64 processors is a rather capricious unit. Various unpleasant aspects associated with its functioning began to become clear since the advent of processors with support for two memory channels. It turned out that due to the fairly high electrical load that the memory modules impose on the controller, the Athlon 64 has certain problems when working with four DIMM modules. So, when installing four memory modules into an Athlon 64-based system, the CPU may reset their frequency, increase timings, or not work at all.

However, to be fair, it should be noted that the server analogue of the Athlon 64, Opteron, is free of such problems due to the use of more expensive register modules. However, the use of such modules in desktop systems is unjustified, and therefore users must put up with some restrictions that arise when installing more than two DIMMs into the system.

However, the problems described are gradually being solved. While the older Athlon 64 processors, based on 130 nm cores, could not handle four dual-sided DDR400 SDRAM modules at 400 MHz at all and reduced their frequency to 333 MHz, modern processors with 90 nm cores offer users several best options. Already in the revision D kernel, known to us by the code name Winchester, it became possible to work with four double-sided DDR400 SDRAM modules, provided the Command Rate timing was set to 2T.

By memory called a device designed for records (storage) And reading information.

The controller memory stores:

  1. serving manufacturer's programs,
  2. user programs,
  3. controller configuration,
  4. data blocks (values ​​of variables, timers, counters, markers, etc.).

Properties of memory. Memory is characterized by:

  1. Memory capacity (KB, MB or GB).
  2. Speed ​​or memory access time.
  3. Energy dependence. Behavior after power failure.

Rice. 3.4 Types of memory(drawing by the author).

Operationalmemory(RAM - random access memory).

Advantage.

Is the most express semiconductor electronic memory designed for short-term storage of information.

Flaw.

The main property of this memory is volatility, i.e. loss of data after turning off the electrical power.

To buffer RAM, some controllers use batteries or high-capacity electrical capacitors that can store electric charge up to several days.

The RAM element is an electronic trigger (static memory) or an electrical capacitor (dynamic memory).

Rice. 3.5 Trigger - the main element RAM memory (drawing by the author).

Dynamic memory requires cyclic recharging of capacitors, however, it is cheaper than static memory.

Memory Matrixrepresents totality individual memory cells - triggers.

Row 1 of the matrix contains 8 memory cells (8 Bits correspond to 1 Byte).

Each memory cell has its own unique address (row no. “point” no. bit).

Rows (bits) are numbered from right to left from “0” to “7”.

Lines (bytes) are numbered from top to bottom, starting with “0”.

Rice. 3.6 Memory Matrix(drawing by the author).

Persistent memory (ROM - read only memory) Designed for long-term storage of information. The main difference from RAM is that it capable of storing information without a power source, i.e. is non-volatile.

This memory, in turn, is divided into two types: once(ROM) – and repeatedly reprogrammable(PROM)

Reprogrammable memory recorded by the user using programmers. To do this, you must first erase memory contents .

Refers to the old type of reprogrammable memory EPROM- memory erased by ultraviolet rays (EPROM - erasable programmable read only memory).

Rice. 3.7 EPROM memory erased by ultraviolet rays (source http://ru.wikipedia.org/wiki/%D0%A4%D0%B0%D0%B9%D0%BB:Eprom.jpg).

EEPROM (Electrically Erasable Programmable Read-Only Memory) is an electrically erasable reprogrammable read-only memory (EEPROM), a type of non-volatile memory (such as PROM and EPROM ). This type of memory can be erased and refilled with data up to a million times.

Today, the classic two-transistor EEPROM technology has been almost completely replaced by NOR flash memory. However, the name EEPROM is firmly attached to this memory segment, regardless of technology.

Rice. 3.8 Flash memory programming.

(sourcehttp://ru.wikipedia.org/wiki/%D0%A4%D0%B0%D0%B9%D0%BB:Flash_programming_ru.svg).

Flash memory (flash memory) - a type of solid-state semiconductor non-volatile rewritable memory.

It can be read as many times as desired (within the data storage period, typically 10-100 years), but it can be written to such memory only a limited number of times (maximum - about a million cycles). It does not contain moving parts, so, unlike hard drives, it is more reliable and compact.

Due to its compactness, low cost and low power consumption, flash memory is widely used in digital portable devices.

Conditional division of controller memory areas

The controller provides the following memory areas to store the user program, data and configuration.

Boot memory – this is non-volatile memory for the user program,

data and configuration. When a project is loaded into the controller, it is first stored in load memory. This memory is either on the memory card (if available) or directly built-in. Non-volatile memory information is also retained when the power is turned off. The memory card supports more memory than the memory built into the controller.

Working memoryis a volatile memory. The controller copies some design elements from load memory to working memory. This memory area is lost when power is lost, and when power returns, the controller restores it.

Retained memory – This is non-volatile memory for a limited number of working memory values. This memory is used for selective storage important information user in case of power loss. During a power failure, the controller has enough time to save the values ​​of a limited number of memory addresses. When the power is turned on, these stored values ​​are restored.


Data recovery

Rice. 3.9 Phases of information recovery (drawing by the author).

1. Information about the state of the control process stored in RAM is called management process POU. Those. all physical terminals of the input-output block have virtual counterparts (flip-flops) in the controller memory. Typically, to increase the speed of information exchange, the processor accesses information from the RAM (rather than the physical input/output terminals). The results of program processing from the process image are written to the output terminals cyclically.

2. After the supply voltage is turned off (the voltage drops below a critical level), the most important information is retained back from RAM to EEPROM. The areas of data to be saved are determined by the user.

  • What is a memory matrix?
  • How many memory cells are there in one row of the memory matrix?
  • How are the memory matrix columns numbered (direction and range)?
  • What are the main types of controller memory (name only two types)?
  • What advantages does one type of memory have over another (two answers)?
  • What types of controller RAM is divided into (2)?
  • What types of permanent memory are divided into according to the frequency of programming (2)?
  • What types of reprogrammable read-only memory are divided into?by erasing method (2)?
  • Where does the information come from? RAM when turning on the controller power?
  • Is all information lost from RAM when power off(if it doesn’t disappear, then where and what information is saved)?
  • What is the information about the state of the input/output terminals in RAM called?
  • What memory block does the processor primarily work with?

  • Hello Giktimes! Upgrading RAM is the most basic type of PC upgrade, as long as you're lucky and don't stumble upon one of the many hardware incompatibilities. We tell you in what cases a set of cool RAM will not “start” on an old PC, why on some platforms you can increase RAM only with the help of “selected” modules, and we warn about other characteristic quirks of hardware.


    We know about RAM that there is never too much of it, and that, depending on the antiquity of the computer, you have to choose from very old DDR, old DDR2, mature DDR3 and modern DDR4. At this point, the guide at the level of “well, the main thing is to buy it, and then it will somehow work, or exchange it, if anything” could be completed - it’s time to consider the pleasant and not so specific in the selection of hardware. That is, cases when:

    • It should work, but for some reason it doesn't
    • the upgrade is not cost-effective or is it better to do it in a multi-step manner
    • I want to carry out the modernization with “little blood” in accordance with the potential of the PC

    Check where the controller is located

    If you're upgrading an outdated computer not just for the "love of the art" but also for practical reasons, it makes sense to first evaluate how viable the hardware platform is before investing in it. The most archaic of the current ones are chipsets for Socket 478 (Pentium IV, Celeron), which extend from platforms with support for SDRAM PC133 (Intel 845 chipset, for example), through mainstream DDR-based options, up to later, strikingly more modern chipsets with DDR2 support PC2-5300 (Intel 945GC, etc.).


    Previously, controllers were located outside the processor, but now, as it happens, they work from inside

    Against this background, alternatives from the AMD camp of the same time look less colorful: all chipsets for Socket 754, which housed the Athlon 64, representatives of the K8 microarchitecture, support DDR memory, the same type of memory was supported by processors for Socket 939 (Athlon 64 and the first dual-core Athlon 64 X2). Moreover, in the case of AMD chips, the memory controller was built into the processor - now this approach would not surprise anyone, but Intel purposefully kept the controller in the chipset, precisely in order to combine processors for the same socket with new types of RAM.

    For this reason, subsequent AMD chips for socket AM2/AM2+ with a RAM controller under the processor cover worked only with DDR2, and Intel with its “long-lived” Socket 775 extended the pleasure with DDR to the very tomatoes of DDR3! In more modern platforms, both processor manufacturers have switched to an on-chip CPU controller, and such tricks with supporting assorted RAM are a thing of the past.

    When is it cheaper to change a chipset than to shell out for old memory?

    This cumbersome list is not needed to impress readers with the breadth and abundance of chipsets in outdated PCs, but to provide a little unexpected upgrade maneuver. The essence of this simple maneuver is that sometimes it will be more rational to purchase a motherboard that supports a cheaper and modern memory, rather than shelling out for the already rare RAM of the previous generation.

    Because the same amount of DDR2 memory on the secondary market will be at least 50% more expensive than DDR3 memory of comparable capacity. Not to mention that DDR3 has not yet been removed from the assembly line, so it can be purchased in new condition, in an inexpensive kit.
    And with new chipsets, it becomes possible to expand RAM to values ​​that are relevant today. For example, if you compare prices in Russian retail, then 8 gigabytes (2x 4 Gb) of DDR2 memory with a frequency of 800 MHz will cost you about 10 thousand rubles, and the same amount of DDR3 memory with a frequency of 1600 MHz (Kingston Value RAM KVR16N11/8, for example) - 3800-4000 rubles. Taking into account the sale and purchase of a motherboard for an old PC, the idea looks reasonable.

    The realities of upgrading computers with native DDR and DDR2 support have long been known to everyone:

    • memory modules with different timings and frequencies most often they manage to work together, and “alignment” occurs either according to the SPD profile in a less powerful module, or (what’s worse) the motherboard chooses a standard profile for working with RAM. As a rule, with the minimum allowable clock frequency.
    • the number of modules, ideally, should be equal to the number of channels. Two memory sticks with a capacity of 1 GB each in an old PC will work faster than four modules with a capacity of 512 MB. Fewer modules means lower load on the controller, higher efficiency.


    Two channels in the controller - two memory modules for maximum performance. The rest is a compromise between capacity and speed.
    • Modules of equal volume work more efficiently in dual-channel mode. In other words, 1 GB + 1 GB will be better than 1 GB + 512 MB + 512 MB.
    • evaluate platform performance before purchasing memory. Because some chipsets do not reveal the potential of even their “antediluvian” type of RAM. For example, the Intel 945 Express platform is equipped with a dual-channel DDR2 controller supporting frequencies up to 667 MHz. This means that the platform will recognize the DDR2 PC6400 modules you purchased, but the modules will be limited in performance and will work only as PC2-5300, “identical to natural ones.”


    The Intel LGA775 socket is one of the options when buying a motherboard with DDR3 support is easier and cheaper than upgrading memory with a platform within old version DDR

    And, it seems, this list of nuances is enough to make you want to “drag” an LGA775-based computer to a chipset with DDR3 support. However, you will still laugh, but upgrading an old platform with new RAM also has its own nuances.

    In debut platforms with DDR3 support (Intel x4x and x5x chipsets and AMD analogues of the same time), controllers can only work with old-style modules. An absurd situation? Yes, but the fact remains a fact.

    The fact is that old systems do not speak the “language of communication” with modules that are equipped with high-density memory chips. At the everyday level, this means that this module, whose 4 gigabytes are “spread” across eight chips on the front side of the printed circuit board, will not be able to work in an old PC. And the old module, in which the same volume is implemented on 16 chips (8 on each side) with a similar volume and frequency, will be operational.

    Such compatibility problems are typical, for example, for the desktop Intel G41 Express (the same one that carries a considerable share of the surviving Core 2 Duo or Core 2 Quad) or the mobile Intel HM55 (laptops based on the first generation Intel Core based on the Nehalem microarchitecture).

    Sometimes motherboard/laptop manufacturers release new BIOS versions in order to teach older platforms to work with new RAM revisions, but most often there is no talk of any long-term support for old equipment. And, unfortunately, there is no talk of any special series of memory for owners of “outdated, but not quite” PCs - memory production has moved forward and turning it back is very expensive.

    In order not to bother with such concepts as “memory chip density,” at the household level, owners of old PCs are advised to look for Double-sided DIMM, dual-sided memory modules that are more likely to be compatible with debut DDR3-based platforms. In the Kingston model line, a suitable option would be HyperX Blu KHX1333C9D3B1K2/4G - 4 GB DDR3 module for desktops with sixteen memory modules on board. It's not so easy to find on sale, but if you want 16 GB on an old PC, know how to spin.

    And yes, the “best of the archaic” chipsets, such as the Intel P35 Express, for example, are also content with DDR3 support at 1333 instead of the 1600 MHz typical for modern budget platforms.


    HyperX Blu KHX1333C9D3B1K2 is one of the few ways to get 16 GB of RAM in older PCs

    No diversity - no problem

    After a long-term “stronghold of resistance” with the memory controller in the northbridge of Intel platforms, experiments stopped. All new Intel and AMD platforms included a controller under the cover of the CPU itself. This, of course, is bad from the point of view of the longevity of the platform (you can’t do the trick and “switch” to a new type of memory with an old processor), but RAM manufacturers adjusted and, as you can see, DDR3 memory has not lost its popularity even in 2017. Its carriers today are the following platforms:
    AMD Intel
    am3 lga1366
    am3+ lga1156
    fm1 lga1155
    fm2 lga1150
    fm2+ lga2011

    The list of processor architectures based on these platforms is much more extensive! But there is less variety in the choice of memory, or rather almost none. The only exception is AMD processors for socket AM3, which, to the delight of budget-conscious buyers, are compatible with socket AM2, AM2+. Accordingly, the “reds” equipped such processors with a universal controller that supports both DDR2 memory (for AM2+) and DDR3. True, in order to “boost” DDR3 on Socket AM3 to frequencies of 1333 and 1600 MHz, you will have to additionally tinker with the settings.


    This is roughly how new computers based on DDR3 and competing memory types compared in the recent past

    The principles for selecting memory in the case of DDR3-based platforms are as follows:

    • for FM1, FM2 and FM2+, if we are talking about an APU with powerful integrated graphics, you can and should choose the most powerful RAM. Even old chips based on FM1 are able to cope with DDR3 at a frequency of 1866 MHz, and chips based on the Kaveri microarchitecture and its “restyling” Godavari in some cases squeeze out all the juice even from extremely overclocked DDR3 at a frequency of 2544 MHz! And these are not “corn” megahertz, but really useful in real work scenarios. Therefore, overclocking memory is simply necessary for such computers.


    Performance gains in AMD APUs depending on RAM frequency (source: ferra.ru)

    It’s worth starting, for example, with modules HyperX HX318C10F - they already work “in the base” at 1866 MHz and CL10, and when overclocked they will come in handy for clock-sensitive AMD hybrid processors.


    AMD APUs desperately need high-frequency memory

    • "antique" Intel processors on LGA1156 and its server brother LGA1366 platforms capable of riding high-frequency DDR3 only if the multiplier is correctly selected. Intel itself guarantees stable work exclusively within the range “up to 1333 MHz”. By the way, do not forget that in addition to supporting ECC registered memory, the LGA1366 and LGA2011 server platforms offer three- and four-channel DDR3 controllers. And they remain, perhaps, the only candidates for upgrading RAM to 64 GB, because non-registered memory modules with a capacity of 16 GB are almost never found in nature. But in LGA2011, memory overclocking has become easily possible up to 2400 MHz.
    • Almost all processors based on microarchitectures Sandy Bridge and Ivy Bridge (LGA1155) support RAM with frequencies up to 1333 MHz. It is no longer possible to increase the clock generator frequency and thus achieve “easy” overclocking in this generation of Intel Core. But models with an unlocked multiplier and the “correct” motherboard capable of going far beyond the notorious 1333 MHz, so for Z-chipsets and processors with the K suffix it makes sense to spend money on modules HyperX Fury HX318C10F - the standard 1866 MHz is “driveable” almost to the maximum values ​​​​for Bridge processors. It won't seem enough!
    • LGA1150, a carrier of chips based on Haswell and Broadwell microarchitectures, became the last of Intel’s “civilian” platforms with support for DDR3, but the methods of interaction with RAM have not changed much since the days of Sandy Bridge and Ivy Bridge. Is it just support? mass models DDR3 at 1600 MHz has finally become a reality. If we talk about overclocking, then the theoretical maximum for processors with unlocked multipliers on overclocking motherboards is 2933 MHz! The maximum is the maximum, but with support for XMP profiles in modern DDR3 modules, it can be achieved high frequencies on aging types of memory it is no longer difficult.
    By the way, it was in the era of LGA1150 that memory came into use through the efforts of laptop developers DDR3L(although its production started back in 2008). It consumes a little less energy (1.35V versus 1.5V in “just” DDR3), and is compatible with all old chipsets that came out before its distribution on the market. But it is no longer advisable to install DDR3 at 1.5V in laptops that can only handle DDR3L - the memory either will not work at all or will not work correctly with the computer.

    DDR4 is the fastest, most basic memory to upgrade and purchase

    It’s hard to call DDR4 SDRAM memory a new product - after all, Intel processors Skylake, the first mass-produced CPUs with DDR4 on board, came out back in 2015 and managed to get a “restyling” in the form of slightly more optimized and efficient overclocking ones Kaby Lake. And in 2016, AMD demonstrated a platform with DDR4 support. True, I just demonstrated, because the AM4 socket is intended for AMD processors“finally serious competition” RyZEN, which were just declassified.


    DDR4 is still very young, but in order to unlock the potential of four-channel controllers on the Intel LGA 2011-v3 platform, overclocker memory is already needed

    With the choice of memory for supernova platforms, everything is extremely simple - the frequency of mass-produced DDR4 modules starts at 2133 MHz (they are also achievable on DDR3, but “in a jump”), and the volume starts at 4 GB. But buying a “starter” DDR4 configuration today is as short-sighted as being content with DDR3 with a frequency of 800 MHz at the dawn of its appearance.

    The memory controller built into processors based on the LGA1151 platform is dual-channel, which means that you need to fit into a couple of modules, the capacity of which is enough for modern games. Today this volume is 16 GB (no, we’re not kidding - with 8 GB of RAM in 2017 you won’t be able to “deny yourself anything”), and as for the clock frequency, DDR4-2400 memory has become the right mainstream.

    In server/extreme processors for the LGA 2011-v3 platform, the memory controller is already four-channel, and of all types of RAM, only DDR4-2133 is de jure supported, but overclocking memory based on the Intel X99 chipset with Intel Core i7 Extreme is not easy, but very easy . Well, a computer for maximalists needs memory for maximalists - for example, “the toughest” HyperX Predator DDR4 HX432C16PB3K2 with a clock frequency of 3200 MHz. According to the “go for a walk” principle, the LGA 2011-v3 platform must be equipped with all four modules - only in this case will the four-channel controller be able to realize the full speed potential of the memory subsystem.

    In order not to cram the rules and exceptions

    What can be added to the nuances of choice described above? A lot of things: specific all-in-one nettops with non-reference design of components, laptops of the same model with completely different potential for upgrades, individual capricious models of motherboards and other “rake” that are easy to stumble upon if you have not followed hardware trends on the forums enthusiasts.

    In this case, Kingston offers online configurator. With its help, you can select guaranteed compatible and efficient RAM for desktops, workstations, nettops, ultrabooks, servers, tablets and other devices.
    There is a reason to check the compatibility of the PC hardware with the memory you are considering purchasing, so as not to return to the store and explain to consultants that “the memory is functional, but my computer needs DDR3-1600, which is not quite the usual DDR3-1600.”

    Don't leave old people to their fate!

    Don't you think - upgrading memory is really more troublesome than older computer. This article does not cover all possible difficulties and particulars in choosing memory (it is almost physically impossible, and you would be tired of going through the entire summary of such trifles). But this is not a reason to send still working hardware to the dustbin of history.


    You can light up at any age

    Because outdated PCs from our overclocking-enthusiast bell towers can still do a good job for less ambitious users or retrain as a home server/media center, and we won’t be performing yet another song to the “immortal” Sandy Bridge, which celebrated its sixth anniversary and is still good. I wish you high performance and fair winds in upgrading your PC!

    Fast RAM is good, but fast RAM at a discount is even better! Therefore, do not miss the opportunity to purchase any of the HyperX Savage DDR4 and HyperX Predator DDR4 memory kits with a 10% discount using a promotional code before March 8 DDR4FEB in Yulmart. There is no such thing as too much memory, and even more so with powerful and cool memory for new PC platforms!

    For getting additional information about products Kingston And HyperX please visit the company's official website. HyperX will help you choose your kit

    It seems that Intel is catching up with AMD in this regard. But, as often happens, when a giant does something, the step forward is gigantic. If Barcelona uses two 64-bit DDR2 memory controllers, the top Intel configuration includes as many as three DDR3 memory controllers. If you install DDR3-1333 memory, which Nehalem will also support, this will give bandwidth up to 32 GB/s in some configurations. But the advantage of an integrated memory controller lies in more than just bandwidth. It significantly reduces memory access latency, which is equally important given that each access costs several hundred clock cycles. In the context of desktop use, the reduced latency of the integrated memory controller is welcome, but the full benefit of a more scalable architecture will be seen in multi-socket server configurations. Previously, when adding a CPU, it was available throughput remained the same, but now each new additional processor increases throughput, since each CPU has its own memory.

    Of course, miracles should not be expected. This is a Non Uniform Memory Access (NUMA) configuration, that is, access to memory will cost one or another overhead, depending on where the data is located in memory. It is clear that local memory will be accessed with the lowest latency and highest throughput, since access to remote memory occurs through an intermediate QPI interface, which reduces performance.


    Click on the picture to enlarge.

    The performance impact is difficult to predict because it depends on the application and operating system. Intel claims that the performance drop when remote access in terms of delays is about 70%, and throughput is reduced by half compared to local access. According to Intel, even with remote access via the QPI interface, latencies will be lower than on previous generations of processors, where the controller was located on the northbridge. However, this only applies to server applications, which have been developed with NUMA configurations in mind for quite some time.

    The memory hierarchy at Conroe was very simple; Intel focused on the performance of the shared L2 cache, which became the best solution for an architecture that was aimed primarily at dual-core configurations. But in the case of Nehalem, the engineers started from scratch and came to the same conclusion as competitors: the shared L2 cache is not a good fit for the native quad-core architecture. Different cores may flush data needed by other cores too often, leading to too many problems with internal buses and arbitration trying to provide all four cores with enough bandwidth while keeping latency low enough. To solve these problems, engineers equipped each core with its own L2 cache. Since it is allocated to each core and is relatively small (256 KB), it was possible to provide the cache with very high performance; in particular, latency has improved significantly compared to Penryn - from 15 clock cycles to approximately 10 clock cycles.

    Then there is a huge L3 cache (8 MB), which is responsible for communication between the cores. At first glance, the Nehalem cache architecture resembles Barcelona, ​​but the operation of the third level cache is very different from AMD - it is inclusive for all lower levels of the cache hierarchy. This means that if a core tries to access data and it is not in the L3 cache, then there is no need to look for the data in other cores' own caches - it is not there. In contrast, if data is present, the four bits associated with each cache line (one bit per core) indicate whether the data could potentially be present (potentially, but not guaranteed) in another core's lower cache, and if so, which one.

    This technique is very effective at ensuring coherence between each core's personal caches because it reduces the need for inter-core communication. There is, of course, a drawback in the form of loss of part of the cache memory for data present in the caches of other levels. However, it's not all that scary, since the L1 and L2 caches are relatively small compared to the L3 cache - all the data in the L1 and L2 caches takes up a maximum of 1.25 MB in the L3 cache out of the available 8 MB. As with Barcelona, ​​the L3 cache operates at different frequencies compared to the chip itself. Therefore, the access latency at this level can vary, but it should be around 40 clock cycles.

    The only disappointments with the new Nehalem cache hierarchy are with the L1 cache. The instruction cache bandwidth has not been increased - still 16 bytes per clock compared to 32 for Barcelona. This can create a bottleneck in a server-centric architecture because 64-bit instructions are larger than 32-bit instructions, especially since Nehalem has one more decoder than Barcelona, ​​which puts more cache pressure. As for the data cache, its latency has been increased to four clock cycles compared to Conroe's three, making it easier to run at high clock speeds. But we'll end with some positive news: Intel engineers have increased the number of L1 data cache misses that the architecture can handle in parallel.

    TLB

    For many years now, processors have been working not with physical memory addresses, but with virtual ones. Among other advantages, this approach allows the program to allocate more memory than is available on the computer, saving only what is needed. this moment the data is in physical memory, and everything else is on the hard drive. This means that every memory access, the virtual address must be translated into a physical address, and a huge table must be used to maintain the correspondence. The problem is that this table turns out to be so large that it can no longer be stored on the chip - it is located in main memory, and it can even be reset to HDD(part of the table may be missing from memory after being flushed to the HDD).

    If each memory operation required such an address translation stage, then everything would work too slowly. So engineers returned to the principle of physical addressing, adding a small cache directly on the processor that stores mappings for several recently requested addresses. The cache is called Translation Lookaside Buffer (TLB). Intel has completely redesigned the TLB in the new architecture. Until now, Core 2 has used the first level TLB very small size(16 entries), but very fast and only for downloads, as well as a larger second-level TLB cache (256 entries), which was responsible for downloads that were not in the L1 TLB, as well as writes.

    Nehalem is now equipped with a full two-level TLB: the first level TLB cache is divided for data and instructions. The TLB L1 cache for data can store 64 entries for small pages (4K) or 32 entries for large pages (2M/4M), and the TLB L1 cache for instructions can store 128 entries for small pages (same as Core2), and seven for large ones. The second level consists of a unified cache that can store up to 512 entries and only works with small pages. The purpose of this improvement is to increase the performance of applications that use large amounts of data. As in the case of the two-level branch prediction system, we have another evidence of a server-oriented architecture.

    Let's return to SMT for a moment, since this technology also affects the TLB. The L1 data TLB cache and L2 TLB cache are dynamically shared between the two threads. In contrast, the L1 TLB cache for instructions is statically allocated for small pages, and the one allocated for large pages is completely copied - this makes sense given its small size (seven entries per thread).

    Memory access and prefetching

    Optimized Unaligned Memory Access

    In the Core architecture, memory access introduced a number of performance limitations. The processor was optimized to access memory addresses aligned on 64-byte boundaries, that is, the size of a single cache line. For unaligned data, access was not only slow, but executing unaligned read or write instructions was more overhead than for aligned instructions, regardless of the actual alignment of the memory data. The reason was that these instructions caused multiple micro-ops to be generated on the decoders, which reduced throughput with these types of instructions. As a result, compilers avoided generating instructions of this type, substituting instead a sequence of instructions that were less expensive.

    Thus, reading from memory, which overlapped two cache lines, was slowed down by about 12 clock cycles, compared to 10 clock cycles for writing. Intel engineers optimized this type of call so that it runs faster. To begin with, there is now no performance hit when using unaligned read/write instructions in cases where the data is aligned in memory. In other cases, Intel has also optimized access, reducing the performance hit compared to the Core architecture.

    More prefetchers with more efficient operation

    In the Conroe architecture, Intel was especially proud of the hardware prediction units. As you know, a prediction unit is a mechanism that monitors memory access patterns and tries to predict what data will be needed in a few clock cycles. The goal is to proactively load data into the cache, where it will be located closer to the processor, while maximizing the available bandwidth when the processor does not need it.

    This technology produces great results with most desktop applications, but in a server environment it often results in performance issues. There are several reasons for this ineffectiveness. First, memory accesses are often more difficult to predict in server applications. Access to a database, for example, is by no means linear - if a data element is requested in memory, this does not mean that the next element will be next. This limits the effectiveness of the prefetch unit. But the main problem was memory bandwidth in multi-socket configurations. As we said before, it was already a bottleneck for several processors, but on top of that, the prefetchers introduced additional load at this level. If the microprocessor is not accessing memory, then the prefetchers will turn on, trying to use the bandwidth they assume is free. However, the blocks could not know whether another processor needed this bandwidth. This meant that the prefetchers could rob the processor of bandwidth, which was already a bottleneck in such configurations. To solve this problem, Intel has found nothing better than disabling prefetchers in such situations - hardly the most optimal solution.

    Intel claims that this issue has been resolved, but the company has not given any details about how the new prefetch mechanisms work. All the company says is that there is now no need to disable blocks for server configurations. However, even Intel has not changed anything; the benefits of the new memory organization and, as a result, greater bandwidth should offset the negative impact of prefetch units.

    Conclusion

    Conroe became a serious foundation for new processors, and Nehalem is built on it. It uses the same efficient architecture, but is now much more modular and scalable, which should guarantee success in different market segments. We're not saying that Nehalem revolutionized the Core architecture, but new processor revolutionized Intel platform, which has now become a worthy match for AMD in design, and in implementation Intel has successfully outperformed its competitor.


    Click on the picture to enlarge.

    With all the improvements made at this stage (integrated memory controller, QPI), it is not surprising to see that the changes to the execution core are not that significant. But the return of Hyper-Threading can be considered serious news, and a number of small optimizations should also provide a noticeable performance increase compared to Penryn at equal frequencies.

    It is quite obvious that the most significant increase will be in those situations where the main bottleneck was RAM. If you read the entire article, you probably noticed that Intel engineers paid maximum attention to this area. Besides the addition of an on-chip memory controller, which will undoubtedly provide the biggest boost in terms of data access operations, there are many other improvements both large and small - new cache and TLB architecture, unaligned memory access and prefetchers.

    With all the theoretical information in mind, we're looking forward to seeing how the improvements translate to real-world applications once the new architecture is released. We will be devoting several articles to this, so stay tuned!

    Memory

    Memory is a device for storing information. It consists of random access and permanent storage devices. The random access memory device is called RAM, read only memory - ROM.

    RAM - volatile memory

    RAM is designed for recording, reading and storing programs (system and application), initial data, intermediate and final results. Direct access to memory elements. Other name - RAM(Random Access Memory) random access memory. All memory cells are combined into groups of 8 bits (1 byte) and each such group has an address at which it can be accessed. RAM is used for temporary storage of data and programs. When you turn off the computer, the information in RAM is erased. RAM is volatile memory. IN modern computers Memory capacity typically ranges from 512 MB to 4 GB. Modern application programs often require 128–256, or even 512 MB of memory for their execution, otherwise the program simply will not be able to work.

    RAM can be built on dynamic chips (Dinamic Random Access Memory - DRAM) or static (Static Random Access Memory - SRAM) type. Static memory has significantly higher performance, but is much more expensive than dynamic memory. For register memory (MPC and cache memory) SRAM is used, and the main memory RAM is built on the basis of DRAM chips.

    ROM is non-volatile memory.

    In English-language literature, ROM is called Read Only Memory, ROM(read-only memory). Information in ROM is written at the factory of the memory chip manufacturer, and its value cannot be changed in the future. ROM stores information that is independent of the operating system.

    The ROM contains:


    • Program for controlling the operation of the processor itself

    • Programs for controlling the display, keyboard, printer, external memory

    • Programs for starting and stopping the computer (BIOS – Base Input / Outout Sysytem)

    • Device testing programs that check the correct operation of its units every time you turn on the computer (POST -Power On SelfTest)

    • Information about where on the disk it is located operating system.

    CMOS - non-volatile memory

    CMOS RAM is non-volatile computer memory. This chip multiple write has a high density of elements (each cell is 1 byte in size) and low energy consumption - there is enough power for it batteries computer. Received its name from the technology of creation based on complementary metal-oxide semiconductors ( complementary metal-oxide semiconductor- CMOS). CMOS RAM is a database for storing PC configuration information. The Setup BIOS computer startup program is used to set and store configuration settings in CMOS RAM. Each time the system boots, the parameters stored in the CMOS RAM chip are read to determine its configuration. Moreover, since some computer startup parameters can be changed, all these variations are stored in CMOS. The BIOS SETUP installation program, when writing, saves its system information in it, which it later reads itself (when the PC boots). Despite the obvious connection between the BIOS and CMOS RAM, this is absolutely different components.



    Keywords of this lecture

    controllers, chipset, ports, USB, COM, LPT, BIOS POST, CMOS, Boot, I/O devices,

    (controller- regulator, control device) - a control device for various computer devices.

    Chipset(chipset)

    A set of chips designed to work together to perform a set of functions. Thus, in computers, the chipset located on the motherboard acts as a connecting component that ensures the joint functioning of the memory subsystems, central processing unit (CPU), input-output and others. Motherboard (motherboard, MB, also used name mainboard- main board; slang. Mother, mother, motherboard) is a complex multilayer printed circuit board, on which the main components of a personal computer are installed (central processor, RAM controller and RAM itself, boot ROM, controllers of basic input-output interfaces), chipset, connectors (slots) for connecting additional controllers using USB, PCI and PCI-Express buses.

    North Bridge (Northbridge; in selected Intel chipsets, controller hub Memory Controller Hub (MCH) - chipset system controller on the motherboard x86 platform, to which the following are connected as part of the organization of interaction:

    via Front Side Bus - microprocessor,

    via the memory controller bus - RAM,

    via the graphics controller bus - video adapter,

    connected via internal bus south bridge.

    South Bridge(Southbridge; functional controller; I/O Controller Hub, ICH). Usually this one chip on the motherboard, which through the Northbridge connects “slow” (compared to the CPU-RAM connection) interactions with the central processor (for example, bus connectors for connecting peripheral devices).

    AGP(from the English Accelerated Graphics Port, accelerated graphics port) - developed in 1997 by Intel, a specialized 32-bit system bus for a video card.

    PCI(English: Peripheral component interconnect, literally - interconnection of peripheral components) - an input/output bus for connecting peripheral devices to the computer motherboard.

    Ultra DMA(Direct memory access, Direct memory access). Different versions ATA is known by the synonyms IDE, EIDE, UDMA, ATAPI; ATA (Advanced Technology Attachment) - parallel interface for connecting drives ( hard drives And optical drives) to the computer. In the 1990s it was standard on the IBM PC platform; is currently being replaced by its successor - SATA and with its advent it received the name PATA (Parallel ATA).

    USB(English Universal Serial Bus - “universal serial bus”, pronounced “yu-es-bee” or “oo-es-be”) - a serial data transfer interface for medium- and low-speed peripheral devices in computing. To connect peripheral devices to the USB bus, a four-wire cable is used, with two wires ( twisted pair) in differential connection are used to receive and transmit data, and two wires are used to power the peripheral device. Thanks to built-in lines USB power supply allows you to connect peripherals without its own power supply (the maximum current consumed by the device via the USB bus power lines must not exceed 500 mA).

    LPT-port (of a standard printer device “LPT1” Line Printer Terminal or Line PrinTer) in operating systems MS-DOS family. IEEE 1284 (printer port, parallel port)

    COM-port (“com port” Communication port, Serial port, serial port, serial port) is a bidirectional serial interface designed for exchanging bit information. Consistent this port called because information is transmitted through it one bit at a time, bit by bit (unlike a parallel port).

    PS/2- connector used to connect a keyboard and mouse. First appeared in 1987 on IBM PS/2 computers and subsequently gained recognition from other manufacturers and wide use on personal computers and workgroup servers. series personal computers IBM on processors of the Intel 80286 and Intel 80386 series, produced since April 1987. /2 – computer version.