How many memory interface mistakes are there to make? There is one saying I like very much: ”Don’t make the same mistake twice”. There are enough mistakes to make that you probably don’t have time to try them all even once.
In memory interface design there are still enough mistakes left to make… But just to make sure you are not wasting your time making some of the well-known mistakes, Hermann Ruckerbauer has added a whole section with examples in his upcoming ”Open the Black Box of Memory” course.
Roughly speaking the mistakes Hermann collected are divided into three about equal-sized piles:
- Mistakes caused by the controller (silicon and package design)
- Board layout/design mistakes, and
- DRAM related mistakes
Of course, sometimes it is not simple to differentiate, as things are interacting. Reliability issues are more often seen at the silicon circuit design, so such things are usually found for controller and DRAM. Board layout/design issues usually are found during design validation (or latest when changing the obsolete DRAM to its successor).
One of the examples of mistakes that are already ”taken” (so you have to find another mistake to make if you really must make one) that Hermann will explain in the course is this one.
Memory interface mistake no 1
The system under investigation had been running ok for quite a while in the field. At some point in time, it was updated. Besides some other modifications in the design, the DRAM’s had to be replaced because of obsolescence and the software guys enabled some new features. After these changes, the system failed.
A design/layout review did not show any critical issues. The next thing was a signal integrity measurement and compliance check. Again this showed no real issue, except for the Vref noise measurement that showed a small impact on the fail behavior. Running a Vref margin test revealed a weakness on the Vref.
With some additional measurements, the weakness could be traced back to the controller and a combination of data lines, patterns, and Vref. Knowing these details about the fails it could be shown that the weakness was also visible on the old systems, but it just did not lead to a fail.
The new DRAM generation and the higher traffic due to the software changes finally triggered the fail. After having done the homework, the controller vendor could be approached. They confirmed that some data lines coupled too much to the internal Vref routing on the chip. An already known fix was a specific RC decoupling on the Vref.
During test and validation, it is wise to do a “Vref margin test”. This test can be done to check the robustness of a system by varying Vref until a failure occurs. This test can also give valuable information for systems that are already failing.
Maybe the mistake was not to do a proper Vref margin test during design validation. Maybe the mistake was not to read and implement everything from the errata sheets. Maybe the mistake was in the controller. In any case, let us agree not to make that mistake again 🙂
If you are interested in a compilation of other memory interface mistakes that have already been made by others, this might be one more reason to join the popular memory interface design course by Herman Ruckerbauer from EyeKnowHow.
Leave a Reply