In search of bad 4164, 41256 DRAM

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

In search of bad 4164, 41256 DRAM

Jeff_Birt

Hi All,

 

I built a DRAM tester for fun and to brush up on Arduino development. I have a lot of students I work with use Arduino, so I wanted to build something to stay up to date and I have not done anything with an Arduino for more than a year. I found a project online as a starting point and added an automatic DRAM refresh driven by Timer2 overflow. Even with a pokey Arduino and the ISR in C it can refresh 64 rows in less and 100us.

 

I did a quick test with a 41256 just now and was happy I could read/write all bits successfully. Then I realized I don’t have any known bad 4164/41256 type DRAM chips on hand. So, if you happen to have a few bad DRAM chips on hand and you’re in the USA I would gladly take them off your hands. Otherwise I’ll have to figure out a way to inject a simulated fault. Maybe writing a wrong value to a known cell after writing the proper pattern to all cells would be a good enough simulation?

 

I still have a way to go with the software. The original project I found online used separate functions for each test pattern. I want to create a single function that will write a bit pattern that is passed to it and then verify that pattern is in memory. Maybe a n second pause would be good to have in place between the write and verify to ensure that the refresh is working properly?

 

Thanks,

Jeff Birt

Reply | Threaded
Open this post in threaded view
|

Re: In search of bad 4164, 41256 DRAM

daver2nab
Checkout the march-b and march-c algorithms. These are pretty good 'off the peg' algorithms for finding quite a range of faults.

Dave

On Fri, 13 Sep 2019 at 21:05, Jeffrey Birt <[hidden email]> wrote:

Hi All,

 

I built a DRAM tester for fun and to brush up on Arduino development. I have a lot of students I work with use Arduino, so I wanted to build something to stay up to date and I have not done anything with an Arduino for more than a year. I found a project online as a starting point and added an automatic DRAM refresh driven by Timer2 overflow. Even with a pokey Arduino and the ISR in C it can refresh 64 rows in less and 100us.

 

I did a quick test with a 41256 just now and was happy I could read/write all bits successfully. Then I realized I don’t have any known bad 4164/41256 type DRAM chips on hand. So, if you happen to have a few bad DRAM chips on hand and you’re in the USA I would gladly take them off your hands. Otherwise I’ll have to figure out a way to inject a simulated fault. Maybe writing a wrong value to a known cell after writing the proper pattern to all cells would be a good enough simulation?

 

I still have a way to go with the software. The original project I found online used separate functions for each test pattern. I want to create a single function that will write a bit pattern that is passed to it and then verify that pattern is in memory. Maybe a n second pause would be good to have in place between the write and verify to ensure that the refresh is working properly?

 

Thanks,

Jeff Birt

Reply | Threaded
Open this post in threaded view
|

Re: In search of bad 4164, 41256 DRAM

Gerrit Heitsch
In reply to this post by Jeff_Birt
On 9/13/19 10:02 PM, Jeffrey Birt wrote:
>
> I did a quick test with a 41256 just now and was happy I could
> read/write all bits successfully. Then I realized I don’t have any known
> bad 4164/41256 type DRAM chips on hand. So, if you happen to have a few
> bad DRAM chips on hand and you’re in the USA I would gladly take them
> off your hands.

I have one, but I'm not in the USA, so it would be too expensive... But
you can see if you can get your hands on some MT4264-20, those tend to
develop stuck bits.

  Gerrit





smf
Reply | Threaded
Open this post in threaded view
|

Re: In search of bad 4164, 41256 DRAM

smf
In reply to this post by Jeff_Birt
On 13/09/2019 21:02, Jeffrey Birt wrote:
The original project I found online used separate functions for each test pattern. I want to create a single function that will write a bit pattern that is passed to it and then verify that pattern is in memory.

While dram cells can fail completely, because of the analogue nature of dram you can find faults only occur during specific access paterns. So you probably should generate the bit patterns, using one of the many different algorithms that have been created over the years.

A lot of ddr3 & ddr4 in use today are faulty if tested with rowhammer https://en.wikipedia.org/wiki/Row_hammer


Reply | Threaded
Open this post in threaded view
|

Re: In search of bad 4164, 41256 DRAM

Gerrit Heitsch
On 9/14/19 12:17 PM, smf wrote:

> On 13/09/2019 21:02, Jeffrey Birt wrote:
>> The original project I found online used separate functions for each
>> test pattern. I want to create a single function that will write a bit
>> pattern that is passed to it and then verify that pattern is in memory.
>
> While dram cells can fail completely, because of the analogue nature of
> dram you can find faults only occur during specific access paterns. So
> you probably should generate the bit patterns, using one of the many
> different algorithms that have been created over the years.
>
> A lot of ddr3 & ddr4 in use today are faulty if tested with
> rowhammerhttps://en.wikipedia.org/wiki/Row_hammer
> <https://en.wikipedia.org/wiki/Row_hammer>

That's why you should use ECC-RAM. It doesn't completly protect you
against Rowhammer, but it makes discovery more likely since your logs
will fill with messages about corrected errors and uncorrectable errors
will cause a system panic if your memory controller is configured correctly.

DDR4 should be less problematic than DDR3 since they included a feature
called 'Target row refresh'. Doesn't seem to result in full immunity though.

  Gerrit


smf
Reply | Threaded
Open this post in threaded view
|

Re: In search of bad 4164, 41256 DRAM

smf
On 14/09/2019 11:46, Gerrit Heitsch wrote:
> DDR4 should be less problematic than DDR3 since they included a
> feature called 'Target row refresh'. Doesn't seem to result in full
> immunity though.
Some DDR3 modules support pseudo target row refresh when used with
certain chipsets.

TRR also isn't part of the DDR4 standard, it's up to manufacturers to
decide whether they wish to support it & if it's not standardised then
it's hard to know if the implementation is the same and how effective it is.

https://arstechnica.com/information-technology/2016/03/once-thought-safe-ddr4-memory-shown-to-be-vulnerable-to-rowhammer/

It's a three year old article, but there are probably plenty of three
year old memory modules out there.



Reply | Threaded
Open this post in threaded view
|

RE: In search of bad 4164, 41256 DRAM

Jeff_Birt
Respectfully, I'm interested in testing 40 year old nKx1bit DRAM chips not
modern DDR memory modules...

Jeff Birt

-----Original Message-----
From: smf <[hidden email]>
Sent: Saturday, September 14, 2019 7:15 AM
To: [hidden email]
Subject: Re: In search of bad 4164, 41256 DRAM

On 14/09/2019 11:46, Gerrit Heitsch wrote:
> DDR4 should be less problematic than DDR3 since they included a
> feature called 'Target row refresh'. Doesn't seem to result in full
> immunity though.
Some DDR3 modules support pseudo target row refresh when used with certain
chipsets.

TRR also isn't part of the DDR4 standard, it's up to manufacturers to decide
whether they wish to support it & if it's not standardised then it's hard to
know if the implementation is the same and how effective it is.

https://arstechnica.com/information-technology/2016/03/once-thought-safe-ddr
4-memory-shown-to-be-vulnerable-to-rowhammer/

It's a three year old article, but there are probably plenty of three year
old memory modules out there.







Reply | Threaded
Open this post in threaded view
|

Re: In search of bad 4164, 41256 DRAM

Dr Jefyll
In reply to this post by Gerrit Heitsch
> I did a quick test with a 41256 just now and was happy I could
> read/write all bits successfully. Then I realized I don’t have any
> known bad 4164/41256 type DRAM chips on hand. So, if you happen to have
> a few bad DRAM chips on hand and you’re in the USA I would gladly take
> them off your hands.

I was going to suggest that, for testing purposes, you could extend the
refresh interval far beyond specification.  That would cause even a good
RAM to fail, and, all being well, the failure would be detected and
you'd know your test regimen is effective.

Oops, but wait a sec...  This raises a more general question.  I hope
your test regimen doesn't do its test reads too soon, because reading a
cell also causes that cell to be refreshed.  IOW, the test isn't fully
comprehensive unless you write to a cell, leave it alone for one refresh
interval, *then* do a read to verify its contents.

Sorry if I'm stating the obvious; I'm just thinking out loud here!  Have
fun, and thanks for sharing your project with us.

cheers,
Jeff


Reply | Threaded
Open this post in threaded view
|

RE: In search of bad 4164, 41256 DRAM

Jeff_Birt
In reply to this post by daver2nab

I did some reading on these ‘march’ algorithms this morning and found that a lot of the literature that is out there concerns byte/word wide memories, testing memory in situ and weeding out all the other things on a device (decoding logic, etc. ) that could cause problems.

 

From what I can gather there is an issue where adjacent memory cells can have an issue, i.e. if cell one is at a ‘1’ then cell ‘2’ might not want to transition to a ‘1’ or ‘0’. The ‘marching’ are various bit patterns to detect these types of errors with a minimum of time/effort with different forms being better at detecting some types of faults than others.

 

I’m not sure how to translate this to bit wide memory at present though.

 

Jeff Birt

 

From: David Roberts <[hidden email]>
Sent: Friday, September 13, 2019 3:09 PM
To: [hidden email]
Subject: Re: In search of bad 4164, 41256 DRAM

 

Checkout the march-b and march-c algorithms. These are pretty good 'off the peg' algorithms for finding quite a range of faults.

 

Dave

 

On Fri, 13 Sep 2019 at 21:05, Jeffrey Birt <[hidden email]> wrote:

Hi All,

 

I built a DRAM tester for fun and to brush up on Arduino development. I have a lot of students I work with use Arduino, so I wanted to build something to stay up to date and I have not done anything with an Arduino for more than a year. I found a project online as a starting point and added an automatic DRAM refresh driven by Timer2 overflow. Even with a pokey Arduino and the ISR in C it can refresh 64 rows in less and 100us.

 

I did a quick test with a 41256 just now and was happy I could read/write all bits successfully. Then I realized I don’t have any known bad 4164/41256 type DRAM chips on hand. So, if you happen to have a few bad DRAM chips on hand and you’re in the USA I would gladly take them off your hands. Otherwise I’ll have to figure out a way to inject a simulated fault. Maybe writing a wrong value to a known cell after writing the proper pattern to all cells would be a good enough simulation?

 

I still have a way to go with the software. The original project I found online used separate functions for each test pattern. I want to create a single function that will write a bit pattern that is passed to it and then verify that pattern is in memory. Maybe a n second pause would be good to have in place between the write and verify to ensure that the refresh is working properly?

 

Thanks,

Jeff Birt

Reply | Threaded
Open this post in threaded view
|

RE: In search of bad 4164, 41256 DRAM

Jeff_Birt
In reply to this post by Dr Jefyll



>>I was going to suggest that, for testing purposes, you could extend the refresh interval far beyond specification.  That would cause even a good RAM to fail, and, all being well, the failure would be detected and you'd know your test regimen is effective.

That is an idea too and easy to implement.

>>Oops, but wait a sec...  This raises a more general question.  I hope your test regimen doesn't do its test reads too soon, because reading a cell also causes that cell to be refreshed.  IOW, the test isn't fully comprehensive unless you write to a cell, leave it alone for one refresh interval, *then* do a read to verify its contents.

Currently it writes all cells in a nested loop:

For c = 0 to 511
  For r = 0 to 511
//write bit
  Next
Next

So, each row is automatically refreshed as the next column is written. In fact, I turn off the automatic refresh timer while writing.

The read test is similar loop wise and happens after all writes are done. I can add a variable delay here with or without and automatic refresh timer functioning as well.

Jeff Birt







Reply | Threaded
Open this post in threaded view
|

Re: In search of bad 4164, 41256 DRAM

Gerrit Heitsch
In reply to this post by Jeff_Birt
On 9/14/19 6:20 PM, Jeffrey Birt wrote:

> I did some reading on these ‘march’ algorithms this morning and found
> that a lot of the literature that is out there concerns byte/word wide
> memories, testing memory in situ and weeding out all the other things on
> a device (decoding logic, etc. ) that could cause problems.
>
>  From what I can gather there is an issue where adjacent memory cells
> can have an issue, i.e. if cell one is at a ‘1’ then cell ‘2’ might not
> want to transition to a ‘1’ or ‘0’. The ‘marching’ are various bit
> patterns to detect these types of errors with a minimum of time/effort
> with different forms being better at detecting some types of faults than
> others.

Also, back then, an empty DRAM cell didn't necessarily read as '0' to
the outside. Take a look at the memory of a system that does not clear
the RAM after power on. You'll notice a pattern that varies between
manufacturers.

So, just writing a '1' and then waiting for it to fade to '0' if you
stop refreshing will not work on all cells.

  Gerrit


smf
Reply | Threaded
Open this post in threaded view
|

Re: In search of bad 4164, 41256 DRAM

smf
In reply to this post by Jeff_Birt
On 14/09/2019 14:23, Jeffrey Birt wrote:
> Respectfully, I'm interested in testing 40 year old nKx1bit DRAM chips not
> modern DDR memory modules...

Respectfully, I was providing easily verifiable evidence that memory
that appears to work fine will fail depending on the workload because of
analogue effects.

It doesn't matter how or old or new the memory is.

I think commodore had problems with memory supplied by micron in the
80's, there was some back and forth whether commodore were violating
access times or whether the memory was faulty.



smf
Reply | Threaded
Open this post in threaded view
|

Re: In search of bad 4164, 41256 DRAM

smf
In reply to this post by Gerrit Heitsch
On 14/09/2019 17:57, Gerrit Heitsch wrote:
Also, back then, an empty DRAM cell didn't necessarily read as '0' to the outside. Take a look at the memory of a system that does not clear the RAM after power on. You'll notice a pattern that varies between manufacturers.

So, just writing a '1' and then waiting for it to fade to '0' if you stop refreshing will not work on all cells.

There is some interesting research on dram that isn't refreshed here https://citp.princeton.edu/topics/memory/

I'm not sure whether power on pattern and unrefreshed memory will be the same, it's way into analogue territory dealing with the differences.

Reply | Threaded
Open this post in threaded view
|

Re: In search of bad 4164, 41256 DRAM

Gerrit Heitsch
On 9/16/19 11:44 PM, smf wrote:

> On 14/09/2019 17:57, Gerrit Heitsch wrote:
>> Also, back then, an empty DRAM cell didn't necessarily read as '0' to
>> the outside. Take a look at the memory of a system that does not clear
>> the RAM after power on. You'll notice a pattern that varies between
>> manufacturers.
>>
>> So, just writing a '1' and then waiting for it to fade to '0' if you
>> stop refreshing will not work on all cells.
>>
> There is some interesting research on dram that isn't refreshed
> herehttps://citp.princeton.edu/topics/memory/
> <https://citp.princeton.edu/topics/memory/>
>
> I'm not sure whether power on pattern and unrefreshed memory will be the
> same, it's way into analogue territory dealing with the differences.

It should be since at power on the capacitor in a DRAM cell is empty and
if you stop refreshing it, it will also become empty after a while.
Whether this 'empty' is read as '1' or '0' depends on the location on
the die and on the manufacturer.

  Gerrit



smf
Reply | Threaded
Open this post in threaded view
|

Re: In search of bad 4164, 41256 DRAM

smf
On 17/09/2019 09:32, Gerrit Heitsch wrote:
>
> It should be since at power on the capacitor in a DRAM cell is empty
> and if you stop refreshing it, it will also become empty after a
> while. Whether this 'empty' is read as '1' or '0' depends on the
> location on the die and on the manufacturer.

Can you explain why empty is read as 1 or 0 though? As far as I know
dram cells are either empty or full and it checks if the cell is half
full to work out the 0 or 1. So unless they randomly put inverters in
there, an empty cell is an empty cell.

My thought was that during power on the dram is going to be unstable &
it could generate the pattern if the dram did the equivalent of a
refresh and the read part of it was done when there wasn't enough power
to accurately determine the cell is empty enough while the write was
done as the power stabilised. Maybe the power up ends up triggering a
write without a read.

The placement/layout would then make the difference purely because of
things like how much power was leaking around the chip.


Reply | Threaded
Open this post in threaded view
|

Re: In search of bad 4164, 41256 DRAM

Gerrit Heitsch
On 9/17/19 11:27 AM, smf wrote:

> On 17/09/2019 09:32, Gerrit Heitsch wrote:
>>
>> It should be since at power on the capacitor in a DRAM cell is empty
>> and if you stop refreshing it, it will also become empty after a
>> while. Whether this 'empty' is read as '1' or '0' depends on the
>> location on the die and on the manufacturer.
>
> Can you explain why empty is read as 1 or 0 though? As far as I know
> dram cells are either empty or full and it checks if the cell is half
> full to work out the 0 or 1. So unless they randomly put inverters in
> there, an empty cell is an empty cell.

They seem to do exactly that. Otherwise it's not possible that you get a
manufacturer specific pattern after power on.



> My thought was that during power on the dram is going to be unstable &
> it could generate the pattern if the dram did the equivalent of a
> refresh and the read part of it was done when there wasn't enough power
> to accurately determine the cell is empty enough while the write was
> done as the power stabilised. Maybe the power up ends up triggering a
> write without a read.

That should result in a more random pattern though.

But it should be easy to find out, hook a DRAM up to power and a CPU,
read it out after power on. Then stop any refresh and any access for a
while (minutes), and read it out again. If you get all zeros then there
are no inverters. If you get a pattern again, there are inverters.

  Gerrit



Reply | Threaded
Open this post in threaded view
|

Re: In search of bad 4164, 41256 DRAM

Pasi 'A1bert' Ojala
In reply to this post by smf
On 17.09.2019 12:27, smf wrote:
> Can you explain why empty is read as 1 or 0 though? As far as I know
> dram cells are either empty or full and it checks if the cell is half
> full to work out the 0 or 1. So unless they randomly put inverters in
> there, an empty cell is an empty cell.
Hi,

The output sense is often differential, and bit-read line twisting is
used to prevent interference (whether sram or dram). It also makes sense
to invert the polarity of the cells for the same reason. Whether it
helps in reality is another thing.

In SRAM reads two bit-read lines are precharged up, and the cell tries
to drive one of them towards ground. The differential sense then decides
which bit it was.

In SRAM the power-up state is more random from chip to chip, but quite
consistent for individual chip. Each cell powers up to 0 or 1 depending
on the mismatch of that cell's transistors, each individual bit tends to
always power up to the same state. (With only some of the bits so well
matched transistors that they can power up in either state.)

-Pasi


Reply | Threaded
Open this post in threaded view
|

Re: In search of bad 4164, 41256 DRAM

MiaM
In reply to this post by Gerrit Heitsch
Den Tue, 17 Sep 2019 11:46:14 +0200 skrev Gerrit Heitsch
<[hidden email]>:

> On 9/17/19 11:27 AM, smf wrote:
> > On 17/09/2019 09:32, Gerrit Heitsch wrote:
> >>
> >> It should be since at power on the capacitor in a DRAM cell is
> >> empty and if you stop refreshing it, it will also become empty
> >> after a while. Whether this 'empty' is read as '1' or '0' depends
> >> on the location on the die and on the manufacturer.
> >
> > Can you explain why empty is read as 1 or 0 though? As far as I know
> > dram cells are either empty or full and it checks if the cell is
> > half full to work out the 0 or 1. So unless they randomly put
> > inverters in there, an empty cell is an empty cell.
>
> They seem to do exactly that. Otherwise it's not possible that you
> get a manufacturer specific pattern after power on.

They can attach the fixed-voltage end of the capacitor to either +5V or
ground. That would give different default start up states.

> > My thought was that during power on the dram is going to be
> > unstable & it could generate the pattern if the dram did the
> > equivalent of a refresh and the read part of it was done when there
> > wasn't enough power to accurately determine the cell is empty
> > enough while the write was done as the power stabilised. Maybe the
> > power up ends up triggering a write without a read.
>
> That should result in a more random pattern though.
>
> But it should be easy to find out, hook a DRAM up to power and a CPU,
> read it out after power on. Then stop any refresh and any access for
> a while (minutes), and read it out again. If you get all zeros then
> there are no inverters. If you get a pattern again, there are
> inverters.

That is simple to do on a PC/XT as you can control the refresh circuit.
You have to do some tricks though as a parity error on ram read will
trigger an NMI.

Btw some demos rely on being able to turn off refresh. As long as you
execute code / read data that does reads that is equivalent to doing a
refresh. That way they gain a bit more memory bandwidth. :)

--
(\_/) Copy the bunny to your mails to help
(O.o) him achieve world domination.
(> <) Come join the dark side.
/_|_\ We have cookies.

Reply | Threaded
Open this post in threaded view
|

Re: In search of bad 4164, 41256 DRAM

Gerrit Heitsch
On 9/17/19 6:19 PM, Mia Magnusson wrote:
> That is simple to do on a PC/XT as you can control the refresh circuit.
> You have to do some tricks though as a parity error on ram read will
> trigger an NMI.

It also depends on the architecture. To do it correctly, you need at
least 2 memory banks that are not serviced by the same /RAS signal. Many
systems run all RAMs on the same /RAS and select the banks with the /CAS
signal. Unfortunatly, in this case, /RAS is all you need for refreshing
a row and running your code in bank 0 will also refresh bank 1.

  Gerrit


Reply | Threaded
Open this post in threaded view
|

Re: In search of bad 4164, 41256 DRAM

MiaM
Den Tue, 17 Sep 2019 18:27:39 +0200 skrev Gerrit Heitsch
<[hidden email]>:

> On 9/17/19 6:19 PM, Mia Magnusson wrote:
> > That is simple to do on a PC/XT as you can control the refresh
> > circuit. You have to do some tricks though as a parity error on ram
> > read will trigger an NMI.
>
> It also depends on the architecture. To do it correctly, you need at
> least 2 memory banks that are not serviced by the same /RAS signal.
> Many systems run all RAMs on the same /RAS and select the banks with
> the /CAS signal. Unfortunatly, in this case, /RAS is all you need for
> refreshing a row and running your code in bank 0 will also refresh
> bank 1.

Well, assuming you can disable all interrupts so the CPU doesn't fetch
any data from low memory, you could put your idle code in the
graphics/display card ram. That would most likely make sure it won't
cause any RAS signals, unless the DRAM controller asserts RAS even for
addresses outside the DRAM area.

--
(\_/) Copy the bunny to your mails to help
(O.o) him achieve world domination.
(> <) Come join the dark side.
/_|_\ We have cookies.

12