nand_spec.html
19.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
<html>
<head>
<title>BB Module Flash Requirements</title>
</head>
<body>
<h1>
<p align="center">
BB Module Flash Requirements
</p>
</h1>
The document summarizes a key set of flash specifications for the memory module.
It is intended to allow sanity checking of potential flash parts. Once a part
is deemed acceptable given the specs within this document it should still be
fully examined to insure usability within our system.
<p>
<h2>Key Specifications</h2>
<ul>
<li> <b>Electrical Signaling:</b> the signaling protocol must be compatible
with SmartMedia (SM) nand flash, with the exception of the LVD (low
voltage detect SM signal). Key timing parameters are noted in
specification that follow.<p>
<li> <b>Operating Voltage, Current:</b> nominal 3.3V supply voltage, max 200mA
supply current per device. <p>
<li> <b>Page Size:</b> 528B page size (512B data + 16B OOB). <p>
<li> <b>Block Wear Lifecycle:</b> 1E5 erase-program cycles for block
lifetime. <p>
<li> <b>Bad Blocks:</b> max 124 bad blocks per 64MB over the specified
"Block Wear Lifecycle". <p>
<li> <b>Block Size:</b> 16KB or 32KB preferable, any power-of-two times
8KB probably doable. <p>
<li> <b>Bad Block Marking:</b> bad blocks must be marked such that every
page within a bad block has byte 517 containing at least 2 zero bits
(SM physical format specification). The flash memory must either be
shipped with this marking, or we must be capable of performing this
marking at initial burn-in of the memory module.
<p>
Because the inability to perform this marking leads to a production
reject for that memory module unit, we specify the memory must be such
that ther is a 1E-5 probability that this marking cannot successfully
be completed. <p>
<li> <b>Read and Write Timing:</b> The primary goal is to be able to
sustain a worst case 6MB/s "cartridge" DMA, and a typical writing speed
near 1MB/s. The following nand specs provide a rule of thumb indicating
this is possible:
<ul>
<li> 40us max page access on read (t<sub>R</sub>)
<li> 50ns (at most) for minimum period in repeated page buffer access
(t<sub>RC</sub>, t<sub>WC</sub>)
<li> 0ns for setup times measured from CLE/ALE assertion (to WE or RE)
<li> 1ms max page program (typical <= 500us)
<li> 10ms max block erase time per 16KB (typical <= 2ms)
</ul>
<p>
<li> <b>Permanent Errors (Bad Blocks):</b> These errors are either a failed
erase or program operation, and are specified in terms of a number of
erase-program cycles. This number is incorporated into the "Bad Blocks"
number, above so is not individually specified here. <p>
<li> <b>Soft Errors Impacting Reads:</b> these errors lead to system
failures when they cause 2 bit errors in the same ECC region of
256B (2 such regions per page). There are two failure modes of
primary interest. In the first, the flash memory holding the SK or FA
fails. The main trouble with these failures is that both the
player and memory module must be returned to the depot for a fix.
Also, it is likely that the user will be confused because the UI
will not be capable of displaying any clear indication of failure.
This mode will be referred to as severe.
In the second mode, game or license data is impacted. This is less
severe since a trip to the depot with only the memory card can
fix the problem, and the UI can clearly indicate this to the user.
This mode will be referred to as mild.
<p>
The two types of soft errors that can cause these failures are briefly
described below. To determine how to compute the various probabilities
please refer to the more detailed analysis in the "System Impact..."
section to follow.
<ul>
<li> <i>Data Retention.</i> In this case single bits may flip at
random after some duration of time. We specify that this form
of error should occur after XXX days with probability less
that 1/100000 in the severe failure mode. For the mild error
mode the probability can be less than 1/10000 (in addition
the number of hours may be relaxed).
<p>
<i>NOTE: this error mode impacts shelf-life. It is currently
dificult to obtain flash reliability data that covers a
reasonable enough time from manufacture, to shelf, to sale,
to product life. </i>
<li> <i>Read Disturbance.</i> In this case a bit within the same
block as another bit being read may be unintentionally
programmed from 1 to 0. For the severe failure mode we
specify that for a minimum of 100000 reads the probability
of failure should be less than 1/100000. In the mild failure
mode we specify that for a minimum of 1000000 reads the
probability of failure is less than 1/10000.
</ul>
</ul>
<h2>System Impact of Specifications</h2>
The specifications above that require more detailed analysis to rationalize
are treated below.
<h3>Electrical Signaling</h3>
The signal protocol has been specified to be compatible with standard nand
flash. However, we will still need to conduct a detailed timing analysis to
determine acceptability based on potential timing settings of the flash
controller's configuration register.
<h3>Page Size</h3>
The page size must be 512B data + 16B OOB because the hardware ECC engine
requires this layout.
<h3>Block Wear Lifecycle</h3>
Although the blocks containing the game binaries should not undergo
significant re-writing, these blocks are preferably unavailable for
wear-leveling because their placement remains relatively fixed due to the ATB
entry address-size constraints (i.e., once a game has been layed out to
satisfy ATB constraints we would prefer not to relocate it). Then, the pool
of blocks available for relocation could be relatively small. For this
specification we assume the worst case scenario that all blocks must be
written when a game is played. Now, if a game is played 10x per day,
everyday for 3 years, this leads to approximately 11000 required writes for
state saving alone. The wear number must be greater than 10E4, so to maintain
and order of magnitude safety margin 1E5 has been chosen.
<p>
This number could probably be relaxed if need be.
<h3>Bad Blocks</h3>
Bad blocks impact our ability to present a contiguous cartridge address-space
mapping to game code. The
<a href=../hw/pi-spec.html>PI</a>
ATB mappings are used to avoid bad blocks, and
there approximately 190 usable ATB entries for a given game (assuming the
game uses one catridge address space to access the cartridge). But, the
requirement that a given ATB entry must have its starting virtual address be
a multiple of the contiguous physical size mapped, and that the physical size
must be a power-of-two multiple of 16KB further complicates matters.
<p>
To derive a worst-case number, take the largest game we would like to map and
assume bad blocks are distributed evenly in flash so that we would need a
single ATB for each contiguous region. For example, assuming the maximum game
size is 32MB, then with 128 ATB entries, each of size 256KB, the game could
be mapped. This leaves some margin in the total available ATB entries for
file-system fragmentation and other issues. Further, asume a 32MB game would
be placed on a minimum 64MB memory. To determine how many bad blocks we could
afford and still be able to map the game with 128 ATB entries, divide the
device size of 64MB by the segments in flash that contain the 256KB mappable
region plus the bad block (for a 64MB part the block size is assumed to be
16KB):
<blockquote>
64MB / (256KB + 16KB) = 240
</blockquote>
The worst case comes as the game size increases. From the game data a 64MB
game is the maximum previously released. Now assuming we would support this
game size with a flash module containing 96MB (64MB and 32MB parts), the
computation would become:
<blockquote>
96MB / (512KB + 16KB) = 187
</blockquote>
This yields a rate of 187 bad blocks per 96MB.
<h3>Block Size</h3>
Block sizes of 16KB and 32KB are usable depending on the "Bad Block Marking"
capabilities, discussed below. The ATB requires block sizes that are
power-of-two multiples of 16KB, and uses 16KB as the minimum granularity when
mapping game cartridge space addresses to flash. So, a 64KB block would
actually be legal as well. On the smaller end, and 8KB block is usable, but
it seems unlikely a part of interest to us would use this block size.
<p>
<i>Are there file-system issues constraining the block size as well?</i>
<h3>Bad Block Marking</h3>
The boot code depends on being able to determine if a 16KB "block" is bad by
reading byte 517 of the first page within the block. If this byte contains 2
or more 0 bits, the "block" is deemed to be bad (consistent with the
SmartMedia spec.). Assuming that an arbitrary page within the block may be
marked in this manner, a true 32KB block is acceptable since a single 32KB
bad block can have pages 0 and 16 marked bad. The boot code will simply treat
this case as two consecutive bad blocks.
<p>
This topic deserves more detailed coverage because of some inconsistency in
information pertaining to the marking of bad blocks when nand is shipped. For
Toshiba, the data sheet specifies that a bad block determined by reading
every bit within the block. If a single bit is '0' the block is bad. The
Toshiba application notes indicate the first page of a bad block will be
marked per the SmartMedia spec (byte 517 as we assume in the boot code).
Because of this confusion we have asked Toshiba and they have indicated that
they actually mark byte 517 for every page in a bad block. This last case is
the most desirable for us, but not imperative since we have asked if we can
mark byte 517 of any page in a bad block (although a program may fail
overall, turning at least 2 bits of byte 517 to 0 <i>should</i> work) and
Toshiba indicated that we can. So, we could use our own burn-in to insure the
boot code reacts appropriately to a 32KB physical block size.
<p>
Samsung's data sheets specify that the first OR second page within the block
will be marked. We do not know if this implies that we cannot guarantee that
at least 2 bits in byte 517 can programmed with 0s. If so, we would need to
change the boot code accordingly, but this seems unlikely for reasons below.
<p>
A block must be marked as bad when the erase or program operations fail (as
indicated by the status read that follows). Given that a block is bad, the
reason is most likely because an erase failed to flip 0 bits to 1. So, the
task of programming at least two 0 bits into byte 517 of any given page is
not hindered by this failure. There may be more catastrophic failure modes
that could preclude marking byte 517 with 2 0 bits, but these are considered
far less likely. To be safe, we specify this probability.
<h3>Read Timing</h3>
As mentioned previously, the true target for these specifications is to
enable a sustained "cartridge" DMA of 6MB/s. A rule-of-thumb was provided
that is a reasonable guide in determining if a part will achieve the 6MB/s
rate. However, the interaction of many flash timing parameters and the
<a href=../hw/pi-spec.html>PI</a>
timing configuration need to be considered to truly determine the DMA speed.
<p>
A more detailed discussion of this process and application to a number of
current flash parts is provided <a href=nand_dma_speeds.html>here</a>.
<p>
Note that our initial latency will be considerably slower than the n64, but
we have decided this is OK given we can sustain the rate above.
<h3>Write Timing</h3>
Like the "Read Timing" case, the actual speed will depend on the optimumun
allowable setting for the
<a href=../hw/pi-spec.html>PI</a>
flash timing configuration, as determined
from the set of detailed flash specs. For most cases the worst case is
64ns per flash cycle, where the flash cycle is used to determine the time
for accessing individual data bytes, or emitting command/address cycles.
Then, the worst case numbers from the "Key Specifications" section,
<ul>
<li> 10ms block erase per 16KB
<li> 64ns byte write period (to flash page buffer)
<li> 64ns address and command write cycle
<li> 1ms page program
</ul>
yield (per 16KB data bytes): 32*1ms + 32*(528+5)*64ns + 10ms =
43.1ms/16KB, <br>
or 0.36MB/s (the individual byte accesses are nearly negligable, so
even if the flash configuration turned out to be slower this would
not have much impact).
<p>
Using typical numbers:<br>
<ul>
<li> 2ms block erase per 16KB
<li> 64ns byte write period (to flash page buffer)
<li> 64ns address write cycle
<li> 500us page program
</ul>
yields (per 16KB): 32*500us + 32*(528+4)*64ns + 2ms = 19.1ms/16KB, <br>
or 0.82MB/s.
<h3>Soft Errors Impacting Reads</h3>
The description of failure modes was provided earlier. Here, the
method of computing the error probability for each type of soft
error (on reads) is provided, given typical form of reliability test data.
<ul>
<li> <i>Data Retention:</i> we assume this error effects isolated bits (no
predisposition for neighbors to be effected). Data retention is
generally measured with parameters such as:
<ul>
<li> size of parts under test (i.e., 32MB)
<li> sample size (number of devices in test, typically 50-100)
<li> hours (72, ..., 1000)
<li> pre-conditioning cycles
<li> number of cumulative failed bits
</ul>
The "number of cumulative failed bits" is the measured parameter,
while the other parameters specify the test environment. The formula
for computing the probability of error is:
<blockquote>
P = 1 - (1-(1/N<sub>s</sub>))<sup>(N*(M-1))</sup>
(1+(M-1)/N<sub>s</sub>)<sup>N</sup>
</blockquote>
where:
<ul>
<li> N<sub>s</sub>: number of 256B regions in sample (number of parts
times size of each part, in bytes, divided by 256).
<li> M: number of cumulative failed bits.
<li> N: number of 256B regions in the expected amount of
memory covered by the failure mode. For the severe mode the
total effected memory is estimated at 1MB (SK + FA). For the
mild failure mode this is approximately the entire capacity of
the memory module (i.e., so if 64MB parts are tested, and the
module will consist of a single 64MB part, N is N<sub>s</sub>
divided by the sample size).
</ul>
An upper bound that is easier to compute, less prone to round-off
error in the computation, and appropriate given our expectation of
typical parameters:
<blockquote>
P < (1/2) (M/N<sub>s</sub>)<sup>2</sup>
( (N<sub>s</sub>-M+2)/(N<sub>s</sub>/N) )
</blockquote>
<p>
As an example, the probability of error anywhere in the
device (i.e., 2-bit or more errors in same 256B data unit) is
computed given reliability data from Toshiba for a 64MB
TC58512FT nand flash. Toshiba's data had:
<ul>
<li> 73 samples
<li> 64MB per sample
<li> 1.2e5 write-erase cycle pre-condition
<li> 1000 hrs (42 days)
<li> 7 cumulative failed bits
</ul>
This leads to P = 1.5E-8, which is quite acceptable. The only
issue is that we need a product lifecycle of much greater than
42 days!
<p>
Using the upper bound approximation results in 1.8E-8, which is
reasonably close.
<p>
Similar test data from Samsung results in lower probabilities for
failure.
<li> <i>Read Disturbance:</i> this error effectively reprograms some bit
within the block (on different page) from 1 to 0. It is generally
specified with parameters such as:
<ul>
<li> size of parts under test (i.e., 32MB)
<li> sample size (number of devices in test, typically 50-100)
<li> target number of cumulative failed bits (i.e., reads will
occur over the sample until this number is reached)
<li> pre-conditioning cycles
<li> number of read cycles till the number of cumulative failed
bits, specified earlier, are reached.
</ul>
The last parameter, "number of read cycles...", is the measured
parameter, while the remainder specify test environment.
<p>
In this case, the probability of failure, P, is computed with the same
parameters as for the "Data Retention" case (since games are not
composed of all 1 bits this is actually an upper bound on the error).
However, P now measures the validity of the test environment.
Ideally, we would like to specify the experiment to produce our
desired P. Then the resultant number of reads obtained from the
test will determine if the part is reliable enough.
<p>
Also, for the severe error mode we are not interested in a high
number of pre-conditioning cycles, since the SK and FA will not
be written often. A higher number (typically 100000 cycles) is
relevant for the mild error mode.
<p>
The 100000 reads number for severe errors was estimated as follows.
The SK and FA are read every time the system resets or a game is
restarted. Assuming the combination of these events occurs 10x/day,
every day for 5 years, we obtain 20000 reads. To have a reasonable
buffer we specify 100000. The number is very dificult to estimate
during the game play, so a number 10x that for severe errors is
chosen. This number for the mild case is somewhat flexible because
we could occasionally re-write the game "in place", though we
would avoid this if possible.
<p>
As an example, the probability of error anywhere in the
device (i.e., 2-bit or more errors in same 256B data unit) is
computed given reliability data from Toshiba for a 64MB
TC58512FT nand flash. Toshiba's data had:
<ul>
<li> 44 samples
<li> 64MB per sample
<li> 1.2e5 write-erase cycle pre-condition
<li> 10 bit errors target
<li> 3.1E6 reads to reach target (over entire sample) worst case
</ul>
This leads to P = 8.9E-8, which is quite acceptable. In this case
this means that the experiment will satisfy our requirements since
the computed P is < 1/100000. The number of reads, 3.1E6 is also
within our specification.
<p>
Similar test data from Samsung results in lower probabilities for
failure.
</ul>
</body>