# 1.1 Basic Knowledge on Terrestrial Secondary Particles

Cosmic rays, which have extremely high energies, come from the galactic core and the sun to the atmosphere of the Earth. Primary cosmic rays in outer space consist mainly of protons (about 90%). Since cosmic rays are charged particles they twine around lines of geomagnetic or heliomagnetic forces as illustrated in Figure 1.1. Some of them are trapped by geomagnetic force to form the Van Allen radiation belts. Cosmic rays with energies less than the *geomagnetic rigidity cutoff* are deflected before entering the geomagnetic field. On the other hand, some are attracted into geomagnetic poles along with lines of geomagnetic force sometimes accompanied by the aurora borealis or australis. Cosmic rays are deflected more strongly near the equator since the lines of geomagnetic force are parallel to the surface of the Earth. Therefore, the strength of cosmic rays that reach the atmosphere differs depending on the geomagnetic latitude of the Earth.

When the energetic protons enter the atmosphere (troposphere and stratosphere) of the Earth, some protons undergo *nuclear spallation reaction* with nuclei (mainly nitrogen and oxygen nuclei) in the atmosphere to produce a number of light particles including neutrinos, photons, electrons, muons, pions, protons and neutrons as illustrated in Figure 1.2. Since secondary neutrons have longer ranges in the atmosphere compared to protons, they release cascades of spallation reactions in the atmosphere to make *air showers* that reach the surface of the Earth. Figure 1.3 shows an estimated differential neutron spectrum at NYC (New York City) sea level based on measurements in different locations in the USA [1]. The neutron energy at the ground ranges over 1 GeV and its flux beyond 1 MeV is around 20 n/cm<sup>2</sup>/h in average. As the air can shield neutrons, strength (flux and energy) of neutrons depends upon altitude and to a slight extent atmospheric pressure [2]. Compared to the neutron flux at ground level, the neutron flux at avionics altitude is much higher by a factor of 100.

Furthermore, as cosmic rays are also deflected by the heliomagnetic field or the sun's activity which has about an 11-year cycle, the strength of the neutron flux at ground level also has about an 11-year cycle as shown in Figure 1.4 [3]. At the *solar maximum*, the neutron flux at ground level is almost at its weakest, while it is at its

*Terrestrial Radiation Effects in ULSI Devices and Electronic Systems*, First Edition. Eishi H. Ibe. © 2015 John Wiley & Sons, Ltd. Published 2015 by John Wiley & Sons, Ltd.



Figure 1.1 Overall scheme of terrestrial radiation-induced single event effects



Figure 1.2 Initial stage of secondary particle production



Figure 1.3 Differential high-energy neutron spectrum at NYC sea level based on JESD89A



**Figure 1.4** Long-term cyclic variation in neutron flux measured at Moscow Neutron Monitor Centre (http://cr0.izmiran.rssi.ru/mosc/main.htm)



**Figure 1.5** Differential proton spectra originated from solar-minimum sun, big flares on the sun and the galactic core

strongest at the *solar minimum*. Under normal activity, the sun emits a large quantity of protons, but their energy level is relatively low as shown in Figure 1.5 for the solar maximum period [4] since protons from the sun do not cause air showers directly at ground level. However, when big flares take place on the sun's surface, a much larger quantity of protons is emitted with energy comparative to the galactic protons as shown in Figure 1.5 [5] and this can cause air showers.

# **1.2 CMOS Semiconductor Devices and Systems**

CMOS (Complementary Metal Oxide Semiconductor) devices like Static Random Access Memory (SRAM) or Flip Flops (FFs) are basically made on the stripe structure of p and n-dual wells. For example, Figure 1.6 shows typical layouts of diffusion layers (nodes) in SRAM one bit and an OR gate cell on the stripe structure. All nodes in memories and logic circuits are basically made on the same stripe structure in a chip. Unlike dual well structure, triple well structure has a deep n-well. As for Silicon On Insulator (SOI), Buried OXides (BOXs) are made under the dual wells as shown in Figure 1.7. Isolation oxides, usually Shallow Trench Isolation (STI) oxides are also made to isolate each node in a lateral direction. When the thickness of the SOI layer is thinner than the depth of the depletion layer in the SD (Source-Drain) channel, the structure is known as FD (Fully-Depleted) SOI. Meanwhile, when the thickness of the SOI layer is thicker than the depletion layer, the structure is known as PD (Partially-Depleted) SOI. Since the upper surface of BOX is completely covered by the depletion layer, parasitic capacitance can be largely reduced compared to Bulk/PDSOI, resulting in steep sub-threshold characteristics, reduction in latency and power consumption.



**Figure 1.6** Basic layouts of CMOSFET devices on the stripe structure of p- and n-wells and cross sections of triple and dual well. (a) Top view of CMOS substrate and (b) A-A' cross section (nMOSFET)



Figure 1.7 Structure of SOI device (image). (a) FDSOI and (b) PDSOI

Sugii et al. develop Silicon on Thin BOX (SOTB) structure by which back-gate bias can be applied in a silicon layer below the bottom of the thin (about 10 nm thickness) BOX in order to control  $V_{\text{th}}$  [6]. This structure is also known as *double-gate* structure that originates from FinFET [7].

Circuits are made by electrically connecting some nodes by wires above the well structure. Figure 1.8 shows a typical circuit for a SRAM. Figure 1.8a illustrates a simplified expression of a SRAM, in which two inverters are connected as a ring. The circuit is stable when the data in both nodes are reversed, and thus can store data. More specifically, as shown in Figure 1.8b, two nodes Q and Q(bar) store data '1' (high/ $V_{cc}$ ) or '0' (low/ $V_{ss}$ ). When the transfer transistors Tr<sub>5</sub> and Tr<sub>6</sub> is 'ON', data on the nodes are written or read. When the state of  $n_1/n_2$  is high, the pMOSFET (p-channel Metal



**Figure 1.8** SRAM function and layout. (a) Equivalent circuit, (b) circuit and (c) node layout on p- and n-wells

Oxide Semiconductor Field Effect Transistor) transistor  $Tr_4$  is 'OFF' and nMOSFET (n-channel MOSFET)  $Tr_3$  is 'ON', making the state of  $n_3/n_4$  low. This makes pMOS-FET transistor 'ON' and nMOSFET 'OFF', resulting in the states of  $n_1-n_4$  being stable. Figure 1.7c shows an example of corresponding layout of source and drain nodes  $n_1-n_{10}$ .

Logic circuits consist of a number of combinational logic gates (AND, OR, NAND, NOR, ...), and sequential logic gates, typically FFs as illustrated in Figure 1.9. In synchronous logic circuits, the total operation is controlled by clock signals produced typically in Phase Locked Loop (PLL). An instruction from CPUs is executed in one interval of clocks by controlling inputs of gates between two subsequent FFs. Execution results are captured to each FF when the nsubsequent clock is sent to the clock input of each FF.



Figure 1.9 Example of logic circuit



Figure 1.10 Example of electronic system implementation

Figure 1.10 illustrates the typical architecture of an electronic system that consists of power supply, printed circuit boards (PCBs) with various chips including CPU/GPU (Graphic Processing Unit), RAM, ROM (Read-Only Memory), FPGA (Field Programmable Gate Array), PLL, DSP (Digital Signal Processor), PLL and IO (Input/Output) port, and a system such as a server, router, automobile, train, aircraft, and so on. Such a system may have sensors to monitor the system condition, actuators and motors to control the operation of the system.

As a whole, any electronic system consists of a number of stack layers as illustrated in Figure 1.11 from the lowest layer, well/substrate to the highest layer, application or final hardware products.

# 1.3 Two Major Fault Modes: Charge Collection and Bipolar Action

When a charged particle passes through semiconductor devices, electron-hole pairs are produced along with the particle path (we call this 'track'). When the particle passes through the depletion layer or pn-junction under an off-state n-diffusion layer with potential  $V_{cc}$ , electrons in the depletion layer are collected to the n-diffusion layer to cause a *single event fault* and holes are repulsed out of the depletion layer due to the potential in the n-depletion layer (we call this 'node' or 'storage node') and flow out through a ground contact. This movement of electrons and holes causes elongation



Figure 1.11 Example of stack layers in an electronic system

of the potential field along with the track to collect additional electrons produced initially outside the depletion layer. This phenomenon is known as 'funnelling' [8]. As for triple-well structure as shown in Figure 1.12, funnelling also takes place in the pn-junction adjacent to the deep n-well surface. When the charged particle passes through both pn-junctions on the diffusion layer and the deep n-well, electrons are collected on both sides. In this case, about a half of electrons produced along with the particle path may be collected into the storage node. This is the conventional soft-error model, or *charge collection model*, under which soft-error takes place



**Figure 1.12** Charge collection model in a semiconductor structure by funnelling. (a) Charged particle hit and charge deposition and (b) charge collection to an n+ diffusion layer

only when the charged particle passes through the off-state n-diffusion layer and the amount of charge exceeds the critical charge  $Q_{\text{crit}}$  necessary to flip the data in the diffusion layer.

As device scaling nosedives to 130 nm process, Ibe et al. have pointed out a novel soft-error mechanism, Multi-coupled Bipolar Interaction (MCBI) [9], under which the condition that a charged particle passes through the off-state diffusion layer is not necessary for the occurrence of soft-error as illustrated in Figure 1.13. When a charged particle passes through pn-junction(s) in a p-well wall not of the storage node, electrons produced in the well flow out of the well through the same funnelling mechanism and holes are left in the well to make its level high, resulting in the parasitic transistor in the well being 'On'. As a substantial number of nodes are made in a single common well, faults due to MCBI can take place in multiple nodes to cause a SET (Single Event Transient) or MNT (Multi-Node-Transient).

There are a number of sources of faults other than SET or MNT in electronic devices as summarised in Table 1.1. Namely, electronic systems may fail due to cross-talk



Parasitic transistor(on)

**Figure 1.13** Noble bi-polar action model in a triple well n-MOSFET structure (MCBI: Multi Coupled Bipolar Interaction model). (a) Penetration of a charged particle through p-n junctions on the p-well. (b) Elevation of potential in the p-well by residual holes to turn on the parasitic transistor

| Table 1.1           | Modes of faults                                                |                                      |                                                                                                                                                                                                                                                                          |                                                 |                                                       |                                                                                                     |                                            |
|---------------------|----------------------------------------------------------------|--------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|-------------------------------------------------------|-----------------------------------------------------------------------------------------------------|--------------------------------------------|
| Class               | Definition                                                     | Name                                 | Characteristics                                                                                                                                                                                                                                                          | Source                                          | Affected<br>area                                      | In-situ<br>detection<br>method                                                                      | In-situ recover/<br>mitigation<br>method   |
| Transient/<br>noise | Transient in electric<br>potential and/or<br>current in a chip | SET <sup>a</sup><br>MNT <sup>d</sup> | Single transient due to charge<br>collected to the diffusion<br>layer in the chip. Pulse width<br>is below a few nano seconds,<br>and can last longer than two<br>clock pulses<br>Simultaneous SETs in more                                                              | Well/<br>substrate<br>Well/                     | Random but<br>limited to<br>single well<br>Random but | Time and/or space<br>redundancy<br>such as DMR <sup>b</sup> ,<br>TMR <sup>c</sup><br>Monitoring the | Time and/or<br>space<br>redundancy<br>None |
|                     |                                                                |                                      | than two diffusion layers.<br>Mainly, MNTs take place in<br>a single well due to charge<br>sharing or bipolar action.<br>Space redundancy<br>techniques such as DICE <sup>e</sup> .<br>TMR may not work against<br>MNTs                                                  | substrate                                       | limited to<br>single well                             | well potential<br>and/or current                                                                    |                                            |
|                     |                                                                | RILC                                 | When a charged particle passes<br>through tunnel oxide of a<br>floating gate memory (flash<br>memory), a leakage path is<br>formed in the tunnel oxide<br>and the potential in the<br>floating gate may shifts<br>resulting in change of $V_{th}$ to<br>cause soft-error | Tunnel oxide<br>of a floating<br>gate<br>memory | Random but<br>limited to<br>tunnel oxide              | BCC/parity                                                                                          | ECC                                        |

 $\oplus$ 

 $\oplus$ 

Ē t t N.N. Ţ

 $\oplus$ 

 $\oplus$ 

|         |                                                     | Cross-talk       | Noise propagation between<br>close wires via parasitic<br>capacitance                                                                                         | Wire                      | Random but<br>limited to<br>wire(s) | Time and/or space<br>redundancy | Time and/or<br>space<br>redundancy |
|---------|-----------------------------------------------------|------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------|-------------------------------------|---------------------------------|------------------------------------|
|         |                                                     | Disturbance      | in power supply                                                                                                                                               | Power supply<br>line      | Unlimited area                      | Monitor in power<br>supply      | None                               |
|         |                                                     | EMI <sup>g</sup> | Electromagnetic noise including burst noise                                                                                                                   | Anywhere                  | Unlimited area                      | Electro-magnetic probe          | None                               |
| Trap in | Holes trapped in                                    | $TID^{h}$        | Parasitic level by the trap may                                                                                                                               | Oxide                     | Random in a                         | Vth measurement                 | Annealing                          |
| oxide   | oxide. They may<br>cause leakage<br>current and may |                  | cause V <sub>th</sub> shift, or potential<br>due to holes in oxide may<br>cause leakage path formation                                                        |                           | small area                          |                                 | may work                           |
|         | disappear in time                                   |                  | in adjacent semiconductor<br>portion                                                                                                                          |                           |                                     |                                 |                                    |
| Defect  | Lattice defects                                     | I                | Damage or interstitials in<br>crystal, which may<br>deteriorate device<br>functionality. They may<br>cause stack at '0/1' error and<br>can be permanent error | Anywhere/<br>tunnel oxide | Random in a<br>small area           | None                            | None                               |

 $\oplus$ 

 $\oplus$ 

<sup>a</sup>Single event transient. <sup>b</sup>Double modular redundancy. <sup>c</sup>Triple modular redundancy. <sup>d</sup>Multi-node transient. <sup>e</sup>Dual interlocked storage cell <sup>f</sup>Radiation induced leakage current. <sup>g</sup>Electro-magnetic interference. <sup>h</sup>Total ionization dose effects.

 $\oplus$ 

 $\oplus$ 

between two or more wires with high parasitic capacitance [10], noise produced in the power supply [11], Electro-Magnetic Interference (EMI) [11], hole traps in oxides which may cause the  $V_{th}$  shift [12] of the transistor or high potential due to accumulated holes in oxide which may cause leakage path formation in the adjacent Si channel (Total Ionisation Dose effects: TID. For more details, see Chapter 3), and a lattice defect may deteriorate the device's functionality [13]. RILC (Radiation Induced Leakage Current) in the tunnel oxide of the floating gate causes a  $V_{th}$  shift of flash memory, resulting in soft-errors [14, 15]. The most important characteristics of single event failure is that the source is basically limited to one single well or substrate, whereas other fault sources are not limited to anything other than the well/substrate. The faults that do not originate from terrestrial radiation are not within the scope of this book, we refer to any fault caused by terrestrial radiation as 'fault' in the remaining text of this book.

# 1.4 Four Hierarchies in Faulty Conditions in Electronic Systems: Fault – Error – Hazard – Failure

Starting from faults in the well or substrate, failures in electronic systems take place in a kind of hierarchy of faulty conditions as illustrated in Figure 1.14. As illustrated in Figure 1.6, faults are only produced in the wells and faults do not always cause errors



Figure 1.14 Hierarchy of faulty conditions: fault-error-failure

(data flips in memory elements). They are generally difficult to detect and may actually disappear or may not be strong enough to cause errors. Only when fault is captured and causes data flips in memory devices such as SRAMs, DRAMs, flash memories and FFs, is it regarded as an *error* at the device or circuit level. When a single particle penetrates into a device(s), it can cause multiple faults or multiple errors. In definition, physical consequence due to one single particle including a neutron is called a 'Single Event Effect (SEE)'. When an SEE causes an error(s), we refer to this phenomenon as 'SEU (Single Event Upset)'. Again, SEU can consist of multiple errors. The important thing is that Soft Error Rate (SER) is defined by the number of SEUs, not by the number of errors.

One more important concept used in this book is the SEU cross section  $\sigma_{seu}$  that is defined by,

$$\sigma_{\rm seu} = \frac{N_{\rm seu}}{\Phi_{\rm p}},\tag{1.1}$$

where,

 $N_{\text{seu}}$ : the number of SEUs (not errors!)/count;  $\Phi_{\text{p}}$ : fluence of particles (neutrons)/(n/cm<sup>2</sup>).

Fluence means the total number of particles passed through a unit area.

 $\sigma_{seu}$  can be measured using the accelerator experiment (see Chapter 4) and one can calculate SER by using the  $\sigma_{seu}$  as follows:

$$SER = \sigma_{seu} \times \phi_p \times 1 \times 10^9 / \text{FIT},$$
 (1.2)

where,

 $\phi_{\rm p}$ : flux of the particle (count h<sup>-1</sup> cm<sup>-2</sup>); FIT: Failure In Time, SER in 10<sup>9</sup> hour.

Flux means the number of particles which pass through a unit area per unit time.

An error does not always cause failure, depending mainly on the location and the functionality of the system. Only when an error(s) propagates to the final output and causes malfunction of the system canwe call this consequence a '*failure*'. We may call incorrect output of the controller or PCB, which does not affect normal operation of the system, a '*hazard*'. An error does not always cause a system hazard or failure, because it may disappear or may be *masked* during propagation in the chip or board by some masking effects. Some mitigation techniques like parity, Error Checking and Correction (ECC) and interleaving techniques may be applied to reduce SER. Failure is not compensated for by the system without physical or economic damage. Failures include shutdown and abnormal operation of the system. Incorrect calculation by using super computers can also be categorised into failure.

# **1.5 Historical Background of Soft-Error Research**

Scaling down of semiconductor devices to sub-100 nm technology encounters a wide variety of technical challenges including:  $V_{\rm th}$  variation [6], Negative Bias Temperature Instability (NBTI) [16], short-channel effect [17], gate leakage [18], and so on. Terrestrial neutron-induced SEU has become one of the key issues that can present a major setback in scaling. Before going into detail about the current situation concerning soft-error problems, let us look at the historical background on which the soft-error problem has escalated.

Table 1.2 summarises the history of soft-error research with SRAM design rules and densities. Ever since  $\alpha$ -ray soft error was first discovered in DRAMs, we have experienced a number of distinctive *paradigm shifts*. Five or six paradigm shifts in soft-error research from 1979 to 2013 are highlighted in Table 1.2. It is well known that  $\alpha$ -ray soft error in DRAM was discovered by May and Woods in 1979 [19]. In the same year, the possibility of soft-error due to a terrestrial neutron is pointed out by Ziegler and Lanford [20]. As the impact of  $\alpha$ -ray was much larger than neutron induced soft-error, intensive efforts are focused on  $\alpha$ -ray soft error. Until the early 1990s, alpha-ray soft error in DRAMs was overcome by several countermeasures such as triple-well structure, usage of low-alpha materials and shielding by package materials [21–23]. Scaling of semiconductor devices potentially has beneficial effects toward reduction of  $\alpha$ -ray soft error. Namely, reduction in the probability of hitting the storage node by alpha particles due to shrinkage of the area of the storage node and reduction of the charge collected to the nodes due to reduction in the volume for charge collection [24, 25]. Thus, soft error has not mattered at the ground level for the time being. As the device design rule is shrunk down further to around 130 nm, soft error in SRAMs by a terrestrial neutron has become significant. It is surprising that trends of literature data of SER in DRAM and SRAM by nucleon (neutron and proton) turn out to be reversed and the soft-error rate of SRAM has become much higher than DRAMs when memory density exceeds 4 Mbit/device or the SRAM design rule has become smaller than 250 nm as indicated in Figure 1.15. Due to the beneficial effects of scaling as mentioned previously, soft error in DRAMs that have capacitors has naturally decreased. Meanwhile, the parasitic capacitor of SRAMs inevitably has decreased resulting in the reduction of the critical charge  $Q_{\rm crit}$  as the tradeoff of the beneficial effects of scaling.

As this malicious trend in SRAM soft error by terrestrial neutrons is becoming obvious, major network vendors since 2000 have begun to request that SRAM venders carry out neutron irradiation tests to clarify the susceptibility for soft error [27]. The reversal phenomena and network vendor action are believed to trigger the paradigm shift in soft-error research from DRAM to SRAM. Filing neutron irradiation data reports to users have been almost mandatory for memory vendors, and this has triggered discussions on neutron standard testing methods worldwide. As a consequence, JESD89 [28] for neutron, proton and  $\alpha$ -ray SER testing methods was issued as the de  $\oplus$ 

| No. | Years                     | Main features<br>in paradigm<br>shift                                                                                                            | Events                                                       | SRAM<br>design rule<br>(nm) | SRAM<br>density<br>(Mb) |
|-----|---------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------|-----------------------------|-------------------------|
| 1   | 1979 to<br>early<br>1990s | <ul> <li>Discover a-ray soft error and development of mitigation techniques for DRAMs</li> <li>α-ray soft error in DRAMs is everyone.</li> </ul> | ✓ First finding by<br>May (1979)                             | >250                        | <64 K                   |
| 2   | Late 1990s<br>to 2000     | ✓ Impact of terrestrial<br>neutron induced<br>soft-error is widely<br>recognized in<br>SRAMs [9]                                                 | ✓ An SRAM vender<br>was blamed for<br>soft-error of<br>SRAMs | 130                         | 128 K-4 M               |
| 3   | 2000-2005                 | ✓ MCU (multi-cell<br>upset) in SRAMs<br>become major<br>concerns [26]                                                                            | ✓ JESD89 (2001)                                              | 90                          | 8 M                     |
|     |                           | <ul> <li>✓ Bipolar mode<br/>soft-error was<br/>discovered (<i>idem</i>.)</li> <li>✓ Concerns spread over<br/>logic devices</li> </ul>            | ✓ Polishing method:<br>BPSG → CMP                            |                             |                         |
| 4   | 2006-2009                 | ✓ Memory SER was<br>basically overcome<br>[9]                                                                                                    | ✓ JESD89A (2006)                                             | 65                          | 16 M                    |
|     |                           | ✓ Mitigation of MNT                                                                                                                              | ✓ AEC Q100 G                                                 |                             |                         |
| 5   | 2010-2013                 | ✓ Concerns spread over<br>redundant large and<br>real-time systems                                                                               | (2008)<br>✓ ISO26262 (2011)                                  | <40                         | >32 M                   |
|     |                           | ✓ Power/cost effective<br>mitigation<br>techniques are<br>becoming the top<br>priority in terrestrial<br>industries                              | ✓ IEC62396                                                   |                             |                         |
|     |                           | ✓ Concerns spread over<br>terrestrial particles<br>other than neutrons                                                                           | ✓ Cloud/exa-scale computing/big data                         |                             |                         |

 Table 1.2
 Five paradigm shifts in the history of soft-error research



Figure 1.15 Reverse trends in SEU cross section in SRAMs and DRAMs as scaling proceeds

facto soft-error testing standard in 2001. In the third paradigm shift in 2000-2005, concerns over neutron soft error further spread in two directions from around 2004. One direction was about the concerns over multi cell upset (MCU) that emerged from the 130 nm process due to bipolar [26] or charge sharing effects [29]. When MCU takes place over more than two bits in the same word of SRAM, it cannot be recovered using EDAC (Error Detection and Correction) or ECC or it results in system crash. EDAC or ECC can detect two-bit errors and correct one-bit error, but cannot correct two-bit errors in the same word. It was found that almost all MCUs take place in one MOSFET well and align along with only two adjacent bit lines. Based on this finding, MCUs in memory devices were basically overcome by applying both ECC and interleaving [9]. Another direction is over concerns on SET (noise) in sequential and combinational logic devices. SER in flip-flops is predicted to be close to that in SRAM beyond the 90 nm design rule [30], but there have been no effective detection and correction methods in logic devices except for redundancy techniques. Moreover, obvious threats due to MNT are found in most space redundancy techniques, like TMR (Triple Module Redundancy) [31] and DICE (Dual Interlocked storage Cell) [32]. In TMR, instructions are executed in three identical modules and two out of three voting is carried out to obtain reliable output for the next set of modules. DICE has two input nodes, and only when the values of two nodes are identical is the input transferred to output. MNT appears in logic devices down to the same mechanism as MCU in memory devices. If this happens in two modules of TMR [33] or two input nodes of DICE [34], the recovery mechanism fails and may result in SDC

(Silent Data Corruption) that can cause un-recognisable system failure [35]. Several ideas to harden DICE-like flip-flops are proposed by Uemura et al. [36] and Lee et al. [37] separately in 2010.

In addition to the two distinct directions, Baumann *et al.* pointed out soft error caused by thermal neutron caption reaction by <sup>10</sup>B that has a natural abundance of 19.9% in Boron contained in the BPSG (Boron Phosphor Silicate Glass) used for flattening wafer surface [38]. When thermal neutrons, which typically have an energy level of 25 meV equilibrium with atmospheric molecular dynamics, are captured in <sup>10</sup>B nuclei, He ion (1.47 MeV) and <sup>7</sup>Li ion (0.84 MeV) are released and cause soft error by direct ionisation. This type of soft error appeared to be overcome by changing the polishing process from BPSG to CMP (Chemical Mechanical Polishing) for the time being.

In the fourth paradigm shift in 2006–2009, discussions over the revision of JESD89 started in 2003 because the original JESD89 has a number of limitations: the testing facility assigned as the standard facility is the only spallation neutron source in Los Alamos National Laboratory, for example. The new version JESD89A [39] was issued in 2006 with more reliable testing and analysis methods including the *quasi-monoenergetic neutron* test method. A differential spectrum of terrestrial neutron was revised and a certain number of neutron irradiation facilities were added as the standard facilities. An IEC60749-38 standard that is consistent with JESD89A was issued in 2008 as the de jure standard [40].

In the fifth paradigm shift during 2010–2013, the impact by soft error spread over large electronic systems. For big data-centres [41] or exa-scale supercomputers [42], power reduction is one of most important key designs. The space redundancy techniques that have a large area and power overheads will therefore not be applied to such big systems, in principle. Real-time (safety critical) systems like avionics or micro-control units in automobiles are also attracting serious concern and in-depth studies are being widely pursued [43, 44]. AEC Q100 G [45] and ISO26262 [46] for automobiles were also issued in 2008 and 2011, respectively.

With common recognition that soft-error induced system failure cannot be suppressed to a satisfactory level by applying mitigation techniques only to single stack layer (device, circuit, chip, board, firmware, OS, middleware, and so on), communication and combined mitigation techniques among stack layers are encouraged as an unavoidable option [47, 48]. It has already been generally accepted from around 2000 that mitigation techniques applied to a single stack layer cannot be an effective nor promising solution against system failures, and collaboration among stack layers must be realised [49, 50]. In reality, such collaboration has turned out to be very difficult since the basic engineering skills in each stack layer are essentially and significantly different. Most engineers/researchers cannot expand their specialties beyond their stack layers. Novel strategies to overcome this situation are needed in order to allow further exploration. A *built-in* communication scheme among the stack layers is proposed by Ibe *et al.* in their LABIR (inter Layer Built-In Reliability) concept [51, 52]. Evans *et al.* are proposing the RIIF (Reliability Information Interchange Format) as the common format or protocol to be used in system design among stack layers [53].

In the possible sixth paradigm shift, other terrestrial particles, such as muons, low energy neutrons and protons, are being pointed out as possible SER threats at ground level. Due to mainly scaling semiconductor devices under 100 nm, a number of new radiations/particles are being pointed out as the source of soft error. Alpha particles with VLA (Very Low Alpha)-level packaging can again cause soft error in SRAMs [54]. Sub-100 nm SRAMs are sensitive to soft error due to terrestrial muons [55]. Low (thermal) energy neutrons cause soft error due to neutron capture reaction of B-10 in devices without BPSG processes [56, 57]. Concerns about electrons (beta rays) and gamma rays have been boosted after the severe nuclear accident in the Fukushima nuclear plant site [58, 59].

# **1.6 General Scope of This Book**

In Chapter 2, sources of terrestrial radiation and their properties are introduced. In Chapter 3, mechanisms of radiation effects are described in depth. In Chapter 4, fundamentals of electronic devices and systems such as memories, logic gates, FPGAs, OS and processors necessary to understand the following chapters are introduced. Chapter 5 summarises a wide range of experimental facilities for irradiation tests. In Chapter 6, simulation techniques for soft-error and typical simulation results are introduced, particularly for neutron irradiation effects on several error/fault modes. Chapter 7 describes detection and classification techniques of faults, errors and failures. In Chapter 8, mitigation techniques against failures in electronic systems and challenges are summaried. Chapter 9 summaries this book. In the Appendices, some additional information together with some coding techniques in Visual Basic and some sample codes are provided. Those who are not interested in simulation may skip Chapter 6 and the Appendices.

### References

- [1] JEDEC Standard JESD89A. (2006) Measurement and Reporting of Alpha Particle and Terrestrial Cosmic Ray-Induced Soft Errors in Semiconductor Devices, JEDEC.
- [2] Nakamura, T., Baba, M., Ibe, E. et al. (2008) Terrestrial Neutron-Induced Soft-Errors in Advanced Memory Devices, World Scientific, Hackensack, NJ.
- [3] http://cr0.izmiran.rssi.ru/mosc/main.htm (accessed August 17, 2014)
- [4] Solar Energetic Particles and Cosmic Rays http://www.solar-system-school.de/lectures /marsch/7.pdf (accessed 14 February 2013)
- [5] http://www.kusastro.kyoto-u.ac.jp/~kamaya/ISP/chap7.ppt (accessed 14 February 2013)

- [6] Sugii, N., Tsuchiya, R., Ishigaki, T. *et al.* (2008) Comprehensive study on Vth variability in silicon on thin BOX (SOTB) CMOS with small random-dopant fluctuation: finding a way to further reduce variation. IEEE International Devices Meeting, San Francisco, CA, 15–17 December, pp. 249–253.
- [7] Hisamoto, D., Lee, W.-C., Kedziersk, J. *et al.* (2000) FinFET-a self-aligned double-gate MOSFET scalable to 20 nm. *IEEE Transactions on Electron Devices*, **47** (12), 2320–2325.
- [8] Hu, C. (1982) Alpha-particle-induced field and enhanced collection of carriers. *IEEE Electron Device Letters*, EDL-3 (2), 31–34.
- [9] Ibe, E., Chung, S., Wen, S. *et al.* (2006) Spreading diversity in multi-cell neutron-induced upsets with device scaling. The 2006 IEEE Custom Integrated Circuits Conference, San Jose, CA, 10–13 September, 2006, pp. 437–444.
- [10] Walker, C.S. (1990) Capacitance, Inductance and Crosstalk Analysis (Artech House Antennas and Propagation Library), Altech House Publisher.
- [11] Kanekawa, N., Ibe, E., Suga, T. and Uematsu, Y. (2010) Dependability in Electronic Systems-Mitigation of Hardware Failures, Soft Errors, and Electro-Magnetic Disturbances, Springer, New York.
- [12] Crain, S.H., Mazur, J.E., Katz, R.B. et al. (2001) Analog and digital single-event effects experiments in space. *IEEE Transactions on Nuclear Science*, 48 (6), 1841–1848.
- [13] Lacoe, R.C. (2004) Fabricating radiation-hardened digital components at commercial CMOS foundries using hardness-by-design techniques. The 6th International Workshop on Radiation Effects on Semiconductor Devices for Space Application, Tsukuba, 6–8 October 2004, pp. 227–234.
- [14] Cellere, G., Pellati, P., Chimenton, A. *et al.* (2001) Radiation effects on floating-gate memory cells. *IEEE Transactions on Nuclear Science*, 48 (6), 2222.
- [15] Butt, N.Z. and Alam, M. (2008) Modeling single event upsets in floating gate memory devices. IEEE International Reliability Physics Symposium, Anaheim, CA, No. 5D.1, pp. 547–555.
- [16] Wen, S., Wong, R. and Silburt, A. (2008) IC component SEU impact analysis. IEEE Workshop on Silicon Errors in Logic – System Effects, University of Texas at Austin, 26 March, p. 27.
- [17] Villanueva, D., Pouydebasque, A., Robilliart, E. *et al.* (2003) Impact of the lateral source/drain abruptness on MOSFET characteristics and transport properties. IEEE International Electron Devices Meeting, Washington, DC, 7–10 December 2003 (9.4).
- [18] Clark, L.T., Moh, K.C., Holbert, K.E. et al. (2007) Optimizing radiation hard by design SRAM cells. *IEEE Transactions on Nuclear Science*, 54 (6), 2028–2036.
- [19] May, T.C. and Woods, M.H. (1979) Alpha-particle-induced soft errors in dynamic memories. *IEEE Transactions on Electron Devices*, ED-26 (1), 2–9.
- [20] Ziegler, J.F. and Lanford, W.A. (1979) Effect of cosmic rays on computer memories. *Science*, **206**, 776–788.
- [21] Takeuchi, K., Shimohigashi, K., Takeda, E. and Yamasaki, E. (1987) Experimental characterization of  $\alpha$ -induced charge collection mechanism for megabit DRAM cells. IEEE International Solid-State Circuits Conference, 10 February 1987, pp. 99–100.

- [22] Sai-Halasz, G.A., Wordeman, M.R. and Dennard, R.H. (1982) Alpha-particle-induced soft error rate in VLSI circuits. *IEEE Transactions on Electron Devices*, **ED-29** (4), 725–731.
- [23] Thompson, C.E. and Meese, J.M. (1981) Reduction of α-particle sensitivity in dynamic semiconductor memeories (16k d-RAMs) by neutron irradiation. *IEEE Transactions on Nuclear Science*, **28** (6), 3987–3993.
- [24] Ibe, E. (2001) Current and future trend on cosmic-ray-neutron induced single event upset at the ground down to 0.1-micron-device. The Svedberg Laboratory Workshop on Applied Physics, Uppsala, Sweden, May, 3 (1).
- [25] Ibe, E. Yahagi, Y. Kataoka, F. *et al.* (2002) A self-consistent integrated system for terrestrial-neutron induced single event upset of semiconductor devices at the ground. 2002 International Conference on Information Technology and Application, Bathurst, Australia, 25–28 November 2002, pp. 273–221.
- [26] Ibe, E., Kameyama, H., Yahagi, Y. *et al.* (2004) Distinctive asymmetry in neutroninduced multiple error patterns of 0.13umocess SRAM. The 6th International Workshop on Radiation Effects on Semiconductor Devices for Space Application, Tsukuba, Japan, 6–8 October 2004, pp. 19–23.
- [27] Cataldo, A. (2001) SRAM Soft Errors Cause Hard Network Problems, http://eetimes .com/electronics-news/4042377/SRAM-soft-errors-cause-hard-network-problems-(accessed 18 February 2013).
- [28] JEDEC Standard JESD89. (2001) Measurement and Reporting of Alpha Particle and Terrestrial Cosmic Ray Induced Soft Errors in Semiconductor Devices. JEDEC, pp. 1–63.
- [29] Seifert, N., Gill, B., Zhang, M. *et al.* (2007) On the scalability of redundancy based SER mitigation schemes. International Conference on IC Design and Technology, Austin, Texas, 18–20 May (G2), pp. 197–205.
- [30] Shivakumar, P., Kistler, M.S., Keckler, A. *et al.* (2002) Modeling the effect of technology trends on the soft error rate of combinational logic. International Conference on Dependable Systems and Networks, pp. 389–398.
- [31] Pilotto, C., Azambuja, J.R. and Kastensmidt, F.L. (2008) Synchronizing triple modular redundant designs in dynamic partial reconfiguration applications. Proceedings of the 21st Annual Symposium on Integrated Circuits and System Design, September 2008, pp. 199–204.
- [32] Calin, T., Nicolaidis, M. and Velazco, R. (1996) Upset hardened memory design for submicron CMOS technology. *IEEE Transactions on Nuclear Science*, 43 (6), 2874–2878.
- [33] Quinn, H., Morgan, K., Graham, P. *et al.* (2007) Domain crossing events: limitations on single device triple-modular redundancy circuits in Xilinx FPGAs. International Nuclear and Space Radiation Effects Conference, Honolulu, Hawaii, July 23–27 (C-5).
- [34] Seifert, N. and Zia, V. (2007) Assessing the impact of scaling on the efficacy of spatial redundancy based mitigation schemes for terrestrial applications. IEEE Workshop on Silicon Errors in Logic – System Effects 3, Austin, TZ, April 3, 4.
- [35] Quinn, H., Tripp, J., Fairbanks, T. and Manuzzato, A. (2011) Improving microprocessor reliability through software mitigation. IEEE Workshop on Silicon Errors in Logic – System Effects, Champaign, Illinoi, 29–30 March, pp. 16–21.

- [36] Uemura, T., Tosaka, Y., Matsuyama, H.K. and Shono, K. (2010) SEILA: soft error immune latch for mitigating multi-node-SEU and local-clock-SET. IEEE International Reliability Physics Symposium 2010, Anaheim, California, 2–6 May, pp. 218–223.
- [37] Lee, H.-H. Lilja, K. and Mitra, S. (2010) Design of a sequential logic cell using LEAP: layout design through error aware placement. IEEE Workshop on System Effects of Logic Soft Errors, Stanford University, 23 March.
- [38] Baumann, R.C. and Smith, E.B. (2000) Neutron-induced boron fission as a major source of soft errors in deep submicron SRAM devices. IEEE International Reliability Physics Symposium Proceedings, San Jose, CA, 10–13 April, pp. 152–157.
- [39] JEDEC Standard JESD89A. (2006) Measurement and Reporting of Alpha Particle and Terrestrial Cosmic Ray Induced Soft Errors in Semiconductor Devices. JEDEC, pp. 1–93.
- [40] IEC IEC60749-38. (2008) Part 38: Soft Error Test Method for Semiconductor Devices With Memory. Semiconductor Devices. Mechanical and Climatic Test Methods, pp. 1–9.
- [41] Pecchia, A., Cotroneo, D., Kalbarczyk, Z. and Iyer, R.K. (2011) Improving log-based field failure data analysis of multi-node computing systems. International Conference on Dependable Systems and Networks, Hong Kong, China, 28–30 June, pp. 97–108.
- [42] Bronevetsky, G. and deSupinski, B. (2007) Soft error vulnerability of iterative linear algebra methods. IEEE Workshop on Silicon Errors in Logic – System Effects 3, Austin, TX, April 3, 4.
- [43] Abella, J.F., Cazorlal, J., Gizopoulos, D. *et al.* (2011) Towards improved survivability in safety-critical systems. 17th IEEE International On-Line Testing Symposium, Athens, Greece, 13–15 July (S3), pp. 242–247.
- [44] Baumeister, D. and Anderson, S.G.H. (2012) Evaluation of chip-level irradiation effects in a 32-bit safety microcontroller for automotive braking applications. IEEE Workshop on Silicon Errors in Logic – System Effects, Champaign-Urbana, Illinois, 27–28 March (2.2).
- [45] Automotive Electronics Council (2007) Failure Mechanism Based Stress Test Quantification for Integrated Circuits, AEC-Q100 Rev.G, pp. 1–35.
- [46] ISO ISO26262. (2011) *Road Vehicles-Functional Safety*, International Organization for Standardization.
- [47] Quinn, H. (2011) Study on cross layer reliability. IEEE Workshop on Silicon Errors in Logic – System Effects, Champaign, Illinoi, 29–30 March.
- [48] Carter, N. (2010) Cross-layer reliability. IEEE Workshop on System Effects of Logic Soft Errors, Stanford University, 23 March.
- [49] Slayman, C. (2003) Eliminating the threat of soft errors a system vendor perspective. IRPS SER Panel Discussion, Eliminating the Threat of Soft Error, Dallas, TX, 2 April 2003, (6).
- [50] Ibe, E., Kameyama, H., Yahagi, Y. and Yamaguchi, H. (2005) Single event effects as a reliability issue of IT infrastructure. 3rd International Conference on Information Technology and Applications, Sydney, Australia, I, 3–7 July 2005, pp. 555–560.
- [51] Ibe, E., Shimbo, K., Toba, T. *et al.* (2011) LABIR: inter-LAyer built-in reliability for electronic components and systems. Silicon Errors in Logic – System Effects, Champaign, IL, 27 March.

- [52] Ibe, E., Shimbo, K., Taniguchi, H. *et al.* (2011) Quantification and mitigation strategies of neutron induced soft-errors in CMOS devices and components-the past and future. IEEE International Reliability Physics Symposium, Monterey, CA, 12–14 April (3C2).
- [53] Evans, A., Nicolaidis, M., Wen, S.-J. *et al.* (2012) RIF reliability information interchange format. IEEE International On-Line Testing Symposium, Sitges, Spain, 27–29 June 2012 (6.2).
- [54] Kobayashi, H., Kawamoto, N., Kase, J. and Shiraishi, K. (2009) Alpha particle and neutron-induced soft error rates and scaling trends in SRAM. IEEE International Reliability Physics Symposium 2009, Montreal, Quebec, 28–30 April (2H4), pp. 206–211.
- [55] Sierawski, B.D., Mendenhall, M.H., Reed, R.A. *et al.* (2010) Muon-induced single event upsets in deep-submicron technology. *IEEE Transactions on Nuclear Science*, 57 (6), 3273–3278.
- [56] Wen, S., Wong, R., Romain, M. and Tam, N. (2010) Thermal neutron soft error rate for SRAMs in the90nm-45nm technology range. 2010 IEEE International Reliability Physics Symposium, Anaheim, CA, 2–6 May (SE5.1), pp. 1036–1039.
- [57] Wen, S., Pai, S.Y., Wong, R. *et al.* (2010) B10 findings and correlation to thermal neutron soft error rate sensitivity for SRAMs in the Sub-micron technology. IEEE International Integrated Reliability Workshop, Stanford Sierra, CA, 17–21 October, pp. 31–33.
- [58] Baumann, R.C. (2011) Determining the impact of alpha-particle-emitting contamination from the Fukushima Daiichi disaster on Japanese manufacturing sites. 12th European Conference on Radiation and its Effects on Component and Systems, Sevilla, Spain, 19–23 September, pp. 784–787.
- [59] Ibe, E., Toba, T., Shimbo, K. and Taniguchi, H. (2012) Fault-based reliable design-onupper-bound of electronic systems for terrestrial radiation including muons, electrons, protons and low energy neutrons. IEEE International On-Line Testing Symposium, Sitges, Spain, 27–29 June, 2012 (3.2).