Designing High Availability Systems

Designing High Availability Systems

Taylor, Zachary
Ranganathan, Subramanyam

104,21 €(IVA inc.)

A practical, step–by–step guide to designing world–class, high availability systems using both classical and DFSS reliability techniques Whether designing telecom, aerospace, automotive, medical, financial, or public safety systems, every engineer aims for the utmost reliability and availability in the systems he, or she, designs. But between the dream of world–class performance and reality falls the shadow of complexities that can bedevil even the most rigorous design process. While there are an array of robust predictive engineering tools, there has been no single–source guide to understanding and using them . . . until now. Offering a case–based approach to designing, predicting, and deploying world–class high–availability systems from the ground up, this book brings together the best classical and DFSS reliability techniques. Although it focuses on technical aspects, this guide considers the business and market constraints that require that systems be designed right the first time. Written in plain English and following a step–by–step cookbook format, Designing High Availability Systems: Shows how to integrate an array of design/analysis tools, including Six Sigma, Failure Analysis, and Reliability Analysis Features many real–life examples and case studies describing predictive design methods, tradeoffs, risk priorities, what–if scenarios, and more Delivers numerous high–impact takeaways that you can apply to your current projects immediately Provides access to MATLAB programs for simulating problem sets presented, along with PowerPoint slides to assist in outlining the problem–solving process Designing High Availability Systems is an indispensable working resource for system engineers, software/hardware architects, and project teams working in all industries. INDICE: Preface xiii List of Abbreviations xvii 1. Introduction 1 2. Initial Considerations for Reliability Design 3 2.1 The Challenge 3 2.2 Initial Data Collection 3 2.3 Where Do We Get MTBF Information? 5 2.4 MTTR and Identifying Failures 6 2.5 Summary 7 3. A Game of Dice: An Introduction to Probability 8 3.1 Introduction 8 3.2 A Game of Dice 10 3.3 Mutually Exclusive and Independent Events 10 3.4 Dice Paradox Problem and Conditional Probability 15 3.5 Flip a Coin 21 3.6 Dice Paradox Revisited 23 3.7 Probabilities for Multiple Dice Throws 24 3.8 Conditional Probability Revisited 27 3.9 Summary 29 4. Discrete Random Variables 30 4.1 Introduction 30 4.2 Random Variables 31 4.3 Discrete Probability Distributions 33 4.4 Bernoulli Distribution 34 4.5 Geometric Distribution 35 4.6 Binomial Coeffi cients 38 4.7 Binomial Distribution 40 4.8 Poisson Distribution 43 4.9 Negative Binomial Random Variable 48 4.10 Summary 50 5. Continuous Random Variables 51 5.1 Introduction 51 5.2 Uniform Random Variables 52 5.3 Exponential Random Variables 53 5.4 Weibull Random Variables 54 5.5 Gamma Random Variables 55 5.6 Chi–Square Random Variables 59 5.7 Normal Random Variables 59 5.8 Relationship between Random Variables 60 5.9 Summary 61 6. Random Processes 62 6.1 Introduction 62 6.2 Markov Process 63 6.3 Poisson Process 63 6.4 Deriving the Poisson Distribution 64 6.5 Poisson Interarrival Times 69 6.6 Summary 71 7. Modeling and Reliability Basics 72 7.1 Introduction 72 7.2 Modeling 75 7.3 Failure Probability and Failure Density 77 7.4 Unreliability, F(t) 78 7.5 Reliability, R(t) 79 7.6 MTTF 79 7.7 MTBF 79 7.8 Repairable System 80 7.9 Nonrepairable System 80 7.10 MTTR 80 7.11 Failure Rate 81 7.12 Maintainability 81 7.13 Operability 81 7.14 Availability 82 7.15 Unavailability 84 7.16 Five 9s Availability 85 7.17 Downtime 85 7.18 Constant Failure Rate Model 85 7.19 Conditional Failure Rate 88 7.20 Bayes’s Theorem 94 7.21 Reliability Block Diagrams 98 7.22 Summary 107 8. Discrete–Time Markov Analysis 110 8.1 Introduction 110 8.2 Markov Process Defined 112 8.3 Dynamic Modeling 116 8.4 Discrete Time Markov Chains 116 8.5 Absorbing Markov Chains 123 8.6 Nonrepairable Reliability Models 129 8.7 Summary 140 9. Continuous–Time Markov Systems 141 9.1 Introduction 141 9.2 Continuous–Time Markov Processes 141 9.3 Two–State Derivation 143 9.4 Steps to Create a Markov Reliability Model 147 9.5 Asymptotic Behavior (Steady–State Behavior) 148 9.6 Limitations of Markov Modeling 154 9.7 Markov Reward Models 154 9.8 Summary 155 10. Markov Analysis: Nonrepairable Systems 156 10.1 Introduction 156 10.2 One Component, No Repair 156 10.3 Nonrepairable Systems: Parallel System with No Repair 165 10.4 Series System with No Repair: Two Identical Components 172 10.5 Parallel System with Partial Repair: Identical Components 176 10.6 Parallel System with No Repair: Nonidentical Components 183 10.7 Summary 192 11. Markov Analysis: Repairable Systems 193 11.1 Repairable Systems 193 11.2 One Component with Repair 194 11.3 Parallel System with Repair: Identical Component Failure and Repair Rates 204 11.4 Parallel System with Repair: Different Failure and Repair Rates 217 11.5 Summary 239 12. Analyzing Confidence Levels 240 12.1 Introduction 240 12.2 pdf of a Squared Normal Random Variable 240 12.3 pdf of the Sum of Two Random Variables 243 12.4 pdf of the Sum of Two Gamma Random Variables 245 12.5 pdf of the Sum of n Gamma Random Variables 246 12.6 Goodness–of–Fit Test Using Chi–Square 249 12.7 Confidence Levels 257 12.8 Summary 264 13. Estimating Reliability Parameters 266 13.1 Introduction 266 13.2 Bayes’ Estimation 268 13.3 Example of Estimating Hardware MTBF 273 13.4 Estimating Software MTBF 273 13.5 Revising Initial MTBF Estimates and Tradeoffs 274 13.6 Summary 277 14. Six Sigma Tools for Predictive Engineering 278 14.1 Introduction 278 14.2 Gathering Voice of Customer (VOC) 279 14.3 Processing Voice of Customer 281 14.4 Kano Analysis 282 14.5 Analysis of Technical Risks 284 14.6 Quality Function Deployment (QFD) or House of Quality 284 14.7 Program Level Transparency of Critical Parameters 287 14.8 Mapping DFSS Techniques to Critical Parameters 287 14.9 Critical Parameter Management (CPM) 287 14.10 First Principles Modeling 289 14.11 Design of Experiments (DOE) 289 14.12 Design Failure Modes and Effects Analysis (DFMEA) 289 14.13 Fault Tree Analysis 290 14.14 Pugh Matrix 290 14.15 Monte Carlo Simulation 291 14.16 Commercial DFSS Tools 291 14.17 Mathematical Prediction of System Capability instead of “Gut Feel” 293 14.18 Visualizing System Behavior Early in the Life Cycle 297 14.19 Critical Parameter Scorecard 297 14.20 Applying DFSS in Third–Party Intensive Programs 298 14.21 Summary 300 15. Design Failure Modes and Effects Analysis 302 15.1 Introduction 302 15.2 What Is Design Failure Modes and Effects Analysis (DFMEA)? 302 15.3 Definitions 303 15.4 Business Case for DFMEA 303 15.5 Why Conduct DFMEA? 305 15.6 When to Perform DFMEA 305 15.7 Applicability of DFMEA 306 15.8 DFMEA Template 306 15.9 DFMEA Life Cycle 312 15.10 The DFMEA Team 324 15.11 DFMEA Advantages and Disadvantages 327 15.12 Limitations of DFMEA 328 15.13 DFMEAs, FTAs, and Reliability Analysis 328 15.14 Summary 330 16. Fault Tree Analysis 331 16.1 What Is Fault Tree Analysis? 331 16.2 Events 332 16.3 Logic Gates 333 16.4 Creating a Fault Tree 335 16.5 Fault Tree Limitations 339 16.6 Summary 339 17. Monte Carlo Simulation Models 340 17.1 Introduction 340 17.2 System Behavior over Mission Time 344 17.3 Reliability Parameter Analysis 344 17.4 A Worked Example 348 17.5 Component and System Failure Times Using Monte Carlo Simulations 359 17.6 Limitations of Using Nontime–Based Monte Carlo Simulations 361 17.7 Summary 365 18. Updating Reliability Estimates: Case Study 367 18.1 Introduction 367 18.2 Overview of the Base Station Controller—Data Only (BSC–DO) System 367 18.3 Downtime Calculation 368 18.4 Calculating Availability from Field Data Only 371 18.5 Assumptions Behind Using the Chi–Square Methodology 372 18.6 Fault Tree Updates from Field Data 372 18.7 Summary 376 19. Fault Management Architectures 377 19.1 Introduction 377 19.2 Faults, Errors, and Failures 378 19.3 Fault Management Design 381 19.4 Repair versus Recovery 382 19.5 Design Considerations for Reliability Modeling 383 19.6 Architecture Techniques to Improve Availability 383 19.7 Redundancy Schemes 384 19.8 Summary 395 20 Application of DFMEA to Real–Life Example 397 20.1 Introduction 397 20.2 Cage Failover Architecture Description 397 20.3 Cage Failover DFMEA Example 399 20.4 DFMEA Scorecard 401 20.5 Lessons Learned 402 20.6 Summary 403 21. Application of FTA to Real–Life Example 404 21.1 Introduction 404 21.2 Calculating Availability Using Fault Tree Analysis 404 21.3 Building the Basic Events 405 21.4 Building the Fault Tree 406 21.5 Steps for Creating and Estimating the Availability Using FTA 408 21.6 Summary 416 22. Complex High Availability System Analysis 420 22.1 Introduction 420 22.2 Markov Analysis of the Hardware Components 420 22.3 Building a Fault Tree from the Hardware Markov Model 427 22.4 Markov Analysis of the Software Components 427 22.5 Markov Analysis of the Combined Hardware and Software Components 433 22.6 Techniques for Simplifying Markov Analysis 437 22.7 Summary 446 References 447 Index 450

  • ISBN: 978-1-118-55112-7
  • Editorial: Wiley–Blackwell
  • Encuadernacion: Cartoné
  • Páginas: 480
  • Fecha Publicación: 10/12/2013
  • Nº Volúmenes: 1
  • Idioma: Inglés