Approximate dynamic programming: solving the curses of dimensionality

Powell, Warren Buckler

131,35 €(IVA inc.)

Contenido

ÍNDICE: Preface. Acknowledgments. 1. The challenges of dynamic programming.1.1 A dynamic programming example: a shortest path problem. 1.2 The three curses of dimensionality. 1.3 Some real applications. 1.4 Problem classes. 1.5 The many dialects of dynamic programming. 1.6 What is new in this book? 1.7 Pedagogy. 1.8 Bibliographic notes. 2. Some illustrative models. 2.1 Deterministic problems. 2.2 Stochastic problems. 2.3 Information acquisition problems. 2.4 Asimple modeling framework for dynamic programs. 2.5 Bibliographic notes. Problems. 3. Introduction to Markov decision processes. 3.1 The optimality equations. 3.2 Finite horizon problems. 3.3 Infinite horizon problems. 3.4 Value iteration. 3.5 Policy iteration. 3.6 Hybrid valuepolicy iteration. 3.7 Average reward dynamic programming. 3.8 The linear programming method for dynamic programs. 3.9 Monotone policies. 3.10 Why does it work? 3.11 Bibliographic notes. Problems. 4. Introduction to approximate dynamic programming. 4.1 The three curses of dimensionality (revisited). 4.2 The basic idea. 4.3 Qlearning and SARSA. 4.4 Real time dynamic programming. 4.5 Approximate value iteration. 4.6 The postdecision state variable. 4.7 Lowdimensional representations of value functions. 4.8 So just what is approximate dynamic programming? 4.9 Experimental issues. 4.10 But does it work? 4.11 Bibliographic notes. Problems. 5. Modeling dynamic programs. 5.1 Notational style. 5.2 Modeling time. 5.3 Modeling resources. 5.4 The states of our system. 5.5 Modeling decisions. 5.6 The exogenous information process. 5.7 The transition function. 5.8 The objective function. 5.9 A measuretheoretic view of information. 5.10 Bibliographic notes. Problems. 6.Policies. 6.1 Myopic policies. 6.2 Lookahead policies. 6.3 Policy function approximations. 6.4 Value function approximations. 6.5 Hybrid strategies approximations. 6.6 Randomized policies. 6.7 How to choose a policy? 6.8 Bibliographic notes. Problems. 7. Policy search. 7.1 Background. 7.2 Gradient search. 7.3 Direct policy search for finite alternatives. 7.4 The knowledge gradient algorithm for discrete alternatives. 7.5 Simulation optimization. 7.6 Why does it work? 7.7 Bibliographic notes. Problems. 8. Approximating value functions. 8.1 Lookup tables and aggregation. 8.2 Parametric models. 8.3 Regression variations. 8.4 Nonparametric models. 8.5 Approximations and the curse of dimensionality. 8.6 Why does it work? 8.7 Bibliographic notes. Problems. 9. Learning value function approximations. 9.1 Sampling the value of a policy. 9.2 Stochastic approximation methods. 9.3 Recursive least squares for linear models. 9.4 Temporal difference learning with a linear model. 9.5 Bellmans equation using a linear model. 9.6 Analysis of TD(O), LSTD and LSPE using a single state. 9.7 Gradientbased. 9.8 Least squares temporal differencing with kernel regression. 9.9 Value function approximations based on Bayesian learning. 9.10 Why does it work. 9.11 Bibliographic notes. Problems. 10. Opti

Detalles del libro

ISBN: 978-0-470-60445-8
Editorial: John Wiley & Sons
Encuadernacion: Rústica
Páginas: 656
Fecha Publicación: 05/08/2011
Nº Volúmenes: 1
Idioma: Inglés