본문 바로가기 주메뉴 바로가기

주메뉴

단행본 상세

서지주요정보
Reinforcement Learning, Second Edition : An Introduction
서명 / 저자	Reinforcement Learning, Second Edition : An Introduction.
저자명	Sutton, Richard S. Barto, Andrew G.
발행사항	Cambridge : MIT Press, 2018.
Online Access	https://ebookcentral.proquest.com/lib/dgist-ebooks/detail.action?docID=6260249 URL https://ebookcentral.proquest.com/lib/dgist-ebooks/detail.action?docID=6260249 URL

서지기타정보

서지기타정보
청구기호	Q325.6 .S888 2018
판사항	2nd ed.
형태사항	1 online resource (591 pages)
총서명	Adaptive Computation and Machine Learning Ser.
언어	English
내용	Intro -- Series Page -- Title Page -- Copyright -- Dedication -- Table of Contents -- Preface to the Second Edition -- Preface to the First Edition -- Summary of Notation -- 1. Introduction -- 1.1. Reinforcement Learning -- 1.2. Examples -- 1.3. Elements of Reinforcement Learning -- 1.4. Limitations and Scope -- 1.5. An Extended Example: Tic-Tac-Toe -- 1.6. Summary -- 1.7. Early History of Reinforcement Learning -- I: Tabular Solution Methods -- 2. Multi-armed Bandits -- 2.1. A k-armed Bandit Problem -- 2.2. Action-value Methods -- 2.3. The 10-armed Testbed -- 2.4. Incremental Implementation -- 2.5. Tracking a Nonstationary Problem -- 2.6. Optimistic Initial Values -- 2.7. Upper-Confidence-Bound Action Selection -- 2.8. Gradient Bandit Algorithms -- 2.9. Associative Search (Contextual Bandits) -- 2.10 Summary -- 3. Finite Markov Decision Processes -- 3.1. The Agent-Environment Interface -- 3.2. Goals and Rewards -- 3.3. Returns and Episodes -- 3.4. Unified Notation for Episodic and Continuing Tasks -- 3.5. Policies and Value Functions -- 3.6. Optimal Policies and Optimal Value Functions -- 3.7. Optimality and Approximation -- 3.8. Summary -- 4. Dynamic Programming -- 4.1. Policy Evaluation (Prediction) -- 4.2. Policy Improvement -- 4.3. Policy Iteration -- 4.4. Value Iteration -- 4.5. Asynchronous Dynamic Programming -- 4.6. Generalized Policy Iteration -- 4.7. Efficiency of Dynamic Programming -- 4.8. Summary -- 5. Monte Carlo Methods -- 5.1. Monte Carlo Prediction -- 5.2. Monte Carlo Estimation of Action Values -- 5.3. Monte Carlo Control -- 5.4. Monte Carlo Control without Exploring Starts -- 5.5. Off-policy Prediction via Importance Sampling -- 5.6. Incremental Implementation -- 5.7. Off-policy Monte Carlo Control -- 5.8. Discounting-aware Importance Sampling -- 5.9. Per-decision Importance Sampling -- 5.10. Summary. 6. Temporal-Difference Learning -- 6.1. TD Prediction -- 6.2. Advantages of TD Prediction Methods -- 6.3. Optimality of TD(0) -- 6.4. Sarsa: On-policy TD Control -- 6.5. Q-learning: Off-policy TD Control -- 6.6. Expected Sarsa -- 6.7. Maximization Bias and Double Learning -- 6.8. Games, Afterstates, and Other Special Cases -- 6.9. Summary -- 7. n-step Bootstrapping -- 7.1. n-step TD Prediction -- 7.2. n-step Sarsa -- 7.3. n-step Off-policy Learning -- 7.4. Per-decision Methods with Control Variates -- 7.5. Off-policy Learning Without Importance Sampling: The n-step Tree Backup Algorithm -- 7.6. A Unifying Algorithm: n-step Q(σ) -- 7.7. Summary -- 8. Planning and Learning with Tabular Methods -- 8.1. Models and Planning -- 8.2. Dyna: Integrated Planning, Acting, and Learning -- 8.3. When the Model Is Wrong -- 8.4. Prioritized Sweeping -- 8.5. Expected vs. Sample Updates -- 8.6. Trajectory Sampling -- 8.7. Real-time Dynamic Programming -- 8.8. Planning at Decision Time -- 8.9. Heuristic Search -- 8.10. Rollout Algorithms -- 8.11. Monte Carlo Tree Search -- 8.12. Summary of the Chapter -- 8.13. Summary of Part I: Dimensions -- II: Approximate Solution Methods -- 9. On-policy Prediction with Approximation -- 9.1. Value-function Approximation -- 9.2. The Prediction Objective (VE) -- 9.3. Stochastic-gradient and Semi-gradient Methods -- 9.4. Linear Methods -- 9.5. Feature Construction for Linear Methods -- 9.5.1. Polynomials -- 9.5.2. Fourier Basis -- 9.5.3. Coarse Coding -- 9.5.4. Tile Coding -- 9.5.5. Radial Basis Functions -- 9.6. Selecting Step-Size Parameters Manually -- 9.7. Nonlinear Function Approximation: Artificial Neural Networks -- 9.8. Least-Squares TD -- 9.9. Memory-based Function Approximation -- 9.10. Kernel-based Function Approximation -- 9.11. Looking Deeper at On-policy Learning: Interest and Emphasis -- 9.12. Summary. 10. On-policy Control with Approximation -- 10.1. Episodic Semi-gradient Control -- 10.2. Semi-gradient n-step Sarsa -- 10.3. Average Reward: A New Problem Setting for Continuing Tasks -- 10.4. Deprecating the Discounted Setting -- 10.5. Differential Semi-gradient n-step Sarsa -- 10.6. Summary -- 11. Off-policy Methods with Approximation -- 11.1. Semi-gradient Methods -- 11.2. Examples of Off-policy Divergence -- 11.3. The Deadly Triad -- 11.4. Linear Value-function Geometry -- 11.5. Gradient Descent in the Bellman Error -- 11.6. The Bellman Error is Not Learnable -- 11.7. Gradient-TD Methods -- 11.8. Emphatic-TD Methods -- 11.9. Reducing Variance -- 11.10. Summary -- 12. Eligibility Traces -- 12.1. The λ-return -- 12.2. TD(λ) -- 12.3. n-step Truncated λ-return Methods -- 12.4. Redoing Updates: Online λ-return Algorithm -- 12.5. True Online TD(λ) -- 12.6. Dutch Traces in Monte Carlo Learning -- 12.7. Sarsa(λ) -- 12.8. Variable λ and γ -- 12.9. Off-policy Traces with Control Variates -- 12.10. Watkins's Q(λ) to Tree-Backup(λ) -- 12.11. Stable Off-policy Methods with Traces -- 12.12. Implementation Issues -- 12.13. Conclusions -- 13. Policy Gradient Methods -- 13.1. Policy Approximation and its Advantages -- 13.2. The Policy Gradient Theorem -- 13.3. REINFORCE: Monte Carlo Policy Gradient -- 13.4. REINFORCE with Baseline -- 13.5. Actor-Critic Methods -- 13.6. Policy Gradient for Continuing Problems -- 13.7. Policy Parameterization for Continuous Actions -- 13.8. Summary -- III: Looking Deeper -- 14. Psychology -- 14.1. Prediction and Control -- 14.2. Classical Conditioning -- 14.2.1. Blocking and Higher-order Conditioning -- 14.2.2. The Rescorla-Wagner Model -- 14.2.3. The TD Model -- 14.2.4. TD Model Simulations -- 14.3. Instrumental Conditioning -- 14.4. Delayed Reinforcement -- 14.5. Cognitive Maps -- 14.6. Habitual and Goal-directed Behavior. 14.7. Summary -- 15. Neuroscience -- 15.1. Neuroscience Basics -- 15.2. Reward Signals, Reinforcement Signals, Values, and Prediction Errors -- 15.3. The Reward Prediction Error Hypothesis -- 15.4. Dopamine -- 15.5. Experimental Support for the Reward Prediction Error Hypothesis -- 15.6. TD Error/Dopamine Correspondence -- 15.7. Neural Actor-Critic -- 15.8. Actor and Critic Learning Rules -- 15.9. Hedonistic Neurons -- 15.10. Collective Reinforcement Learning -- 15.11. Model-based Methods in the Brain -- 15.12. Addiction -- 15.13. Summary -- 16. Applications and Case Studies -- 16.1. TD-Gammon -- 16.2. Samuel's Checkers Player -- 16.3. Watson's Daily-Double Wagering -- 16.4. Optimizing Memory Control -- 16.5. Human-level Video Game Play -- 16.6. Mastering the Game of Go -- 16.6.1. AlphaGo -- 16.6.2. AlphaGo Zero -- 16.7. Personalized Web Services -- 16.8. Thermal Soaring -- 17. Frontiers -- 17.1. General Value Functions and Auxiliary Tasks -- 17.2. Temporal Abstraction via Options -- 17.3. Observations and State -- 17.4. Designing Reward Signals -- 17.5. Remaining Issues -- 17.6. Experimental Support for the Reward Prediction Error Hypothesis -- References -- Index.
주제	Reinforcement learning.
보유판 및 특별호 저록	Print version: Sutton, Richard S. Reinforcement Learning, Second Edition Cambridge : MIT Press,c2018 9780262039246
ISBN	9780262352703, 9780262039246
QR CODE

책소개

전체보기

목차

전체보기

청구기호 Browsing (유사한 주제의 도서 정보를 브라우징 할 수 있습니다.)

단단한 강화학습 : 강화학습 기본 개념을 제대로 정리한 인공지능 교과서 Sutton, Richard S Q325.6 S88 2020
(바닥부터 배우는) 강화 학습 : 강화 학습 기초에 대한 쉽고 정확한 개념 설명 노승은 Q325.6 노57 2020
Reinforcement Learning, Second Edition : An Introduction Sutton, Richard S Q325.6 .S888 2018
통계학으로 배우는 머신러닝 2/e : 스탠포드대학교 통계학과 교수에게 배우는 머신러닝의 원리 Hastie, Trevor Q325.75 H37 2021
Boosting : foundations and algorithms Schapire, Robert E Q325.75 S34 2014

홈으로

로그인 외부인 로그인

검색