Lecture 2: Optimal Control, Trajectory Optimization, and Planning

수업/Deep Reinforcement Learning Spring 2017 2017. 7. 25. 21:47

□ Trajectory optimization

○ Shooting method: optimize over actions only

- unconstrained optimization 문제를 다루는데 simple함

- constraints는 줄어드나, action에 대한 영향이 커짐

○ Collocation method: optimize over actions and states, with constraints

- state에 따라 action이 달라짐

○ Linear case : LQR

- lnear : ,

- quadratic :

- base case: solve for only

* Q function 정의: c function에 const 붙은 형태(마지막 step이라)

* Q function을 에 대해 편미분하여 값 찾고 Term 정리

* 찾은 값을 V function에 대입하여 quadratic 형태로 정리

- solve for

* Q function 정의: c function + const + 다음 state 의 V function 형태

* 이므로 대신에 대입

* Q function을 quadratic형태로 정리 후 에 대해 편미분

* 이 과정을 t=1일 때 까지 반복

○ Stochastic dynamics

- Gaussian 가정일 경우, algorithm에 변화 없음(증명 연습 필요!)

○ Nonlinear case : DDP/iterative LQR

- linear quadratic system으로 approximate해서 문제 해결

- 자세한건 강의 자료 참고

- 을 풀기 위한 Newton method의 approximation이라고 생각하면 됨.

- Newton method로 하려면, second order dynamics approximation이 필요함

- forward에서는 second order approximation이 아닌 실제 dynamics 적용

○ Additional reading

○ Discrete case : MCTS

- intuition : choose nodes with best reward, but also prefer rarely visited nodes

- start state에서부터 action-state tree를 그려나감.

- leaf node에서는 임의의 policy를 수행해서 value 값 추정

- Additional reading

* Browne, et al., (2012) A survey of Monte Carlo Tree Search Methods

- case study : imitation learning from MCTS

* Xiaoxiao et al., Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, NIPS 2014

* Dagger에서 expert policy 대신 MCTS 적용 가능

○ What's wrong with known dynamics?

- 실제 dynamics를 알기 어려운 경우들이 있음

'수업 > Deep Reinforcement Learning Spring 2017' 카테고리의 다른 글

Lecture 1: Supervised Learning of Behaviors: Deep Learning, Dynamic Systems, and Behavior Cloning (0)	2017.07.07
Lecture 0: Introduction (0)	2017.07.02

Posted by GOnNO

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

음?

Lecture 2: Optimal Control, Trajectory Optimization, and Planning

'수업 > Deep Reinforcement Learning Spring 2017' 카테고리의 다른 글

카테고리

태그목록

최근에 올라온 글

최근에 달린 댓글

글 보관함

달력

링크

티스토리툴바


	by GOnNO