Reference for Reinforcement Learning


RL for game playing

Newest (in recent 2 years):
  1. Heinrich, Johannes, and David Silver. “Deep Reinforcement Learning from Self-Play in Imperfect-Information Games” (2016).
  2. Finn, Chelsea, Tianhe Yu, Justin Fu, Pieter Abbeel, and Sergey Levine. “Generalizing Skills with Semi-Supervised Reinforcement Learning.” arXiv (2016)
  1. Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. “Playing Atari with Deep Reinforcement Learning” (2013).
  2. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., Graves, A., Riedmiller, M., Fidjeland, A., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015) Human-level control through deep reinforcement learning, Cah Rev The, nature 518, 529–533.
  3. Nair, Arun, Praveen Srinivasan, Sam Blackwell, Cagdas Alcicek, Rory Fearon, Alessandro Maria, Vedavyas Panneershelvam, et al. “Massively Parallel Methods for Deep Reinforcement Learning.” arXiv(2015).

Autonomous Driving

  1. Fridman, Lex, and Bryan Reimer. “Semi-Automated Annotation of Discrete States in Large Video Datasets.” arXiv (2016).

Software Frameworks

  1. Neubig, Graham, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, et al. “DyNet: The Dynamic Neural Network Toolkit” (2017).


  1. Sze, Vivienne, Yu-Hsin Chen, Joel Emer, Amr Suleiman, and Zhengdong Zhang. “Hardware for Machine Learning: Challenges and Opportunities.” arXiv (2016).


  1. Dr. John Schulman

GitHub Projects

  1. gym

Open courses recommendation

Reinforcement Learning

  1. [Berkley] CS 294: Deep Reinforcement Learning, Spring 2017
  2. [MIT] 6.S094: Deep Learning for Self-Driving Cars, Spring 2017
  3. [Stanford] CS231n: Convolutional Neural Networks for Visual Recognition, Winter 2016.


  1. [Udacity] SIRAJ RAVAL’S DEEP LEARNING – Nanodegree fundation program
  2. [Udacity] Self-Driving Car Engineer Nanodegree


  1. [Youtube] Bay area deep learning school.
  2. McGill Artificial Intelligence Society

[book note] Manning MEAP: Streaming Data – Message Queueing Tier

This chapter covers

  • Why we need a message queuing tier
  • Understanding message durability
  • How to accommodate offline consumers
  • What are message delivery semantics
  • Choosing the right technology

3.1 Why we need a message queuing tier

With a streaming system we want the same, to decouple the components in each tier, but more importantly to decouple the tiers from each other. If you look up interprocess communication in the literature you will find various different models, for this chapter we are going to focus on the message-queuing model. By adopting this model our collection tier will be de-coupled from our analytics tier.

  • The decoupling allows our tiers to work at a higher level of abstraction, that being by pass messages and not having explicit calls to the next layer.
  • These are two very good properties to have in any system, let alone a distributed streaming one and as we will see in this and the coming chapters the decoupling of the tiers provides us with some wonderful benefits.

Read More »

[book note] Manning MEAP: JavaScript Next – Unit 2: Objects & Arrays

Unit 2 Objects & Arrays

Lesson 5: New Array Methods

Arrays are probably the most common data structure used in JavaScript. We use them to hold all kinds of data but sometimes getting the data we want into or out of the array isn’t as easy as it should be. However those tasks just got a whole lot easier with some of the new array methods that we will cover. In this lesson we will cover the following:

  • Constructing Arrays with Array.from
  • Constructing Arrays with Array.of
  • Constructing Arrays with Array.prototype.fill
  • Constructing Arrays with Array.prototype.includes
  • Constructing Arrays with Array.prototype.find

Priming Exercise: Consider this snippet of jQuery code that grabs all the DOM nodes with a specific CSS class and sets them to be red. If you were going to implement this from scratch what considerations would you have to make? For example, if you were to use document.querySelectorAll which returns a NodeList (not an Array) how would you iterate each node to update its color?

$('.danger').css('color', 'red')

Read More »

[book note] Manning MEAP: JavaScript Next – Unit 1: Variables & Strings

Unit 1: Variables and & Strings

Lesson 1. Declaring Variables with let

In the history of JavaScript, variables have always been declared using the keyword var. ES6 introduces two new ways to declare variables, with the let and const keywords. Both of these work slightly different than variables with var. There are two primary differences with let:

let variables have different scoping rules

let variables have differently when hoisted

Priming Exercise:

for (var i = 0; i < 5; i++) {
  setTimeOut(function () {
  }, 1);

for (let n = 0; n < 5; n++) {
  setTimeOut(function () {
  }, 1);

1.1 How Scope Works with let

if (true) {
  let foo = 'bar';

// An error is thrown because foo does not exist outside the block it was declared in.

This makes variables much more predictable and won’t lead to bugs introduced because the variable leaks outside of the block is used within. A block is the body of a statement or function. It is the area between the opening and closing curly braces, { & }. You can even use curly braces to create a free standing block that isn’t tied to a statement.

There is one exception to that rule though, in a for loop a variable declared with a let inside the for loop’s clause will be in the scope of the for loop’s block:

Read More »

how to do literature search

Q1: What is literature search

A: A literature search is a systematic and thorough search of all types of published literature in order to identify a breadth of good quality references relevant to a specific topic.

Q2: Why we do literature search

A: The success of your research project is dependent on a thorough review of the academic literature at the outset. It is therefore a fundamental element of the methodology of any research project. Effective literature searching is a critical skill in its own right and will prove valuable for any future information gathering activity whether in academia or not. Getting
the literature search right will save hours of time through the course of your research project and will inform and improve the quality of the research you go on to do for yourself.

Read More »

334. Increasing Triplet Subsequence

Difficulty: Medium

Given an unsorted array return whether an increasing subsequence of length 3 exists or not in the array.

Formally the function should:

Return true if there exists i, j, k

such that arr[i] < arr[j] < arr[k] given 0 ≤ i < j < k ≤ n-1 else return false.

Your algorithm should run in O(n) time complexity and O(1) space complexity.


Given [1, 2, 3, 4, 5],

return true.

Given [5, 4, 3, 2, 1],

return false.


double cursors (min + second min)

C++ version