# Activation Functions

The activation ops provide different types of onolinearities for use in neural networks. These include:

• smooth nonlinearities (sigmoid, tanh, elu, softplus, and softsign)
• continous but not everywhere differentiable functions (relu, relu6, and relu_x) and
• random regularization (dropout)

All activation ops apply componentwise, and produce a tensor of the same shape as the input tensor.

## tf.nn.relu(features, name=None)

Computes rectified linear: max(features, 0).

## tf.nn.relu6(features, name=None)

Computes Rectified Linear 6: min(max(features, 0), 6).

## tf.nn.elu(features, name=None)

Compute exponential linear: exp(feature) - 1 if < 0, features otherwise.

See Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

## tf.nn.softplus(features, name=None)

Compute softplus: log(exp(features) + 1).

## tf.nn.softsign(features, name=None)

Compute softsign: features / (abs(features) + 1).

## tf.nn.dropout(x, keep_prob, noise_shape=None, seed=None, name=None)

Computes dropout.

With probability keep_prob, outputs the input element scaled up by 1 / keep_prob, otherwise outputs 0. The scaling is so that the expected sum is unchanged.

By default, each element is kept or dropped independently. If noise_shape is specified, it must be broadcastable to the shape of $x$, and only dimensions with noise_shape[i] will make independent decisions.

## tf.nn.bias_add(value, bias, data_format=None, name=None)

Add bias to value.
This is (mostly) a special case of tf.add where bias is restricted to 1-D. Broadcasting is supported, so value may have any number of dimensions.

## tf.sigmoid(x, name=None)

Computes sigmoid of x element-wise. Specifically, y = 1 / (1 + exp(-x)).

## tf.tanh(x, name=None)

Computes hyperbolic tangent of x element-wise.

# Reinforcement Learning

All these examples can be unified under a general formulation: performing an action in a scenario can yield a reward. A more technical term for scenario is a state. And we call the collection of all possible states a state-space. Performing of an action causes the state to change. But the question is, what series of actions yields the highest cumulative rewards?

## Real-world examples

Here are some examples to open your eyes to some and successful uses of RL by Google:

• Game playing
• input:
• More game playing
• Robotics and control:

## Formal notions

It's not supervised learning, because the training data comes from the algorithm deciding between exploration and exploitation. And it's not unsupervised because the algorithm receives feedback from the environment. As long as you're in a situation where performing an action in a state produces a reward, you can use reinforcement learning to discover the best sequence of actions to take.

You may notice that reinforcement learning lingo involves anthropomorphizing the algorithm into taking "actions" in "situations" to "receive rewards." In fact, the algorithm is often referred to as an "agent" that "acts with" the environment. It should't be a surprise that much of reinforcement learning theory is applied in robotics.

A robot performs actions to change between different states. But how does it decide which action to take? The next section introduces a new concept, called the policy, the answer this question.