Module pacai.student.valueIterationAgent

Expand source code
from pacai.agents.learning.value import ValueEstimationAgent

class ValueIterationAgent(ValueEstimationAgent):
    """
    A value iteration agent.

    Make sure to read `pacai.agents.learning` before working on this class.

    A `ValueIterationAgent` takes a `pacai.core.mdp.MarkovDecisionProcess` on initialization,
    and runs value iteration for a given number of iterations using the supplied discount factor.

    Some useful mdp methods you will use:
    `pacai.core.mdp.MarkovDecisionProcess.getStates`,
    `pacai.core.mdp.MarkovDecisionProcess.getPossibleActions`,
    `pacai.core.mdp.MarkovDecisionProcess.getTransitionStatesAndProbs`,
    `pacai.core.mdp.MarkovDecisionProcess.getReward`.

    Additional methods to implement:

    `pacai.agents.learning.value.ValueEstimationAgent.getQValue`:
    The q-value of the state action pair (after the indicated number of value iteration passes).
    Note that value iteration does not necessarily create this quantity,
    and you may have to derive it on the fly.

    `pacai.agents.learning.value.ValueEstimationAgent.getPolicy`:
    The policy is the best action in the given state
    according to the values computed by value iteration.
    You may break ties any way you see fit.
    Note that if there are no legal actions, which is the case at the terminal state,
    you should return None.
    """

    def __init__(self, index, mdp, discountRate = 0.9, iters = 100, **kwargs):
        super().__init__(index, **kwargs)

        self.mdp = mdp
        self.discountRate = discountRate
        self.iters = iters
        self.values = {}  # A dictionary which holds the q-values for each state.

        # Compute the values here.
        raise NotImplementedError()

    def getValue(self, state):
        """
        Return the value of the state (computed in __init__).
        """

        return self.values[state]

    def getAction(self, state):
        """
        Returns the policy at the state (no exploration).
        """

        return self.getPolicy(state)

Classes

class ValueIterationAgent (index, mdp, discountRate=0.9, iters=100, **kwargs)

A value iteration agent.

Make sure to read pacai.agents.learning before working on this class.

A ValueIterationAgent takes a MarkovDecisionProcess on initialization, and runs value iteration for a given number of iterations using the supplied discount factor.

Some useful mdp methods you will use: MarkovDecisionProcess.getStates(), MarkovDecisionProcess.getPossibleActions(), MarkovDecisionProcess.getTransitionStatesAndProbs(), MarkovDecisionProcess.getReward().

Additional methods to implement:

ValueEstimationAgent.getQValue(): The q-value of the state action pair (after the indicated number of value iteration passes). Note that value iteration does not necessarily create this quantity, and you may have to derive it on the fly.

ValueEstimationAgent.getPolicy(): The policy is the best action in the given state according to the values computed by value iteration. You may break ties any way you see fit. Note that if there are no legal actions, which is the case at the terminal state, you should return None.

Args

alpha
The learning rate.
epsilon
The exploration rate.
gamma
The discount factor.
numTraining
The number of training episodes.
Expand source code
class ValueIterationAgent(ValueEstimationAgent):
    """
    A value iteration agent.

    Make sure to read `pacai.agents.learning` before working on this class.

    A `ValueIterationAgent` takes a `pacai.core.mdp.MarkovDecisionProcess` on initialization,
    and runs value iteration for a given number of iterations using the supplied discount factor.

    Some useful mdp methods you will use:
    `pacai.core.mdp.MarkovDecisionProcess.getStates`,
    `pacai.core.mdp.MarkovDecisionProcess.getPossibleActions`,
    `pacai.core.mdp.MarkovDecisionProcess.getTransitionStatesAndProbs`,
    `pacai.core.mdp.MarkovDecisionProcess.getReward`.

    Additional methods to implement:

    `pacai.agents.learning.value.ValueEstimationAgent.getQValue`:
    The q-value of the state action pair (after the indicated number of value iteration passes).
    Note that value iteration does not necessarily create this quantity,
    and you may have to derive it on the fly.

    `pacai.agents.learning.value.ValueEstimationAgent.getPolicy`:
    The policy is the best action in the given state
    according to the values computed by value iteration.
    You may break ties any way you see fit.
    Note that if there are no legal actions, which is the case at the terminal state,
    you should return None.
    """

    def __init__(self, index, mdp, discountRate = 0.9, iters = 100, **kwargs):
        super().__init__(index, **kwargs)

        self.mdp = mdp
        self.discountRate = discountRate
        self.iters = iters
        self.values = {}  # A dictionary which holds the q-values for each state.

        # Compute the values here.
        raise NotImplementedError()

    def getValue(self, state):
        """
        Return the value of the state (computed in __init__).
        """

        return self.values[state]

    def getAction(self, state):
        """
        Returns the policy at the state (no exploration).
        """

        return self.getPolicy(state)

Ancestors

Static methods

def loadAgent(name, index, args={})

Inherited from: ValueEstimationAgent.loadAgent

Load an agent with the given class name. The name can be fully qualified or just the bare class name. If the bare name is given, the class should …

Methods

def final(self, state)

Inherited from: ValueEstimationAgent.final

Inform the agent about the result of a game.

def getAction(self, state)

Returns the policy at the state (no exploration).

Expand source code
def getAction(self, state):
    """
    Returns the policy at the state (no exploration).
    """

    return self.getPolicy(state)
def getPolicy(self, state)

Inherited from: ValueEstimationAgent.getPolicy

What is the best action to take in the state? Note that because we might want to explore, this might not coincide with …

def getQValue(self, state, action)

Inherited from: ValueEstimationAgent.getQValue

Should return Q(state,action).

def getValue(self, state)

Return the value of the state (computed in init).

Expand source code
def getValue(self, state):
    """
    Return the value of the state (computed in __init__).
    """

    return self.values[state]
def observationFunction(self, state)

Inherited from: ValueEstimationAgent.observationFunction

Make an observation on the state of the game. Called once for each round of the game.

def registerInitialState(self, state)

Inherited from: ValueEstimationAgent.registerInitialState

Inspect the starting state.