Module pacai.student.valueIterationAgent
Expand source code
from pacai.agents.learning.value import ValueEstimationAgent
class ValueIterationAgent(ValueEstimationAgent):
"""
A value iteration agent.
Make sure to read `pacai.agents.learning` before working on this class.
A `ValueIterationAgent` takes a `pacai.core.mdp.MarkovDecisionProcess` on initialization,
and runs value iteration for a given number of iterations using the supplied discount factor.
Some useful mdp methods you will use:
`pacai.core.mdp.MarkovDecisionProcess.getStates`,
`pacai.core.mdp.MarkovDecisionProcess.getPossibleActions`,
`pacai.core.mdp.MarkovDecisionProcess.getTransitionStatesAndProbs`,
`pacai.core.mdp.MarkovDecisionProcess.getReward`.
Additional methods to implement:
`pacai.agents.learning.value.ValueEstimationAgent.getQValue`:
The q-value of the state action pair (after the indicated number of value iteration passes).
Note that value iteration does not necessarily create this quantity,
and you may have to derive it on the fly.
`pacai.agents.learning.value.ValueEstimationAgent.getPolicy`:
The policy is the best action in the given state
according to the values computed by value iteration.
You may break ties any way you see fit.
Note that if there are no legal actions, which is the case at the terminal state,
you should return None.
"""
def __init__(self, index, mdp, discountRate = 0.9, iters = 100, **kwargs):
super().__init__(index, **kwargs)
self.mdp = mdp
self.discountRate = discountRate
self.iters = iters
self.values = {} # A dictionary which holds the q-values for each state.
# Compute the values here.
raise NotImplementedError()
def getValue(self, state):
"""
Return the value of the state (computed in __init__).
"""
return self.values[state]
def getAction(self, state):
"""
Returns the policy at the state (no exploration).
"""
return self.getPolicy(state)
Classes
class ValueIterationAgent (index, mdp, discountRate=0.9, iters=100, **kwargs)
-
A value iteration agent.
Make sure to read
pacai.agents.learning
before working on this class.A
ValueIterationAgent
takes aMarkovDecisionProcess
on initialization, and runs value iteration for a given number of iterations using the supplied discount factor.Some useful mdp methods you will use:
MarkovDecisionProcess.getStates()
,MarkovDecisionProcess.getPossibleActions()
,MarkovDecisionProcess.getTransitionStatesAndProbs()
,MarkovDecisionProcess.getReward()
.Additional methods to implement:
ValueEstimationAgent.getQValue()
: The q-value of the state action pair (after the indicated number of value iteration passes). Note that value iteration does not necessarily create this quantity, and you may have to derive it on the fly.ValueEstimationAgent.getPolicy()
: The policy is the best action in the given state according to the values computed by value iteration. You may break ties any way you see fit. Note that if there are no legal actions, which is the case at the terminal state, you should return None.Args
alpha
- The learning rate.
epsilon
- The exploration rate.
gamma
- The discount factor.
numTraining
- The number of training episodes.
Expand source code
class ValueIterationAgent(ValueEstimationAgent): """ A value iteration agent. Make sure to read `pacai.agents.learning` before working on this class. A `ValueIterationAgent` takes a `pacai.core.mdp.MarkovDecisionProcess` on initialization, and runs value iteration for a given number of iterations using the supplied discount factor. Some useful mdp methods you will use: `pacai.core.mdp.MarkovDecisionProcess.getStates`, `pacai.core.mdp.MarkovDecisionProcess.getPossibleActions`, `pacai.core.mdp.MarkovDecisionProcess.getTransitionStatesAndProbs`, `pacai.core.mdp.MarkovDecisionProcess.getReward`. Additional methods to implement: `pacai.agents.learning.value.ValueEstimationAgent.getQValue`: The q-value of the state action pair (after the indicated number of value iteration passes). Note that value iteration does not necessarily create this quantity, and you may have to derive it on the fly. `pacai.agents.learning.value.ValueEstimationAgent.getPolicy`: The policy is the best action in the given state according to the values computed by value iteration. You may break ties any way you see fit. Note that if there are no legal actions, which is the case at the terminal state, you should return None. """ def __init__(self, index, mdp, discountRate = 0.9, iters = 100, **kwargs): super().__init__(index, **kwargs) self.mdp = mdp self.discountRate = discountRate self.iters = iters self.values = {} # A dictionary which holds the q-values for each state. # Compute the values here. raise NotImplementedError() def getValue(self, state): """ Return the value of the state (computed in __init__). """ return self.values[state] def getAction(self, state): """ Returns the policy at the state (no exploration). """ return self.getPolicy(state)
Ancestors
- ValueEstimationAgent
- BaseAgent
- abc.ABC
Static methods
def loadAgent(name, index, args={})
-
Inherited from:
ValueEstimationAgent
.loadAgent
Load an agent with the given class name. The name can be fully qualified or just the bare class name. If the bare name is given, the class should …
Methods
def final(self, state)
-
Inherited from:
ValueEstimationAgent
.final
Inform the agent about the result of a game.
def getAction(self, state)
-
Returns the policy at the state (no exploration).
Expand source code
def getAction(self, state): """ Returns the policy at the state (no exploration). """ return self.getPolicy(state)
def getPolicy(self, state)
-
Inherited from:
ValueEstimationAgent
.getPolicy
What is the best action to take in the state? Note that because we might want to explore, this might not coincide with …
def getQValue(self, state, action)
-
Inherited from:
ValueEstimationAgent
.getQValue
Should return Q(state,action).
def getValue(self, state)
-
Return the value of the state (computed in init).
Expand source code
def getValue(self, state): """ Return the value of the state (computed in __init__). """ return self.values[state]
def observationFunction(self, state)
-
Inherited from:
ValueEstimationAgent
.observationFunction
Make an observation on the state of the game. Called once for each round of the game.
def registerInitialState(self, state)
-
Inherited from:
ValueEstimationAgent
.registerInitialState
Inspect the starting state.