MindSpore Reinforcement API
Components for MindSpore Reinforcement Learning Framework.
mindspore_rl.agent
Components for agent, actor, learner, trainer.
- class mindspore_rl.agent.Actor[source]
Base class for all actors.
Examples
>>> from mindspore_rl.agent.actor import Actor >>> from mindspore_rl.network import FullyConnectedNet >>> from mindspore_rl.environment import GymEnvironment >>> class MyActor(Actor): ... def __init__(self): ... super(MyActor, self).__init__() ... self.argmax = P.Argmax() ... self.actor_net = FullyConnectedNet(4, 10, 2) ... self.env = GymEnvironment({'name': 'CartPole-v0'}) >>> my_actor = MyActor() >>> print(my_actor) MyActor< (actor_net): FullyConnectedNet< (linear1): Dense<input_channels=4, output_channels=10, has_bias=True> (linear2): Dense<input_channels=10, output_channels=2, has_bias=True> (relu): ReLU<> > (environment): GymEnvironment<>
- act(state)[source]
The interface of the act function. User will need to overload this function according to the algorithm. But argument of this function should be the state output from the environment.
- Parameters
state (Tensor) – the output state from the environment.
- Returns
done (Tensor), whether the simulation is finish or not.
reward (Tensor), simulation reward.
state (Tensor), simulation state.
- act_init(state)[source]
The interface of the act initialization function. User will need to overload this function according to the algorithm. But argument of this function should be the state output from the environment.
- Parameters
state (Tensor) – the output state from the environment.
- Returns
done (Tensor), whether the simulation is finish or not.
reward (Tensor), simulation reward.
state (Tensor), simulation state.
- env_setter(env)[source]
Set the environment by the input env for the actor. The env is created by class GymEnvironment or other environment class.
- Parameters
env (object) – the input environment.
- Returns
environment.
- evaluate(state)[source]
The interface of the act evaluation function. User will need to overload this function according to the algorithm. But argument of this function should be the state output from the environment.
- Parameters
state (Tensor) – the output state from the environment.
- Returns
done (Tensor), whether the simulation is finish or not.
reward (Tensor), simulation reward.
state (Tensor), simulation state.
- reset_collect_actor()[source]
Reset the collect actor, reset the collect actor’s environment and return the reset state and a false flag of done.
- Returns
state (Tensor), the state of the actor after reset.
Tensor, always false of done.
- class mindspore_rl.agent.Agent(num_actor, actors, learner)[source]
The base class for the Agent.
- Parameters
Examples
>>> from mindspore_rl.agent.learner import Learner >>> from mindspore_rl.agent.actor import Actor >>> from mindspore_rl.agent.agent import Agent >>> actor_num = 1 >>> actors = Actor() >>> learner = Learner() >>> agent = Agent(actor_num, actors, learner) >>> print(agent) Agent< (_actors): Actor<> (_learner): Learner<> >
- property actors
Get the instance of actors in the agent.
- Returns
actors (object), actors object created by class Actor.
- env_setter(env)[source]
Set the agent environment for actors in agent.
- Parameters
env (object) – the input environment.
- learn(samples)[source]
The learn function interface.
- Parameters
samples (Tensor) – the sample from replay buffer.
- property learner
Get the instance of learner in the agent.
- Returns
learner (object), learner object created by class Learner.
- property num_actor
Get the number of the actors of the agent.
- Returns
num_actor (int), actor numbers.
- class mindspore_rl.agent.Learner[source]
The base class of the learner.
Examples
>>> from mindspore_rl.agent.learner import Learner >>> from mindspore_rl.network import FullyConnectedNet >>> class MyLearner(Learner): ... def init(self): ... super(MyLearner, self).init() ... self.target_network = FullyConnectedNet(4, 10, 2) >>> my_learner = MyLearner() >>> print(my_learner) MyLearner< (target_network): FullyConnectedNet< (linear1): Dense<input_channels=4, output_channels=10, has_bias=True> (linear2): Dense<input_channels=10, output_channels=2, has_bias=True> (relu): ReLU<> >
- learn(samples)[source]
The interface for the learn function. The behavior of the learn function depend on the user’s implementation. Usually, it takes the samples form replay buffer or other Tensors, and calculates the loss for updating the networks.
- Parameters
samples (Tensor) – Sampling from the buffer.
- Returns
success, If the training success or not.
- class mindspore_rl.agent.Trainer(msrl)[source]
The trainer base class.
Note
Reference to dqn_trainer.py.
- Parameters
msrl (object) – the function handler class.
mindspore_rl.core
Helper components used to implement RL algorithms.
- class mindspore_rl.core.MSRL(config)[source]
The MSRL class provides the function handlers and APIs for reinforcement learning algorithm development.
It exposes the following function handler to the user. The input and output of these function handlers are identical to the user defined functions.
agent_act_init agent_act_collect agent_act_eval agent_act agent_reset sample_buffer agent_learn
- Parameters
config (dict) –
provides the algorithm configuration.
Top level: defines the algorithm components.
key: ‘actor’, value: the actor configuration (dict).
key: ‘learner’, value: the learner configuration (dict).
key: ‘policy_and_network’, value: the policy and networks used by actors and learners (dict).
key: ‘env’, value: the environment configuration (dict).
Second level: the configuration of each algorithm component.
key: ‘number’, value: the number of actors/learner (int).
key: ‘type’, value: the type of the actor/learner/policy_and_network/environment (class name).
key: ‘params’, value: the parameters of actor/learner/policy_and_network/environment (dict).
key: ‘policies’, value: the list of policies used by the actor/learner (list).
key: ‘networks’, value: the list of networks used by the actor/learner (list).
key: ‘environment’, value: True if the component needs to interact with the environment, False otherwise (Bool).
key: ‘buffer’, value: the buffer configuration (dict).
- get_replay_buffer()[source]
It will return the instance of replay buffer.
- Returns
Buffers (object), The instance of relay buffer. If the buffer is None, the return value will be None.
- get_replay_buffer_elements(transpose=False, shape=None)[source]
It will return all the elements in the replay buffer. :param transpose: whether the output element needs to be transpose, :type transpose: boolean :param if transpose is true: False :param shape will also need to be filled. Default: False :param shape: the shape used in transpose. Default: None :type shape: Tuple[int]
- Returns
elements (List[Tensor]), A set of tensor contains all the elements in the replay buffer
- class mindspore_rl.core.Session(config)[source]
The Session is a class for running MindSpore RL algorithms.
- Parameters
config (dict) – the algorithm configuration or the deployment configuration of the algorithm. For more details of configuration of algorithm, please have a look at https://www.mindspore.cn/reinforcement/docs/zh-CN/r1.5/index.html
mindspore_rl.environment
Component used to implement custom environments.
- class mindspore_rl.environment.GymEnvironment(params)[source]
The GymEnvironment class provides the functions to interact with different environments.
- Parameters
params (dict) – A dictionary contains all the parameters which are used to create the instance of GymEnvironment, such as the name of environment. Since this environment is based on Gym, the name of environment should match with the name in Gym.
- Supported Platforms:
Ascend
GPU
CPU
Examples
>>> env_params = {'name': 'CartPole-v0'} >>> environment = GymEnvironment(env_params) >>> print(environment) GymEnvironment<>
- property action_space_dim
Get the action space dim of the environment.
- Returns
A tuple which states for the space dimension of action
- clone()[source]
Make a copy of the environment.
- Returns
env (object), a copy of the original environment object.
- reset()[source]
Reset the environment to the initial state. It is always used at the beginning of each episode. It will return the value of initial state.
- Returns
A tensor which states for the initial state of environment.
- property state_space_dim
Get the state space dim of the environment.
- Returns
A tuple which states for the space dimension of state
- step(action)[source]
Execute the environment step, which means that interact with environment once.
- Parameters
action (Tensor) – A tensor that contains the action information.
- Returns
state (Tensor), the environment state after performing the action.
reward (Tensor), the reward after performing the action.
done (mindspore.bool_), whether the simulation finishes or not.
- class mindspore_rl.environment.GymMultiEnvironment(params)[source]
The GymMultiEnvironment class provides the functions to interact with different environments. It is the multi-environment version of GymEnvironment.
- Parameters
params (dict) – A dictionary contains all the parameters which are used to create the instance of GymEnvironment, such as name of environment, number of environment. Since this environment is based on Gym, the name of environment should match with the name in Gym.
- Supported Platforms:
Ascend
GPU
CPU
Examples
>>> env_params = {'name': 'CartPole-v0', 'env_nums': 10} >>> environment = GymMultiEnvironment(env_params) >>> print(environment) GymMultiEnvironment<>
- property action_space_dim
Get the action space dim of the environment.
- Returns
A tuple which states for the space dimension of action
- clone()[source]
Make a copy of the environment.
- Returns
env (object). A copy of the original environment object.
- reset()[source]
Reset the environment to the initial state. It is always used at the beginning of each episode. It will return the value of initial state of each environment.
- Returns
A list of tensor which states for all the initial states of each environment.
- property state_space_dim
Get the state space dim of the environment.
- Returns
A tuple which states for the space dimension of state
- step(action)[source]
Execute the environment step, which means that interact with environment once.
- Parameters
action (Tensor) – A tensor that contains the action information.
- Returns
state (Tensor), a list of environment state after performing the action.
reward (Tensor), a list of reward after performing the action.
done (Tensor), whether the simulations of each environment finishes or not
mindspore_rl.network
Network component used to implement polies.
mindspore_rl.policy
Policies used in RL algorithms.
- class mindspore_rl.policy.EpsilonGreedyPolicy(input_network, size, epsi_high, epsi_low, decay, action_space_dim)[source]
Produces an epsilon-greedy sample action base on the given policy.
- Parameters
input_network (Cell) – A network returns policy action.
size (int) – Shape of epsilon.
epsi_high (float) – A high epsilon for exploration betweens [0, 1].
epsi_low (float) – A low epsilon for exploration betweens [0, epsi_high].
decay (float) – A decay factor applied to epsilon.
action_space_dim (int) – Dimensions of the action space.
Examples
>>> state_dim, hidden_dim, action_dim = (4, 10, 2) >>> input_net = FullyConnectedNet(state_dim, hidden_dim, action_dim) >>> policy = EpsilonGreedyPolicy(input_net, 1, 0.1, 0.1, 100, action_dim) >>> state = Tensor(np.ones([1, state_dim]).astype(np.float32)) >>> step = Tensor(np.array([10,]).astype(np.float32)) >>> output = policy(state, step) >>> print(output.shape) (1,)
- class mindspore_rl.policy.GreedyPolicy(input_network)[source]
Produces a greedy action base on the given policy.
- Parameters
input_network (Cell) – network used to generate action probs by input state.
Examples
>>> state_dim, hidden_dim, action_dim = 4, 10, 2 >>> input_net = FullyConnectedNet(state_dim, hidden_dim, action_dim) >>> policy = GreedyPolicy(input_net) >>> state = Tensor(np.ones([2, 4]).astype(np.float32)) >>> output = policy(state) >>> print(output.shape) (2,)
- class mindspore_rl.policy.Policy[source]
The virtual base class for the policy. This class should be overridden before calling in the model.
- class mindspore_rl.policy.RandomPolicy(action_space_dim)[source]
Produces a random action betweens [0, acton_space_dim).
- Parameters
acton_space_dim (int) – dimension of the action space.
Examples
>>> action_space_dim = 2 >>> policy = RandomPolicy(action_space_dim) >>> output = policy() >>> print(output.shape) (1,)