Category : rllib

I’m trying to follow this tutorial to understand the basics of RLlib. I’m using pipenv on OS X to setup my environment with the following Pipfile: [[source]] name = "pypi" url = "https://pypi.python.org/simple" verify_ssl = true [dev-packages] [packages] ray = {extras = ["default", "rllib"], version = "*"} torch = "*" jupyterlab = "*" ipywidgets = ..

Read more

In ray rllib, I usually apply ray.tune.run a ppo trainning like this: ray.init(log_to_driver=False, num_cpus=3, local_mode=args.local_mode, num_gpus=1) env_config={"code":"codeA"} config={ env_config={ "code":"codeA"}, "parm":"paramA"} stop = { "training_iteration": args.stop_iters, "timesteps_total": args.stop_timesteps, "episode_reward_mean": args.stop_reward, } results = tune.run(trainer, config=config1, verbose=0, stop=stop1, checkpoint_at_end=True, metric=’episode_reward_mean’, mode="max", checkpoint_freq=1 ) checkpoints = results.get_trial_checkpoints_paths( trial=results.get_best_trial( metric=’episode_reward_mean’, mode="max"),metric=’episode_reward_mean’) checkpoint_path = checkpoints[0][0] metric = checkpoints[0][1] At ..

Read more

I am implementing a double player(Alice and Bob) rock paper and scissor environment using MultiAgentEnv class. Observation space contains both players’ utilities(reward), and both players’ action. Action space contains 0(rock),1(paper),2(scissor) Reward for each player: winner +20, loser -20 If it’s a tie, one case(a) is all players will have +10, the other case(b) is all ..

Read more

I have an observations space for 30 elements array and I have used dtype as np.float32 in all of them. the Error- "ValueError: (‘Observation ({}) outside given space ({})!’, array(4.2), Box(17.799999237060547, -19.700000762939453, (30,), float32)) "` #high limits for observations high = np.array([min(self.relativevalue)] * 30,dtype=np.float32) #low limits for observations low = np.array([max(self.relativevalue)] * 30,dtype=np.float32) #observation space ..

Read more

I’m building a speaker listener training environment with rllib from the article: https://github.com/openai/multiagent-particle-envs Using pettingzoo and supersuit: https://github.com/PettingZoo-Team/SuperSuit https://github.com/PettingZoo-Team/SuperSuit I’ve encountered the following error: NotImplementedError: Cannot convert a symbolic Tensor (default_policy/cond/strided_slice:0) to a numpy array When trying to run my code, but as I lack experience with these packages I do not understand if the ..

Read more

I have some offline experiences: (s, a, r, s’) that were generated with a heuristic. And I want to use these when training SAC agents. Using the example saving_experiences to prepare my data gives me an error when using with SAC. Here is a colab where the issue is exposed for the pendulum-v0 environment. What ..

Read more

I am making a comparison between both kind of algorithms against the CartPole environment. Having the imports as: import ray from ray import tune from ray.rllib import agents ray.init() # Skip or set to ignore if already called Running this works perfectly: experiment = tune.run( agents.ppo.PPOTrainer, config={ "env": "CartPole-v1", "num_gpus": 1, "num_workers": 0, "num_envs_per_worker": 50, ..

Read more

I’ve been trying to set up a custom LSTM model with RLLib, but for some reason I’m getting an incompatible shapes error within my LSTM layer when trying to train. In particular, this error seems to be related to batch size, as the dimensions listed for incompatible shapes change linearly with my batch sizes values. ..

Read more