NMLO 2020 RL Competition

Link to Download Competition Files

Description

For this final contest, Reinforcement Learning, you will be tasked with training a RL agent to compete in an RL environment with the goal of achieving the highest average reward

This special contest will differ slightly from the previous contests, as Kaggle InClass doesn't currently support evaluating custom RL environments. Instead, you will submit your code and trained and saved model through email.

The subject of the email should be NMLO_RL_Contest_Lastname_Firstname or NMLO_RL_Contest_Lastname_Firstname_PartnerLastname_PartnerFirstname if you are working in a team. If your files are too large, please upload the files to your Google Drive and send us a link to the files so we can download them. Also, please only send one email after you are sure that you are done working on this competition. This will make our lives much easier. Our email is tjmachinelearning@gmail.com. Thanks!

We will score you on your latest submission at the end of the competition.

Environment

This custom OpenAI Gym environment, ReachAndAvoid-v0, requires the agent to control a two-jointed robotic arm to reach towards a target in 2D space, while avoiding a specific area in the same space. The action space consists of two inputs, controlling angular velocities for each of the two joints. These angular velocities are bounded at a max speed of 0.5 rad/sec. The inputs are bounded from -1 to +1, which the environment scales to the correct angular velocity in the backend.

The observation space of this environment has a dimension of 6, as [rotation of arm base "shoulder" joint, rotation of arm second "elbow" joint, x-position of the goal, y-position of the goal, x-position of the center of the avoid-area, y-position of the avoid-area] (radians from 0 to 2pi, meters). The rotation of the base joint is 0 at East, and pi at West. The rotation of the second joint is relative to the base joint, so the rotation of the base joint does not affect the rotation of the second joint.

The 2D environment space where the robotic arm operates is bounded as a circle with radius 20 meters, and each link of arm is 10 meters long. The base joint of the arm is centered at the center of this circle boundary, at coordinate (0m, 0m). The radius of the goal-area is 1m, and the radius of the avoid-area is 8m.

The reward function is for the agent is dictated as followed. If the arm's end-effector (think: hand at the end of the arm) reaches the goal, the reward will be 100. If the end-effector reaches the avoid-area, the reward will be -100. At any timestep where the end-effector is not touching either and in transition, the agent will receive a reward of -(distance from end-effector to goal).

Each episode has a max length of 50 seconds, and the environment has a timestep of 0.1 seconds per action. The episode will end if the end-effector touches either the goal or the avoid-area.

Your score on this contest will be evaluated by the performance of the agent evaluated by contest judges over 100 episodes. The average reward over these episodes, scored by the environment and normalized, will serve as your competition score. Therefore, more points can be earned by decreasing both the time that the end-effector takes to reach the goal and always avoiding the avoid-area.

Submission Requirements

Code must be in Python3

pip install requirements: Gym==0.10.11, Keras==2.4.3, Tensorflow==2.2.0

This custom environment can be installed from the link above. To install, unzip the folder and navigate into the parent folder for all contest data, and then navigate to "gym-reach-and-avoid". Afterwards, install using

pip install -e ./

Please submit both the code used for training, and the final saved model (note: saved model, not just saved weights) to this Kaggle Competition submission before the deadline. The model must be a saved keras model. See below for instructions on how to save a keras model.

External Resources

Saving a complete keras model (structure, weights, etc)

How to use OpenAI Gym

Sample code can also be found within the contest data download.