Deep reinforcement learning-based industrial robotic manipulation
Abstract
Pick and place robotic systems can be found in all major industries in order to increase
throughput and efficiency. But most of the pick-and-place applications in the industry
today have been designed through hard-coded, static programming approaches. These
approaches completely lack the element of learning. This requires, in case of any
modification in the task or environment, reprogramming from scratch is required every
time. This thesis targets this particular area and introduces the learning ability in the
robotic pick-and-place operation which makes the operation more efficient, and
increases its strength of adaptability. We divide this thesis into three parts. In the first
part, we focus on learning and carrying out pick and place operations on various objects
moving on a conveyor belt in a non-visual environment i.e., without using vision
sensors, using proximity sensors. The problem under consideration is formulated as a
Markov Decision Process (MDP). and solved by using Reinforcement Learning (RL).
We train and test both model-free off-policy and on-policy RL algorithms in this
approach and perform their comparative analysis. In the second part, we develop a self learning deep reinforcement learning-based (DRL) framework for industrial pick-and place of regular and irregular-shaped objects tasks in a cluttered environment. We
design the MDP and solve it by deploying the model-free off-policy Q-learning
algorithm. We use the pixelwise-parameterization technique in the fully connected
network (FCN) being used as the Q-function approximator. In the third and main part,
we extend this vision-based self-supervised DRL-based framework to enable the robotic
arm to learn and perform prehensile (grasping) and non-prehensile (non-grasping,
sliding, pushing) manipulations together in sequential manner to improve the efficiency
and throughput of the pick-and-place task. We design the MDP and solve it by using
the Deep Q-networks. We consider three robotic manipulations from both prehensile and non-prehensile category and design large network of three FCNs without creating
any bottleneck situation. The pixel-wise parameterization technique is utilized for Q function approximation. We also present the performance comparisons among various
variants of the framework and very promising test results at varying clutter densities
across a range of complex scenario test cases.
Collections
The following license files are associated with this item: