Journal of Command and Control

fa ناوبری خودمختار ربات چرخ‌ دار با رویکرد مبتنی‌ بر یادگیری تقویتی عمیق Autonomous Navigation of Wheeled Robot using a Deep Reinforcement Learning Based Approach هوش مصنوعی Artificial Intelligence پژوهشي Research در این پژوهش به بررسی یک رویکرد مبتنی&rlm;بر یادگیری تقویتی عمیق برای ناوبری خودمختار ربات&rlm;ها &rlm;می&rlm;&rlm;پردازیم. رویکرد ما در این پژوهش، مبتنی&rlm;بر الگوریتم DDPG و یکی از نسخه&rlm;های بهبود یافته&rlm;ی آن به نام SD3 است. به&rlm;منظور استفاده از این الگوریتم برای مسئله&rlm;ی ناوبری خودمختار، اصلاحاتی بر روی الگوریتم مذکور انجام و برای کاربرد ناوبری بهینه&rlm;سازی شده است. الگوریتم اصلاح شده به علت داشتن لایه&rlm;های کانولوشنی می&rlm;&rlm;تواند با فضاهای حالت با ابعاد زیاد نیز کار کند. همچنین برای کاهش نوسان ربات در حین حرکت و نیز تشویق برای حرکت سریع&rlm;تر در محیط، استفاده از دو پارامتر پاداش و جریمه براساس سرعت خطی و سرعت زاویه&rlm;ای را پیشنهاد دادیم. و برای بهبود تعمیم پذیری الگوریتم، از الگوریتمی برای تغییر متناوب شکل و چینش موانع در محیط استفاده کردیم. همچنین برای تسریع فرایند یادگیری و بهبود عملکرد ربات، داده های ورودی را نرمال کردیم. سپس الگوریتم پیشنهادی را توسط محیط شبیه&rlm;ساز GAZEBO و سیستم عامل ROS پیاده&rlm;سازی کرده و نتایج بدست آمده را با الگوریتم اولیه&rlm;ی SD3 و الگوریتم DDPG مقایسه نمودیم. الگوریتم پیشنهادی عملکرد بهتری نسبت به این دو روش به نمایش گذاشته است.   In this research we develop a deep reinforcement learning-based method for autonomous robot navigation. Our approach in this study is based on DDPG and one of its improved versions named SD3. We did some modifications on this algorithm to make it proper for autonomous navigation problems and optimize it for this problems. The modified algorithm can work with high dimensional state spaces because of using convolutional layers. Also we propose two reward terms include linear velocity reward and angular velocity penalty to encourage robot to move faster with smoother movements. For generalizing the algorithm we used an algorithm for randomly changing shape, layout and number of obstacles in the environment. And to speed up the learning process and improving the robot operation, we normalized all input data. Finally, the proposed algorithm is implemented with ROS and Gazebo and the results show improvement versus the main SD3 and DDPG algorithms. ناوبری خودمختار, یادگیری تقویتی عمیق, DDPG, SD3 Autonomous navigation, Deep reinforcement learning, SD3, DDPG 31 45 http://ic4i-journal.ir/browse.php?a_code=A-10-421-2&slc_lang=fa&sid=1 Kourosh Dadashtabar Ahmadi کوروش داداش تبار احمدی dadashtabar@yahoo.com 10031947532846002143 10031947532846002143 Yes Malek Ashtar university of technology دانشگاه صنعتی مالک اشتر Ali Akbar Kiaei علی اکبر کیایی ali.a.kiaei@gmail.com 10031947532846002144 10031947532846002144 No Malek Ashtar university of technology دانشگاه صنعتی مالک اشتر Mohammad Amin Abbaszadeh محمد امین عباس زاده abbaszadeh.193@gmail.com 10031947532846002145 10031947532846002145 No Malek Ashtar university of technology دانشگاه صنعتی مالک اشتر