精品深夜AV无码一区二区_伊人久久无码中文字幕_午夜无码伦费影视在线观看_伊人久久无码精品中文字幕

COMP9414代寫、Python語言編程代做

時間:2024-07-06  來源:  作者: 我要糾錯



COMP9414 24T2
Artificial Intelligence
Assignment 2 - Reinforcement Learning
Due: Week 9, Wednesday, 26 July 2024, 11:55 PM.
1 Problem context
Taxi Navigation with Reinforcement Learning: In this assignment,
you are asked to implement Q-learning and SARSA methods for a taxi nav-
igation problem. To run your experiments and test your code, you should
make use of the Gym library1, an open-source Python library for developing
and comparing reinforcement learning algorithms. You can install Gym on
your computer simply by using the following command in your command
prompt:
pip i n s t a l l gym
In the taxi navigation problem, there are four designated locations in the
grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the
episode starts, one taxi starts off at a random square and the passenger is
at a random location (one of the four specified locations). The taxi drives
to the passenger’s location, picks up the passenger, drives to the passenger’s
destination (another one of the four specified locations), and then drops off
the passenger. Once the passenger is dropped off, the episode ends. To show
the taxi grid world environment, you can use the following code:

env = gym .make(”Taxi?v3 ” , render mode=”ans i ” ) . env
s t a t e = env . r e s e t ( )
rendered env = env . render ( )
p r i n t ( rendered env )
In order to render the environment, there are three modes known as
“human”, “rgb array, and “ansi”. The “human” mode visualizes the envi-
ronment in a way suitable for human viewing, and the output is a graphical
window that displays the current state of the environment (see Fig. 1). The
“rgb array” mode provides the environment’s state as an RGB image, and
the output is a numpy array representing the RGB image of the environment.
The “ansi” mode provides a text-based representation of the environment’s
state, and the output is a string that represents the current state of the
environment using ASCII characters (see Fig. 2).
Figure 1: “human” mode presentation for the taxi navigation problem in
Gym library.
You are free to choose the presentation mode between “human” and
“ansi”, but for simplicity, we recommend “ansi” mode. Based on the given
description, there are six discrete deterministic actions that are presented in
Table 1.
For this assignment, you need to implement the Q-learning and SARSA
algorithms for the taxi navigation environment. The main objective for this
assignment is for the agent (taxi) to learn how to navigate the gird-world
and drive the passenger with the minimum possible steps. To accomplish
the learning task, you should empirically determine hyperparameters, e.g.,
the learning rate α, exploration parameters (such as ? or T ), and discount
factor γ for your algorithm. Your agent should be penalized -1 per step it
2
Figure 2: “ansi” mode presentation for the taxi navigation problem in Gym
library. Gold represents the taxi location, blue is the pickup location, and
purple is the drop-off location.
Table 1: Six possible actions in the taxi navigation environment.
Action Number of the action
Move South 0
Move North 1
Move East 2
Move West 3
Pickup Passenger 4
Drop off Passenger 5
takes, receive a +20 reward for delivering the passenger, and incur a -10
penalty for executing “pickup” and “drop-off” actions illegally. You should
try different exploration parameters to find the best value for exploration
and exploitation balance.
As an outcome, you should plot the accumulated reward per episode and
the number of steps taken by the agent in each episode for at least 1000
learning episodes for both the Q-learning and SARSA algorithms. Examples
of these two plots are shown in Figures 3–6. Please note that the provided
plots are just examples and, therefore, your plots will not be exactly like the
provided ones, as the learning parameters will differ for your algorithm.
After training your algorithm, you should save your Q-values. Based on
your saved Q-table, your algorithms will be tested on at least 100 random
grid-world scenarios with the same characteristics as the taxi environment for
both the Q-learning and SARSA algorithms using the greedy action selection
3
Figure 3: Q-learning reward. Figure 4: Q-learning steps.
Figure 5: SARSA reward. Figure 6: SARSA steps.
method. Therefore, your Q-table will not be updated during testing for the
new steps.
Your code should be able to visualize the trained agent for both the Q-
learning and SARSA algorithms. This means you should render the “Taxi-
v3” environment (you can use the “ansi” mode) and run your trained agent
from a random position. You should present the steps your agent is taking
and how the reward changes from one state to another. An example of the
visualized agent is shown in Fig. 7, where only the first six steps of the taxi
are displayed.
2 Testing and discussing your code
As part of the assignment evaluation, your code will be tested by tutors
along with you in a discussion carried out in the tutorial session in week 10.
The assignment has a total of 25 marks. The discussion is mandatory and,
therefore, we will not mark any assignment not discussed with tutors.
Before your discussion session, you should prepare the necessary code for
this purpose by loading your Q-table and the “Taxi-v3” environment. You
should be able to calculate the average number of steps per episode and the
4
Figure 7: The first six steps of a trained agent (taxi) based on Q-learning
algorithm.
average accumulated reward (for a maximum of 100 steps for each episode)
for the test episodes (using the greedy action selection method).
You are expected to propose and build your algorithms for the taxi nav-
igation task. You will receive marks for each of these subsections as shown
in Table 2. Except for what has been mentioned in the previous section, it is
fine if you want to include any other outcome to highlight particular aspects
when testing and discussing your code with your tutor.
For both Q-learning and SARSA algorithms, your tutor will consider the
average accumulated reward and the average taken steps for the test episodes
in the environment for a maximum of 100 steps for each episode. For your Q-
learning algorithm, the agent should perform at most 13 steps per episode on
average and obtain a minimum of 7 average accumulated reward. Numbers
worse than that will result in a score of 0 marks for that specific section.
For your SARSA algorithm, the agent should perform at most 15 steps per
episode on average and obtain a minimum of 5 average accumulated reward.
Numbers worse than that will result in a score of 0 marks for that specific
section.
Finally, you will receive 1 mark for code readability for each task, and
your tutor will also give you a maximum of 5 marks for each task depending
on the level of code understanding as follows: 5. Outstanding, 4. Great,
3. Fair, 2. Low, 1. Deficient, 0. No answer.
5
Table 2: Marks for each task.
Task Marks
Results obtained from agent learning
Accumulated rewards and steps per episode plots for Q-learning
algorithm.
2 marks
Accumulated rewards and steps per episode plots for SARSA
algorithm.
2 marks
Results obtained from testing the trained agent
Average accumulated rewards and average steps per episode for
Q-learning algorithm.
2.5 marks
Average accumulated rewards and average steps per episode for
SARSA algorithm.
2.5 marks
Visualizing the trained agent for Q-learning algorithm. 2 marks
Visualizing the trained agent for SARSA algorithm. 2 marks
Code understanding and discussion
Code readability for Q-learning algorithm 1 mark
Code readability for SARSA algorithm 1 mark
Code understanding and discussion for Q-learning algorithm 5 mark
Code understanding and discussion for SARSA algorithm 5 mark
Total marks 25 marks
3 Submitting your assignment
The assignment must be done individually. You must submit your assignment
solution by Moodle. This will consist of a single .zip file, including three
files, the .ipynb Jupyter code, and your saved Q-tables for Q-learning and
SARSA (you can choose the format for the Q-tables). Remember your files
with your Q-tables will be called during your discussion session to run the
test episodes. Therefore, you should also provide a script in your Python
code at submission to perform these tests. Additionally, your code should
include short text descriptions to help markers better understand your code.
Please be mindful that providing clean and easy-to-read code is a part of
your assignment.
Please indicate your full name and your zID at the top of the file as a
comment. You can submit as many times as you like before the deadline –
later submissions overwrite earlier ones. After submitting your file a good
6
practice is to take a screenshot of it for future reference.
Late submission penalty: UNSW has a standard late submission
penalty of 5% per day from your mark, capped at five days from the as-
sessment deadline, after that students cannot submit the assignment.
4 Deadline and questions
Deadline: Week 9, Wednesday 24 of July 2024, 11:55pm. Please use the
forum on Moodle to ask questions related to the project. We will prioritise
questions asked in the forum. However, you should not share your code to
avoid making it public and possible plagiarism. If that’s the case, use the
course email cs9414@cse.unsw.edu.au as alternative.
Although we try to answer questions as quickly as possible, we might take
up to 1 or 2 business days to reply, therefore, last-moment questions might
not be answered timely.
For any questions regarding the discussion sessions, please contact directly
your tutor. You can have access to your tutor email address through Table
3.
5 Plagiarism policy
Your program must be entirely your own work. Plagiarism detection software
might be used to compare submissions pairwise (including submissions for
any similar projects from previous years) and serious penalties will be applied,
particularly in the case of repeat offences.
Do not copy from others. Do not allow anyone to see your code.
Please refer to the UNSW Policy on Academic Honesty and Plagiarism if you
require further clarification on this matter.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp









 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:FINS5510代寫、代做Python/c++程序語言
  • 下一篇:代寫公式指標 代寫指標股票公式定制開發(fā)
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區(qū)
    昆明西山國家級風景名勝區(qū)
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網(wǎng) 版權所有
    ICP備06013414號-3 公安備 42010502001045

    精品深夜AV无码一区二区_伊人久久无码中文字幕_午夜无码伦费影视在线观看_伊人久久无码精品中文字幕
    <samp id="e4iaa"><tbody id="e4iaa"></tbody></samp>
    <ul id="e4iaa"></ul>
    <blockquote id="e4iaa"><tfoot id="e4iaa"></tfoot></blockquote>
    • <samp id="e4iaa"><tbody id="e4iaa"></tbody></samp>
      <ul id="e4iaa"></ul>
      <samp id="e4iaa"><tbody id="e4iaa"></tbody></samp><ul id="e4iaa"></ul>
      <ul id="e4iaa"></ul>
      <th id="e4iaa"><menu id="e4iaa"></menu></th>
      中文字幕一区二区三中文字幕| 成人毛片视频在线观看| 亚洲色欲色欲www在线观看| 精品一区二区免费| 日韩一区二区视频在线观看| 午夜成人免费电影| 欧美老人xxxx18| 另类的小说在线视频另类成人小视频在线| 欧美色网一区二区| 秋霞国产午夜精品免费视频| 欧美tickle裸体挠脚心vk| 久久99这里只有精品| 国产欧美视频一区二区| 色综合中文字幕国产 | 日韩欧美一级在线播放| 男人的天堂亚洲一区| 精品久久久久久久一区二区蜜臀| 国内精品久久久久影院一蜜桃| 国产欧美精品一区aⅴ影院| 欧美亚洲国产怡红院影院| 麻豆国产精品一区二区三区| 国产欧美一区二区三区在线看蜜臀| 成人毛片老司机大片| 亚洲一区二区三区精品在线| 久久影视一区二区| 91久久人澡人人添人人爽欧美| 免费在线观看一区| 国产精品进线69影院| 337p亚洲精品色噜噜狠狠| 国产成人啪免费观看软件| 亚洲成a人片在线不卡一二三区| 日韩免费视频一区二区| 色呦呦日韩精品| 激情五月激情综合网| 亚洲一区在线看| 欧美国产日韩一二三区| 日韩午夜在线观看视频| 色系网站成人免费| 国产成人午夜片在线观看高清观看| 香蕉久久一区二区不卡无毒影院| 国产丝袜美腿一区二区三区| 69av一区二区三区| 色哦色哦哦色天天综合| 国产乱码字幕精品高清av| 日韩精品乱码免费| 一区二区三区不卡视频| 中文字幕综合网| 久久久久久一级片| 欧美精品一区二区三| 日韩欧美第一区| 日韩亚洲国产中文字幕欧美| 欧美绝品在线观看成人午夜影视| 色哟哟日韩精品| 91在线视频网址| 成人午夜在线免费| 国产精选一区二区三区| 国产一区二区三区视频在线播放| 日本不卡中文字幕| 婷婷开心久久网| 日韩福利视频导航| 美女性感视频久久| 国产永久精品大片wwwapp| 久久精工是国产品牌吗| 久久国产人妖系列| 国产在线视频一区二区| 韩国欧美国产1区| 国产一区二区网址| 国产传媒一区在线| 97超碰欧美中文字幕| 在线观看一区二区视频| 欧美乱熟臀69xxxxxx| 欧美一级午夜免费电影| 2023国产精品| 亚洲色大成网站www久久九九| 一区二区三区四区av| 日韩va欧美va亚洲va久久| 国内成人免费视频| av电影天堂一区二区在线观看| 在线日韩av片| 精品国内二区三区| 中文字幕第一页久久| 亚洲精品国产品国语在线app| 亚洲va韩国va欧美va| 国产一区二区不卡| 欧美亚洲动漫另类| 久久一夜天堂av一区二区三区| 中文字幕在线不卡视频| 日本亚洲最大的色成网站www| 国产一区高清在线| 色综合久久久久综合体桃花网| 日本道免费精品一区二区三区| 欧美一区二区网站| 中文字幕一区在线观看视频| 亚洲1区2区3区视频| 国产91清纯白嫩初高中在线观看| 在线视频一区二区免费| 久久久久99精品一区| 亚洲成人在线免费| 暴力调教一区二区三区| 欧美一区二区三区婷婷月色| 国产精品色婷婷| 蜜臀久久99精品久久久画质超高清 | 成人黄色一级视频| 欧美色倩网站大全免费| 久久久久国产成人精品亚洲午夜| 一区二区三区精密机械公司| 国内久久婷婷综合| 欧美日韩国产三级| 综合电影一区二区三区 | 精品三级在线观看| 亚洲综合激情小说| 国产91综合一区在线观看| 欧美喷水一区二区| 亚洲精品视频在线观看网站| 韩国v欧美v日本v亚洲v| 在线观看区一区二| 中文字幕人成不卡一区| 国产一区二区精品久久99| 欧美久久久久久久久久| 亚洲精品国产精品乱码不99| 成人丝袜18视频在线观看| 日韩欧美一级二级| 青青草国产精品亚洲专区无| 91丨porny丨蝌蚪视频| 欧美国产成人精品| 国产精品99久久久| 久久精品一区二区三区四区| 日本视频中文字幕一区二区三区| 欧美中文一区二区三区| 亚洲最大成人网4388xx| 色综合天天综合网国产成人综合天| 中文字幕不卡三区| 国产成人精品影院| 国产精品久久久久婷婷| 不卡一区二区三区四区| 日韩一区在线看| 91蜜桃免费观看视频| 亚洲区小说区图片区qvod| 91丨九色丨蝌蚪富婆spa| 亚洲欧美另类在线| 欧美三级资源在线| 石原莉奈在线亚洲三区| 欧美电影在线免费观看| 三级亚洲高清视频| 欧美精品一区二区久久久| 国产成人亚洲综合a∨婷婷| 国产精品久久久久影院色老大| 成人激情av网| 亚洲蜜臀av乱码久久精品| 日本乱码高清不卡字幕| 亚瑟在线精品视频| 26uuu久久天堂性欧美| 成人久久视频在线观看| 亚洲一区二区三区视频在线| 这里只有精品免费| 韩国三级在线一区| 中文字幕日韩一区二区| 欧美日韩在线综合| 国产精品自拍在线| 亚洲精品视频一区| 欧美日韩高清一区二区不卡| 另类人妖一区二区av| 中文字幕永久在线不卡| 欧美群妇大交群的观看方式| 精品一区二区三区欧美| 国产精品盗摄一区二区三区| 91.麻豆视频| 国产成人av福利| 亚洲最大的成人av| 久久精品亚洲乱码伦伦中文| 一本久久精品一区二区| 看电视剧不卡顿的网站| 中文字幕在线观看不卡| 日韩一级欧美一级| 91麻豆国产自产在线观看| 久久精品99久久久| 亚洲综合成人网| 精品国产乱码久久久久久影片| 91亚洲国产成人精品一区二区三| 蜜臀av性久久久久蜜臀aⅴ | 婷婷成人综合网| 中文字幕一区二区在线播放| 91精品国产综合久久久久久久久久| 国产99久久精品| 另类综合日韩欧美亚洲| 亚洲激情中文1区| 久久久一区二区三区| 8x8x8国产精品| 欧美又粗又大又爽| 国产不卡一区视频| 久久成人精品无人区| 午夜欧美视频在线观看| 伊人夜夜躁av伊人久久| 亚洲欧美中日韩| 日本一区二区动态图| 欧美大片在线观看| 欧美一区二区精品| 欧美精品丝袜久久久中文字幕| 91啪亚洲精品| 99久久99久久精品免费观看|