<samp id="e4iaa"><tbody id="e4iaa"></tbody></samp>
<ul id="e4iaa"></ul>
<blockquote id="e4iaa"><tfoot id="e4iaa"></tfoot></blockquote>
    • <samp id="e4iaa"><tbody id="e4iaa"></tbody></samp>
      <ul id="e4iaa"></ul>
      <samp id="e4iaa"><tbody id="e4iaa"></tbody></samp><ul id="e4iaa"></ul>
      <ul id="e4iaa"></ul>
      <th id="e4iaa"><menu id="e4iaa"></menu></th>

      COMP9414代寫、Python語言編程代做

      時間:2024-07-06  來源:  作者: 我要糾錯



      COMP9414 24T2
      Artificial Intelligence
      Assignment 2 - Reinforcement Learning
      Due: Week 9, Wednesday, 26 July 2024, 11:55 PM.
      1 Problem context
      Taxi Navigation with Reinforcement Learning: In this assignment,
      you are asked to implement Q-learning and SARSA methods for a taxi nav-
      igation problem. To run your experiments and test your code, you should
      make use of the Gym library1, an open-source Python library for developing
      and comparing reinforcement learning algorithms. You can install Gym on
      your computer simply by using the following command in your command
      prompt:
      pip i n s t a l l gym
      In the taxi navigation problem, there are four designated locations in the
      grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the
      episode starts, one taxi starts off at a random square and the passenger is
      at a random location (one of the four specified locations). The taxi drives
      to the passenger’s location, picks up the passenger, drives to the passenger’s
      destination (another one of the four specified locations), and then drops off
      the passenger. Once the passenger is dropped off, the episode ends. To show
      the taxi grid world environment, you can use the following code:

      env = gym .make(”Taxi?v3 ” , render mode=”ans i ” ) . env
      s t a t e = env . r e s e t ( )
      rendered env = env . render ( )
      p r i n t ( rendered env )
      In order to render the environment, there are three modes known as
      “human”, “rgb array, and “ansi”. The “human” mode visualizes the envi-
      ronment in a way suitable for human viewing, and the output is a graphical
      window that displays the current state of the environment (see Fig. 1). The
      “rgb array” mode provides the environment’s state as an RGB image, and
      the output is a numpy array representing the RGB image of the environment.
      The “ansi” mode provides a text-based representation of the environment’s
      state, and the output is a string that represents the current state of the
      environment using ASCII characters (see Fig. 2).
      Figure 1: “human” mode presentation for the taxi navigation problem in
      Gym library.
      You are free to choose the presentation mode between “human” and
      “ansi”, but for simplicity, we recommend “ansi” mode. Based on the given
      description, there are six discrete deterministic actions that are presented in
      Table 1.
      For this assignment, you need to implement the Q-learning and SARSA
      algorithms for the taxi navigation environment. The main objective for this
      assignment is for the agent (taxi) to learn how to navigate the gird-world
      and drive the passenger with the minimum possible steps. To accomplish
      the learning task, you should empirically determine hyperparameters, e.g.,
      the learning rate α, exploration parameters (such as ? or T ), and discount
      factor γ for your algorithm. Your agent should be penalized -1 per step it
      2
      Figure 2: “ansi” mode presentation for the taxi navigation problem in Gym
      library. Gold represents the taxi location, blue is the pickup location, and
      purple is the drop-off location.
      Table 1: Six possible actions in the taxi navigation environment.
      Action Number of the action
      Move South 0
      Move North 1
      Move East 2
      Move West 3
      Pickup Passenger 4
      Drop off Passenger 5
      takes, receive a +20 reward for delivering the passenger, and incur a -10
      penalty for executing “pickup” and “drop-off” actions illegally. You should
      try different exploration parameters to find the best value for exploration
      and exploitation balance.
      As an outcome, you should plot the accumulated reward per episode and
      the number of steps taken by the agent in each episode for at least 1000
      learning episodes for both the Q-learning and SARSA algorithms. Examples
      of these two plots are shown in Figures 3–6. Please note that the provided
      plots are just examples and, therefore, your plots will not be exactly like the
      provided ones, as the learning parameters will differ for your algorithm.
      After training your algorithm, you should save your Q-values. Based on
      your saved Q-table, your algorithms will be tested on at least 100 random
      grid-world scenarios with the same characteristics as the taxi environment for
      both the Q-learning and SARSA algorithms using the greedy action selection
      3
      Figure 3: Q-learning reward. Figure 4: Q-learning steps.
      Figure 5: SARSA reward. Figure 6: SARSA steps.
      method. Therefore, your Q-table will not be updated during testing for the
      new steps.
      Your code should be able to visualize the trained agent for both the Q-
      learning and SARSA algorithms. This means you should render the “Taxi-
      v3” environment (you can use the “ansi” mode) and run your trained agent
      from a random position. You should present the steps your agent is taking
      and how the reward changes from one state to another. An example of the
      visualized agent is shown in Fig. 7, where only the first six steps of the taxi
      are displayed.
      2 Testing and discussing your code
      As part of the assignment evaluation, your code will be tested by tutors
      along with you in a discussion carried out in the tutorial session in week 10.
      The assignment has a total of 25 marks. The discussion is mandatory and,
      therefore, we will not mark any assignment not discussed with tutors.
      Before your discussion session, you should prepare the necessary code for
      this purpose by loading your Q-table and the “Taxi-v3” environment. You
      should be able to calculate the average number of steps per episode and the
      4
      Figure 7: The first six steps of a trained agent (taxi) based on Q-learning
      algorithm.
      average accumulated reward (for a maximum of 100 steps for each episode)
      for the test episodes (using the greedy action selection method).
      You are expected to propose and build your algorithms for the taxi nav-
      igation task. You will receive marks for each of these subsections as shown
      in Table 2. Except for what has been mentioned in the previous section, it is
      fine if you want to include any other outcome to highlight particular aspects
      when testing and discussing your code with your tutor.
      For both Q-learning and SARSA algorithms, your tutor will consider the
      average accumulated reward and the average taken steps for the test episodes
      in the environment for a maximum of 100 steps for each episode. For your Q-
      learning algorithm, the agent should perform at most 13 steps per episode on
      average and obtain a minimum of 7 average accumulated reward. Numbers
      worse than that will result in a score of 0 marks for that specific section.
      For your SARSA algorithm, the agent should perform at most 15 steps per
      episode on average and obtain a minimum of 5 average accumulated reward.
      Numbers worse than that will result in a score of 0 marks for that specific
      section.
      Finally, you will receive 1 mark for code readability for each task, and
      your tutor will also give you a maximum of 5 marks for each task depending
      on the level of code understanding as follows: 5. Outstanding, 4. Great,
      3. Fair, 2. Low, 1. Deficient, 0. No answer.
      5
      Table 2: Marks for each task.
      Task Marks
      Results obtained from agent learning
      Accumulated rewards and steps per episode plots for Q-learning
      algorithm.
      2 marks
      Accumulated rewards and steps per episode plots for SARSA
      algorithm.
      2 marks
      Results obtained from testing the trained agent
      Average accumulated rewards and average steps per episode for
      Q-learning algorithm.
      2.5 marks
      Average accumulated rewards and average steps per episode for
      SARSA algorithm.
      2.5 marks
      Visualizing the trained agent for Q-learning algorithm. 2 marks
      Visualizing the trained agent for SARSA algorithm. 2 marks
      Code understanding and discussion
      Code readability for Q-learning algorithm 1 mark
      Code readability for SARSA algorithm 1 mark
      Code understanding and discussion for Q-learning algorithm 5 mark
      Code understanding and discussion for SARSA algorithm 5 mark
      Total marks 25 marks
      3 Submitting your assignment
      The assignment must be done individually. You must submit your assignment
      solution by Moodle. This will consist of a single .zip file, including three
      files, the .ipynb Jupyter code, and your saved Q-tables for Q-learning and
      SARSA (you can choose the format for the Q-tables). Remember your files
      with your Q-tables will be called during your discussion session to run the
      test episodes. Therefore, you should also provide a script in your Python
      code at submission to perform these tests. Additionally, your code should
      include short text descriptions to help markers better understand your code.
      Please be mindful that providing clean and easy-to-read code is a part of
      your assignment.
      Please indicate your full name and your zID at the top of the file as a
      comment. You can submit as many times as you like before the deadline –
      later submissions overwrite earlier ones. After submitting your file a good
      6
      practice is to take a screenshot of it for future reference.
      Late submission penalty: UNSW has a standard late submission
      penalty of 5% per day from your mark, capped at five days from the as-
      sessment deadline, after that students cannot submit the assignment.
      4 Deadline and questions
      Deadline: Week 9, Wednesday 24 of July 2024, 11:55pm. Please use the
      forum on Moodle to ask questions related to the project. We will prioritise
      questions asked in the forum. However, you should not share your code to
      avoid making it public and possible plagiarism. If that’s the case, use the
      course email cs9414@cse.unsw.edu.au as alternative.
      Although we try to answer questions as quickly as possible, we might take
      up to 1 or 2 business days to reply, therefore, last-moment questions might
      not be answered timely.
      For any questions regarding the discussion sessions, please contact directly
      your tutor. You can have access to your tutor email address through Table
      3.
      5 Plagiarism policy
      Your program must be entirely your own work. Plagiarism detection software
      might be used to compare submissions pairwise (including submissions for
      any similar projects from previous years) and serious penalties will be applied,
      particularly in the case of repeat offences.
      Do not copy from others. Do not allow anyone to see your code.
      Please refer to the UNSW Policy on Academic Honesty and Plagiarism if you
      require further clarification on this matter.
      請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp









       

      標簽:

      掃一掃在手機打開當前頁
    • 上一篇:FINS5510代寫、代做Python/c++程序語言
    • 下一篇:代寫公式指標 代寫指標股票公式定制開發
    • 無相關信息
      昆明生活資訊

      昆明圖文信息
      蝴蝶泉(4A)-大理旅游
      蝴蝶泉(4A)-大理旅游
      油炸竹蟲
      油炸竹蟲
      酸筍煮魚(雞)
      酸筍煮魚(雞)
      竹筒飯
      竹筒飯
      香茅草烤魚
      香茅草烤魚
      檸檬烤魚
      檸檬烤魚
      昆明西山國家級風景名勝區
      昆明西山國家級風景名勝區
      昆明旅游索道攻略
      昆明旅游索道攻略
    • 幣安app官網下載 幣安app官網下載

      關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

      Copyright © 2023 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
      ICP備06013414號-3 公安備 42010502001045

      主站蜘蛛池模板: 特级做A爰片毛片免费看无码| 国产成人无码免费看片软件 | 不卡无码人妻一区三区音频| 国产成人无码专区| 丰满熟妇人妻Av无码区| 无码中文人妻在线一区二区三区| 韩国无码AV片在线观看网站| 国产精品亚韩精品无码a在线| 亚洲熟妇无码八V在线播放| 精品无人区无码乱码大片国产| 无码人妻丝袜在线视频| 一区二区三区人妻无码| 精品少妇人妻AV无码专区不卡| 人妻丰满av无码中文字幕| 久久亚洲AV无码西西人体| 毛片一区二区三区无码| 日韩精品无码一本二本三本| 日韩va中文字幕无码电影| 国产精品无码一区二区三区免费| 亚洲av无码久久忘忧草| 亚洲av永久无码精品国产精品| 无码aⅴ精品一区二区三区| 亚洲人成人伊人成综合网无码| 亚洲av无码不卡| 国产产无码乱码精品久久鸭| 国产亚洲精久久久久久无码77777| 人妻系列无码专区久久五月天| 内射人妻无码色AV天堂| 人妻精品久久无码区| 亚洲AV日韩AV无码污污网站 | 麻豆亚洲AV成人无码久久精品| 无码精品A∨在线观看| 亚洲AV无码国产精品色午友在线| 亚洲午夜国产精品无码| 国精品无码一区二区三区左线 | 久久午夜夜伦鲁鲁片免费无码影视 | 亚洲国产精品无码久久一线| 人妻中文无码久热丝袜| 亚洲欧洲日产国码无码久久99| 国产V亚洲V天堂无码| 久久老子午夜精品无码|