AMD Instinct MI200 Accelerator Firmware Update Tool (amdfwflash) 
User Guide V 2.0
================================================================

Introduction
================================================================
This document provides step-by-step instructions for updating the IFWI 
(Integrated Firmware Image) and RMFW (Remote Management Firmware) images
using the AMD FW Flash tool (amdfwflash) on the AMD Instinct MI200 server 
platforms.

The amdfwflash tool v2.0 is delivered with four versions of IFWI and RMFW:
	* Maintenance Update#1 (mu1) 
	* Maintenance Update#2 (mu2)
	* Maintenance Update#3 (mu3) 
	* General Availability (GA) 
By default, the tool updates to the most recent version of Maintenance Update#3.

The tool also provides capability to update or rollback your IFWI and/or RMFW 
to a desired level. For instance, this tool has the capability to update your 
MI200 platform to Maintenance Update#1 (mu1) or Maintenance Update#2 (mu2)version
from the GA version. The steps to be followed are outlined in this document.

NOTE:
The amdfwflash is not intended to be used in a Virtual Machine/Guest 
operating system (OS) environment.

WARNING:
Using the amdfwflash tool in a Virtual Machine/Guest OS may result in an undefined
behavior and unsupported configuration.


Getting Started
================================================================
Prior to updating the FW, follow the instructions below:

* Identify the server with the AMD Instinct MI200 accelerator(s) requiring 
FW update or GPU replacement.
* Ensure that you have the appropriate login credentials for the server.

NOTE: To execute the firmware update tool, you must have sudo or root permissions
on the server.

* To access the system console, make sure you have access to the BMC/IPMI interface.
* Ensure network access to the AMD FW Flash tool repository,
(repo.radeon.com).
* Ensure that all applications are closed prior to launching the tool and that 
no Operating System (OS) updates are pending in the background. 
Notify server users about the server maintenance for the FW update.

NOTE: It is strongly recommended to run the FW tool update from the system console, 
and not on the network. This prevents any network interruption and loss of connection.

1)  Instructions 
================================================================
To update the FW on AMD Instinct MI200 Accelerator(s) or when replacing the AMD Instinct 
MI200 Accelerator(s) on a server, configure the system for the FW maintenance. Once 
the system is configured for firmware maintenance, execute the amdfwflash command to 
update or rollback the IFWI and/or RMFW to a desired version.  For RM firmware updates, the 
driver must be loaded.

2)  Configuring the System for FW Maintenance or AMD Instinct MI200 Replacement
================================================================
Download and Install the AMD FW Flash Tool from repo.radeon.com Repository:
1. Login to the server with the MI200 GPUs requiring a FW update.
   $ ssh user@mi200_server

2. Set up the AMD FW Flash Tool Package repository.
* Set up Ubuntu OS apt repo:
   Step 1:
   wget -q -O - https://repo.radeon.com/fwupdater/amdfw.gpg.key | sudo apt-key add -

    Step 2:
echo 'deb [arch=amd64] https://repo.radeon.com/fwupdater/amdfwflash/2.0/deb/ ubuntu main' | sudo tee /etc/apt/sources.list.d/amdfwflash.list

* Set up RHEL 8 or RHEL 9 yum repo:

echo -e '[amdfwflash]\nname=amdfwflash\nenabled=1\nautorefresh=0\ngpgkey=https://repo.radeon.com/fwupdater/amdfw.gpg.key\nbaseurl=https://repo.radeon.com/fwupdater/amdfwflash/2.0/rpm\ngpgcheck=1' | sudo tee /etc/yum.repos.d/amdfwflash.repo

* Set up SLES 15 SP3 or SP4 zypper repo:

echo -e '[amdfwflash]\nenabled=1\nautorefresh=0\ngpgkey=https://repo.radeon.com/fwupdater/amdfw.gpg.key\nbaseurl=https://repo.radeon.com/fwupdater/amdfwflash/2.0/rpm\ntype=rpm-md\ngpgcheck=1' | sudo tee /etc/zypp/repos.d/amdfwflash.repo

3. Update the AMD FW Flash Tool package repository.
* Ubuntu OS
    sudo apt update

  To verify, search for the `amdfwflash` package:
    sudo apt search amdfwflash

* RHEL 8 or RHEL 9
    sudo yum update

  To verify, search for the `amdfwflash` package:
    sudo yum search amdfwflash

* SLES 15 SP3 or SP4
    sudo zypper update

  To verify, search for the `amdfwflash` package:
    sudo zypper search amdfwflash

4. Install the AMD FW Flash Tool package.
* Ubuntu OS
    sudo apt install amdfwflash

* RHEL 8 or RHEL 9ls
    sudo yum install amdfwflash

* SLES 15 SP3 or SP4
    sudo zypper install amdfwflash

5. Verify the installation of AMD FW Flash tool package. 
* Ubuntu OS
    dpkg -l | grep amdfwflash

* RHEL 8, RHEL 9
    rpm -qa | grep amdfwflash

* SLES 15 SP3, or SLES 15 SP4
    rpm -qa | grep amdfwflash

6. Reboot the server for FW maintenance update or power off to replace the MI200 GPUs.
    sudo reboot
OR
    sudo poweroff

NOTE: If there is a replacement of the AMD Instinct MI200 Accelerator in the system, 
power off the system.

Refer to the section "Updating and Rolling Back the AMD Instinct MI200 FW Version" to 
update or rollback the AMD Instinct MI200 FW to a desired version.

3)  Updating and Rolling Back the AMD Instinct MI200 FW Version
================================================================
Follow the below steps to update or rollback the AMD Instinct MI200 FW to a desired version.

3.1 Updating to the MI200 IFWI Maintenance Version: 
1. Login to the server's BMC/IPMI interface identified for FW update.

2. Launch the remote/virtual console on the server.

3. Login to the server. (NOTE: You must have sudo or root permissions to execute
amdfwflash tool to update the IFWI on MI200 GPUs.)

4. Run the amdfwflash utility to list the GPU devices.
    sudo /opt/amdfwflash/sbin/amdfwflash --list-devices

NOTE: The output should list all the GPU devices in the system. If the output 
does not list all the GPU devices, contact Customer Care.

5. Run the amdfwflash utility to update the IFWI and/or RMFW of all GPUs in the system
to the latest MI200 Maintenance Update#3 version.
    sudo /opt/amdfwflash/sbin/amdfwflash --update-ifwi or
    sudo /opt/amdfwflash/sbin/amdfwflash --update-ifwi mu3

    sudo /opt/amdfwflash/sbin/amdfwflash --update-rmfw or
    sudo /opt/amdfwflash/sbin/amdfwflash --update-rmfw mu3

6. Follow this step to update the IFWI and/or RMFW of all GPUs in the system
to the MI200 Maintenance Update#1 version.
    sudo /opt/amdfwflash/sbin/amdfwflash --update-ifwi mu1

    sudo /opt/amdfwflash/sbin/amdfwflash --update-rmfw mu1

7. Follow this step to update the IFWI and/or RMFW of all GPUs in the system
to the MI200 Maintenance Update#2 version.
    sudo /opt/amdfwflash/sbin/amdfwflash --update-ifwi mu2

    sudo /opt/amdfwflash/sbin/amdfwflash --update-rmfw mu2

8. Save the system log and console output to a file.

9. The amdfwflash tool saves a copy of the old IFWI and/or RMFW images under /tmp before
updating. Archive the generated FW images from /tmp folder for later
reference.
    tar cvf ifwi-backup.tar /tmp/amdfwflash/ifwi/backup

    tar cvf rmfw-backup.tar /tmp/amdfwflash/rmfw/backup

10. Reboot the server (an AC power cycle is recommended) to make the FW update effective.
    sudo reboot
OR
    sudo ipmitool power cycle

11. Refer to the section "Verifying the AMD Instinct MI200 FW Version" to complete 
the FW update. After a successful verification of the FW update, the server may resume normal operation.

3.2 Rolling Back to the MI200 GA FW Version:
1. Login to the server's BMC/IPMI interface identified for FW update.

2. Launch the remote/virtual console on the server.

3. Login to the server.(NOTE: You must have sudo or root permissions to 
execute amdfwflash tool to update the FW on MI200 GPUs.)

4. Run the amdfwflash utility to list the GPU devices.
    sudo /opt/amdfwflash/sbin/amdfwflash --list-devices

NOTE: The output should list all the GPU devices in the system. If the output 
does not list all the GPU devices, contact Customer Care.

5. Run the amdfwflash to rollback the IFWI and/or RMFW of all GPUs to the 
GA version.
    sudo /opt/amdfwflash/sbin/amdfwflash --rollback-ifwi

    sudo /opt/amdfwflash/sbin/amdfwflash --rollback-rmfw

6. Run the amdfwflash to rollback the IFWI and/or RMFW of all GPUs to the 
mu1 version from mu2 version.
    sudo /opt/amdfwflash/sbin/amdfwflash --rollback-ifwi mu1

    sudo /opt/amdfwflash/sbin/amdfwflash --rollback-rmfw mu1

7. Run the amdfwflash to rollback the IFWI and/or RMFW of all GPUs to the 
mu2 version from mu3 version.
    sudo /opt/amdfwflash/sbin/amdfwflash --rollback-ifwi mu2

    sudo /opt/amdfwflash/sbin/amdfwflash --rollback-rmfw mu2

8. Save the system log and console output to a file.

9. The amdfwflash tool saves a copy of the old IFWI and/or RMFW images under /tmp
 before updating. Archive the generated FW images from /tmp folder for later reference.
    tar cvf ifwi-backup.tar /tmp/amdfwflash/ifwi/backup

    tar cvf rmfw-backup.tar /tmp/amdfwflash/rmfw/backup

10. Reboot the server (an AC power cycle is recommended) to make the FW update
 effective.
    sudo reboot
OR
    sudo ipmitool power cycle

11.  Refer to the section "Verifying the AMD Instinct MI200 FW Version" to 
complete the FW update. After a successful verification of the FW update, the
server may resume normal operation.

4)  Verifying the AMD Instinct MI200 FW Versions
================================================================
1. Login to the system.

2. Run the amdfwflash utility to list the GPU devices.
    sudo /opt/amdfwflash/sbin/amdfwflash --list-devices

NOTE: The output should list all the GPU devices in the system. If the output
 does not list all the GPU devices, contact Customer Care.

3. If the AMD ROCm software is installed, run the rocm-smi --showhw command to 
display the IFWI version under VBIOS column.
    /opt/rocm/bin/rocm-smi --showhw

NOTE: If your environment has blacklisted the amdgpu driver for normal operation,
run the following command to load the driver before executing rocm-smi.
    sudo modprobe amdgpu

4. Verify that all the MI200 GPUs have been updated to the same IFWI and RMFW
 versions.

NOTE: In the event of a console output error, contact Customer Care. After
successful verification of the FW update, the server may resume normal operation.

5)  Uninstalling the amdfwflash  Tool Package
================================================================
Uninstall the amdfwflash tool package.
* Ubuntu OS
    sudo apt remove amdfwflash

* RHEL 8 or RHEL 9
    sudo yum remove amdfwflash

* SLES15 SP3 or SP4
    sudo zypper rm amdfwflash

6)  Replacing the AMD Instinct MI200 GPU (RMA)
================================================================
The IFWI and RMFW versions of all AMD Instinct MI200 Accelerators within
a system must be identical for the system to work properly.

1. When replacing AMD Instinct MI200 Accelerator(s) in a system, the system
must be configured for the AMD Instinct MI200 Replacement. 
Refer to the section "Configuring the System for IFWI Maintenance or AMD 
Instinct MI200 Replacement " for steps on how to configure the system. 

2. Once the system is configured for the AMD Instinct MI200 replacement, power
off the system and replace the AMD Instinct MI200 Accelerator(s) according to
the assembly instruction manual.

3. After replacing the AMD Instinct MI200 Accelerator, power on the system and
follow the steps in "Updating and Rolling Back the AMD Instinct MI200 FW 
Version" to update or rollback the IFWI and/or RMFW on all AMD Instinct MI200 
Accelerator(s) to a desired version.

7)  Additional Support
================================================================
If you have questions or need any additional information, please contact your AMD
Representative. You may also submit a question at Online Service Request 
(https://www.amd.com/en/support/contact-email-form) using the keyword 
amdfwflash in the subject line.

8)  References
================================================================
For additional information, please refer to the following web sites:
1. System Administration Guide: https://documentation.suse.com/sles/15-SP4/html/SLES-all/cha-mod.html
2. Knowledge base site: https://access.redhat.com/solutions/41278


***End of Text***