November 16, 2018

FreeRTOS SDK v2.3 for i.MX8MQ

One of the many advantages of developing with NXP’s i.MX 8M Family of application processors is the ability to utilize both the Cortex-A53 as well as the Cortex-M4 core. As such, we have had questions about how to run FreeRTOS using the M4-Core of our Nitrogen8M SBC. This blog post will first present the architecture of the i.MX 8MQ processor as a starting point for the discussion, and then explain how to build and run the FreeRTOS SDK v2.3 on its MCU.

The i.MX 8M (Quad) processor is coupling a Cortex-A53 cluster (Core Complex 1, 1 to 4 cores) alongside a Cortex-M4 (Core Complex 2) to offer the best of MPU and MCU worlds.

i.MX8M Block Diagram

For the impatient

You can download any of our currently available OS images for i.MX8MQ to use the Cortex-M4:

Then you need to make sure to have latest U-Boot available (v2018.07, Nov. 14th 2018  or newer):

After the upgrade and resetting the board, you should see several m4 variables:

=> env default -a
=> printenv m4boot
m4boot=load ${devtype} ${devnum}:1 ${m4loadaddr} ${m4image}; dcache flush; bootaux ${m4loadaddr}
=> saveenv

Note that you can find pre-built versions of the examples here:

You can then start your first Hello World application on the Cortex-M4 manually (after copying one of the binary above to your storage):

=> load mmc 0 $m4loadaddr hello_world.bin
=> dcache flush
=> bootaux $m4loadaddr

On the second serial port, you should see the following output:

Hello World!

Architecture 

As an introduction, here is the definition of terms that will be used throughout the post:

  • MCU: Microcontroller Unit such as the ARM Cortex-M series, here referring to the Cortex-M4
  • MPU: Microprocessor Unit such as the ARM Cortex-A series, here referring to the Cortex-A53
  • RTOS: Real-Time Operating System such as FreeRTOS or MQX

The i.MX8M processors offer an MCU and a MPU in the same chip, this is called a Heterogeneous Multicore Processing Architecture.

How does it work?

The first thing to know is that one of the cores is the “master”, meaning that it is in charge to boot the other core which otherwise will stay in reset.

The BootROM will always boot the Cortex-A core first. In this article, it is assumed that U-Boot is the bootloader used by your system. The reason is that U-Boot provides a bootaux command which allows to start the Cortex-M4.

Once started, both CPU are on their own, executing different instructions at different speeds.

Where is the code running from?

It actually depends on the application linker script used. When GCC is linking your application into an ELF executable file, it needs to know the code location in memory.

There are several options in both processors, code can be located in one of the following:

  • TCM (Tightly Coupled Memory): 128kB available
  • DDR: up to 1MB available (can be increased, set in the device tree)

Note that the TCM is the preferred option when possible since it offers the best performances since it is an internal memory dedicated to the Cortex-M4.

External memories, such as the DDR, offer more space but are also much slower to access.

In this article, it is assumed that every application runs from the TCM.

When is the MCU useful?

The MCU is perfect for all the real-time tasks whereas the MPU can provide a great UI experience with non real-time OS such as GNU/Linux.

We insist here on the fact that the Linux kernel is not real-time, not deterministic whereas FreeRTOS on Cortex-M4 is.

Also, since its firmware is pretty small and fast to load, the MCU can be fully operating within a few hundred milliseconds whereas it usually takes Linux OS much longer to be operational.

Examples of applications where the MCU has proven to be useful:

  • Motor control: DC motors only perform well in a real-time environment since feedback response time is crucial
  • Automotive: CAN messages can be handled by the MCU and operational at a very early stage

Resource Domain Controller (RDC)

Since both cores can access the same peripherals, a mechanism has been created to avoid concurrent access, allowing to ensure a program’s behavior on one core does not depend on what is executed/accessed on the other core.

This mechanism is the RDC, it can be used to grant peripheral and memory access permissions to each core.

The examples and demo applications in the FreeRTOS BSP use RDC to allocate peripheral access permission. When running the ARM Cortex-A application with the FreeRTOS BSP example/demo, it is important to respect the reserved peripheral.

The FreeRTOS BSP application has reserved peripherals that are used only by ARM Cortex-M4, and any access from ARM Cortex-A core on those peripherals may cause the program to hang.

The default RDC settings are:

  • The ARM Cortex-M4 core is assigned to RDC domain 1, and ARM Cortex-A core and other bus masters use the default assignment (RDC domain 0).
  • Every example/demo has its specific RDC setting in its board.c (see BOARD_RdcInit() function).

The user of this package can remove or change the RDC settings in the example/demo or in his application. It is recommended to limit the access of a peripheral to the only core using it when possible.

Also, in order for a peripheral not to show up as available in Linux, it is mandatory to disable it in the device, which is why a specific device tree is used when using the MCU:

The memory declaration is also modified in the device tree above in order to reserve some areas for FreeRTOS and/or shared memory.

Remote Processor Messaging (RPMsg)

The Remote Processor Messaging (RPMsg) is a virtio-based messaging bus that allows Inter Processor Communications (IPC) between independent software contexts running on homogeneous or heterogeneous cores present in an Asymmetric Multi Processing (AMP) system.

The RPMsg API is compliant with the RPMsg bus infrastructure present in upstream Linux 3.4.x kernel onward.

This API offers the following advantages:

  • No data processing in the interrupt context
  • Blocking receive API
  • Zero-copy send and receive API
  • Receive with timeout provided by RTOS

Note that the DDR is used by default in RPMsg to exchange messages between cores.

Where can I find more documentation?

The BSP actually comes with some documentation which we recommend reading in order to know more on the subject:

Build instructions

Development environment setup

In order to build the FreeRTOS BSP, you first need to download and install a toolchain for ARM Cortex-M processors.

~$ cd && mkdir toolchains && cd toolchains
~/toolchains$ wget https://developer.arm.com/-/media/Files/downloads/gnu-rm/7-2017q4/gcc-arm-none-eabi-7-2017-q4-major-linux.tar.bz2
~/toolchains$ tar xjf gcc-arm-none-eabi-7-2017-q4-major-linux.tar.bz2
~/toolchains$ rm gcc-arm-none-eabi-7-2017-q4-major-linux.tar.bz2

FreeRTOS relies on cmake to build, so you also need to make sure the following packages are installed on your machine:

~$ sudo apt-get install make cmake

Download the BSP

The FreeRTOS SDK v2.3 is available from our GitHub freertos-boundary repository.

~$ git clone https://github.com/boundarydevices/freertos-boundary.git freertos
~$ cd freertos
~/freertos$ git checkout imx8m_2.3_ga

Finally, you need to export the ARMGCC_DIR variable so FreeRTOS knows your toolchain location.

~/freertos$ export ARMGCC_DIR=~/toolchains/gcc-arm-none-eabi-6-2017-q2-update
~/freertos$ export PATH=$PATH:~/toolchains/gcc-arm-none-eabi-6-2017-q2-update/bin

Build the FreeRTOS apps

All the applications are located under the nitrogen8m folder:

~/freertos$ tree boards/nitrogen8m/ -L 1
boards/nitrogen8m/
├── cmsis_driver_examples
├── demo_apps
├── driver_examples
├── multicore_examples
├── project_template
└── rtos_examples

As an example, we will build the helloworld application:

~/freertos$ cd boards/nitrogen8m/demo_apps/hello_world/armgcc/
~/freertos/boards/nitrogen8m/demo_apps/hello_world/armgcc$ ./build_release.sh
~/freertos/boards/nitrogen8m/demo_apps/hello_world/armgcc$ ls release/
hello_world.bin  hello_world.elf

You can then copy that hello_world.bin firmware to the root of the eMMC or any other storage you use.

Run the demo apps

Basic setup

By default, the firmware is loaded from eMMC to TCM.

Before going any further, make sure to hook up the second serial port to your machine as the one marked as “console” will be used for U-Boot and the other one will display data coming from the MCU.

This blog post only considers the firmware file is named m4_fw.bin, if you wish to use another name, you need to set the m4image variable:

=> setenv m4image hello_world.bin
=> saveenv

If you want to load the MCU fimware manually from eMMC, here is the procedure:

=> load mmc 0 $m4loadaddr $m4image
=> dcache flush
=> bootaux $m4loadaddr

If you want to load the MCU fimware from TFTP:

=> dhcp $m4loadaddr 192.168.1.60:$m4image
=> dcache flush
=> bootaux $m4loadaddr

In order to start the MCU automatically at boot up, we need to set a variable that will tell the boot.scr to load the firmware.

To do so, make sure to save this variable.

=> setenv m4enabled 1
=> saveenv

Hello World app

The Hello World project is a simple demonstration program that uses the BSP software. It prints the “Hello World” message to the ARM Cortex-M4 terminal using the BSP UART drivers.

The purpose of this demo is to show how to use the UART and to provide a simple project for debugging and further development.

In U-Boot, type the following:

=> setenv m4image hello_world.bin
=> load mmc 0 $m4loadaddr $m4image
=> dcache flush
=> bootaux $m4loadaddr

On the second serial port, you should see the following output:

Hello World!

You can then type anything in that terminal, it will be echoed back to the serial port as you can see in the source code.

RPMsg TTY demo

This demo application demonstrates the RPMsg remote peer stack. It works with Linux RPMsg master peer to transfer string content back and forth. The Linux driver creates a tty node to which you can write to. The MCU displays what is received, and echoes back the same message as an acknowledgement. The tty reader on ARM Cortex-A core can get the message, and start another transaction. The demo demonstrates RPMsg’s ability to send arbitrary content back and forth.

In U-Boot, type the following in order to boot the OS automatically while loading the M4:

=> setenv m4image rpmsg_lite_str_echo_rtos_imxcm4.bin
=> setenv m4enabled 1
=> boot

On the second serial port, you should see the following output:

RPMSG String Echo FreeRTOS RTOS API Demo...

Once Linux has booted up, you need to load the RPMsg module so the communication between the two cores can start.

# modprobe imx_rpmsg_tty
# echo "this is a test" > /dev/ttyRPMSG30  

The last command above writes into the tty node, which means that the Cortex-M4 should have received data as it can be seen on the second serial port.

Nameservice sent, ready for incoming messages...
Get Message From Master Side : "hello world!" [len : 12]
Get Message From Master Side : "this is a test" [len : 14]
Get New Line From Master Side

RPMsg Ping Pong demo

Same as previous demo, this one demonstrates the RPMsg communication. After the communication channels are created, Linux OS transfers the first integer to FreeRTOS OS. The receiving peer adds 1 to the integer and transfers it back, a hundred times and then stops.

In U-Boot, type the following:

=> setenv m4image rpmsg_lite_pingpong_rtos_linux_remote.bin
=> setenv m4enabled 1
=> boot

On the second serial port, you should see the following output:

RPMSG Ping-Pong FreeRTOS RTOS API Demo...

Once Linux has booted up, you need to load the RPMsg module so the communication between the two cores can start.

# modprobe imx_rpmsg_pingpong
[   30.501148] get 1 (src: 0x1e)
[   30.506527] get 3 (src: 0x1e)
...
[   30.730958] get 101 (src: 0x1e)
[   30.734104] imx_rpmsg_pingpong virtio0.rpmsg-openamp-demo-channel.-1.30: goodbye!

While you can send the received data from the MCU on the main serial port, you can also see the data received from the MPU on the secondary serial port.

RPMSG Share Base Addr is 0xb8000000
Link is up!
Nameservice announce sent.
Waiting for ping...
Sending pong...
...
Waiting for ping...
Sending pong...
Ping pong done, deinitializing...
Looping forever...

 

That’s it, you should now be able to build, modify, run and debug