Advertisment

Expedite the edit-compile-debug process

author-image
CIOL Bureau
Updated On
New Update

By: Sandy Orlando and Jean Claude Coutant, Wind River Systems

Advertisment

Multi-core technology

Multi-core technology delivers significant advantages to hardware and software developers by providing higher processor performance, more effective power usage, and a smaller physical footprint for embedded devices.

Multicore solutions are often implemented in tandem with multi-processing in which multiple processors are used on a single-board or in an integrated system. To unlock the power of multi-core and multi-processing solutions it takes more than just silicon; it takes a new approach to debugging that allows software and hardware developers to debug an entire system and optimize the compile-edit-debug process.

JTAG debugging

JTAG debugging has traditionally been used for hardware bring-up and more recently to augment agent-based debugging. However, on-chip debugging plays a more significant role in multi-core and multi-processing debugging, helping to debug the operating system and middleware by isolating complex interactions between the software running on one or more cores.

Advertisment

Multi-core processors contain a single chip with multiple distinct processing engines. These characteristics provide increased CPU performance, functional specification, and partitioning options.

Multi-core is typically implemented in one of two configurations.

1) Symmetric multiprocessing (SMP): a single operating environment in which an SMP operating system abstracts the hardware from the developer and decides which core to use for each task. This scenario has homogenous cores with shared memory, in which a single operating system runs across multiple cores.

Advertisment

2) Asymmetric multi-processing (AMP): a collection of interacting but independent operating environments, with a separate instance of an operating system running for each core. This independence means that the environment can be either homogenous (all processors the same, one OS type) or heterogeneous (multiple processor types or operating systems). The added complexity of the multi-core environment requires a robust tool-chain for debugging operating systems that run on multiple cores, as well as the hardware related to those cores.

While the conventional definition of multi-core is multiple cores in a single die, the real-world use of multi-core and its debugging challenges represents a specific instance of the more general multi-processing case and extends beyond single-die debugging. Developers are also taking the single die and developing solutions using multiple CPUs on a single board with one or more cores. In highly complex systems, developers are writing software that runs across multiple CPU boards in a system using a multi-core and multi-processing technology.

The convergence of multi-core and multiprocessing technology is introducing new debugging challenges based on growing system complexity and the requirement to realize the inherent performance potential of multi-core through optimized hardware and software development.

Advertisment

Specific challenges include:

  • Effectively managing shared resources such as memory and peripherals
  • Debugging OS and application code over multiple cores, boards, and systems Optimizing the JTAG interface and fully utilize the JTAG bandwidth
  • Debugging homogeneous and heterogeneous cores on a single die and then coordinate the debugging over an entire system
  • Effectively using JTAG debugging with agent-based debug and ensure a smooth handoff between different debug tasks
  • Ensuring synchronization when debugging applications over multiple cores

There are three primary technology options for multi-core JTAG debugging.

Advertisment
  • A debugger that supports all cores through a single JTAG interface.
  • JTAG muxing (multiplexing) using independent debuggers at a single JTAG debug interface.
  • JTAG linkers or addressable scan ports.

These technology approaches deal with a central issue in debugging multiple cores with a JTAG interface: the limitation of JTAG interfaces by the SoC vendors. To save on costs, many SoC vendors provide only a single JTAG interface on a die, regardless of the number of cores. The challenge for developers is how to cost-effectively use that interface to synchronize the debug of multiple cores and multiple processors.

Daisy-Chain Methodology

The single debugger method uses the IEEE 1149.1 standard daisy-chain methodology. In the JTAG interface there are four wires: TDI, TDO, TCK, and TMS. For the purpose of connecting to the JTAG interface in multi-core debugging, the relevant wires are the TDI and TDO. In daisy-chaining, the output of the first core is connected to the input of the second core, and so on to reach the maximum number of cores.

Advertisment

The daisy-chain methodology is standards-based, widely used, and will work in all the multi-core debugging scenarios: single die, multiple CPUs on a board, and complex systems. It also works well in a heterogeneous environment in which more than one processor family and operating system is used for development such as in the mobile handset and consumer electronics devices. In the daisy chain method, the JTAG debuggers use the software interface of a JTAG server, which manages the addressing of the individual cores, regardless of location, via a single JTAG interface, solving the problem of a limited JTAG connection often found in multi-core environments.

The JTAG server also enables the developer to synchronize cores, start and stop processes on the same JTAG clock, and add or remove a connection without impacting microprocessor or device on the scan-chain. This method maintains accurate clocking and facilitates the debugging of different operating systems across multiple cores or different processes in the same operating system running across multiple cores. The key objections to the daisy-chain method are performance and JTAG bandwidth utilization.

The issue with daisy-chained JTAG is that the amount of data to transmit at the Shift-IR stage depends on the number of devices on said scan chain as well as the IR length of each device. As an example, it will take 24 bits of data to access an 8-bit IR register of a device in a daisy chain containing 3 of those devices. The problem exists also in DR, but is minimised by the fact that a device in bypass mode will only require one bit at the Shift-DR stage.

Advertisment

When the JTAG server is designed properly, such as the JTAG solution offered by Wind River, there is little performance degradation. Wind River has JTAG accelerator and server technology that virtually optimizes the JTAG bandwidth by reducing the idle time between JTAG sequence packets, using 100% of available JTAG bandwidth.

The other issue with the JTAG server is the concern regarding additional debugging capabilities such as the ability to use a stop request signal to stop a core immediately or a stop indication signal to stop a core and then synchronizing the stopping of all cores. Like all limitations, this one again is dependent upon the vendor implementation. For example, the Wind River Workbench on-chip debugging solution can start and stop multiple cores simultaneously. Vendors such as Wind River offer JTAG solutions (Workbench On-Chip Debugging) that centralise the multi-core and multi-processing debug function. This solution can simultaneously debug up to eight cores in a single scanchain regardless of whether those cores are on a single die, board, or in a system configuration.

With the Wind River multi-core solution, the developer can stop and start cores simultaneously, set breakpoints on one or more cores, including conditional breakpoints. In addition, the availability of the Workbench Eclipse framework and agent-based debugging enables developers to manage the multi-core/multiprocessor projects from a single console. The developer has the flexibility to use the JTAG connection for hardware bring-up, kernel, middleware, and other application functions and then seamlessly move to agent-based debugging when appropriate, all within the same debug application. These capabilities increase collaboration between different developers and improve time to problem resolution. The dominant alternative to the single debugger is JTAG multiplexing. This technique extends the IEEE JTAG specification to support the use of an independent debugger for every core that is connected through a shared JTAG interface.

Mux Technology

The mux technology enables the developer to access multiple discrete cores on a single die by registering, through a single JTAG interface, the core it wants to debug. The main advantage of this solution is its connection and debugging performance. Because the mux connects to each core individually, it does not have the bit shifting challenge of the daisy- chaining method and provides relatively good performance on a single die. The other advantage is that this solution does not require any modifications on the tool, enabling it to be used effectively across multiple projects.

The main issue with the mux approach is the inability to simultaneously start and stop cores to synchronise applications across multiple cores in the debug process. With a mux, stopping all the cores requires the developer to stop each core sequentially, introducing delay call skid. The problem with introducing delay in the debug process is that it becomes more difficult to locate problems with the OS, middleware, and application across cores, especially if there is dependence between the parts of the application running on the different cores.

For example, if the developer has a product with a DSP and an ARM 9 core with the DSP streaming video and the ARM 9 core providing the file system, synchronisation of the start and stop of the cores is critical. If there is a large delay (skid) between the stopping of the DSP and the starting of the ARM core during debugging, the DSP streaming video can quickly overrun the ARM file buffer and video traffic will be dropped, making it challenging to analyse the problem. Moreover, the muxing process has now introduced a new variable that the developer will have to measure and account for during troubleshooting, dramatically increasing the debugging cycle time.

The final issue occurs in heterogeneous environments when debugging cores from different vendors such as a processor from one vendor and a DSP from another vendor. In this case, muxing is more complex and if the instrumentation is not uniformly available, impossible to execute. This issue is even further compounded when multiple cores run across a system. In that case, muxing alone will not solve the problem; developers will also need to use an addressable scan port.

Addressable Scan Port

The last technology is the addressable scan port. This architecture requires the use of very specialized components. These components allow the developer to partition the JTAG scan-chain into functional groups with each group being accessed by a unique address. This is a multi-drop architecture and is often used in backplanes where a separate addressable scan-chain would be routed across the backplane such that each board in a rack would have a dedicated scan-chain. This architecture is limited in speed by the speed of the addressable scan port itself, typically 25 MHz.

JTAG debugging can perform a valuable role in multi-core development and improve the edit-compile- debug time when integrated with a standards-based integrated development environment such as Eclipse. The most optimal technology solution is using a single JTAG debugger that leverages the IEEE 1149.1 JTAG standard in daisy chain and employs JTAG acceleration to improve throughput and performance. Vendors such as Wind River offer unique capabilities in on-chip debugging that integrate effectively with agent-based debugging to improve the debugging performance in even the most complex environments.

tech-news