Recovering hung channels / ports.

Introduction

This article contains information on stuck/hung channels and recovery methods for Dialogic JCT series Media Boards.

Details

Applications may experience two types of stuck channels also known as hung ports.  
Applications should have control over knowing whether there is a call control event OR a media event that should have been returned but has not yet returned.  The most common method to detect a lack of events is to implement a timer at certain function calls, such as dx_stopch(), where we are most likely to run into stuck channels.
  
A lack of an event from a dx_stopch() / ec_stopch() is a clear indication of something not working properly in one of the lower layers.  The rule of thumb is to wait 30 seconds extra for an event that should be returned, and if none is returned, to attempt channel recovery.
  
If the missing event is on the call control side, gc_ResetLineDev is the first API that should be used to attempt recovery.  If the missing event is on the media side, dx_resetch() or ec_resetch() can be used to attempt recovery.
  
** Note that applications should not replace the use of dx_stopch() with that of dx_resetch(). The dx_resetch() function should only be used if the application has detected that dx_stopch() has not returned an event. 
  
** Note also that the dx_resetch() API has some limitations and may not recover the channel in all stuck channels / hung ports scenarios. Use the TDX_RESET event as an indication of success and TDX_RESETERR event as an indication of failure.

Further information

  1. When individual channels go down on a board, this is indicative of a problem of that of a protocol or communication error between the underlying layers.  Generally these types of stuck channels / hung ports can be recovered by calling dx_resetch() API.  If that does not get the channel back into an IDLE state, attempting to close and re-open the channel might work.

  

  1. If an individual DSP (digital signal processor) goes down, this is indicative of a problem where the firmware was no longer operable in processing tasks from that particular DSP and affects only a subset of channels on the board in which that DSP was tied to. In this case the only option is to restart that board via DCM. An individual DSP on board cannot be restarted.  

  

  1. If the CP (Control processor) on the board goes down, this is indicative of a problem where the firmware was no longer operable in processing tasks from CP of the board and all channels go down at once. In this case the only option is to restart the board via DCM. 

  

  1. It is possible in some situations the dx_resetch() / close and re-open of the channel or even a restarting of the board via DCM does not work.  This may still be indicative of protocol error between the underlying layers which needs to be re-established.  This would require restart of system itself.

  

  1. Sometimes in situations where the CPU load on the host system reaches a point where the dialogic stack is unable to schedule CPU time may result in stuck channels.  Also if the available physical memory is depleted to an extent that the dialogic stack is unable to allocate necessary resources for continued operation, may also result in stuck channels.  

Product List

Dialogic JCT series Media Boards

Return to Documentation Home I Return to Sangoma Support