Driver Debugging

1 Wanrouter hwprobe ERROR
2 Wanrouter status displays "disconnected"
3 Port I/O Issue: Wanpipe Overruns
4 WANPIPE Tools Compilation Failed
5 Errors while stopping WANPIPE

Wanrouter hwprobe ERROR

wanrouter hwprobe

wanrouter hwprobe
FATAL: Error inserting wanpipe (/lib/modules/2.6.18-194.el5PAE/kernel/drivers/net/wan/wanpipe.ko)

[dmesg : Unknown symbol in module, or unknown parameter () ]

After installing the wanpipe driver, if you see the above message, the issue is either of the following:

There are no Sangoma AFT cards in the system, or they are not detected by the system
The wanpipe Driver did not install properly

To verify your system has detected the Sangoma cards, run "lspci" in the linux command line:

lspci \| grep Sangoma

lspci \| grep Sangoma
... 07:01.0 Network controller: Sangoma Technologies Corp. A104d QUAD T1/E1 AFT car

or if using a USB device,

lsusb

Verify that you see "Sangoma" somewhere near the end of the list.

If the system did not detect your Sangoma card, try re-seating the card in the PCI slot, or try another slot, and run "lspci" again to verify if the Sangoma card is detected by the system.
Now view /var/log/messages for the reason the Wanpipe driver did not install properly:

vi /var/log/messages

...
wanpipe: Unknown symbol _dahdi_ec_span
wanpipe: Unknown symbol dahdi_alarm_notify
wanpipe: Unknown symbol dahdi_hdlc_getbuf
wanpipe: Unknown symbol dahdi_register
wanpipe: Unknown symbol _dahdi_receive
wanpipe: Unknown symbol dahdi_qevent_lock
wanpipe: Unknown symbol dahdi_hooksig
...

This means, the Wanpipe driver did not install against the same version of DAHDI that is currently running in the system. To resolve this, make sure that you install the Wanpipe driver for the version of dahdi indicated by running

dahdi_cfg -vvv

if your are running Trixbox, Elastix or other binary distros instead, run

rpm -qa | grep dahdi

If you are confident you have verified the above and the same issue is happening then other previous versions of dahdi, or even zaptel, are corrupting the Wanpipe install process.

-> Try removing any Dahdi/zaptel modules currently loaded into the kernel
*Note: never run the these commands if Wanpipe is running, as this will crash the system!

/etc/init.d/dahdi stop

for dahdi or

/etc/init.d/zaptel stop

for zaptel.

-> Then try re-installing The Wanpipe driver.

-> If issue persists, try re-installing Dahdi/zaptel and then re-install the Wanpipe driver

-> If the issue still exists type the following in your command line:

modprobe -l | grep dahdi | xargs rm

depmod -a

Again re-install Dahdi/zaptel, and then re-install the Wanpipe Driver.

Wanrouter status displays "disconnected"

Type the following into your Linux terminal and check for output as follows:

wanrouter Status

wanrouter Status
... \|Device name \| Protocol \| Station \| Status \| \| wanpipe1 \| AFT ISDN \| N/A \| Disconnected \| ...

The 'status' field should normally display 'connected' for all wanpipe interfaces. The above output indicates that the physical layer is down, due to alarms on the line. To diagnose the alarms on the line, run the following command:

wanpipemon -i wXg1 -c Ta

where 'X' stands for the wanpipe number that is showing 'disconnected'.

Please see E1-T1 Alarms for a sample output of the above command, that describes the meaning of each alarm.

When trying to read the output from above try to use the following rules of thumb:

First check the Rx Level
The correct value is -2.5db
Anything other than -2.5db indicates that there is a problem with the cable.
Options
Rx = -44dB -> either there is no cable plugged into the port, the cable is in very poor shape (so replace cable), or the incorrect type of cable is used (straight-through vs cross-over). Sangoma cards are shipped with a cable, if you see -44dB using this cable, chances are you require to change the type of cable used.
Rx = -2.5dB -> rx level is perfect
Rx = [-10dB - 20dB] -> there is something on the line but very weak. indicates a cable problem.
Sangoma cards will not come up if there is no clock on the line.
One way to confirm that Telco is not giving us the clock, is to go back to TDM Physical Configuration section and
configure the TDM Port for Master T1/E1 Clock. Note: Telco should always supply the clock.
Rx Alarms
Rx Alarms indicated that there is something wrong on the line and have the following meanings:
RED - We are not receiving any kind of signal on the line (Usually indicates that the line is not active).
AIS - The remote end is keeping us down on purpose (eg. Line in maintenance).
RAI - We receive good signal from remote end, but remote end does not see a good signal from us (Thus remote end is down.).
Note: If the only alarm you experience is LOF & RED, and you are using E1, try changing between CRC4 and NCRC4
Short/Open Circuit
These statistics usually indicate cable issues or that the port is not plugged in at all.

Port I/O Issue: Wanpipe Overruns

Once the wanpipe port is connected (see previous section), the voice/b-channels start transmitting and receiving data. At the interrupt level, the card is fully operational and voice DMA's are passing data to and from the memory. In order to confirm that wanpipe port interrupts are performing up to spec without irq slips, we must view ifconfig statistics. Issues you can experience include audio issues, faxing issues, and in very severe cases the wanpipe TDM driver will restart itself causing complete loss of all calls through the system.

Note that in voice communications the interrupts occur at 1ms (1000 Hz) intervals and are handed by a single core at a time. This means that an 8 core system would not help in processing interrupt requests from a single card. As a single card is attached to a single PCI interrupt.

The ifconfig command will display all network interfaces. Sangoma network interfaces start with "w".

For example,
w1g1 - indicates span 1
w2g1 - indicates span 2

Execute "ifconfig" command repeatedly every 0.5 sec in your shell:

watch -d -n 0.5 ifconfig

Look for "overrun" statistics in wXg1 interfaces.
Confirm that overrun statistics are NOT INCREMENTING.
It's ok if few overruns are present as overruns are normal until all ports become connected.
However after ports are connected there should be no more overruns increasing.

*NOTE: What we are looking for here, is that overruns are not incrementing every few seconds.

if overrun counters ARE incrementing, YOU HAVE OVERRUNS!
You may also notice the following output in /var/log/messages:

...
Jun 14 18:12:21 kernel: [20491.813839] wanpipe1: Warning: Excessive Fifo Errors:Resync (rx=128/tx=128)
Jun 14 18:12:21 kernel: [20491.814841] wanpipe2: Warning: Excessive Fifo Errors:Resync (rx=127/tx=127)
Jun 14 18:12:21 kernel: [20491.816119] wanpipe3: Warning: Excessive Fifo Errors:Resync
...

Overruns are caused by:

Clocking: The timing of the information being received by the Sangoma Card (specific to digital cards: T1/E1, BRI)
Processing interrupts: The ability of the computer system to be able to process the hardware interrupts generated by card in a synchronous fashion

Troubleshooting Interrupts

Since the hardware interrupts generated by the Sangoma card are handled by the system's DMA engine, its best to run a dma engine test on the system.
If you have a serial-ata (SATA) hard drive or type the following in your linux cli:

hdparm -t /dev/sda
/dev/sda: Timing buffered disk reads: 354 MB in 3.01 seconds = 117.69 MB/sec

If you have a parallel-ata (IDE) hard drive use

hdparm -t /dev/hda

The output above must be over 45 mb/sec
If 'using_dma' /=1, then your hard drive is monopolizing your CPU time.
The above test uses your hard drive to test the efficiency of the dma engine on the system.
If your output is less than the required rate then you must resolve this issue before you may use Sangoma cards.
*Note: Most likely, you might require to update the kernel chipset drivers on your system
---------
if the above test passes, then make certain that you do not have any PCI power save modes enabled that would prevent full capacity PCI functionality / Disable ACPI. You also want to use/enable the APIC interrupt handler (if you run "cat /proc/interrupts" and notice anything other than IO-APIC for your wanpipe interrupts, follow this step).
Go to your bootloader configuration file and at the end of the kernel command line and add "apic=on acpi=off". The bootloader config file will be system dependent. An example of what this change should look like is:
vi /etc/grub.conf

*Note: you must reboot your system in order for the above changes to be applied
---------
If overruns persist, then there may be some other processes sharing the same IRQ as the wanpipe interfaces, hindering wanpipe performance. View the current running IRQ settings:

cat /proc/interrupts

*NOTE: the above is an ideal view of /proc/interrupts
If you notice any other device being shared on the same IRQ as your wanpipe interface(s), then try isolating your wanpipe(s) to their own IRQ by one of the two methods:
-> There is potentially an option in your system BIOS to accomplish this.
-> placing the Sangoma card in another slot, then run "cat /proc/interrupts" to verify
If you notice that not all processors/cores are being used to handle interrupts concurrently you can adjust smp_affinity settings for your specific IRQ :
-> Guide on how to do this here: http://www.cs.uwaterloo.ca/~brecht/servers/apic/SMP-affinity.txt------
---------
If overruns continue, and your Sangoma Card has a Hardware Echo cancellor, and you are using Asterisk with Dahdi, try changing the Dahdi Chunk Size:
(If your Sangoma card does not have a hardware Echo Cancellor, skip to step 5)
To check if your card has a hardware echo cancellor run, run "wanrouter hwprobe" and check for HWEC=X

wanrouter hwprobe

If HWEC= <anything but 0>, then you have a Hardware Echo Cancellor
Changing the Dahdi Chunk Size reduces the amount of hardware interrupts caused by the card. This allows your system more time to process interrupts before the next one is called. Please see Reduce hardware interupts & context switching by 70% to adjust your dahdi chunk size.
*Note: This step involves re-installing the wanpipe driver.
------------
If overruns continue, then investigate the CPU usage on an interrupt level using the "top" application
(NOTE: only one CPU can be used per interrupt)
top

-> press the '1' key once in the top screen output and a few extra lines should populate near the top of the screen with information based on each CPU on the system:
Watch the value beside "hi" for each CPU. This indicates the CPU usage for Interrupts.
If this number is very high, then this may be the evidence that your system is not able to process interrupts in time.
-----------
-
If overruns continue to increment we need to isolate the Sangoma cards from the TDM lines to see where the issue is being caused.
1. Put the all the ports in loop back by create loop back plugs for all configured ports ( see HERE for loop back pin outs for your card)
2. In all wanpipe configuration files (/etc/wanpipe/wanpipeX.conf) change the clock source from NORMAL to MASTER (TE_REF_CLOCK= MASTER)
3. Restart wanrouter (wanrouter restart)
The above scenario demonstrates how to allow the card to create its own clock and communicate to itself (as if there were lines plugged into the card)
If the above scenario stops the overruns from incrementing, then the issue is due to multiple clock signals from the connected E1/T1 lines: proceed to Troubleshooting Due to Clocking from the PSTN lines.

Troubleshooting Clocking

When connected directly to the Telco, the clocking in /etc/wanpipe/wanpipeX.conf MUST be set to NORMAL
When connected to another pbx/channel bank, clocking in /etc/wanpipe/wanpipeX.conf MUST be set to MASTER

Note: it is typical to see overruns occur when first starting the Wanpipe driver, or starting up individual ports, as there is a short time interval required for the clock to synchronize.

All Sangoma cards operate using 1 single clock source. When data is being received by the Sangoma card on all the configured ports, the on-board DMA buffers temporarily store the data for each port. When all the buffers are filled, a hardware interrupt is triggered to the system's DMA engine to tell the kernel that data is read to be processed from the Sangoma card. All the dma buffers will be read for all ports at this time.

If at least one of the lines connected to a port on the card has a different clock signal than the others, this can cause the dma buffers to be filled at different times. When the hardware interrupt is signalled, buffers are filled, while others are not full. This scenario causes wanpipe overruns.

Typically all telco's in the same geographical region will use the same clock signal. Sometimes though, if one or more of the lines connected are from different telco's, the clock signal can be slightly off and cause overruns

Watch the live output of "ifconfig" in a separate window
watch -n 1 "ifconfig"
Stop all wanpipe interface(s)
wanrouter stop
Start each wanpipe interface in ascending order, with 30 second intervals between each, and only proceed when you see "overruns" from ifconfig not-incrementing

wanrouter start wanpipe1
wanrouter start wanpipe2
.......
If the next interface you start causing "overruns" to increment on that wanpipe interface only, or on all wanpipe interface, the line connected to that wanpipe interface has a different clock than the rest of the lines connected. Verify that the line is in good health by using the wanpipemon utility and contact your telco if there is an issue with the clock. Otherwise, you will need to connect this line to a separate Sangoma card to meet the "1 clock per card" requirement

If you are using ports on your Sangoma card to interface with the Telco and some with a pbx/channel bank, read the below information that verifies that you are using the telco clock as timing for the pbx/channel bank ports (and not the internal oscillator timing), otherwise there will be two clock signals used:

In the setup below there is one telco link and three channel bank/pbx links. Now in the /etc/wanpipe/wanpipe*.conf files for this senario to occur the clocking will be set as shown below. So this means port 2-4 the clock is provided from the clock on the card its self and port 1 gets it's clock from the telco; so this results in two different clocks. Now when using our configuration scripts this will occur because NORMAL clocking was selected for port 1 and ports 2-4 MASTER was selected but the clocking source was "Free run" rather then "Port 1".

This Setup Is Wrong; Just For Example

Port 1: TE_CLOCK = NORMAL
TE_REF_CLOCK = 0

Port 2-4: TE_CLOCK = MASTER
TE_REF_CLOCK = 0

In the setup below there is one telco link and three channel bank/pbx links. Now in the /etc/wanpipe/wanpipe*.conf files for this senario to occur the clocking will be set as shown below. So this means port 2-4 the clock is provided from the clock on port 1 which is the telco's clock; so this results in only a single clock. Now when using our configuration scripts this will occur because NORMAL clocking was selected for port 1 and ports 2-4 MASTER was selected but the clocking source was "Port 1" rather then "Free run". This scenario here is correct because there is only one clock as shown in the picture below and the fix in the configuration was "TE_REF_CLOCK =1" rather then "TE_REF_CLOCK =0" for ports 2-4.

This Setup Is Correct

Port 1: TE_CLOCK = NORMAL
TE_REF_CLOCK = 0

Port 2-4: TE_CLOCK = MASTER
TE_REF_CLOCK = 1

WANPIPE Tools Compilation Failed

If the Wanpipe driver fails to install, and you see:

"       !!! WANPIPE Tools Compilation Failed !!!
     Possible solution:
            Wanpipe header files were not installed properly
            in /usr/include/wanpipe directory
    Please contact Sangoma Tech. at 905 474-1990"

Possible reason:

-> Missing Wanpipe requirements
Please click on the following links to verify all requirements met:
-> Wanpipe Requirement

If you have verified the requirements the reason would be indicated inside the setup_drv_compile.log located in the source directory of the wanpipe driver location:
i.e: /usr/src/wanpipe-3.X.X/setup_drv_compile.log

Errors while stopping WANPIPE

You may try to stop wanpipe with "wanrouter stop" and see the following:

wanrouter stop

Shutting down wanpipe1 interface: w1g1

Shutting down device: wanpipe1

wanconfig: WAN device wanpipe1 did not shutdown

: ioctl(wanpipe1,ROUTER_DOWN) failed:

: 16 - Device or resource busy

If you router was not running ignore this message

!! Otherwise, check the /var/log/wanrouter and

/var/log/messages for errors

This means that your wanpipe interface(s) are still being used by Asterisk/FreeSWITCH and cannot be stopped.

To resolve this issue and to be able to stop Wanpipe, you must first stop the application above using wanpipe (i.e. Asterisk/FreeSWITCH)

If you have verified nothing on your system is still using Wanpipe, you may force wanpipe to stop:

wanrouter stop all

Telephony Cards