ovpxa24xx: Work around defective OpenVox PCIe hardware / firmware design

The OpenVox A1610P/A1610E/A2410P/A2410E series cards all appear to share
a design flaw, where wild DMA reads are occassionally emitted from the card.
We suspect this design flaw is related to setup timing for the address bus
inside the FPGA PCI controller not being met, as it seems temperature dependent
and is largely, if not completely, eliminated by ensuring the upper 14 address
bits of the PCI bus are never set and forcing the link speed to 2.5GT/s.

As all wild DMA, even wild DMA read, is caught by the advanced PCIe controllers
on high availability platforms (mostly OpenPOWER, such as Talos II / Blackbird),
these cards will regularly drop offline due to the PCIe interface freezing and
invoking the EEH handler.  Since recovery from this state without a full reload
of Asterisk and DAHDI, along with the driver module, is not guaranteed, the wild
DMA presents a significant problem for high availability analog PBX servers.

For now, work around the broken hardware by setting a DMA mask of 14 bits.
Should OpenVox fix the problem in future hardware / firmware revisions,
the DMA mask can be reset to the standard 32 bits for this type of PCI device.
parent 6adf5360
......@@ -2396,8 +2396,23 @@ static int __devinit a24xx_init_one(struct pci_dev *pdev, const struct pci_devic
wc->index = index;
// Set the DMA mask appropriately
if (dma_set_mask(&pdev->dev, DMA_BIT_MASK(32)) || dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32))) {
/* The closed-source proprietary OpenVox firmware / HDL on this card is *badly* broken.
* It will try to read from unauthorized host addresses via DMA if more than 26 or so address bits are used,
* and even then it's unknown how reliable the card will be with that many address bits actually in use.
*
* Due to the closed firmware source, it's unknown if this is intentional or just wild DMA resulting from bugs
* in the firmware / HDL. In any case, x86 allows these through with unknown / undefined effect, whereas POWER
* blocks them and throws an EEH (the correct behavior in this instance). The side effect is that on POWER
* we lose the card on each wild DMA, as recovery without module reload and service restart is not guaranteed.
*
* Since we only need a ~1k DMA buffer, constrain the card's physical DMA address to the lower 14 bits...
* On POWER systems, this usually results in a PCI physical address of 0x0000000000001000.
*
* NOTE: This workaround only reduces the EEH frequency, it does not fully resolve the issue. Raptor has had
* good success in significantly reducing (if not outright elmininating) these issues with a combination of both
* this workaround and forcing the PCIe link speed to v1.x transfer rates (2.5GT/s).
*/
if (dma_set_mask(&pdev->dev, DMA_BIT_MASK(14)) || dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(14))) {
if (wc_dev->freeregion) {
release_mem_region(wc_dev->mem_region, wc_dev->mem_len);
iounmap((void *)wc_dev->mem32);
......@@ -2420,8 +2435,18 @@ static int __devinit a24xx_init_one(struct pci_dev *pdev, const struct pci_devic
return -ENOMEM;
}
if (wc_dev->writedma & (~DMA_BIT_MASK(14))) {
dma_free_coherent(&pdev->dev, wc_dev->dma_buffer_size, (void *)wc_dev->writechunk, wc_dev->writedma);
if (wc_dev->freeregion) {
release_mem_region(wc_dev->mem_region, wc_dev->mem_len);
iounmap((void *)wc_dev->mem32);
dev_warn(&wc_dev->dev->dev, "opvxa24xx: dma_alloc_coherent() gave a physical address %p outside of the requested range, aborting!\n", (void *)wc_dev->writedma);
return -EIO;
}
}
if(debug) {
printk("opvxa24xx: dma buffer allocated at %p, pci(%p), size %d bytes\n", wc_dev->writechunk, (void *)wc_dev->writedma, wc_dev->dma_buffer_size);
printk("opvxa24xx: dma buffer allocated at %p, pci(0x%016x), size %d bytes\n", wc_dev->writechunk, wc_dev->writedma, wc_dev->dma_buffer_size);
}
__a24xx_malloc_chunk(wc_dev,ms_per_irq);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment