When a hardware error occurs in a cell of the HBM memory, the internal SRAM of the memory controller is used to replace the faulty memory, this method is ACLS (Adaptive Cache Line Sparing). The IMU reports the ACLS RAS, and the rasdaemon record it and runs the ACLS to replace the faulty memory. HBM ACLS can repair one cell (258-bit) memory at a time. The HBM can check which HBM cell the physical address belongs to and filter invalid HBM addresses. Multiple RAS errors are reported if memory errors occur in different HBM cells. The feature depends on the linux kernel CONFIG_HISI_HBMDEV [1] and CONFIG_HWPOISON_INJECT [2]. [1]: https://gitee.com/openeuler/kernel/pulls/2757 [2]: https://gitee.com/openeuler/kernel/blob/OLK-5.10/mm/hwpoison-inject.c Signed-off-by: Junhao He <hejunhao3@huawei.com>
Description
No description provided
Languages
Diff
100%