1 Commits

Author SHA1 Message Date
Junhao He
156e08f2fd rasdaemon: Add HBM Memory ACLS support for HiSilicon
When a hardware error occurs in a cell of the HBM memory, the internal
SRAM of the memory controller is used to replace the faulty memory, this
method is ACLS (Adaptive Cache Line Sparing). The IMU reports the ACLS
RAS, and the rasdaemon record it and runs the ACLS to replace the faulty
memory.

HBM ACLS can repair one cell (258-bit) memory at a time. The HBM can
check which HBM cell the physical address belongs to and filter invalid
HBM addresses. Multiple RAS errors are reported if memory errors occur
in different HBM cells.

The feature depends on the linux kernel CONFIG_HISI_HBMDEV [1] and
CONFIG_HWPOISON_INJECT [2].

[1]: https://gitee.com/openeuler/kernel/pulls/2757
[2]: https://gitee.com/openeuler/kernel/blob/OLK-5.10/mm/hwpoison-inject.c

Signed-off-by: Junhao He <hejunhao3@huawei.com>
2023-12-13 16:48:19 +08:00