SC22 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

ClusterLog: Clustering Logs for Effective Log-Based Anomaly Detection


Workshop: 12th Workshop on Fault-Tolerance for HPC at Extreme Scale (FTXS 2022)

Authors: Chris Egersdoerfer, Di Zhang, and Dong Dai (University of North Carolina, Charlotte)


Abstract: With the increasing prevalence of scalable file systems in the context of HPC, the importance of accurate anomaly detection on runtime logs is increasing. But as it currently stands, many log-based anomaly detection methods have encountered numerous challenges when applied to logs from many parallel file systems (PFSes) due to their irregularity and ambiguity in time-based log sequences. To circumvent these problems, this study proposes ClusterLog, a log pre-processing method to cluster temporal sequence of log keys based on their semantic similarity. By grouping semantically and sentimentally similar logs, it aims to represent log sequences with the smallest amount of unique log keys, intending to improve the ability for a downstream sequence based model to learn the log patterns. The preliminary results indicate not only its effectiveness in reducing the granularity of log sequences without the loss of important sequence information, but also its generalizability to different file systems’ logs.





Back to 12th Workshop on Fault-Tolerance for HPC at Extreme Scale (FTXS 2022) Archive Listing



Back to Full Workshop Archive Listing