-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
CentOS Stream 9, CentOS Stream 10, eln
-
None
-
No
-
Moderate
-
rhel-sst-pt-llvm-rust-go
-
rhel-sst-pt-llvm-rust-go
-
ssg_platform_tools
-
None
-
False
-
-
None
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
-
ppc64le
-
None
During the Rust 1.86 rebase, we found that ppc64le was failing thousands of UI tests, but this only happened on RHEL where we target pwr9. On Fedora for pwr8, the results were much more typical (though we never have perfectly clean tests). We also see this with both LLVM 19 on el9/10 and 20 on eln. Here are example builds to check the ppc64 log:
- eln https://koji.fedoraproject.org/koji/buildinfo?buildID=2691296
- c10s https://kojihub.stream.centos.org/koji/buildinfo?buildID=76639
- c9s https://kojihub.stream.centos.org/koji/buildinfo?buildID=76640
I found a more direct reason for failure is that rustc --error-format=json is returning is_primary: true for all location spans, which confuses the test infrastructure that's trying to match errors on specific lines. Some spans should be false if they're only provided context for the error. Since nothing about this code is arch-specific, it seems we have a codegen bug.
To reproduce it, it seems to be dependent on PGO flags, and in fact even the first step of merely instrumenting the compiler for profiling causes the same json issue. I used ./x.py build --stage 2 sysroot --rust-profile-generate=$SOME_PATH in order to bisect it.
That got me to an upstream commit, which does make some sense since that's modifying the error json code. But I think this is only the point where the code changed in a way that triggered some underlying codegen issue, and I don't have that root cause yet.
I also don't know if this bug is manifesting in any other way. For now, I'm disabling PGO on those ppc64le builds, and that's getting us back to the status quo, at least.