LLVM Qualification WG sync-ups meeting minutes

uwendi · 2025-07-01T06:14:29.365Z

Hello everyone,

This thread is dedicated to sharing the meeting minutes of the LLVM Qualification Working Group. We will use this space to publish summaries and action items from our monthly sync-ups, in order to keep the broader community informed.

Meeting notes are initially drafted collaboratively in a shared FramaPad and then archived here after each session for long-term reference and discussion.

Notes (FramaPad): MyPads

The LLVM Qualification WG was formed following community interest in exploring how LLVM components could be qualified for use in safety-critical domains (e.g. automotive, oil & gas, medical). We welcome contributions from all perspectives: compiler developers, toolchain integrators, users from regulated industries, and others interested in software tool confidence, safety assurance, and systematic quality evidence.

If you’re interested in participating or following along, feel free to join the discussions here or connect via the LLVM Community Discord in the #fusa-qual-wg channel.

Warm regards,
Wendi
(on behalf of the LLVM Qualification WG)

uwendi · 2025-07-02T11:37:59.413Z

Notes - July 2025

Participants

EU/Asia-friendly (Tuesday 2025/07/01, 5:30PM JST, 1h)

Carlos Ramirez (WbyT)
Ferdinand Lemaire (WbyT)
Jessica Paquette (WbyT)
Jorge Pinto Sousa (Critical TechWorks)
Mikhail Maltsev (Arm)
Oliver Pajonk (Elektrobit)
Petar Jovanovic (HTEC)
Petter Bertnsson (Arm)
Sameera Deshpande (Quadric)
Sandeep (Arm)
Shivam Gupta (Raincode Labs)
YoungJun (NSHC)
Wendi Urribarri (WbyT)

Americas-friendly (Wednesday 2025/07/02, 6:00AM JST, 1h)

Alan Phipps (Texas Instruments)
Allen Miller (Texas Instruments)
Chris Apple (self/RTSan owner)
Florian Gilcher (Ferrous Systems)
John Regehr (University of Utah/Alive2)
Lucile Nihlen (Google)
Nigel Drago (Quadric)
Oscar Slotosch (Validas)
Pete LeVasseur (WbyT)
Petr Hosek (Google)
Todd Snider (Texas Instruments)
Wendi Urribarri (WbyT)

Agenda

Welcome & Intros
Code of Conduct
Collaboration format
Early topics & activities
Focus discussion: Requirements & Traceability
Next steps?

Mentimeter

llvm-fusa-qual-wg-sync-up-1 - Mentimeter

Be heard, collaborate, and share ideas—make meetings and classes more engaging with real conversations.

Links

Code of Conduct: https://mypads2.framapad.org/mypads/?/mypads/group/llvm-qualification-wg-cwt6z99x/pad/view/llvm-qualification-wg-code-of-conduct-2mt7z9lc
Meeting minutes (Discourse): LLVM Qualification WG sync-ups meeting minutes
Discord channel: Discord
Our page: LLVM Qualification Group — LLVM 22.0.0git documentation
Slides (will be migrated to GitHub later)
Open source requirements management tools · GitHub
Free and Open Source Software Requirements Management Tools
Clang C/C++ Language WG (meeting notes): Clang C/C++ Language Working Group Meetings - Google Docs

Highlights

EU/Asia

Approaches to Linking Tests and Requirements: Jessica discussed different methods for associating tests with requirements, such as adding text to existing tests or creating a directory to reference them. She suggested that adding text to the tests might be the most practical initial step. Wendi noted this down as a potential solution.
Leveraging Existing Specifications and Tests: Wendi inquired about existing specifications from the C/C++ working group. Jessica mentioned that implicit requirements might already exist in the test directory where clang’s behavior is checked for specific code. Jorge suggested utilizing golden samples as tests and mapping them to requirements using LLVM’s existing testing infrastructure.
Command Line Option Testing: Mikhail proposed checking which command line options are used by which tests during test suite runs to ensure all specified options are tested. Wendi asked about typical requirements management practices, noting the need for unique IDs and clear, verifiable descriptions.
Requirements Management Tools: Wendi shared links to free and open-source requirements management tools, mentioning Basil. Oliver described a similar tool used for tracing links between requirements, tests, and other elements, capable of generating coverage reports. Jorge offered to investigate in the Eclipse SDV/S-Core group what they use for requirements.
Automating Traceability: The discussion addressed automating the mapping between specifications and tests, either within the tests or using a requirements management tool. Oliver described how traceability tools use commands with IDs to link requirements to various artifacts and check for coverage.
Scope and Maintenance of Specifications: Wendi raised the question of what should be specified, by whom, and how it can be maintained, initially suggesting a focus on clang and C++. Oliver cautioned that this task should not be underestimated due to the potential workload. Jessica suggested that while specifying every compiler transformation might be difficult, existing tests could catalog behavior, and tools like Alive2 could verify semantic equivalence for certain optimizations. Petar emphasized the potential enormity of the effort required for detailed specification and maintenance.
Black Box Testing and Trust: Petar shared experience from qualifying GCC by treating it as a black box and using clang for result comparison. They suggested that a similar approach of extensive testing might be necessary for LLVM, focusing on building trust rather than deep internal analysis. Oliver seconded this, noting the difficulty of qualifying the Linux kernel through code analysis and suggesting the possibility of safety monitors or limiting the scope of usable compiler options. Jorge mentioned that qualified commercial compilers often come with safety manuals and usage guidelines.
Qualification of Standard/Runtime Libraries and Linker: Petar inquired about the qualification of the standard library. Jessica suggested working on libraries after addressing clang. Mikhail expressed interest in qualifying open-source runtimes. Wendi noted this point for future discussion.
Next Steps and Continued Discussion: Wendi suggested continuing the discussion on the Discord channel or Discourse and encouraged participants to add their thoughts to the notes.

US/Canada

Proposed Work Breakdown for Qualification: Wendi presented a suggestion coming from JR Simoes to split the qualification work into three parts: front-end, middle-end, and back-end, with an initial focus on the C/C++ front-end due to its broad use in safety-critical applications. Noted that other languages like Rust and relevant tools could be included later. Key discussion points for confidence in use of the compiler include specifications, testing, formal verification, sanitizers, runtime diagnostics, quality of compiler inputs, known issues analysis, and documentation (user manuals, safety manuals, release notes). The qualification of standard and runtime libraries was also added as a topic based on a suggestion during the EU/Asia session.
Challenges in Defining and Tracing Specifications: Wendi highlighted critical questions regarding specifications: whether existing specifications for front-ends like Clang can be reused, how to define partial specifications if none exist, and how to trace specifications with existing open-source verification means, such as the 25,000 tests, to achieve bidirectional traceability. Posed questions about the ownership and maintainability of these specifications. Opinions from the EU/Asia session suggested annotating tests with unique IDs linked to requirements or grouping tests into directories associated with specific requirements.
Recommendations for Specification and Test Organization: Florian shared his experience with Rust, suggesting that specifications should reuse as much as possible from existing project documentation and be built in a format conducive to linking, such as HTML. Oscar pointed out that C/C++ benefits from existing ISO standardization documents, so the focus should be on LLVM-specific features rather than creating new language specifications. Both Florian and Oscar agreed that structuring tests in directories that mirror the C standard’s chapters and sub-chapters is a practical and accepted approach for C/C++ compiler qualification, making maintenance easier for both safety-critical and non-safety-critical maintainers.
Completeness Argument and Requirements Management Tools: Oscar emphasized the need for a “completeness argument” in qualifying open-source software, explaining that beyond code coverage, it is essential to demonstrate why test cases are sufficient, often by using equivalence classes and programming constructs to define comprehensive test strategies. Wendi inquired about the use of free or open-source requirements management tools. Oscar indicated that he uses a proprietary model for linking functional specifications to test cases.
Experiences with Requirements Management Tools: Florian shared his experience, stating that while tools like Sphinx needs are useful for general software and libraries, programming language specifications are too dense for typical requirements writing tools. They opted for a custom Sphinx extension for language specification to test tracing, finding it more suitable than trying to adapt existing tools not designed for this specific task. Pete supported this, noting that Sphinx and Sphinx needs were adopted for the Rust coding guidelines within the safety-critical Rust consortium, finding them useful for building verifications and ensuring traceability.
Feature-Based Qualification Structure: Oscar suggested structuring the qualification based on logical features (e.g., language compliance, optimization rules, target-specific features) rather than technical components like front-end, middle-end, and back-end, as this logical structure is more relevant for certifiers who may not understand internal compiler architecture, and that tool qualification is typically a black-box activity. John clarified that the split front-end/middle-end/back-end is driven by the capabilities of existing formal verification tools like Alive2 and their translation validation work, which currently can validate optimizations on the middle-end and back-end but not the front-end. Oscar and Wendi agreed that tools used for tool qualification do not need to be qualified themselves, simplifying the process. Florian expressed interest in separately qualifying linkers, but Oscar argued that qualifying a tool always involves documenting its environment and configuration, making it an integrated process.

Actions

Wendi:

Create calendar for the group, with regular invitations to the sync-ups
Update Online Sync-ups page: Getting Involved — LLVM 22.0.0git documentation
Update group’s page with new info: LLVM Qualification Group — LLVM 22.0.0git documentation
Talk to the Clang C/C++ working group about specifications and how to build a language-compliant suite, e.g., C++17
Add the topic of what to do with libraries as a backlog item

Jorge:

Check the Eclipse Safety Core (S-core) for requirements management (definition, traceability) and report back

All:

Continue the discussion on next steps in the Discord channel & Discourse

uwendi · 2025-07-31T01:32:05.996Z

[LLVM Qual WG] arm-tv demo with @regehr

2025/07/31 8:30AM JST

Recording: link

Chat transcript: link

Notes by Gemini

Summary

@regehr introduced Alive2, a software tool for refinement checking of LLVM optimizations, and the arm-tv tool, developed by his group, for translation validation of ARM 64-bit assembly code, explaining their methodologies and demonstrating their application in bug detection. While arm-tv has found 46 bugs, primarily silent miscompiles, scalability challenges, particularly with memory access, were acknowledged. Some questions were raised about limitations during lifting, tool’s trustworthiness, adding new architectures.

Details

Introduction to Alive2 @regnerjr introduced Alive2, a software tool for refinement checking of LLVM optimizations. He explained that LLVM’s middle-end rewrites Intermediate Representation (IR) to improve code, often making it faster or smaller. These transformations are considered “refinements,” meaning the new code’s set of meanings is a subset of the old code’s. Alive2 uses symbolic execution of code before and after optimization and generates questions for the Z3 theorem prover to verify if the optimized code refines the unoptimized code.
Alive2 Compiler Explorer @regehr encouraged attendees to try Alive2 via its compiler explorer instance at alive2.llvm.org, noting its ease of use and providing an example problem to explore. He also mentioned that papers have been written about Alive2, but hands-on use is likely more engaging.
arm-tv** overview** @regehr presented the arm-tv tool, developed by his group, which performs translation validation for ARM 64-bit assembly code. He demonstrated an LLVM function that uses `memcmp` and showed how the ARM backend optimizes it, including inline substitution of `memcmp` and replacing control flow with a conditional select. The arm-tv tool aims to prove that the assembly code is a faithful translation of the LLVM IR.
Translation Validation methodology @regehr explained that translation validation involves assigning a mathematical meaning to the code before and after transformation. Alive2 is used to formally represent the meaning of LLVM functions. For ARM code, arm-tv assigns meaning either by using hand-written instruction semantics derived from the manual or through a mechanically derived version from ARM’s formal description of instructions. The tool then translates the ARM code back into LLVM IR and invokes Alive2 for a refinement check.
arm-tv** in action** @regehr demonstrated arm-tv, which is called backend-tv and also supports RISC-V. The tool parses assembly into LLVM MC inst, lifts the ARM assembly code by building a small execution environment that resembles an ARM processor with registers initialized with “freeze poison” (an indeterminate bit pattern), and then processes the lifted instructions. This process results in a clumsy but optimizable function that Alive2 can then efficiently check against the original code.
Bug detection with arm-tv @regehr shared that arm-tv has found 46 bugs, primarily silent miscompiles, most of which are in the machine-independent parts of the LLVM backend. He noted that while arm-tv recently started supporting RISC-V, fewer bugs have been found compared to ARM, attributing this to the multi-backend impact of the existing bugs. @regehr mentioned that most bugs were found with the help of fuzzers and an automated testing workflow.
Origin and scalability challenges @regehr revealed that the impetus for arm-tv came from a conversation with JF Bastien years ago about trusting LLVM’s top-of-tree for automotive applications. @YoungJunLee inquired about handling large functions more efficiently, to which @regehr acknowledged scalability as a significant weakness of the tool, particularly with memory access, indicating that improvements to Alive2’s memory encoding are needed.
Limitations and trustworthiness @uwendi asked about limitations or loss of information during lifting. @regehr explained that while ARM assembly semantics are cleaner, challenges arose in lifting code with powerful pointers to LLVM’s weaker object-offset model, necessitating changes to Alive2’s memory model to support “physical pointers”. He addressed concerns about trusting arm-tv, suggesting documenting the tool’s scope and limitations, with a separate group of people needed to verify its implementation for certification purposes.
Tool Usage and Bug Reporting @regehr stated that currently, only his team uses arm-tv. When a bug is reported by the tool, he verifies it on an actual ARM machine to confirm the misbehavior before reporting it to the LLVM developers, ensuring the tool’s output is vetted. He also mentioned the existence of false alarms due to the complexity of the LLVM memory model.
Impact on LLVM specification and Future Work @regehr shared an anecdote where arm-tv uncovered an ambiguity in the interaction between the LLVM Lang Ref and the AR64 ABI document, which led to a resolution and fix in LLVM. Regarding future work, he expressed interest in supporting translation validation of inline assembly and concurrency-related aspects of LLVM IR, such as volatile accesses and interrupt handlers in embedded systems.
Adding new architectures Luc Forget inquired about the modularity of arm-tv for adding new ISA semantics. @regehr explained that while not “super modular,” refactoring had made it easier to add RISC-V support, and adding a third architecture would likely not be difficult, though Alive2’s lack of multiple address space support remains a limitation for GPU backends. He also highlighted that supporting a new architecture primarily requires a description of its instruction set. @regehr mentioned that for ARM, they can automatically generate the instruction semantics from ARM’s Architecture Specification Language (ASL), but for RISC-V, it was done by hand. He hopes to derive x86-64 semantics automatically in the future, as manual implementation is too extensive.

uwendi · 2025-08-04T08:06:45.582Z

LLVM Qualification Group’s August Sync-Up Agenda

Hi all,

The main topics for the next sync-up are as follows:

Internal process update: proposed changes to membership criteria

(Thanks to @petarj and @etomzak for their inputs)

Discussion: Proposed changes to membership criteria to address the current internal process’ inherent challenges with active collaboration and contribution.
Action: If possible, please complete the Participant Introduction and Membership Criteria Form before the sync-up.

Clang C/C++ WG insights on conformance to ISO

(Follow-up on the previous discussion regarding specifications and traceability to tests for Clang)

Invitees: @Endill and @AaronBallman (to the EU/Asia or Americas-friendly timeslots, depending on their availabilities)
Related RFC: https://discourse.llvm.org/t/rfc-c-conformance-test-suite/69821
Current Status: Overview of Clang’s test suite (clang/test/cxx) and conformance challenges.
Discussion: How these insights impact LLVM Qualification Group’s goals, explore possible steps on creating better traceability and conformance for Clang.

Open Floor

Any additional topics, questions, comments, or suggestions from group members.
Review action items and assignees.

uwendi · 2025-08-05T22:53:22.428Z

Notes - August 2025

Participants

EU/Asia-friendly (Tuesday 2025/08/05, 5:30PM JST, 1h)

Davide Cunial: Interest in Clang-Tidy Qualification
Oscar Slotosch: Contribute and Learn about TQ, AI Tool Classification
Carlos Ramirez: Tokyo, SW Quality, Safety & Security, phd in human-error centric quality
Erik Tomusk: Interest in qualifying GPU Accelerators and use LLVM
Petar Jovanovic: Compiler Engineer, Open-Source Enthusiast, Static Analysis Tool
YoungJun Lee: Korea, Obfuscation Compiler, Alive2, SAST
Wendi Urribarri: Tokyo, Functional Safety Engineer, Formal Methods

Americas-friendly (Wednesday 2025/08/06, 6:00AM JST, 1h)

Peter LeVasseur: WbyT
Wendi Urribarri: WbyT
Vlad Serebrennikov: invitee from the Clang C/C++ Working Group

Agenda

Participant Introduction and Membership Criteria Form
C++ Conformance Test Suite (RFC)

Links

NASE Study about AI
Proposal of Membership Criteria
Clang conformance - summary from RFC
Rust language references: The Rust Reference and FLS

Highlights

EU/Asia

Meeting Kick-off and Participant Introductions: as a previous step to discussing membership criteria, introduction of each attendee, sharing background and interest in the LLVM qualification group.
AI in Software Development and Qualification: Carlos and Oscar discussed the role of AI in software development, particularly in the context of ISO 26262 compliance. Oscar mentioned a study where AI tools were classified as TCL1 due to uncertainties in their qualification, unlike other tools often classified as TCL3, emphasizing the human ability to detect errors. Carlos expressed skepticism about AI-generated code making it into production for critical software within the next decade due to liability issues and AI’s current limitations in understanding broad code context, which was supported by an experiment showing AI’s failure to recognize dependencies.
Accelerators and Safety Critical Spaces: Erik focuses on high-performance computing and runtimes, and bringing this technology to safety-critical spaces like automotive. They clarified that their work involves certified runtime components that depend on LLVM for qualification, positioning themself more as a runtime specialist rather than a compiler expert in this context.
Open Source Static Analysis Tools and Legal Challenges: Petar shared their experience in trying to open-source a static analysis tool for automotive standards like MISRA and AUTOSAR. They explained that legal issues, particularly concerning the exact wording of error reports and the reuse of standard parts, prevented the tool’s public release, despite having presented it five years prior. Davide affirmed a similar experience, noting that MISRA and AUTOSAR checks cannot currently be open-sourced, highlighting the legal complexities involved.
Discrepancies in Open Source Standards Access: Oscar expressed surprise regarding the difficulties with open-sourcing AUTOSAR-related implementations, as AUTOSAR specifications are freely available, unlike MISRA documents which require payment. Petar clarified that while AUTOSAR standards are free to download, reusing parts of them requires written permission from the consortium, which has been difficult to obtain. This discussion underscored the legal and logistical hurdles in leveraging open-source initiatives for automotive industry standards.
LLVM Component Qualification by Validas: Oscar detailed Validas’s experience in qualifying LLVM components, including LLVM-based compilers and clang-tidy. They highlighted the usefulness of clang’s feature to log optimization rules for qualification purposes and also mentioned their qualification kit for clang-tidy, which requires qualifying each rule individually. Additionally, Oscar noted their ongoing process of qualifying the STL template library, having identified and contributed fixes for issues in its implementation.
Compiler Optimizations and Safety Concerns: Petar raised a question about “wrong” optimizations in compilers, stating that as a compiler developer, they see nothing inherently wrong with optimizations and that issues are typically bugs, not inherent flaws in optimization. Oscar provided examples of optimizations that can lead to incorrect or unexpected behavior, such as integer overflow issues or deviations in floating-point calculations due to differences in host versus target accuracy. The discussion emphasized the need for careful configuration and understanding of compiler behavior in safety-critical contexts to ensure deterministic output.
Managing Known Bugs in Open Source Tools: Oscar discussed the importance of managing known bugs in open-source tools for qualification purposes, noting that the existence of bugs is acceptable as long as workarounds are available. They suggested that improving the classification and mapping of known bugs to specific features would significantly aid in filtering and scanning for relevant issues, making the analysis process faster and easier for developers.
Internal Process Changes and Membership Criteria: Wendi briefly introduced a proposal for changes in the group’s internal process, including membership criteria and participation expectations. They shared a link to the detailed description, emphasizing the need for clear expectations regarding contributions and acknowledging the limited time and bandwidth of participants.
Valuing Small Contributions: Wendi emphasized the significance of small contributions, stating that even a few minutes or one hour per month dedicated to the group would be meaningful and important. They encouraged attendees to review and comment on the shared document, noting that it was a lightweight version of the security response definition of group composition.
RFC Summary and Offline Review: Wendi shared a link to a summary of the main points from an RFC written in April 2023, which is related to Clang conformance. They requested that participants review it offline and share their opinions on Discord rather than the Discourse forum.

US/Canada

Proposed Internal Process for the Group: Wendi presented a proposal for a new lightweight internal process to address concerns about group efficiency and the need for a more structured approach. They highlighted the importance of recognizing and respecting members’ limited bandwidth and valuing small contributions, as some members might have mistakenly believed that only full-time commitment was expected.
C++ Conformance Testing Challenges: Wendi shared insights from their contact with the Clang C/C++ working group regarding Clang conformance specifications and testability. An RFC from April 2023 indicated that developing a C++ conformance test suite faced resource limitations, preventing any current action despite a good description of how it could be done. A significant hurdle was the licensing issues with test vendors, as they only allowed reporting pass/fail results but not opening tests to analyze failures, making error analysis impossible for open-source use.
Clang CXX Directory and Defect Reports: Vlad elaborated on the `clang/test/CXX` directory, noting its two main parts: `DRs` (defect reports) and everything else. They maintained the `DRs` section, which contained about 700 tests for defect reports, far exceeding other implementations. Vlad mentioned that much of the work in this directory, particularly the first 600 defect reports, was done around 2014 by Richard Smith, but progress stopped after that.
Challenges with External Conformance Test Suites: Vlad explained that efforts to use external test suites like those from Perennial and Plum Hall in Clang were unsuccessful due to restrictive licensing, which would essentially require these companies to forfeit their business. They also mentioned that some of these test suites were not ideal and could even contain bugs. Wendi confirmed similar issues with SolidSands, stating that it was difficult to use such suites in an open-source context.
C++ Standard and Compiler Conformance: Vlad discussed the historical decision not to include many C++ examples in the standard, which created long-term issues for language evolution and caused increasing disagreement among implementations, especially for newer features. They emphasized the RFC’s primary goal: to find a way to write and maintain a test suite that avoids decay, proposing tracking the git repository of the draft to reflect updates to the standard in updated tests. Vlad explained that compilers often do not conform to published standards due to subsequent defect reports and accepted papers, citing the “relaxed template template parameter” debacle as an example.
Private Compiler Qualification and Test Suite Quality: Wendi inquired how companies privately qualify their LLVM-based compilers given the lack of proof of conformance to standards. Vlad expressed skepticism about the quality of such private test suites, stating that Clang itself does not claim full conformance due to known unresolved issues, such as the incomplete implementation of the 2019 name lookup paper. Vlad also detailed challenges with “complete class context” rules, where compilers are expected to handle dependencies correctly but often do not due to performance concerns, making it difficult for external parties to fix.
Current Status of RFC: Wendi summarized the RFC’s status, confirming with Vlad that nothing had changed regarding the feasibility of in-house effort or external conformance test suites due to resource and licensing issues. Vlad stressed the need for ongoing communication with the core working group to correctly interpret the standard and identify parts considered “garbage” that require rewriting.
Safety Critical Rust Consortium: Pete discussed their work with the Safety Critical Rust Consortium, which aims to identify and address gaps in the Rust ecosystem and language for safety-critical applications. They explained that the consortium seeks to enable Rust’s use in more safety-critical industries and at higher safety criticality levels. Vlad raised concerns about the completeness of Rust’s specifications, noting that the specification for name lookup was unclear. Pete acknowledged that Rust, as a less mature language, had gaps in its documentation, but efforts were underway to improve the reference and FLS documents.

Actions

All participants: review description of the proposal of internal process update, share thoughts on the Discord channel, and add comments or modify directly in the FramaPad for improvement.
All participants: review the Clang conformance summary and send feedback and questions on the Discord channel. The Clang C/C++ WG members are open to answer to our questions and concerns.
Wendi: try to find a conversation with Robert C. Seacord about Plum Hall’s test suite and update Vlad.
Wendi: plan contacting Plum Hall.

uwendi · 2025-08-26T04:45:55.677Z

LLVM Qualification Group’s September Sync-Up Agenda

Hi all, hope you’re having a great summer!

For those who filled in the Participant Introduction & Membership form and indicated interest in being active contributors (Q3): our next sync-up is planned for next week (@petarj @CarlosAndresRamirez @evodius96 @petbernt @slotosch @YoungJunLee @ZakyHermawan).

Ahead of the call, I’d like to invite you to drop a quick message on Discord about the offline reviews we talked about last time (see also the minutes ):

Any comments or feedback on the description of Active contribution
Questions or comments related to @Endill’s RFC (April 2023) about Clang conformance
Thoughts on @CarlosAndresRamirez‘s [RFC] A Human-centered approach to proactively improve LLVM code and documentation quality (if you haven’t seen it yet)

Additional quick topics for the agenda :

Introduction of @ZakyHermawan
Concerns or viewpoints about meeting transcriptions / AI summaries (Gemini)
@slotosch’s proposal for the LLVM Conference in Santa Clara
Eclipse SDV’s interest in the LLVM open qualification initiative + invitation to their community meetup in Japan
Poster at Innovations in Compiler Technology
Insights from a conversation with an ELISA project member on resources & funding

Given that time is short, I may also create separate Discord threads to keep these discussions moving more efficiently.

Thanks again to everyone who answered the form. @etomzak You’re warmly welcome in our calls, even if your availability is limited.

Small note: at the moment, @evodius96 is officially the only member from US/Canada time zones. @PLeVasseur is interested and expected to join sync-ups, so just letting you know for context.

evodius96 · 2025-08-27T18:56:31.329Z

uwendi:

Small note: at the moment, @evodius96 is officially the only member from US/Canada time zones.

Hi Wendi, due to travel I won’t be able to attend this upcoming call. If there is a better time for everyone, please don’t hesitate to make a change. Thank you!

uwendi · 2025-09-01T01:24:49.408Z

Hi @evodius96, thanks for letting me know! Since you won’t be able to attend, I’ll cancel the upcoming call. We’ll keep the EU/Asia sync-up as the main source of updates this time, so I’d kindly ask you to have a look at the minutes afterward to stay in the loop. Looking forward to catching up with you in a future call once you’re back from your travels. Safe travels!

uwendi · 2025-09-01T09:36:52.599Z

Handling non-technical topics asynchronously

Hi all,

@petarj @CarlosAndresRamirez @petbernt @slotosch @YoungJunLee @capitan-davide

cc: @evodius96 @PLeVasseur @etomzak @ZakyHermawan

For our upcoming sync-up (tomorrow), we have more items on the agenda than we can realistically cover in one hour. Here’s a draft of the presentation (the final version will be uploaded to GitHub after the sync-up):

docs.google.com

202509_llvm_qual_wg

LLVM Qualification WG Sync-up meeting #3 September 2025 Focus: Directions for a grey-box validation approach

To make sure we use our meeting time efficiently, and to give everyone a fair chance to contribute, I’d like to suggest that we handle some of the non-technical topics asynchronously on our Discord channel.

Topics for discussion in Discord:

Draft Policies for Active Membership, AI auto-transcriptions, Safety Transparency
New self-nominations for membership
Communications & outreach (conferences and meetups)
Resources & funding (initial conversation inspired by ELISA project)

By shifting these items to Discord, we’ll free up the sync-up call to focus on technical discussions (e.g. directions for a grey-box approach, tool usage confidence, evaluation of development processes).

Outcomes from Discord discussions will also be summarized here in our meeting minutes on Discourse so nothing is lost. Looking forward to your thoughts and contributions on Discord!

uwendi · 2025-09-02T10:53:51.393Z

Notes - September 2025

Participants

EU/Asia-friendly (Tuesday 2025/09/02, 5:30PM JST, 1h)

Carlos Ramirez (host)
Davide Cunial
Erik Tomusk
Florian Gilcher
Jorge Sousa
José Rui Simoes
Oscar Slotosch
Petter Berntsson
Vlad Serebrennikov
Wendi Urribarri (co-host - note taking & check time)
YoungJun Lee
Zaky Hermawan

Americas-friendly (Wednesday 2025/09/03, 6:00AM JST, 1h)

Cancelled - See https://discourse.llvm.org/t/llvm-qualification-wg-sync-ups-meeting-minutes/87148/8

Agenda

Refer to https://discourse.llvm.org/t/llvm-qualification-wg-sync-ups-meeting-minutes/87148/6

Links

[Reminder] Introductions & Participation (Google Form): https://forms.gle/zKBPZrma6tUMDCwh7
[Discord channel] #fusa-qual-wg: https://discord.com/channels/636084430946959380/1389362444169773117
[RFC] C++ conformance test suite: https://discourse.llvm.org/t/rfc-c-conformance-test-suite/69821

Highlights

Non-technical topics

About note taking

No shared concerns from “core members”
One shared concern about AI writing down every word (from a non-member)
Gemini not enabled today

New self-nomination through the Google Form - Zaky’s presentation

EE student from Indonesia
Coming as an individual
Working with ISO/SAE 21434 (cybersecurity)

Oscar’s idea for the US LLVM 2025 conference (end of October)

Proposal to have a corner about compiler qualification at the exhibition for sponsors
Discuss with people and attract interest on it
No conclusion about this point, to be taken for discussion to Discord

Technical topics

Wrap-up about direction and focus of the discussions since July

Reference functional safety standard

Members from several industries (automotive, trains, robots, etc), so different functional safety standards apply
General framework for functional safety of E/EE/PE systems is IEC 61508, so makes sense to use it as first guidance
As IEC 61508 is parent of other functional safety standards, the expectations around tool confidence are very similar

Need to provide evidence of tool usage

Three questions from the safety standards (see slides)
If answers Yes - Yes - No, then there is a need to provide the evidence
Comments about question 3:
- Most safety standards are written for users, so it depends on how much they examine the “relevant outputs”
- In the case of a compiler, relevant output → final executable
- More and more difficult to thoroughly verify the final executable (complexity)
- Many of the tools that are traditionally used by vendors are closed, some open tools that can be used to check the relevant outputs

First target: Clang compiler

As a tool provider of Clang, we don’t know what will be the usage
As a tool user, you can restrict yourself (for example, using it only for debugging, not for mass-production)
Users will need to rely on the compiler depending on the usage
All the C++ parsing and semantic analysis is done by the Clang frontend
Language + Standard => Version changes are fast
Which flavor of C or C++?
- C++ spec improves significantly
- C spec is more rigorous
Suggestion of small scope:
- Limit to the lexer? Spec wise, it is more simple
- Opinion 1: use of restricting to lexer is limited; from safety point of view, trust the lexer but what about the rest; requirements and association to what use cases
- Opinion 2: agree, need of a valid use case for the lexer

About effort for a conformance test suite:

Opinion 1: Amount of effort would be huge even for 1 version
Opinion 2: Testing is laborious but not very hard
Opinion 3: If you want to do a good conformance test, bottleneck is interpretation of the standard; testing specification against C/C++ is not as with Rust
Comment: commercial test suites are expensive, 40-45K Euro to qualify only one version of a compiler
What is generated is version dependent
About usage of Alive2:
- Replace the Clang front-end with Alive2 front-end and generate Alive2 IR from source code?
- Clarification: alivecc doesn’t replace Clang itself; it simply adds a pass plugin for verification at the IR transformation stage

Grey-box approach

Qualification is typically black-box activity
Disadvantage: to be done for every combination, optimization options, etc
Grey-box approach could be useful, but one limitation is lack of specification of intermediate I/O
Example: specification of the IR
Identification of regressions in IR could be useful

Possibility of LTS?

From this RFC, this will not happen - https://discourse.llvm.org/t/rfc-llvm-lts/84049
Labor can be massive
Very difficult interpretation of what an LTS is
In Rust community an LTS is 2 years
Rather do qualification work incrementally on top of the main version
Which C++ flavor to support is a big question

Example of funding

Fleet of students
Reasonable budget
Example: University of Romania
“Top leadership” needed to guide the students

Selection of qualification methods

ISO 26262 proposes four qualification methods
Evaluation of the tool dev process is highly recommended only for ASIL A and ASIL B
Not to be used alone without Validation so to cover ASIL C and ASIL D
Clarification:
- Proposal is not about using Validation or Evaluation of dev process alone
- Have a mix of both to cover all safety integrity levels with at least one highly recommended method
- Many tool vendors already use a mix of these two methods for “certification“: validation by the vendor + audit by a certifying body

Actions

Wendi :

share summary of topics with the group and the community
point out the possible ways to proceed
create threads for each subject on Discord? (easier to communicate)

All : participate in the open discussions (preferred on Discord, but Discourse is also fine)

uwendi · 2025-09-04T15:26:28.470Z

Just a quick update: I’ve submitted a PR to update the documentation and add links to the August and September 2025 sync-up slide decks, which helped guide our recent discussions:

https://github.com/llvm/llvm-project/pull/156897

The slides are currently hosted in llvm-project/docs/qual-wg/slides, but following feedback, I plan to migrate them to a more appropriate location (likely llvm-www) once confirmed with the community. Please feel free to check the PR for details, and let me know if you have any feedback!

uwendi · 2025-09-29T09:39:55.894Z

LLVM Qualification Group’s October Sync-Up Agenda

Calendar: Getting Involved — LLVM 22.0.0git documentation

Non-technical Topics

Docs updates (September) – summary of recent GitHub changes for LLVM Qualification Group Docs: #156897, #157804, #156184, #158842, #160021, #161113
New members (September) – welcome to @sousajo-cc @jr-simoes @ZakyHermawan
Decision-taking in the WG (requested by @slotosch) – discussion on how we define consensus, use votes, and set time limits for open topics

Technical Topics

Upstream efforts & action plan (small deliverables) – build an initial roadmap based on the “confidence in the use of software tools” workflow
Tutorial / Introduction (proposed by @YoungJunLee) – outline for newcomer materials
Qualification focus areas (proposed by @petbernt) – first candidate areas and lightweight templates

Looking forward to seeing everyone at the October sync-up and continuing to shape our next steps together.

uwendi · 2025-10-08T00:37:07.468Z

Notes - October 2025

Participants

EU/Asia-friendly (Tuesday 2025/10/07, 5:30PM JST, 1h)

Carlos Ramirez
Davide Cunial
Oscar Slotosch
Petter Berntsson
Wendi Urribarri
YoungJun Lee
Zaky Hermawan

Americas-friendly (Wednesday 2025/10/08, 6:00AM JST, 1h)

Alan Phipps
Wendi Urribarri

Agenda

Refer to https://discourse.llvm.org/t/llvm-qualification-wg-sync-ups-meeting-minutes/87148/12

Links

docs.google.com

202510_llvm_qual_wg

LLVM Qualification WG Sync-up meeting #4 October 2025 Focus: Upstream efforts and backlog

RFC Clang C++ conformance: https://discourse.llvm.org/t/rfc-c-conformance-test-suite/69821
RFC Human-error prediction: https://discourse.llvm.org/t/rfc-a-human-centered-approach-to-proactively-improve-llvm-code-and-documentation-quality/87903
ISO/PAS 8926: https://www.iso.org/standard/83346.html
RFC (draft) to collect community input: https://mypads2.framapad.org/p/rfc-petter-311qpv9ni

Highlights

Gemini notes taken in the EU/Asia meeting, modified by @uwendi

Non-technical topics

Meeting Logistics and New Members

Slides shared on Discord before the meeting
Addition of three new members
Updated member list has been added into the official group webpage (PR not merged yet)

Challenges in Decision-Making

Difficulties in making decisions within the group
Cultural factors, challenges in reaching consensus, and time limits as primary issue
What does consensus means for us, what constitutes it
Find compromise, common understanding
Clear when people have different opinions, but for people who are not giving an opinion, how can we know?
Divided positions - What happens when 50/50?
Consensus could mean no objections are raised, or majority rule if participation is lacking
Engagement in early conversations and reasoning are necessary, but for clear yes/no decisions, voting with a >50% threshold should be used, excluding non-votes from the percentage => If someone doesn’t vote, we cannot count the vote
If there a “gray” area, then maybe discuss it in next sync-up meeting
What is the model from other WGs? In general, it’s done through “informal” decision making
Don’t over complicate => Lightweight process

Technical topics

Discussion on Confidence and Qualification Activities

Two work threads: classical black box validation + component confidence
These are explorations, not final decisions
Suggestion: rephrase the “confidence” aspect as “gray/white box qualification approach by focusing on sub components” rather than “improving confidence,” explaining that tool confidence is a risk analysis that cannot be improved but rather risks can be reduced through qualification
Counter-opinion: different providers can have different Tool Confidence Levels (TCLs) for the same tool due to mitigation actions, which could be seen as improving confidence
Proposal: this WG could provide advice on activities to increase confidence in tool usage, beyond just qualification
Suggested reformulation:
- Create confidence in the use of the tools, e.g. by using them carefully and checking the tool results
- This will help to reduce the tool confidence level (TCL) as defined in ISO 26262, which measure the tool risks
- The remaining risks of the tools have to be reduced by tool qualification
- The mostly used qualification method is validation
- It can be performed as a black-box approach, just by testing the requirements, e.g. the compliance of a compiler with a C or C++ standard
- Another validation strategy which can be applied to open-source tools is a white/grey box approach, i.e. by validating sub-components of the tools, e.g. a lexer, parser or backend of a compiler
Different perspective:
- Validation is acknowledged as a common method for qualification for usage of a tool for the higher risk contexts (SIL3/4, ASIL C/D…)
- But, evaluation of the development process is another highly recommended method for lower criticality contexts (ASIL B)
- Let’s document our arguments about suitability of LLVM’s development process upstream, and perform an assessment with auditors who could volunteer for it
- Development process - LLVM developer policy + Evaluation of the process: OSS best practices + human factors metrics
- A qualification could then be achieved up to ASIL B context usage
- Validation would be a 2nd step approach for qualification for higher criticality levels

Tool Qualification and Library Qualification

Tool qualification by validation: different strategies like using test suites or breaking down qualification by validating sub-components such as the lexer or parser
Qualifying sub-components is useful for overall qualification
Library qualification is more rigorous and typically requires a white box approach with code coverage measurements

Focus on Upstream LLVM and Reusability

The group’s focus is exclusively on the upstream LLVM project, aiming to create reusable output that downstream companies can utilize
The group strives for reusability, building general or usable tools for various standards

Challenges in Defining a Toy Project

Several constraints, including the lack of resources and the difficulty in choosing a language (C, C++, Rust…), compiler version, language standard, and compiler flags for a deliverable “toy project”
Agreeing on a toy project would help answer these questions and provide a concrete direction for the group’s work

Standard Library Function Qualification

Start with function-level qualification of standard library functions within a limited scope, such as a single header file, as it would be easier to manage than qualifying the entire compiler
Use publicly known sources like CPP reference for requirements and tracing tests, aiming to provide upstream qualification evidence and design that downstream users could then utilize
Library qualification is more rigorous than tool qualification but selecting requirements for library functions is a good idea
If the group explores the library qualification topic, it might focus on pure C language due to the significant differences and complexities of C++
The difference between C and C++ standard libraries is substantial, C++ libraries change frequently
ISO/PAS 8926:2024 to qualify software components (incl. libraries)
- Planned to be merged with Part 8-Chapter 12 in next edition of ISO 26262 (2027)\
- Two axes: analysis of provenance and analysis of complexity
- Depending on results of the analysis, different qualification activities must be performed

Outputs and Collaboration

Draft tutorial for newcomers, focusing on organizing documents, presentations, and projects like compiler and code coverage sanitizers
External toolchains could be referenced
RFC to gather community input, aiming to provide a framework for compiler qualification and enable sharable evidence for downstream users
Other efforts can and will be explored in the future, and all members are encouraged to work on any of these subjects, based on their interests and bandwidth/availability

Actions

@petbernt will make a proposal presentation for library qualification
@YoungJunLee and @ZakyHermawan will help write the draft tutorial
@slotosch will put his thoughts on tool confidence and qualification activities in a message in the group’s Discord channel
@uwendi will continue working on analysis of the suitability of the LLVM Developer Policy as development process for safety-critical, and on how to reuse good quality best practices from ELISA as part of this analysis/assessment
@CarlosAndresRamirez will run experiments on LLVM and provide evidence to see what can be accomplished using human-centric metrics analysis in LLVM’s upstream development process
All: provide feedback on @petbernt‘s RFC (considering the potential library approach) by end of the current calendar week, then @petbernt will publish it

uwendi · 2025-10-29T21:55:30.174Z

LLVM Qualification Group’s November Sync-Up Agenda

Pre-reads / Prep

Think ahead: Which linking policy should we adopt for Meeting Materials and why
Bring 1–2 bullets: Your progress, top blocker, and any help needed

Non-technical topics

Docs updates (October): summary of GitHub changes for LLVM Qualification Group Docs
Decision on “Meeting Materials” linking policy: Per-meeting slide links (monthly PRs) versus Single folder link (rely on folder contents) in Meeting Materials
WG page check-in: What to add/correct/improve (wording, links, missing topics, clarifications)
US LLVM & relevant bits for this WG: Attendee highlights; what should feed into our backlog?

Technical topics / Round-robin updates

@petbernt: RFC for input request from the community
@YoungJunLee & @ZakyHermawan: draft tutorial
@uwendi: analysis of suitability of the LLVM Developer Policy
@CarlosAndresRamirez: experiments on LLVM using human-centric metrics analysis

Looking forward to seeing everyone at the November sync-up.

uwendi · 2025-11-03T07:37:37.857Z

To our readers,
FYI, we’re merging the two regional sync-ups into one Tuesday 13:00 UTC call; see our Discord channel and the LLVM calendar for details.

uwendi · 2025-11-04T16:04:05.481Z

Notes - November 2025

Participants

Tuesday 2025/11/05, 13:00 UTC, 1h

@evodius96
@CarlosAndresRamirez
@capitan-davide
@petarj
@petbernt
@uwendi
@YoungJunLee
@ZakyHermawan

Agenda

Refer to LLVM Qualification WG sync-ups meeting minutes - #14 by uwendi

Links

docs.google.com

202511_llvm_qual_wg

LLVM Qualification WG Sync-up meeting #5 November 2025 Focus: Progress round-up

[RFC] Collecting Community Input on Qualification of LLVM Tools & Libraries
RFC: Require Pull Requests for all llvm-project commits
Lighthouse SIG - OSS Practices

Highlights

Gemini notes taken, modified by @uwendi

Non-technical topics

Pull Requests and Meeting Materials

The latest Pull Request (PR), prepared by @YoungJunLee was merged and there are no currently pending PRs.
There is a concern regarding meeting materials being stored only in a personal Google Drive. We stopped storing them on GitHub due to feedback from 2 people in the community. We’ll consider contacting the infrastructure team to determine a suitable storage location, possibly archiving quarterly PDFs on GitHub while continuing to share links from the Google Drive for collaboration.

Takeaways from the US LLVM Conference

@evodius96 attended the presentation from Peter Smith from ARM which was interesting and showing a practical example relevant to compiler qualification.
@uwendi attended @slotosch’s round table. Other attendees asked questions about the group’s work. Feedback from Peter Smith is that he would like to see a keynote from the group next year outlining our vision. Directed the attendees to @petbernt’s request for comments.
Does Solid Sands only provide a test suite for toolchain quality, as suggested by a previous conference talk? Looks like they currently offer a broader selection, including libraries qualification. Some concerns exist because it seems that they do not participate in the ISO C and C++ committees, potentially impacting their ability to keep up with language changes and interpretations.
Discussed the challenge of judging the quality of conformity test kits without insight into the build and testing processes. Suggestion: false positives might be an indicator of quality.
Conference presentation and group visibility: there might be motivation for the group to do more for next year’s conference, possibly a keynote, round table, or talk, to create more awareness. It was too early to present at the recent conference. April’s LLVM Europe conference might be a target, or potentially a conference in Asia if one is organized.
Operational Maturity and Code Reviews: @uwendi attended the operational maturity round table, discussing code reviews as a way to prevent errors. There is an RFC from Infra about enforcing pull requests. Noted that 30% of commits from 5% of contributors are reviewed post-merge, and issues include a lack of reviewers and long post-commit review times.

Technical topics / Round-robin updates

Status of the Qualification RFC and Library Proposal

Concern from @petbernt that non-context-aware readers might misunderstand the RFC.
@petbernt agreed to post it in other Discord channels to increase visibility.
@petbernt also has a small proposal for library qualification that is not yet finished but could be shared later in the week.

Tutorial Preparation

@uwendi provided materials to @YoungJunLee that could be helpful to prepare the tutorial / set of initial documents for newcomers.
Open to further assistance with the interpretation of the functional safety standards
@uwendi can contact old colleagues from the railways industry for materials related to their standards.
@uwendi shared personal efforts regarding the ISO 26262 standard, noting a successful submission and acceptance of around 30 simple comments and plans for two or three significant (possibly controversial) upcoming comments regarding the description of methods 1b and 1c (evaluation of the development process and validation of the software tool).
@uwendi expressed concerns about the wording and requirements for method 1b, suggesting that evaluation of the tool development process should be highly recommended for all ASILs.

Software Tool Experiments and Automation

@CarlosAndresRamirez shared results from his quality-focused experiments on the development process, which were conducted on LLVM and Alive2, confirming at least 12 defects in Alive2 that still need to be reported.
@CarlosAndresRamirez noted that adoption of this strategy will face resistance unless the process is fully automated to prevent extra effort for developers. He will work on automation, possibly involving git hooks, and prepare templates and documentation for the group, emphasizing that this work relates to “evaluation of the development process” (Method 1b in ISO 26262).
@uwendi requested a demo; agreed to schedule offline.

LLVM Developer Policy Suitability Analysis

@uwendi presented a draft analysis of the LLVM developer policy’s suitability based on the open-source software practices checklist from the Elisa project’s Lighthouse OSS SIG. She gave an initial, rough evaluation based on gut feeling due to the immaturity of the maturity scale, but stated that the written process generally “looks good”.
Areas rated as having “limited maturity” included one item in security/supply chain (SBOMs) and the lack of a formal bus factor metric program.
While the written process looks good, measuring its execution and follow-through requires automation, leading to being stuck due to the undecided maturity scale.
The checklist for LLVM’s suitability would support Method 1B, evaluation of the development process, and argued that a best practices list is more suitable than requiring assessment against a national or international standard, especially for open source.
@uwendi proposed a new action for herself to start writing templates for the workflow steps outlined in the previous sync-up. These templates would be useful for both downstream and upstream. For tool qualification, the idea is to include worksheets for each method.

Actions / Next steps

@petbernt will post some slides with a small proposal for library qualification, and post the RFC in other channels in Discord, to make it more visible.
@CarlosAndresRamirez will report the confirmed bugs found in Alive2.
@uwendi will contact the infrastructure team to discuss where meeting materials could be stored, possibly archiving quarterly PDFs in GitHub while continuing to share links to her Google Drive.
@petbernt will ask Peter Smith from ARM to add a comment to the RFC about input request from the community.
@uwendi will contact an old colleague from the railways industry to ask if they have materials and can help with the new version of the railway standard for tool qualification.
@CarlosAndresRamirez will work on automation using git hooks or similar mechanisms for the development process quality strategy and prepare templates and documentation for the human-centered approach to finding defects to share with the group.
@CarlosAndresRamirez will give a demo of the defect finding experiments at a dedicated slot or the next sync-up.
@uwendi will start writing templates for the gray boxes in the workflow from the last sync-up and share them with the group for review and improvements.
@petbernt will send some presentations about Solid Sands and how their super test and super framework works.

uwendi · 2025-11-25T09:51:27.954Z

LLVM Qualification Group’s December Sync-Up Agenda

Hello all,

Our next sync-up meeting will be dedicated to a special topic: Function-Level Qualification Methodology for libc/libc++ (@petbernt ’s proposal)

Slides (open for comments):

docs.google.com

llvm_qual_grp_lib_qual

LLVM Qualification WG By Petter Berntsson November 2025 Topic: Function-based Qualification of C/C++ Standard Library Functions

@petbernt will tell us more about why standard libraries matter for qualification, giving us a walkthrough illustrated with examples from the slide deck:

Overview of the proposed proof-of-concept:
- Unique challenges: vast API surface, varied implementations, historical behavior, testability
- Why “function-level qualification” might offer a scalable entry point
- Relationship to previous WG discussions on requirements traceability, upstream-friendly artefacts, and modular qualification pilots
Structure of the approach
Requirements decomposition (per function)
Test strategy (functional, boundary, behavioral)
Traceability approach across libc/libc++
Criteria for function selection in the PoC

Let’s take time also for any clarifying questions and discussion about strengths & gaps. Some guiding questions:

Does the function-level approach scale?
Is the requirements/test breakdown consistent with typical qualification workflows?
What do we consider “minimum viable artefacts” for a PoC?
How should the WG ensure upstream-friendliness?
What is the interaction with existing test suites (libc/test, libc++/test, other conformance suites)?
How might other methods & tools (static analyzers, fuzzing, translation validation) play a complementary role?
Any foreseeable blockers?

See you next week!

uwendi · 2025-12-02T15:15:13.229Z

Notes - December 2025

Participants

Tuesday 2025/12/02, 13:00 UTC, 1h

@petarj
@petbernt
@slotosch
@sohail
@uwendi
@YoungJunLee

Agenda

Refer to LLVM Qualification WG sync-ups meeting minutes - #17 by uwendi

Links

docs.google.com

llvm_qual_grp_lib_qual_dec_2025

LLVM Qualification WG By Petter Berntsson December 2025 Topic: Function-based Qualification of C/C++ Standard Library Functions

docs.google.com

202512_llvm_qual_wg

LLVM Qualification WG Sync-up meeting #6 December 2025 Focus: Function-Level Qualification Methodology for libc/libc++

Gemini notes taken, modified by @uwendi

Summary

@petbernt proposed a function-based qualification approach for C and C++ standard library functions, aiming to create modular, upstream-friendly artifacts (like design and traceability evidence) that downstream users can reuse for functional safety qualification and certification. We discussed the feasibility, complexity, and compliance of applying this component-level approach to extensive libraries like libC and libC++ under standards like ISO 26262.

Highlights

Proposal of a Function-Based Qualification Approach for C and C++ Standard Library Functions: The goal of this concept is to explore a methodology for function-level qualification of the standard C and C++ library that is upstream friendly and modular, motivated by the standard library’s inherent modularity, which makes it ideal for incremental qualification. It aims to demonstrate that qualification artifacts can be created in an open-source community and reused by downstream consumers, requiring compatibility with open-source methodologies, with requirements and design artifacts potentially being built from public non-normative references like CPP reference, as direct quotation of ISO specification text is prohibited due to copyright.
Outcome and Objective of the Qualification Methodology Proposal: The outcome of the proposal would be a practical and transparent methodology demonstrating how safety qualification evidence can be created and collaboratively maintained upstream, incrementally increasing the scope of what the open-source community qualifies. This proposal focuses on a practical pilot direction for the Qual WG.
Clarification on the Upstream Methodology and Downstream Qualification: The proposal is not for qualifying upstream, but for creating qualified materials (like design artifacts or traceability artifacts between implementation, design, and tests) that can live upstream and be used downstream. The actual qualification and eventual certification would be the responsibility of downstream users, who would review the evidence and provide it to a third party for assessment.
Compliance with Safety Standards and Modularity of Qualification: The methodology could be applied to all different standards, noting that most standards focus on documentation and artifact tracing. C and C++ based qualification on a function level is a good approach, citing the necessity due to the vast number of functions in the C++ template library. Agreed on the need for a modular approach, viewing the proposal as a proof of concept that can be incrementally expanded.
Scope and Role of the WG in Qualification Evidence Creation: The scope of the proposal is to create upstream evidence, not certification, by providing reusable upstream qualification artifacts, and it is not intended to qualify or certify LLVM itself. The concept envisions the WG producing artifacts in the upstream LLVM repository, which downstream users would then reuse in their safety case or safety plans before providing them to certification bodies. The goal is to reduce duplicated effort by having more evidence provided upstream. The downstream users would then reuse and extend these artifacts within their own safety life cycle, and hopefully begin contributing qualification evidence upstream. The WG’s role is to enable qualification, not to act as a certifying body, with the outcome being a model for creating referenceable qualification evidence for downstream organizations to build upon.
Definition of a Software Component and its Relevance to Safety Standards: A software component is defined as a self-contained part of a software system with a defined interface (e.g., public API function signatures, header files), documented behavior that can be verified with specific expectations, an implementation (source code), and a verification scope with tests, coverage, and analysis tied to it. Safety standards, such as IEC 61508 and ISO 26262, view runtime libraries as software components, requiring each component to have behavioral requirements, a design representation, verification, coverage, evidence of exercise, and traceability between all these elements. Standard library functions naturally fit this definition, with one function or one header ideally classified as one component.
Benefits of Function Level Qualification: Function-level qualification works because each C and C++ function naturally maps to a small, self-contained software component, enabling incremental qualification of one function, one header, or one behavioral subset at a time. This approach encourages parallel contributions, as each contributor can qualify a self-contained component, create a design artifact, and specific test cases. This small scope (for example, four or five requirements per function) could be manageable and easily parallelizable, aiding with traceability and review.
High-Level Approach for Qualification Artifacts: The proposed approach involves defining upstream-friendly requirements that express observable behavior, which would be derived from reverse engineering existing implementations and public non-normative references (like CPP reference, since ISO text cannot be quoted). Requirements should be broken down into atomic, easily testable requirements (similar to autosar style), where one requirement states only one thing. The behavior should also be modeled in a design artifact, such as a PlantUML diagram, to serve as a detailed design artifact representing the function’s control flow and outcomes. This process would then trace requirements and the design model to the implementation and testing.
Establishing Traceability and Leveraging Existing Test Suites: A lightweight traceability system is needed, which could use YAML or Markdown tables, linking requirements and design models to implementations (source code) and tests. @petbernt suggested trying to leverage and extend the existing LLVM test suites for lib and lib++ to achieve full coverage. Evidence such as coverage metrics (line and branch initially, with MCDC coverage planned for future expansion) should be recorded and published in upstream qualification artifacts.
Recommendation to Start with Simple Pure Functions: It is recommended to start with simple pure functions like memcpy, memmove, memcmp, binary_search, fabs, or isalpha because they are deterministic and have observable outcomes, making them easy to describe and test without platform dependencies, hidden states, or global variables. This serves as an ideal proof of concept before addressing more complex or stateful functions.
Memcpy Example: Key Behaviors and Qualification Steps: The example used is memcpy, with key behaviors being: returning a destination pointer, copying exactly count bytes from source to destination, and having undefined behavior for overlapping regions. The proposed qualification steps for memcpy involve creating small atomic requirements based on public reference models (like CPP reference), modeling it in PlantUML, linking requirements to existing tests, and verifying and potentially extending coverage. The description of memcpy from cppreference.com was cited, highlighting that undefined behavior occurs for overlapping objects or invalid/null pointers.
Refining Specifications into Atomic Requirements and Assumptions: The cited description of memcpy was broken down into six atomic verifiable requirements and two assumptions for the user regarding undefined behavior. A requirement should only say one thing and be testable with one specific test case, which is the definition of atomic. The user assumptions require that source and destination regions do not overlap and that the caller ensures valid and non-null pointers. There is no proposal yet of a specific methodology and recommendations for performing this refinement from natural language to atomic requirements, ensuring that the rewritten specifications are truly a refinement of the original one from natural language.(that the refinement is correct).
Behavioral Model using PlantUML and Traceability: Presented a behavioral model for memcpy in PlantUML, where the requirements are traced to specific sections of the diagram. The simple example shows the function signature, the copy loop, reading and writing to addresses, and returning the destination pointer. Since PlantUML is text-based, it can be handled through regular LLVM pull request review methods.
Proposed Structure for Qualification Evidence in LLVM Repository: The qualification evidence is proposed to live in specific folders: a requirements folder (containing YAML files, one per atomic requirement, defining what must be true and derived from behavioral descriptions), a design folder (containing PlantUML models referencing requirements), and a traceability file per function. The traceability file would list all linked requirements, reference design artifacts, test files, and test cases, verifiable through validation scripts. An example of the traceability file structure was shown, listing header files, design reference, requirements, assumptions, and associated test functions and files, which establishes verifiable links and enables automated validation.
Referencing ISO Standards and Open Source Requirements: About referring to ISO standards, there was a suggestion that while copying the text is restricted, referring to the unique IDs of the standards is possible and provides a more credible and stable reference than a wiki page. Nevertheless, not all upstream maintainers or users have access to the copyrighted ISO standard, which is why an open-source friendly approach using publicly known sources (like CPP reference) is needed. The current lip repository already references specific clauses in ISO standards and POSIX, but acknowledges that non-ISO descriptions might have slight differences.
Additional Requirements from ISO 26262 for Unit Verification: ISO 26262 software component qualification refers to software unit verification (which aligns with C functions), and requires systematic considerations of equivalence classes, boundary values, and extreme values, in addition to simple requirements and tests. These are additional efforts are currently beyond the basic proof of concept.
Automated Generation of Traceability Matrix Visualization: Based on the traceability file, a PlantUML-based traceability matrix visualization can be automatically generated, linking design, requirements, test cases, and test files, which provides a clear visualization of traceability. This visualization would also be committed as text files to the upstream repository.
Placement of Qualification Evidence within LLVM Projects: Qualification evidence is suggested to reside close to its specific project, such as within the libC or libC++ subprojects, with separate subfolders for requirements, design, traceability, and scripts. This structure avoids top-level clutter, mirrors the LLVM modular layout, and clarifies ownership for reviewers. Existing test cases would be reused, and new tests for coverage gaps would reside in the same test directory. A top-level qualification tools folder could hold shared templates and utility scripts for common methodology across subprojects.
Coverage and Confidence Building: For the initial proof of concept, line and branch coverage are suggested to demonstrate the methodology’s feasibility, with the future potential to expand to MCDC coverage once the test infrastructure is stable. The goal is to identify coverage gaps and create more tests until each qualified function becomes a reusable evidence unit, fully testable with 100% coverage and traceability.
Reusability of Existing Tests and Tooling for Traceability: About usefulness of existing tests in the framework, at least one person in the WG has analyzed the test cases, confirming they are often not mapped to specific functions or requirements, but adding this information (e.g., in comments) would make them usable. The same suggested that while PlantUML might be useful for flow specification, it might be “over engineering” for simple functions, and for traceability, understanding the file location is sufficient and easily translated into a qualification report.
Feasibility Concerns Regarding Effort and Library Complexity: Some expressed concern that implementing this approach looks like an “enormous effort” and questioned its applicability to large and complex libraries like libC and libC++, especially due to different implementations for various architectures. Companies are already performing this work proprietarily, and sharing basic modeling of simple functions would provide the majority of the work upstream, simplifying the task for downstream users who would only need to add architecture-specific artifacts.
Handling Compliance and Deviations in Complex Functions: Another concern is that complex or stateful functions in the libraries might not inherently comply with ISO 26262. Even if there are deviations from the standard, documenting the actual behavior and providing known workarounds (such as assumptions for undefined behavior in a safety manual) still results in a valid safety case for an assessor, as the documented behavior is transparent. One confirmed finding bugs in the C++ specification itself and agreed that a safety manual is the correct approach to document known issues.
Proprietary Qualification Efforts and Upstream Contribution: Existing companies offer products with test cases, requirements, coverage, and analysis for over a thousand C++ template specifications. They would not upstream this huge value as it is their business model. The purpose of the Qualification WG should still be to attempt some form of qualification upstream, not only for the benefit of reducing costs if some qualification work were already available upstream, but also for the reunification of a fragmented landscape of knowledge and understandability of basic properties of these libraries that are common to all downstreams.
Focus on Latest Standard Version and Next Steps: The methodology aligns with the current LLVM libC efforts toward C23. Clarity on focusing on the latest standard version is important. Regarding a plan for developing the testing framework/script, LLVM already has the lit unit test framework, and the work would involve mapping existing tests to requirements and extending coverage.

Actions / Next steps

@petbernt will check with people working on libraries development and testing in the community (e.g. Clang C/C++ WG) for opinions, and for historical background on any similar approaches that could have been already tried in the past for upstream libraries.
@uwendi will reflect upon and write down ideas on how to explain the spec refinement process in the methodology.

uwendi · 2025-12-31T10:37:33.792Z

LLVM Qualification Group’s January 2026 Sync-Up Agenda

Here’s the agenda for next sync-up meeting:

2026 Objectives & Radar: Key focus areas for next year; topics on the horizon; how to engage the broader LLVM community with regards to our action ideas.
Action Items Review: Quick status check on open actions; identifying ways to unblock or move stalled items forward.
Small, Practical Deliverables: Ideas for lightweight, useful outputs (e.g. short notes, alignment with other LLVM groups, “good-enough-for-now” artifacts that can evolve).

See you next Tuesday!

uwendi · 2026-01-06T14:36:19.341Z

Notes - January 2026

Participants - Tuesday 2026/01/06, 13:00 UTC, 1h

@CarlosAndresRamirez
@petbernt
@uwendi
@ZakyHermawan

Agenda

LLVM Qualification WG sync-ups meeting minutes Community

LLVM Qualification Group’s January 2026 Sync-Up Agenda Here’s the agenda for next sync-up meeting: 2026 Objectives & Radar: Key focus areas for next year; topics on the horizon; how to engage the broader LLVM community with regards to our action ideas. Action Items Review: Quick status check on open actions; identifying ways to unblock or move stalled items forward. Small, Practical Deliverables: Ideas for lightweight, useful outputs (e.g. short notes, alignment with other LLVM groups, “good-e…

Links

docs.google.com

202601_llvm_qual_wg

LLVM Qualification WG Sync-up meeting #7 January 2026 Focus: 2026 objectives & radar

Google Drive

Templates - Google Drive

Highlights

Libraries Qualification Process:
- Seeking feedback from a relevant group in the community might be beneficial (e.g. subgroup in the Clang C/C++ WG?), especially if they have prior experience with improving testing.
- Move forward with creating a small example as part of a PR to communicate the objective of the process more clearly. Start the work on a PoC using the existing example, potentially refining it.
- The discussion around the PoC included refining an example function and planning follow-up functions. We will include proposed folder modifications in the PoC to elicit feedback from maintainers through the PR process, especially regarding naming conventions.
- Other suggestions:
  - Create a commit on a cloned repository for internal review before making a PR.
  - Draft the process as a workflow, with steps and I/Os.
Note on Standards and Templates:
- Keep the process documentation free of excessive standards jargon to avoid deterring potential contributors.
- Regarding ongoing work on templates, only one template has been started but five others remain to be written, with a goal of keeping them simple by adding guidelines and examples directly within the Markdown comments.
- An initial template confusion was resolved by changing the wording to reflect that it was meant to determine the need for evaluation and qualification, not the evaluation report itself.
LLVM Development Process and External Checklists:
External Interest and Templates: two companies (one of them is Quadric) have expressed interest in using the templates. Real project usage of our outputs could help improve our proposals.
Reporting Defects in Alive2: long-pending action of reporting defects found in Alive2 - we have the list of findings and they need to be formally reported, with the goal of completing this task this month before the next meeting.
Engagement with LLVM Security Group:
- Last two MoMs from the LLVM Security Group mention functional safety and a “testability problem”.
- Take the initiative to contact them to explore potential collaboration, particularly regarding test traceability.
Need for High-Level Guidance for Onboarding:
- Need for higher-level guidance, such as a visual flow or simple framework, to help potential users self-assess their needs and navigate the templates.
- This guidance, including a possible questionnaire, could be integrated into the existing tutorial on mapping projects and ISO 26262 clauses.

Actions / Next steps

@petbernt :
- Start a small proof of concept for one function using the same example from the slides, including proposed folder modifications, and refine it internally before creating a pull request.
- Contact Peter Smith to discuss the security group’s minutes mentioning functional safety and test traceability, and inform our group on Discord about how to proceed.
@CarlosAndresRamirez :
- Report the list of findings of defects in Alive2 before the next meeting this month.
@uwendi :
- Review Tom Stellard’s LLVM self-assessment against the OpenSSF checklist, compare it with her notes, and potentially contact Tom.
- Continue writing the templates, making them as simple as possible with comments, guidelines, and examples.
- Contact Aaron to find out if there is a subgroup about C/C++ libraries, then inform @petbernt
- Draw the workflow for the libraries qualification process, then share with @petbernt
All:
- Review @YoungJunLee‘s tutorial + mapping with the projects and ISO 26262 clauses. Add comments and suggestions directly in the document.