H-M-E V&V Working Document

Scope of the Document

This document constitutes a working document and a methodological proposal for the verification and validation of simulation models based on the H-M-E approach. Its purpose is to describe a conceptual and technical framework that enables the analysis of a model’s internal coherence, structural adequacy, and degree of validity with respect to the objectives for which it was built.

The content of this document does not describe or specify any specific software application, nor should it be interpreted as a validation of any particular simulation tool. The proposed framework is intentionally independent of technology, implementation details, and execution environments in which models may be developed or evaluated.

Nevertheless, it is assumed that simulation tools adopting this approach — among them the GPSS-Plus simulation engine — will progressively incorporate the techniques, metrics, and mechanisms required to enable the practical application of this methodology. Such adoption does not imply exclusivity or dependency, nor does it condition the validity of the framework on a specific implementation.

The H-M-E model should therefore be understood as an open and revision-prone proposal, whose acceptance and evolution depend on its application across different domains, its use in diverse contexts, and critical evaluation by the community.

This document is published with the explicit purpose of being opinionable, criticizable, and debatable.

Verification and Validation of Simulation Models

Classical approaches to verification and validation (V&V) of simulation models rely, either explicitly or implicitly, on a black-box conception: the model is interpreted as an opaque function that transforms a set of inputs into observable outputs, whose correctness is assessed by statistically comparing results against historical data or expert judgment.

This conception is adequate when the model approximates a pure, deterministic function. However, in the modeling of complex real-world systems, this situation is the exception rather than the rule. A simulation model rarely implements relationships of the form f(a,b); instead, it describes procedures that interact with a dynamic state, evolve over time, and depend on the concurrent behavior of other entities. The same procedure, invoked with identical parameters, will typically produce different results depending on time, context, and the system’s prior history.

In this scenario, the model ceases to be a functional “black box” and becomes a living environment whose meaning resides in its internal structure and execution dynamics. Validating outputs alone is therefore equivalent to ignoring the semantics of the process that generates them. Methods such as simulation Turing tests or direct KPI comparison may produce fortuitous validations: apparently correct results obtained for the wrong reasons.

In other words, a model that normally runs under certain circumstances cannot be statistically validated once those circumstances change. What is alive rarely encounters identical conditions twice.

Statistics are valid where phenomena are statistically repeatable. When the environment changes and that repeatability breaks down, the model ceases to be a black box by nature.

Simulation engines exist precisely because reality is not a pure function. And yet, they are validated as if they were.

Classified Types of V&V

1. Expert Judgment–Based Methods (“Face Validation”)

This is the most common approach.

  1. Peer Review: Another expert examines the model and provides an opinion.
  2. Walkthrough: The modeler explains the code step by step to an auditor.
  3. Face Validation: Animations or results are shown to someone familiar with the real system, who issues a judgment.
  4. Critique: Entirely subjective and prone to human error or authority bias (“I say it because I know”). Nevertheless, it remains indispensable for systems lacking historical data (future systems), making it a “necessary evil”.

2. Statistical and Black-Box Methods (Current Standard)

They focus on numbers, not on why those numbers arise.

  1. Historical Comparison: Historical data are fed into the model and compared against simulation outcomes.
  2. Simulation Turing Tests: Real and simulated reports are presented to an expert; if they cannot distinguish them, the model is “validated”.
  3. Sensitivity Analysis: Inputs are varied to observe whether outputs respond logically (e.g., if load doubles, does waiting time double?).
  4. Critique: Correct results may arise for incorrect reasons (two errors canceling each other). These methods also depend entirely on input data quality. Biased data validate a model against a falsehood (GIGO: Garbage In, Garbage Out). This is a “leap of faith”.

3. Formal Methods and Logical Verification (Mathematical Rigor)

Used primarily in electronic chip design (Formal Verification).

  1. Model Checking: Algorithms explore all possible system states to determine whether any violate a rule (e.g., two trains colliding).
  2. Theorem Proving: Mathematical logic is used to prove that a property always holds.
  3. Critique: Extremely difficult to apply to complex systems due to mathematical intractability. These methods are typically “static” (analyzing code or logic without executing all scenarios).

All of these approaches suffer from common biases:

They attempt to validate the mental model before code exists. This is a purely theoretical exercise where experts “assume” the design will work.

They assume that if input distributions resemble historical data, the model is valid. They rarely question whether an observed Gaussian distribution might itself be a bias — perhaps the true distribution is triangular.

4. PROPOSED H-M-E Method (Semantic Consistency assisted by AI or non-expert auditors)

Based on the equivalence of three models: (H) History – (M) Model – (E) Execution.

  1. H* Induction: AI extracts the actual logic from the code (what the model M does, independent of H).
  2. Semantic Contrast: The intended history H is compared with H* to form a contractual or consensual History (Hc). Otherwise, M is modified to regenerate H*.
  3. Empirical Validation: Execution (E) verifies, as proof of existence, everything stated in Hc. Test data belong to observable phenomena.

Historical Results of V&V

Traditionally, verification and validation (V&V) were understood as:

1. Scope and Objective Definition

What system is being modeled?

Which Key Performance Indicators (KPIs) must the model predict?

2. Verification

Does the model do what it is supposed to do?

2.a Code/Logic Inspection

2.b Traceability Tests

2.c Boundary Condition Tests

3. Validation

Is the model an accurate representation of reality?

3.a Input Data Validation

3.b Output Data Validation (KPI comparison)

In summary

Traditional V&V assumes the model is a logical black box. What is never validated is semantic coherence between History ↔ Model ↔ Execution. The relevant question becomes: “Is what is inside the black box what we intended to create?”

Proposed H-M-E V&V Model (History – Model – Execution)

The emergence of natural-language auditors (AI) enables the breakdown of structural limitations inherent to previous approaches.

The H-M-E method does not replace classical empirical or statistical validation; it incorporates them. Its objective is to act as a prior semantic pre-validation phase, ensuring that the meaning of the model is explicit, agreed upon, and verifiable before any comparison with reality.

Accordingly, a new DES verification and validation model is formulated, based on three levels of reality.

PROPOSALS UNDER STUDY

Should an existing H document intervene, given its potential conceptual errors?

Who defines KPIs, and when?

Who defines the study’s breadth (number of runs) validating route coverage?

Should comments and embedded documents in M be weighted or filtered during H* induction?

Should O* be textual or parametric?