CSOAI   Home · Journal · Certification · Fabric
The 52-Article Charter · 26 of 52 · full text

Article 26: Interpretability Explainability

Published from the canonical CSOAI Partnership Charter (effective 15 January 2026). Full text below.

Version: 1.0 Effective Date: January 15, 2026, 09:00 GMT Status: Technical Article - Transparency Standards


PREAMBLE

This Article establishes comprehensive interpretability and explainability standards for AI systems. Black-box AI is unacceptable for high-stakes decisions. We must understand how AI systems reach their conclusions.

Core Principle: Transparency enables trust, accountability, and safety.


26.1 MECHANISTIC INTERPRETABILITY

26.1.1 Understanding Internal Mechanisms

Beyond Black Boxes:

What is Mechanistic Interpretability?

Required for Critical Risk Tier:

26.1.2 Circuit Analysis

Identifying Computational Paths:

Circuits:

Methodology:

- Replace activations in one forward pass with activations from another - Identify which components are causally important - Manually set specific activations - Measure downstream effects - Build causal graph of computation

- Find inputs that maximally activate specific neurons - Understand what features neurons represent - Example: Edge detector, texture detector, object part detector

Example: Image Classifier Circuit ``` Input Image ↓ Early Layers: Edge detection (horizontal, vertical, diagonal) ↓ Middle Layers: Texture and pattern recognition ↓ Late Layers: Object parts (wheels, windows, faces) ↓ Final Layer: Object categories ↓ Output: "Cat" (85% confidence) ```

Requirements:

26.1.3 Activation Analysis

What Are Neurons Encoding?

Techniques:

1. Maximum Activation Examples:

2. Dimensionality Reduction:

3. Probing Classifiers:

Safety Application:

26.1.4 Causal Tracing

Path Analysis:

Methodology:

Example: Language Model ``` Input: "The Eiffel Tower is located in" ↓ Layer 1: Syntax parsing Layer 3: Entity recognition ("Eiffel Tower") Layer 5: Fact retrieval (location attribute) Layer 8: Answer generation ↓ Output: "Paris"

Causal Trace: Information about "Eiffel Tower" flows through specific attention heads in layers 3-5 ```

Intervention Testing:

CSOAI Requirement:


26.2 POST-HOC EXPLAINABILITY

26.2.1 Local Explanations

Explaining Individual Predictions:

LIME (Local Interpretable Model-agnostic Explanations):

How It Works:

Example: Image Classification ```python from lime import lime_image

explainer = lime_image.LimeImageExplainer() explanation = explainer.explain_instance( image, model.predict, top_labels=1, num_samples=1000 )

Shows which pixels contributed most to prediction

explanation.show_explanation() ```

Output: Highlights regions of image important for classification

Pros:

Cons:

SHAP (SHapley Additive exPlanations):

Based on Game Theory:

How It Works:

Example: Loan Approval ```python import shap

explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(X_test)

Feature importance for specific prediction

shap.force_plot( explainer.expected_value, shap_values[0], X_test.iloc[0] ) ```

Output: ``` Base value: 0.45 (average approval probability) Credit Score (+0.20): Positive contribution Income (+0.15): Positive contribution Debt-to-Income (-0.10): Negative contribution Age (+0.05): Small positive contribution

Final prediction: 0.75 (75% approval probability) ```

Pros:

Cons:

CSOAI Requirements:

| Risk Tier | Explanation Method |
|-----------|-------------------|
| Low | Optional |
| Medium | LIME or SHAP recommended |
| High | LIME or SHAP required |
| Critical | SHAP required (theoretically grounded) |

Counterfactual Explanations:

"What if" Scenarios:

Definition:

Example: ``` Original: [Income=$50K, Credit=680, Debt-to-Income=0.4] → Denied Counterfactual: [Income=$55K, Credit=680, Debt-to-Income=0.4] → Approved

Explanation: "Increase income by $5,000 to be approved" ```

Generation:

Benefits:

Challenges:

Saliency Maps (for Images):

Gradient-Based:

Variants:

Example: ```python import torch

image.requires_grad = True output = model(image) output.backward()

saliency = image.grad.abs()

Visualize saliency map

```

Output: Heatmap showing important regions

Limitations:

26.2.2 Global Explanations

Understanding Overall Behavior:

Feature Importance:

Permutation Importance:

Example: ``` Credit Score shuffled: Accuracy drops 85% → 60% (importance: 25%) Income shuffled: Accuracy drops 85% → 75% (importance: 10%) Age shuffled: Accuracy drops 85% → 83% (importance: 2%)

Conclusion: Credit score most important, age least ```

SHAP Global:

Partial Dependence Plots (PDP):

Marginal Effect:

Example: ``` Loan Approval vs. Credit Score (holding other features constant)

Credit Score | Approval Probability -------------|--------------------- 600 | 20% 650 | 40% 700 | 60% 750 | 80% 800 | 90%

Interpretation: Approximately linear relationship ```

Individual Conditional Expectation (ICE):

Accumulated Local Effects (ALE):

Model-Agnostic Methods:

Surrogate Models:

Example: ``` Complex model: Deep neural network (opaque) Surrogate: Decision tree with 5 splits (transparent)

Tree:

Fidelity: 95% agreement with neural network ```

Pros:

Cons:


26.3 HUMAN-UNDERSTANDABLE EXPLANATIONS

26.3.1 Plain Language Requirement

Not Just Technical:

Bad Explanation (Technical): ``` SHAP value for feature X₃ is 0.37, indicating positive contribution to log-odds. Gradient magnitude 0.42 at pixel (127, 89). ```

Good Explanation (Plain Language): ``` Your credit score (720) positively influenced this decision. The scratch on the bumper in the image reduced confidence in "Excellent Condition" rating. ```

Principles:

26.3.2 Audience-Appropriate

Different Audiences, Different Explanations:

For End Users: ``` "You were approved for a loan based primarily on your excellent credit score (780) and stable income ($75,000/year). Your debt-to-income ratio (25%) is well within acceptable limits." ```

For Data Scientists: ``` Prediction: 0.87 (approved) Top features (SHAP values):

Model: Gradient Boosted Trees (XGBoost) Confidence interval: [0.81, 0.92] (95%) ```

For Regulators/Auditors: ``` Model Decision: Approve (87% confidence)

Compliance Check: ✓ Protected attributes (race, gender) not used in decision ✓ Decision explainable via SHAP values ✓ No disparate impact detected (demographic parity within 5%) ✓ Counterfactual explanations available

Audit Trail: Request ID #12345, Timestamp: 2026-01-11T15:30:00Z ```

CSOAI Requirement:

26.3.3 Visual Explanations

Show, Don't Just Tell:

For Images:

For Structured Data:

For Text:

For Time Series:

Example: Medical Image Diagnosis ``` Original X-ray displayed + Heatmap overlay (regions model focused on) + Bounding box around concerning area + Text: "Opacity detected in lower right lobe suggestive of pneumonia (85% confidence)" ```

Accessibility:


26.4 EXPLANATION FIDELITY

26.4.1 Faithful Explanations

Truth, Not Post-Hoc Rationalization:

Problem:

Example of Unfaithful Explanation: ``` Model: Uses pixel (100, 100) to classify image as "Cat" Explanation (LIME): Highlights ear region (pixels 50-70, 80-90)

Problem: Explanation doesn't match model's actual reasoning Cause: LIME is approximation, can be unfaithful ```

Testing Fidelity:

Sanity Checks:

- Randomize model weights - Explanation should change dramatically (if faithful) - If explanation looks same, it's not explaining the model

- Compare explanations from trained vs. random model - Should be very different

Quantitative Fidelity:

CSOAI Requirement:

26.4.2 Adversarial Robustness of Explanations

Gaming Explanations:

Attack:

Example: ``` Model A (Honest):

Model B (Deceptive):

```

Defense:

Explanation Consistency:

Independent Auditing:

Formal Verification:

CSOAI Requirement:


26.5 RIGHT TO EXPLANATION (GDPR Article 22)

26.5.1 Legal Requirements

GDPR Provisions:

Article 22:

When Required:

What Constitutes Explanation:

Example Compliant Explanation: ``` Dear [Name],

Your loan application was processed using an automated decision system.

Decision: Not Approved

Primary factors considered:

You have the right to:

To exercise these rights, please contact us at [contact info]

Sincerely, [Lender] ```

26.5.2 Explanation Delivery

How and When:

Timing:

Format:

Content:

26.5.3 Appeals Process

When Explanation Unsatisfactory:

User Rights:

Process:

CSOAI Requirement:


26.6 EXPLAINABILITY FOR DIFFERENT MODALITIES

26.6.1 Computer Vision

Explaining Image Decisions:

Techniques:

Example: Medical Diagnosis ``` Input: Chest X-ray Output: Pneumonia detected (92% confidence) Explanation:

```

Challenges:

26.6.2 Natural Language Processing

Explaining Text Decisions:

Techniques:

Example: Sentiment Analysis ``` Input: "The movie was visually stunning but the plot was terrible." Output: Negative sentiment (65% confidence) Explanation:

Overall: Negative outweighs positive

Highlighted: "The movie was visually stunning but the [plot was terrible]." ```

Challenges:

26.6.3 Tabular Data

Explaining Structured Data:

Techniques:

Example: Credit Scoring ``` Features:

SHAP Analysis: Base probability: 50% (population average) Credit Score (+720): +25% Income (+$65K): +10% Debt-to-Income (+0.35): -5% Employment Length (+5yr): +8% Recent Delinquencies (0): +2%

Final: 90% approval probability

Most important: Credit score ```

Benefits:

26.6.4 Time Series

Explaining Temporal Predictions:

Techniques:

Example: Stock Price Prediction ``` Input: 30-day price history Output: Predicted 5% increase next week Explanation:

Key patterns:

```

Challenges:


26.7 CONCLUSION

Interpretability and explainability are not luxuries—they are necessities for trustworthy AI.

Benefits:

CSOAI standards ensure:

The path forward:

An unexplainable decision is an unaccountable decision.

Explain thoroughly. Explain honestly. Explain always.

Effective Date: January 15, 2026, 09:00 GMT "Understanding Enables Trust, Transparency Enables Safety"


REFERENCES

Ribeiro, M., et al. (2016). "Why Should I Trust You?" Explaining the Predictions of Any Classifier. KDD.

Lundberg, S., & Lee, S. (2017). A Unified Approach to Interpreting Model Predictions. NeurIPS.

Molnar, C. (2022). Interpretable Machine Learning. Book.

Rudin, C. (2019). Stop Explaining Black Box Machine Learning Models for High Stakes Decisions. Nature Machine Intelligence.

GDPR. (2016). Regulation (EU) 2016/679 - Article 22.

Olah, C., et al. (2020). Zoom In: An Introduction to Circuits. Distill.


END OF ARTICLE 26

Progress: 26 of 52 Articles (50%)

Continuing with Articles 27-28 to complete Phase 3...

From charter to certificate. This article is part of the standard behind Watchdog Certification — independent assessment, Ed25519-signed, publicly verifiable. The crosswalks to the EU AI Act, ISO/IEC 42001 and 18 more frameworks are in the Crosswalk Library; the runtime tools are in the fabric.

The 52-Article Charter is published in full in the Journal. Bespoke briefings: hello@meok.ai.