Enabling AI to explain its predictions in plain language

EXPLINGO: Transforming AI Explanations into Clear, Natural Language with LLMs

The Challenge of Understanding AI Decisions

Artificial intelligence (AI) is revolutionizing industries—from healthcare diagnostics to financial forecasting. However, its “black box” nature makes it difficult for users to trust AI-driven decisions. While Explainable AI (XAI) techniques like SHAP (SHapley Additive exPlanations) provide insights into model behavior, these explanations are often too technical for non-experts.

To bridge this gap, researchers from MIT and ESPACE-DEV IRD have developed EXPLINGO, a groundbreaking system that converts AI explanations into natural-language narratives using Large Language Models (LLMs). Presented at the IEEE Big Data Conference, EXPLINGO introduces a two-part framework:

  1. NARRATOR – Generates human-readable explanations.

  2. GRADER – Automatically evaluates their quality.

This article explores:
✔ How EXPLINGO transforms SHAP explanations into clear narratives.
✔ The science behind its accuracy and reliability.
✔ Real-world applications in healthcare, finance, and beyond.
✔ Challenges and the future of explainable AI.


Why Traditional AI Explanations Fail

[Image: A cluttered SHAP bar plot with dozens of features (Source: Pexels/Free to use)]

The Problem with SHAP and Other XAI Methods

Most AI models provide explanations via:

  • Feature importance scores (e.g., SHAP values).

  • Decision rules (e.g., “If age > 50, predict high risk”).

But these methods have critical limitations:
❌ Too technical – Non-experts struggle with numerical outputs.
❌ Overwhelming – Models with 100+ features produce unreadable charts.
❌ Lack context – Users don’t understand why a feature matters.

Real-World Consequences

  • doctor might ignore an AI’s cancer prediction if the explanation is unclear.

  • loan officer could approve a risky application if the AI’s reasoning isn’t transparent.

  • Consumers may distrust AI recommendations if they seem arbitrary.

Solution Needed: AI explanations should be as intuitive as a conversation.


How EXPLINGO Works: AI That Explains AI

[Image: A flowchart showing EXPLINGO’s two-step process (Source: AI-generated/DALL·E)]

1. NARRATOR: The AI Storyteller

The NARRATOR uses GPT-4 to convert SHAP explanations into plain-language summaries.

Input Example:

  • SHAP values: (age, 45, +0.3), (blood_pressure, 130, -0.2)

  • Context: “The model predicts heart disease risk.”

Output:
“The patient’s age (45) slightly increases their risk of heart disease, while their elevated blood pressure (130) reduces it marginally.”

Key Innovations:
✔ Customizable style – Users provide 3-5 example narratives to guide tone (e.g., formal vs. casual).
✔ No hallucinations – The LLM strictly follows the SHAP data.

2. GRADER: The Quality Checker

The GRADER evaluates narratives on four metrics:

  1. Accuracy – Does it match the SHAP values?

  2. Completeness – Does it cover all key features?

  3. Fluency – Does it sound natural?

  4. Conciseness – Is it brief yet informative?

Example GRADER Prompt:
“Score this narrative from 0-4 on fluency: ‘The house’s large size increases its predicted price by ~$16K.'”

Why It Matters:

  • Ensures reliable explanations before they reach users.

  • Adapts to domain-specific needs (e.g., healthcare prioritizes accuracy over conciseness).


Real-World Applications

[Image: A doctor reviewing an AI-generated health report (Source: Unsplash/Free to use)]

1. Healthcare: Transparent Diagnostics

Problem: Doctors distrust AI predictions without clear reasoning.
EXPLINGO Solution:
“The AI suspects diabetes because of high blood sugar (key factor), age (moderate factor), and family history (minor factor).”

2. Finance: Explainable Credit Scoring

Problem: Loan applicants reject denials they don’t understand.
EXPLINGO Solution:
“Your application was declined due to: credit score (below 650), high debt-to-income ratio, and limited credit history.”

3. Retail: Smarter Product Recommendations

Problem: Customers ignore AI suggestions that seem random.
EXPLINGO Solution:
“We recommended this laptop because you searched for lightweight models, it fits your budget, and has high ratings.”


Challenges & Future Work

[Image: A researcher fine-tuning an AI model (Source: Pexels/Free to use)]

Current Limitations

🔸 Comparative terms (e.g., “larger,” “higher”) can confuse the GRADER.
🔸 Requires well-written examples for optimal customization.
🔸 Edge cases (e.g., 1,000+ features) remain challenging.

Next Steps

1️⃣ Interactive Q&A – Let users ask follow-up questions (e.g., “Why did income matter more than savings?”).
2️⃣ Multi-modal explanations – Combine text with simple visuals.
3️⃣ Smaller, local models – Adapt EXPLINGO for cost-effective deployment.


Toward Trustworthy AI

[Image: A futuristic AI assistant explaining decisions in real-time (Source: AI-generated/DALL·E)]

EXPLINGO represents a leap forward in AI transparency. By converting complex explanations into natural narratives, it empowers users to:
✅ Understand AI decisions without technical expertise.
✅ Trust predictions in high-stakes domains.
✅ Customize explanations for their needs.

Key :
🔹 NARRATOR turns SHAP values into human-friendly stories.
🔹 GRADER ensures explanations are accurate, complete, and clear.
🔹 Open-source integration via PyReal makes this accessible.

The Future? Imagine:

  • Patients discussing AI diagnoses with doctors in plain language.

  • Businesses auditing AI decisions as easily as reading a report.

With EXPLINGO, AI is no longer a black box—it’s a transparent, trustworthy advisor.

EXPLINGO: Making AI Decisions Transparent Through Natural Language Explanations

[IMAGE: A minimalist interface showing AI explanations being converted into plain text – suggest using a copyright-free image from Unsplash showing futuristic technology or AI visualization]

The Trust Gap in Artificial Intelligence

Artificial intelligence has become an integral part of critical decision-making processes across industries. From determining who qualifies for a loan to diagnosing complex medical conditions, AI systems are increasingly responsible for consequential outcomes that affect people’s lives. However, a fundamental issue undermines their effectiveness: the “black box problem.”

Most advanced AI models operate as inscrutable systems whose internal logic remains hidden from end users. While they excel at making predictions, they often fail to explain the reasoning behind those predictions in ways that humans can easily comprehend. This lack of transparency creates a significant trust barrier between AI systems and the people who need to rely on their outputs.

Consider a physician who receives an AI-generated diagnosis suggesting a rare condition. Without understanding why the system arrived at this conclusion, the doctor may hesitate to incorporate this information into their treatment plan—potentially missing critical insights. Similarly, when a loan application is rejected by an automated system, both the applicant and the loan officer are left wondering about the specific factors that influenced this outcome.

This communication gap isn’t merely a technical inconvenience—it represents a significant obstacle to AI adoption in high-stakes domains where transparency is non-negotiable.

[IMAGE: A person looking confused while examining a complex chart with multiple data points – suggest using a copyright-free image from Pexels showing someone analyzing data visualizations]

Why Traditional AI Explanations Fall Short

Current AI explanation techniques have made important strides toward transparency, but they remain inadequate for most users. Methods like SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) provide insights into which features most influenced a model’s prediction. However, these explanations typically manifest as technical visualizations or numerical scores that require specialized knowledge to interpret.

The limitations of traditional explanation methods become apparent when we examine them closely:

Feature importance scores, while mathematically precise, often lack contextual meaning. When a healthcare AI reports that “blood pressure has a SHAP value of +0.35,” medical professionals must still translate this abstract number into actionable information.

Visualizations quickly become overwhelming in complex models. A system analyzing hundreds of variables might generate bar charts or waterfall plots that are visually cluttered and difficult to parse, especially under time constraints.

Decision rules can become unwieldy in sophisticated models. While a simple “if-then” structure might work for basic systems, modern neural networks operate through intricate patterns of weights and activations that don’t translate neatly into rule-based explanations.

Most critically, these technical explanations fail to bridge the communication gap between AI systems and their human users. This disconnect undermines trust, hampers adoption, and potentially leads to missed opportunities for AI to improve decision-making processes.

EXPLINGO: AI That Speaks Human Language

[IMAGE: A flowchart showing the transformation from data to natural language – suggest using a copyright-free diagram showing information flow or transformation]

Recognizing these challenges, researchers from MIT and ESPACE-DEV IRD have developed EXPLINGO, an innovative system that transforms technical AI explanations into natural language narratives that anyone can understand. Presented at the IEEE Big Data Conference, EXPLINGO represents a significant advancement in making AI systems more accessible and trustworthy.

At its core, EXPLINGO consists of two complementary components working in tandem:

  1. NARRATOR: This component takes technical explanation data (such as SHAP values) and converts them into human-readable narratives using Large Language Models (LLMs). Unlike general-purpose AI chatbots, NARRATOR is specifically designed to translate feature importance scores into coherent explanations that highlight the most significant factors influencing a prediction.
  2. GRADER: This quality assurance component evaluates the narratives produced by NARRATOR across four critical dimensions:
    • Accuracy: Does the narrative faithfully represent the original explanation data?
    • Completeness: Are all important features included in the explanation?
    • Fluency: Is the narrative easy to read and understand?
    • Conciseness: Does the explanation provide information efficiently without unnecessary details?

Together, these components ensure that EXPLINGO produces explanations that are both technically accurate and intuitively comprehensible.

How NARRATOR Works

The NARRATOR component employs advanced LLMs (like GPT-4) to transform technical explanations into natural language. For example, when given SHAP values from a heart disease prediction model, NARRATOR might generate:

“Your risk of heart disease is elevated primarily due to your age (45 years) and family history. However, your healthy blood pressure (110/70) and regular exercise routine partially offset these risk factors.”

This transformation occurs through a carefully designed process:

  1. The system ingests the numerical explanation data along with contextual information about the prediction task.
  2. Users can customize the explanation style by providing sample narratives, allowing the system to match the tone and terminology appropriate for specific audiences.
  3. The LLM is explicitly constrained to avoid “hallucinations” or adding information not present in the original explanation data.

This approach ensures that explanations remain faithful to the underlying model while being expressed in natural, accessible language.

[IMAGE: A side-by-side comparison of technical explanation vs. natural language explanation – suggest using a copyright-free visualization showing data transformation]

The GRADER’s Quality Control

Creating natural language explanations is valuable only if those explanations accurately reflect the original AI decisions. This is where GRADER plays a crucial role, serving as a quality control mechanism to ensure explanations meet rigorous standards.

GRADER employs its own set of language models to evaluate narratives across the four dimensions mentioned earlier. For instance, when checking accuracy, GRADER might analyze statements like “The house’s large size increases its predicted price by approximately $16,000” to verify that this information matches the original SHAP values.

What makes GRADER particularly powerful is its adaptability to different domains. In medical settings, the system can prioritize accuracy and completeness over conciseness, ensuring that physicians receive all relevant information. Conversely, in consumer applications, GRADER might emphasize fluency and brevity to enhance user engagement.

Transforming Industries Through Clearer AI Communication

[IMAGE: A healthcare professional reviewing AI-assisted diagnosis with a patient – suggest using a copyright-free image from Unsplash showing medical consultation with technology]

EXPLINGO’s potential extends across numerous domains where AI already plays a crucial role. By bridging the gap between technical explanations and human understanding, this technology promises to transform how we interact with artificial intelligence.

Healthcare: Building Trust in AI Diagnostics

In healthcare settings, AI systems can analyze complex medical data to identify patterns indicative of disease. However, medical professionals remain hesitant to rely on these systems without understanding their reasoning.

With EXPLINGO, an AI diagnostic tool might explain:

“This patient’s CT scan shows patterns consistent with early-stage lung cancer (70% confidence). The model identified three suspicious nodules in the right lung’s upper lobe, with irregular borders and density patterns matching historical cases of malignancy. The patient’s 30-year smoking history and recent weight loss further support this assessment, though their normal blood work slightly reduces the confidence level.”

This detailed yet accessible explanation allows physicians to evaluate the AI’s reasoning critically, potentially identifying cases where the system might be missing contextual factors or misinterpreting data. Rather than replacing medical expertise, EXPLINGO enhances it by making AI insights more interpretable.

Finance: Transparent Credit Decisions

Financial institutions increasingly rely on AI to evaluate loan applications, but customers often feel frustrated when rejected without clear explanations. This lack of transparency can lead to distrust and missed opportunities for institutions to provide constructive feedback.

Using EXPLINGO, a loan decision system could generate explanations like:

“Your loan application was declined primarily because your debt-to-income ratio (48%) exceeds our threshold of 40%. Additionally, your credit history shows three late payments in the past 12 months, which significantly impacts our risk assessment. On the positive side, your stable employment history and the substantial down payment you offered partially offset these concerns, but not enough to approve the application under our current guidelines.”

This level of clarity accomplishes multiple goals: it helps applicants understand specific factors affecting their creditworthiness, provides actionable feedback for improvement, and potentially reduces the perception of arbitrary or biased decision-making.

[IMAGE: A business meeting with data visualizations – suggest using a copyright-free image from Pexels showing business analytics or decision-making]

Retail: Personalized Recommendations That Make Sense

E-commerce platforms use sophisticated recommendation engines to suggest products, but consumers often perceive these recommendations as random or irrelevant when they don’t understand the underlying logic.

With EXPLINGO, an online retailer could explain:

“We’re recommending this laptop based on your recent browsing history of lightweight ultrabooks with at least 16GB of RAM. This model matches your apparent preference for devices with extended battery life (10+ hours) and high-resolution displays. We’re prioritizing this specific brand because you’ve purchased accessories from them previously, suggesting potential brand loyalty.”

This transparency not only improves the customer experience but also builds trust in the recommendation system, potentially increasing conversion rates and customer satisfaction.

Technical Innovations Behind EXPLINGO

The development of EXPLINGO required solving several challenging technical problems at the intersection of explainable AI and natural language processing.

Preventing Hallucinations and False Information

A critical concern when using LLMs to generate explanations is the risk of “hallucinations”—AI-generated content that sounds plausible but includes fabricated information. EXPLINGO addresses this through careful prompt engineering and constraints that anchor the narrative strictly to the provided explanation data.

The system employs a structured template approach that maps specific explanation components to narrative elements, ensuring that the LLM doesn’t invent features or importance scores that weren’t present in the original model output. This strict adherence to factuality is crucial for maintaining trust in high-stakes domains.

Customizable Explanation Styles

Different contexts require different communication styles. A medical explanation for a physician should employ different terminology than one intended for a patient; likewise, financial explanations for regulatory compliance differ from those for consumers.

EXPLINGO addresses this through a novel “few-shot” learning approach where users provide 3-5 example narratives to establish the desired tone, vocabulary, and structure. The system then generalizes from these examples to generate explanations that match the preferred style while maintaining technical accuracy.

[IMAGE: A user interface showing customization options for explanations – suggest using a copyright-free interface mockup or design]

Quantitative Quality Assessment

Perhaps the most innovative aspect of EXPLINGO is its approach to evaluating explanation quality. Rather than relying solely on human judgment, which can be subjective and resource-intensive, GRADER provides automated quality assessments across multiple dimensions.

This assessment isn’t merely binary (good/bad) but uses a nuanced scoring system that can be weighted differently depending on the application context. For example:

  • In healthcare: Accuracy might receive a weight of 0.4, completeness 0.3, fluency 0.2, and conciseness 0.1
  • In consumer applications: Fluency might receive 0.4, conciseness 0.3, accuracy 0.2, and completeness 0.1

This flexible approach ensures that explanations meet the specific needs of each domain while maintaining a baseline of quality across all applications.

Challenges and Future Directions

[IMAGE: Researchers working on AI systems – suggest using a copyright-free image from Unsplash showing technology development or research]

Despite its promising capabilities, EXPLINGO faces several challenges that researchers continue to address:

Linguistic Nuances and Ambiguity

The GRADER component sometimes struggles with comparative language and implicit quantification. Phrases like “slightly higher” or “significantly lower” can be interpreted differently depending on context, making it difficult to verify whether these descriptions accurately reflect the underlying numerical values.

Future research aims to develop more sophisticated semantic understanding that can better evaluate these nuanced expressions against the original explanation data.

Scaling to Complex Models

While EXPLINGO performs well with models containing dozens of features, modern deep learning systems may consider thousands or even millions of parameters. Translating such complex decision processes into coherent narratives remains challenging.

Researchers are exploring hierarchical explanation approaches that organize features into meaningful clusters and provide explanations at varying levels of detail, allowing users to “zoom in” on specific aspects of interest.

Domain Knowledge Integration

Generic language models may lack the specialized knowledge required to generate accurate explanations in highly technical domains. A medical explanation that uses terminology incorrectly could confuse or mislead healthcare providers.

To address this limitation, future versions of EXPLINGO may incorporate domain-specific knowledge bases and terminologies to ensure explanations use appropriate vocabulary and concepts for each field.

Beyond Text: The Multi-Modal Future of AI Explanations

[IMAGE: Visual explanation combining text and graphics – suggest using a copyright-free dashboard or data visualization]

The current implementation of EXPLINGO focuses primarily on textual explanations, but researchers envision a multi-modal future where explanations combine text, visualizations, and interactive elements.

Imagine a healthcare application where EXPLINGO not only explains that “abnormal cell clusters in the upper quadrant” influenced a cancer diagnosis but also highlights these regions on the original medical image. The text would provide context and significance, while the visual component would show precisely what the AI system detected.

Similarly, financial explanations might incorporate interactive charts that allow users to explore “what-if” scenarios—for example, showing how a loan application outcome might change if the applicant reduced their debt-to-income ratio by 5%.

This integration of multiple communication modalities could address different learning styles and information preferences while providing more comprehensive understanding than any single approach alone.

Toward a More Transparent AI Future

As artificial intelligence systems continue to permeate critical aspects of society, the need for transparent, understandable explanations becomes increasingly urgent. Regulatory frameworks like the European Union’s AI Act now mandate explainability for high-risk applications, and public acceptance of AI technologies hinges on their transparency.

EXPLINGO represents a significant step toward addressing this challenge by transforming technical explanations into accessible narratives. By bridging the gap between AI systems and human users, this technology promises to:

  • Enhance trust in AI-driven decisions by making reasoning transparent
  • Empower users to identify potential biases or errors in model reasoning
  • Enable more effective collaboration between human experts and AI systems
  • Make AI benefits accessible to broader populations regardless of technical background

[IMAGE: A diverse group of people interacting with AI systems – suggest using a copyright-free image showing technology inclusivity]

The ultimate vision is not just explainable AI but truly interpretable AI—systems whose decision processes are inherently understandable to the humans who work alongside them. While EXPLINGO doesn’t fundamentally change how AI models make decisions, it dramatically improves how these decisions are communicated, representing an essential bridge between current black-box systems and the transparent AI of the future.

As we continue to integrate AI systems into consequential decision processes, technologies like EXPLINGO will be crucial for ensuring these systems serve human needs effectively. By speaking our language—literally and figuratively—AI can fulfill its promise as a tool that amplifies human capabilities rather than replacing or confounding them.

Conclusion: The Human Element in AI Communication

At its core, EXPLINGO addresses a fundamentally human challenge: effective communication. Despite their computational power, AI systems remain tools designed to serve human objectives. Their effectiveness ultimately depends not just on technical accuracy but on their ability to convey information in ways that human users can understand, evaluate, and apply.

By transforming abstract mathematical representations into natural language explanations, EXPLINGO recognizes that AI systems must operate within human communication frameworks to achieve their full potential. This approach doesn’t diminish the technical sophistication of AI; rather, it makes that sophistication accessible to those who can benefit from it most.

As we look toward a future where AI increasingly supports critical decision-making, the ability to explain—clearly, accurately, and appropriately—may prove just as important as the ability to predict. Through innovations like EXPLINGO, we move closer to AI systems that not only think powerfully but also communicate effectively, bridging the gap between artificial intelligence and human understanding.

Visited 1 times, 1 visit(s) today

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top