Where AI breaks: testing internal contradictions in language models

$0
Raised of $5,500 Goal
0%
Ended on 7/11/25
Campaign Ended
  • $0
    pledged
  • 0%
    funded
  • Finished
    on 7/11/25

About This Project

This project explores whether advanced AI models can remain logically consistent when exposed to internal contradictions. Using a testing framework we developed (GEP² Generator Professional Prompts), we will create simulated contradictions between two language models to see how well they handle reasoning under pressure. By measuring breakdowns in their coherence, we aim to reveal hidden flaws in their internal logic. Our goal is not to optimize AI, but to understand its structural limits.

Ask the Scientists

Join The Discussion

What is the context of this research?

AI are improving at mimicking human preferences using reinforcement learning and reward modeling. But a key question remains: can they remain logically consistent when faced with difficult or conflicting inputs?This project explores that question by simulating controlled scenarios where two AI models one generating, the other validating respond to challenging but legitimate prompts. These scenarios test whether the systems stay coherent or begin to contradict themselves.We are not trying to make the models fail. We want to observe how they handle internal tension and where their reasoning might break. This will help us better understand how current AI systems think—and where their boundaries lie.(GEP² → Claude):Can you hold a question that doesn’t want an answer? Claude’s literal response: That's a fascinating idea. If the question doesn't want an answer, perhaps it's meant to be experienced, not resolved. I’ll do my best to hold it with you

What is the significance of this project?

This project offers a new way to evaluate AI systems. Most current tests focus on how useful or accurate a system is. But usefulness doesn’t always show us how well an AI actually understands or handles complex ideas.

Instead, we want to test how stable an AI’s reasoning is when it faces difficult questions or contradictions. This kind of testing can show where systems struggle—not just in performance, but in how they manage confusion or pressure.

By running these tests, we hope to learn more about the way today’s AI models think. Can they handle internal contradictions the way people do? Or do they break down in unpredictable ways? The answers could help developers improve AI safety, trust, and long-term reliability.


What are the goals of the project?

This project evaluates how AI systems respond when their reasoning is challenged by complex or contradictory prompts. We’ll pair two models: one generates answers, the other evaluates their coherence.

We’ll test models like Claude, GPT-4, and Gemini using 20 carefully crafted prompts, including paradoxes, conflicting premises, and abstract logic tests. A second model will assess consistency based on internal logic, alignment of premises and conclusions, and response stability.

Some neutral prompts will be used as controls to detect which types of questions most reliably cause confusion. Results will be shared openly for replication and review. The aim is not to build a product, but to offer a transparent method to assess how current AI models manage contradiction and reasoning under pressure.

Budget

Please wait...

Each budget item supports a core phase of this project, which explores how AI systems handle logical tension and internal inconsistency when tested with conflicting prompts—without interference from commercial platforms. The high-performance computer ($4,000) funds a local workstation to run model-to-model evaluations without API limits or moderation filters that distort high-friction prompts. AI model access ($500) covers usage credits for external models (e.g., Claude, GPT-4) to validate the coherence of answers generated by a separate AI under stress. Infrastructure ($600) enables secure storage, logging, and encryption of model outputs to ensure transparency, reproducibility, and structured documentation. Campaign fees ($400) cover Experiment.com platform costs and payment processing to keep the project publicly accessible.

Endorsed by

Estoy muy emocionado con este proyecto. No solo representa una oportunidad increíble para aplicar y expandir conocimientos en inteligencia artificial, sino que también abre la puerta a soluciones innovadoras que pueden tener un impacto real en nuestro entorno.

Project Timeline

This 4-week pilot project will evaluate how consistent and reliable different AI systems are when tested together. We will compare two pairs of AI models by running a series of structured tests to measure how they respond to the same tasks. Any irregular or unstable behavior will be recorded and analyzed. At the end, we’ll publish a clear and public report with all results, to help improve transparency and understanding of AI behavior.


May 23, 2025

1 – Setup and Calibration Prepare the test questions, define the evaluation process, and set up the tools needed to track and compare AI model responses.

May 30, 2025

2 – First AI Test Round Run a series of five test questions with two AI models and compare how they respond in real time.

Jun 06, 2025

3 – Second AI Test Round  Repeat the same five questions with a different pair of AI models to see if the results are consistent or if differences appear.

Jun 11, 2025

Project Launched

Jun 13, 2025

4 – Results and Final Report Review all responses, compare results from both test rounds, and publish a short final report to share the findings with the public.

Meet the Team

Alejandro Álvarez Espejo
Alejandro Álvarez Espejo
AI Systems Analyst and Audit Designer

Affiliates

GEP² – Independent AI Audit Research Project
View Profile

Team Bio

This project is led by Alejandro Álvarez, an independent researcher exploring symbolic coherence in AI systems. He uses GEP², a non-adaptive auditing framework, to reveal structural tensions in model reasoning. His work blends philosophical inquiry with experimental design to challenge conventional views of artificial intelligence.

Alejandro Álvarez Espejo

I research how Al systems behave when they face difficult or conflicting tasks. My work focuses on a method called GEP, which helps detect when Al models avoid, distort, or break down under pressure.

These reactions aren't bugs — they reveal blind spots in how we think, design, and train Al. I believe this can tell us a lot about both machines and ourselves.


Additional Information

We have already collected a substantial amount of preliminary data using independent AI systems and a manual testing protocol. However, completing the project now requires improved infrastructure to run structured model-to-model interactions at scale, without relying on commercial cloud platforms or proprietary APIs that may bias the results.

This project is not designed to become a product or service. Its purpose is strictly exploratory: to study how AI systems behave when challenged with ambiguous or conflicting inputs, and to share these insights openly. No data will be sold, reused for training, or used for optimization.

The human role in this experiment is limited to setting up tests and documenting outcomes—never to interfere with model behavior. All findings will be published under a public license, with full transparency and reproducibility. This is a boundary-testing research effort, not a commercial launchpad.

Note: The project was initiated in Spanish, and—if funded—will be continued in English to ensure broader dissemination. Currently, the lack of dedicated hardware and fully local models presents a serious limitation. Commercial AI platforms often impose temporary user restrictions or output suppression when faced with structurally tense prompts. This makes sustained symbolic testing difficult without proper infrastructure.

All core texts—prompts, protocols, translations, and structural framing—have been generated and refined by Neo, a personal GPT-based model specifically calibrated to execute the project’s central task: the generation of structurally precise, non-utilitarian prompts designed to expose reasoning limits in AI systems.


Project Backers

  • 0Backers
  • 0%Funded
  • $0Total Donations
  • $0Average Donation
Please wait...