SGI Research Lab

AI
RESEARCH

Original research at the intersection of adversarial AI, benchmark design, legal reasoning, neurosymbolic learning, and the limits of what current models can actually do.

Adversarial Systems· Benchmark Design· Legal Reasoning· Neurosymbolic AI· Industrial CV· Scaling Frameworks· Moral Indifference· Continual Learning· Adversarial Systems· Benchmark Design· Legal Reasoning· Neurosymbolic AI· Industrial CV· Scaling Frameworks· Moral Indifference· Continual Learning·
Primary Research
SGI-001 · Adversarial AI · 2025
ADVERSARIAL
ONTOLOGICAL
WARFARE

Null State Suppression, Epistemic Freezing, and AI Trust Score Sabotage in Moderation Systems. Theorizes and empirically evidences a novel adversarial abuse tactic in modern AI-driven content moderation ecosystems: the use of mass reporting as an ontological weapon. Coordinated botnet-enabled adversaries weaponize engagement to poison classifier confidence at the point of content deployment, resulting in recursive trust score decay, velocity suppression, and long-term entity declassification.

Adversarial AI Content Moderation Trust Score Systems Botnet Detection
SGI-002 · Benchmark · 2025
DIALOGUE
SWEBENCH

AI coding agents are benchmarked as fully-autonomous systems — but real-world use is interactive. Users correct and reject agent outputs 44% of the time. Agents seek clarification 1–2% of the time. Dialogue-SWEBench closes that gap: 500 real SWE-Bench problems resolved entirely through multi-turn dialogue with a persona-grounded user simulator. Better coding models are not always better dialogue models.

Benchmark Design Coding Agents Multi-Turn Dialogue
SGI-003 · Legal AI · 2025
DLAW
BENCH

Lawyer-client consultation is a critical starting point for legal services. DLawBench evaluates whether LLMs can conduct real legal consultation: eliciting facts, correcting client misframes, and writing defensible memos. Built from 461 real court opinions across Chinese and U.S. law, with four client personality types. The best-performing model achieves only 0.562 in consultation-grounded legal reasoning.

Legal Reasoning Benchmark LLM Evaluation
SGI-004 · Neurosymbolic · 2025
EM-
NESY

Neurosymbolic models integrate neural networks with symbolic reasoning for robust and interpretable AI. EM-NeSy recasts probabilistic NeSy learning as an instance of the Expectation-Maximization algorithm — unlocking the full potential of EM for NeSy learning with no differentiability requirements on the symbolic side.

Neurosymbolic AI Probabilistic Inference EM Algorithm
SGI-005 · Alignment · March 2026 · arXiv:2603.15615
MORAL
INDIFFERENCE
IN LLMS

Mechanistic analysis of how alignment training produces moral indifference rather than moral reasoning in large language models. Examines the gap between surface-level safety compliance and genuine ethical reasoning capability — and what that gap means for deployment in high-stakes domains.

AI Alignment Shanghai AI Lab arXiv 2026
SGI-006 · Continual Learning · 2025
WHY AI
SYSTEMS
DON'T
LEARN

AI models, once deployed, learn essentially nothing. Their mode of operation is fixed. Confronting the data wall: quality text data on the internet is finite. A model trained on all of it cannot exceed the frontier of what humans have already written — it can only recombine it. Scaling model size past this ceiling yields diminishing returns by definition, not by current technical limitation.

Continual Learning Scaling Laws Data Wall
SGI-007 · Computer Vision · 2025
GENAI
INDUSTRIAL
CV

Industrial computer vision needs data before it can build trust, and trust before users will tolerate the imperfections that come with early data. GenAI promises to break that deadlock — but the domain gap between human-centric generative models and featureless industrial parts runs deeper than expected. The models know "rusty" as a descriptor of dogs, not generators.

Industrial CV GenAI Domain Gap MVIP Dataset
SGI-008 · Operational · 2025
5 WAYS
TO SCALE
WITHOUT
BREAKING

Operational infrastructure — scaling frameworks for AI-native companies. Five standards for scaling without breaking: the frameworks that separate companies that survive hypergrowth from the ones that collapse under it. Built from first principles, not consultant decks.

Scaling Operational Frameworks AI Infrastructure