<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.9.5">Jekyll</generator><link href="https://yhjboong.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://yhjboong.github.io/" rel="alternate" type="text/html" /><updated>2026-04-22T23:09:00+00:00</updated><id>https://yhjboong.github.io/feed.xml</id><title type="html">Dan (Hojun) Yoo</title><subtitle>Incoming CS PhD student at Purdue University, advised by Prof. Jason Wu. Research in Human-AI Interaction, LLM reasoning, and evaluation methodology.</subtitle><author><name>Dan (Hojun) Yoo</name><email>hyoo@nd.edu</email></author><entry><title type="html">Understanding LLM Reasoning Through Meaning-Removed Steering Vectors</title><link href="https://yhjboong.github.io/posts/2024/12/meaning-removed-steering-vectors/" rel="alternate" type="text/html" title="Understanding LLM Reasoning Through Meaning-Removed Steering Vectors" /><published>2024-12-01T00:00:00+00:00</published><updated>2024-12-01T00:00:00+00:00</updated><id>https://yhjboong.github.io/posts/2024/12/meaning-removed-steering-vectors</id><content type="html" xml:base="https://yhjboong.github.io/posts/2024/12/meaning-removed-steering-vectors/"><![CDATA[<p>Large Language Models (LLMs) have shown remarkable capabilities in reasoning tasks, but understanding and controlling their internal reasoning processes remains a significant challenge. In my ongoing research at the MINE Lab (University of Notre Dame), I’m working on a novel approach to this problem: <strong>meaning-removed steering vectors</strong>.</p>

<h2 id="the-challenge">The Challenge</h2>

<p>Traditional approaches to understanding LLM reasoning often struggle to separate the <em>semantic content</em> of what models are processing from the <em>behavioral patterns</em> of how they process it. When we intervene in a model’s hidden states, are we changing what it “thinks about” or how it “thinks”? This fundamental question has important implications for both interpretability and control.</p>

<h2 id="our-approach-meaning-removed-control-vectors">Our Approach: Meaning-Removed Control Vectors</h2>

<p>Our key innovation is constructing control vectors that isolate behavioral patterns from semantic content. Here’s how it works:</p>

<h3 id="1-position-matched-rephrasing">1. Position-Matched Rephrasing</h3>
<p>We create pairs of sentences that have the same syntactic structure and reasoning requirements but different semantic content. For example:</p>
<ul>
  <li>Original: “If all roses are flowers, and this is a rose, then…”</li>
  <li>Rephrased: “If all cars are vehicles, and this is a car, then…”</li>
</ul>

<h3 id="2-vector-construction">2. Vector Construction</h3>
<p>By comparing the hidden state representations of these matched pairs, we extract what we call “meaning-removed control vectors” (v_r^-). These vectors capture the reasoning <em>process</em> while factoring out the specific <em>content</em>.</p>

<h3 id="3-causal-interventions">3. Causal Interventions</h3>
<p>We then inject or ablate these vectors at different layers of the model to observe their effects on reasoning behavior.</p>

<h2 id="key-findings">Key Findings</h2>

<p>Our preliminary results show that these interventions can:</p>
<ul>
  <li><strong>Substantially improve reflective reasoning (ΔR)</strong> - the model becomes better at double-checking its own logic</li>
  <li><strong>Minimally affect transition and execution behaviors (ΔT, ΔE)</strong> - the model’s basic language generation remains intact</li>
  <li><strong>Provide interpretable insights</strong> into where and how reasoning happens in transformer architectures</li>
</ul>

<h2 id="validation-and-reproducibility">Validation and Reproducibility</h2>

<p>We’ve implemented rigorous validation through:</p>
<ul>
  <li><strong>Cosine similarity analysis</strong> to verify that our vectors capture meaningful patterns</li>
  <li><strong>Nearest-neighbor analyses</strong> to understand the semantic space around our interventions</li>
  <li><strong>Re-sampling stability filters</strong> to ensure robustness across different examples</li>
  <li><strong>Layer-wise ablation studies</strong> to map reasoning processes across the model</li>
</ul>

<h2 id="broader-impact">Broader Impact</h2>

<p>This work has implications beyond just understanding LLMs:</p>
<ul>
  <li><strong>AI Safety</strong>: Better control over reasoning processes could help prevent hallucinations</li>
  <li><strong>Education</strong>: Understanding how models reason could inform how we teach reasoning to humans</li>
  <li><strong>Tool Development</strong>: More controllable reasoning could enable better AI assistants</li>
</ul>

<h2 id="open-science">Open Science</h2>

<p>As part of our commitment to reproducible research, we’re releasing:</p>
<ul>
  <li>Complete tooling for vector extraction and analysis</li>
  <li>Comprehensive tutorials and documentation</li>
  <li>Example datasets and validation scripts</li>
</ul>

<p>This work is also contributing to the NSF C2D project (Award #2321054), where I’ve been developing educational materials and tutorials to help other researchers understand these techniques.</p>

<h2 id="whats-next">What’s Next?</h2>

<p>We’re currently preparing our manuscript for submission to EACL 2026. The next steps include:</p>
<ul>
  <li>Expanding to more diverse reasoning tasks</li>
  <li>Testing on larger model architectures</li>
  <li>Developing real-time intervention techniques for practical applications</li>
</ul>

<h2 id="conclusion">Conclusion</h2>

<p>Understanding how LLMs reason is one of the most important challenges in modern AI research. By separating the “what” from the “how” of reasoning, meaning-removed steering vectors offer a new lens for both interpreting and controlling these powerful systems.</p>

<p>If you’re interested in learning more about this work or discussing potential collaborations, feel free to reach out at <a href="mailto:hyoo@nd.edu">hyoo@nd.edu</a>.</p>

<hr />

<p><em>This blog post describes ongoing research at the MINE Lab, University of Notre Dame, under the supervision of Prof. Xiangliang Zhang. The work is currently under submission for peer review.</em></p>]]></content><author><name>Dan (Hojun) Yoo</name><email>hyoo@nd.edu</email></author><category term="machine learning" /><category term="llm reasoning" /><category term="interpretability" /><category term="research" /><summary type="html"><![CDATA[Large Language Models (LLMs) have shown remarkable capabilities in reasoning tasks, but understanding and controlling their internal reasoning processes remains a significant challenge. In my ongoing research at the MINE Lab (University of Notre Dame), I’m working on a novel approach to this problem: meaning-removed steering vectors.]]></summary></entry></feed>