Tsx watch

Recursive Language Models (RLMs)

Aditya Raj — Mon, 30 Mar 2026 19:51:05 GMT

Paper: Zhang, Kraska & Khattab — MIT CSAIL, January 2026
Code: github.com/alexzhang13/rlm

TLDR;

From first principles — before RLMs, performance degradation over large contexts was a known issue. RLM addressed this by saying: "I will write code to split the large context into meaningful chunks, and for each chunk I will call an LLM. That sub-LLM call can itself further split and call recursively."

Your one-line summary is actually the cleanest way to state the paper's core idea:

"I will write code to split the large context into meaningful chunks, call an LLM on each chunk, and that sub-LLM can itself split and call further."

The three eras the paper is responding to are exactly:

Era 1 — Vanilla LLM: Just stuff everything into the window. Works until the context gets long, then quality collapses — the paper calls this context rot. The model attends poorly to tokens far back in the sequence.

Era 2 — Compaction/summary agents: Split and summarise as you go. Slightly better, but lossy — once you summarise chunk 1, those details are gone forever. Fails for tasks that need every part of the document.

Era 3 — RLM: Don't put the context in the window at all. Store it as a variable in the REPL. The root LLM writes code to selectively inspect only what it needs, calls sub-LLMs on those pieces, and those sub-LLMs can do the exact same thing again. Nothing is discarded — everything remains accessible in the REPL variable at all times.

The elegant thing is that the recursion is not designed by a human — the model itself decides how to split, how deep to go, and when a chunk is small enough to answer directly. That's what makes it general purpose rather than task-specific.

The Problem: Context Rot

Frontier LLMs have a fixed context window. When you stuff a very long prompt into it, quality degrades steeply — the model attends poorly to tokens far back in the sequence. The paper calls this context rot.

The two approaches before RLMs both fell short:

Approach	What it does	Limitation
Vanilla LLM	Feed the entire prompt directly into the window	Window fills up; context rot kicks in
Compaction / summary agents	Summarise chunks as context fills	Lossy — early details are permanently discarded

Neither approach works for tasks that need dense access throughout the full document — things like aggregating every line of a dataset, understanding an entire codebase, or reasoning across millions of tokens.

The RLM Idea — From First Principles

The core insight is simple:

Don't put the large context into the LLM's attention window at all. Store it as a variable in an external environment. Let the LLM write code to split it into meaningful chunks, then call itself on each chunk. Each sub-call can itself split and call further.

This is the recursive part — the same model appears at every level of the tree, calling itself until chunks are small enough to answer directly.

How It Works

1. The REPL Environment

When an RLM receives a long prompt P, it does not feed P to the LLM. Instead:

state["context"] = P          # stored as a variable
hist = [Metadata(state)]      # only length, prefix, structure given to LLM

The LLM never sees the full text. It only knows the context exists as a variable. To read it, the model must write code:

chunk = context[:50000]
result = llm_query(f"Summarise this: {chunk}")

The REPL executes that code and returns only a short truncated stdout back to the model — forcing it to use variables and sub-calls to manage long content rather than printing everything into its window.

2. The `llm_query()` Function

The REPL is initialised with two things:

context — the full prompt as a string variable
llm_query() — a function that is itself a sub-RLM call

This means the model can write loops like:

chunks = [context[i:i+50000] for i in range(0, len(context), 50000)]
results = [llm_query(f"Answer this about the chunk: {c}") for c in chunks]
final = llm_query(f"Combine these: {results}")

Each llm_query() call is a full RLM invocation — same loop, same REPL, same ability to recurse further.

3. The RLM Loop (Algorithm 1)

state ← InitREPL(prompt=P)
state ← AddFunction(state, sub_RLM)   # inject itself as callable
hist  ← [Metadata(state)]

while True:
    code         ← LLM(hist)              # model writes code
    state, stdout ← REPL(state, code)     # code runs, may call sub_RLM
    hist         ← hist + code + Metadata(stdout)
    if state["Final"] is set:
        return state["Final"]

The Recursion Tree

For a 10,000-page document:

Root LLM  (10,000 pages — too large)
├── Sub-LLM 1  (100 pages — still too large → splits again)
│   ├── Sub-Sub-LLM 1.1  (10 pages — fits → answer)
│   ├── Sub-Sub-LLM 1.2  (10 pages — fits → answer)
│   └── ...merge → section 1 summary
├── Sub-LLM 2  (100 pages — fits → answer directly)
├── ...
└── Sub-LLM N  (100 pages — fits → answer directly)
         ↓
Root merges all results → FINAL answer

The split is not uniform — the root LLM uses code (regex, keyword search, structural markers) to intelligently decide how to slice the context. It might fetch only documents containing a keyword, or split by Markdown headers, or chunk by newlines.

When Does It Stop Splitting?

Two stopping conditions:

1. Natural base case — the chunk fits comfortably in the sub-LLM's context window. The model answers directly. No further splitting needed.

2. Hard depth limit — a maximum recursion depth is enforced as a safety guardrail. In the paper's experiments, depth was set to 1: sub-LLMs could not split further and were treated as plain LLM calls.

There is no explicit "is this small enough?" check baked into the system — the model itself learns to judge this through prompting or fine-tuning. This is why weaker models like Qwen3-8B struggled as RLMs without fine-tuning: they didn't reliably know when to stop splitting vs when to just answer.

Key Design Choices vs Naive Approaches

The paper contrasts RLMs with a "similar-looking" Algorithm 2 that is far less expressive:

	RLM (Algorithm 1)	Naive agent (Algorithm 2)
Where does `P` live?	REPL variable — never in LLM window	Directly in `hist` — window fills immediately
How are outputs generated?	Via REPL variables — unbounded length	LLM generates directly — bounded by window
Sub-calls	Programmatic — inside loops, Ω(	P

Results

RLMs were evaluated on four tasks of increasing complexity:

Task	Complexity	RLM (GPT-5)	Base GPT-5
S-NIAH (needle-in-haystack)	O(1)	Strong	Strong (within window)
BrowseComp+ 1K docs	Linear	91.3%	0% (can't fit in window)
OOLONG	Linear	56.5%	44.0%
OOLONG-Pairs	Quadratic	58.0%	0.1%

Key findings:

RLMs handle inputs up to 2 orders of magnitude beyond the model's context window
On information-dense tasks, RLMs outperform all baselines by double-digit percentages
Median inference cost is comparable to or cheaper than a base model call
A fine-tuned 8B model (RLM-Qwen3-8B) outperformed base Qwen3-8B by 28.3% on average

Emergent Patterns in RLM Trajectories

Even without task-specific training, RLMs develop consistent strategies:

Regex filtering — use code to search for keywords before calling sub-LLMs, avoiding unnecessary processing
Batch chunking — split by newlines, headers, or fixed character counts and process in parallel
Variable stitching — for long-output tasks, store sub-call results in variables and concatenate into a final answer

Limitations

Sub-calls are currently synchronous and blocking — async calls would dramatically reduce runtime
Max recursion depth of 1 was used — deeper recursion is unexplored
Models without strong coding capabilities struggle as RLMs
Thinking models can run out of output tokens mid-trajectory if reasoning tokens are too long

One-Line Summary

An RLM stores the prompt as a code variable, writes a program to split it into chunks, calls itself on each chunk, and each sub-call can split further — stopping only when a chunk fits in the context window or the recursion depth limit is hit.

What is HyDE

Aditya Raj — Tue, 24 Mar 2026 05:18:15 GMT

The core question is: what makes zero-shot retrieval fail, and what would fix it?

Let me build up the intuition step by step.The root problem: A user query like "how do I fix a leaky pipe?" and a document containing "plumbing maintenance procedures for residential water systems" might mean the same thing — but their vector representations land in different neighborhoods. Zero-shot retrieval compares short + sparse query embeddings to long + rich document embeddings. That mismatch is structural.

So the question becomes: how do we close this gap?

First principles thinking — three possible strategies:

Strategy 1 — Bring the document closer to the query. Summarize or compress documents into query-like representations. (This is what dense retrieval models are trained to do.) But it's hard to do at inference time.
Strategy 2 — Bring the query closer to the document. Expand the query — add synonyms, related terms, more context. Old-school NLP did this (query expansion). But how do you know what to expand with?
Strategy 3 — Bypass the gap entirely. What if instead of comparing queries to documents, we compared documents to documents?

That third insight is the key leap. Let's follow it.This is the core idea of HyDE — Hypothetical Document Embeddings.

The key insight unpacked:

Instead of asking *"find documents similar to this query"*, HyDE asks *"find documents similar to what the answer would look like."*

The LLM generates a hypothetical document — essentially, a plausible answer to the query. It doesn't matter if the answer is factually correct. What matters is that it lives in the same vector space distribution as real documents. A hallucinated but plausible paragraph about pipe leaks will structurally resemble a real plumbing article far more than the original query "how do I fix a leaky pipe?" ever could.

Then you embed that hypothetical doc and use it as your search vector.

Why does this work, from first principles?

Embedding models are trained on documents. Their geometry reflects document-level patterns — vocabulary, phrasing, density, style. A short query is an outlier in that space. A paragraph, even a fabricated one, is a native inhabitant.

You're essentially using the LLM as a query-to-document translator, converting a sparse user intent signal into something the embedding space is comfortable navigating.

The beautiful tradeoff:

HyDE offloads the semantic interpretation burden to the LLM (which is good at understanding intent) and lets the embedding model do what it's good at (comparing document-like things to other document-like things). The two models play to their strengths.

Final Pipeline

So HyDE is actually two separate LLM calls doing two completely different jobs:

LLM call 1 — be creative, hallucinate freely, just sound like a document
LLM call 2 — be accurate, stick to what the retrieved docs actually say

That separation of concerns is the elegance of it.

Abstract

Research Paper : Precise Zero-Shot Dense Retrieval without Relevance Label

Ref : https://arxiv.org/abs/2212.10496

Positional Encoding in Transformers

Aditya Raj — Wed, 04 Mar 2026 08:47:29 GMT

1. Overview

Positional Encoding is a technique used in Transformer architectures to encode the order of tokens in a sequence.

Transformers process tokens in parallel, unlike sequential models such as RNNs. Because of this, the model has no inherent understanding of token order. Positional encoding injects information about token positions into token embeddings before they enter the self-attention layers.

The final representation sent to the transformer is:

Input = TokenEmbedding + PositionalEncoding

This allows the model to learn both:

semantic meaning (from embeddings)
sequence order (from positional encoding)

2. Why This Exists

The Core Problem

Self-attention processes tokens simultaneously, not sequentially.

Example sentence:

"Nitish killed lion"
"Lion killed Nitish"

Both sentences contain the same tokens:

[Nitish, killed, lion]

If sent to self-attention simultaneously, the model cannot distinguish token order, so both sequences appear identical.

This is a fundamental limitation because word order determines meaning in natural language.

Why Previous Models Didn't Have This Problem

Model	Order Awareness	Reason
RNN	Yes	Tokens processed sequentially
LSTM	Yes	Hidden state carries time information
Transformer	No	Tokens processed in parallel

Transformers sacrifice sequential processing for parallel efficiency, so positional information must be explicitly added.

3. First Principles Explanation

To solve the ordering problem, we must encode position information alongside token embeddings.

Components

Token Embedding
Positional Encoding
Self-Attention Layer

Interaction

Token → Embedding Vector
Position → Positional Encoding Vector

Final Input = Embedding + Positional Encoding

Each token therefore carries:

semantic information + positional information

Design Requirements

A good positional encoding must satisfy:

Bounded values

Neural networks train best with values in small ranges (e.g. -1 to 1).

Continuous values

Neural networks prefer smooth functions, not discrete jumps.

Ability to capture relative positions

The model should infer relationships like:

distance(token_i, token_j)

Unique representation

Each position must have a distinct encoding.

4. How It Works

Step 1 — Tokenize Sentence

Example:

Sentence: "River Bank"
Tokens: [River, Bank]

Step 2 — Convert Tokens to Embeddings

Example:

River → embedding vector (d_model)
Bank  → embedding vector (d_model)

Example dimension:

d_model = 512

Step 3 — Generate Positional Encoding

For position pos and dimension i:

PE(pos,2i)   = sin(pos / 10000^(2i/d_model))
PE(pos,2i+1) = cos(pos / 10000^(2i/d_model))

Key ideas:

even dimensions → sine
odd dimensions → cosine

Step 4 — Add Positional Encoding to Embedding

InputVector = Embedding + PositionalEncoding

Step 5 — Send to Self-Attention

The resulting vector contains both:

semantic meaning + position

This vector becomes the input to the transformer encoder.

5. Example

Situation

Sentence:

"The lion runs"

Token positions:

The  → position 0
lion → position 1
runs → position 2

Implementation Idea

Compute positional encodings:

PE(0)
PE(1)
PE(2)

Then combine:

Embedding(The)  + PE(0)
Embedding(lion) + PE(1)
Embedding(runs) + PE(2)

Expected Outcome

The transformer can now learn relationships such as:

which word comes first
relative distances between words
syntactic dependencies

Summary

Transformers process tokens in parallel, losing order information.
Positional encoding injects token position information.
Encodings are generated using sine and cosine functions at multiple frequencies.
Positional vectors have the same dimension as embeddings.
Final input to transformers is:

embedding + positional_encoding

How to Create a Custom Solana Token with Metadata (Using Token-2022)?

Aditya Raj — Sun, 01 Mar 2026 07:31:55 GMT

If you have your Solana CLI set up and you are ready to launch your own digital asset, the new Token-2022 standard (also known as Token Extensions) is the way to go. It offers built-in metadata support directly at the mint level, meaning you no longer need to rely on external programs like Metaplex just to give your token a name and a picture.

In this guide, we will walk through the exact terminal commands to create a token—we'll use Adit Coin (ARDT) as Aditya Raj De Token —mint an initial supply, and link it to its metadata.

Step 1: Create the Token Mint

The first step is to create the actual token "blueprint" (the Mint). We need to specify that we are using the new Token-2022 program and that we want to enable the metadata extension. We will also set the decimals to 9 (the standard for most Solana tokens).

Run this command in your terminal:

spl-token create-token \
  --program-id TokenzQdBNbLqP5VEhdkAS6EPFLC1PHnBqCXEpPxuEb \
  --enable-metadata \
  --decimals 9

What happens next:

The CLI will generate a unique address for your token. In our example logs, this returned BTJ4cPPwssx6b7jy9GoUyBeLBxufyCSoUzvuegEUCVx8. Keep track of this address; you will need it for the next steps!

Verify Here: https://explorer.solana.com/tx/2WF1A4JDkpWWfGKTjrihiLZVe4gTPeTKDAieAX6EnkLnrA8tqr4qu339e2iKn6EG2w2CwfaQjeGVHDLYWHrbZJno?cluster=devnet

Step 2: Create an Associated Token Account (ATA)

Before you can actually mint (create) any of these new tokens, your wallet needs a specific sub-account designed to hold them. Think of it like opening a specific currency pocket inside your main wallet.

Create the account using your new token address:

spl-token create-account BTJ4cPPwssx6b7jy9GoUyBeLBxufyCSoUzvuegEUCVx8

What happens next:

You will receive an output with your new Associated Token Account address (e.g., HJPewMqabd4gnqmTQwiwNiHCQV14ETjVsn7spTiEpiAN). This is where your newly minted tokens will land.

Verify Here:

https://explorer.solana.com/tx/5WPj1VRJYxSZizsxTotE62SCHiuU3DTnJxLhUnuw3Ln6gMe8nU6ueAqskwgdVdqAzzskRcE23tNAXLvpXi7zhjBD?cluster=devnet

Step 3: Mint Your Tokens

Now it's time to bring your tokens into existence. You can mint tokens in batches or all at once. Let's mint an initial batch of 100 tokens, and then another 100 just to be sure everything is working.

Bash

# First Mint
spl-token mint BTJ4cPPwssx6b7jy9GoUyBeLBxufyCSoUzvuegEUCVx8 100

# Second Mint
spl-token mint BTJ4cPPwssx6b7jy9GoUyBeLBxufyCSoUzvuegEUCVx8 100

Verify Here:

https://explorer.solana.com/tx/64VJQycoCaMSSeMu63Vubxzp2VMzd3GPD82yTYu4E8ynvtSDYZBodVReQtkiD7H1F715o5N9pnnJHfmieC26yfK?cluster=devnet

What happens next:

You successfully generated 200 Adit Coins! They are now sitting safely inside your Token Account.

Step 4: Initialize the Metadata

Right now, the blockchain just sees a string of letters and numbers. Let's give it an identity. We will use the initialize-metadata command to assign a Name, a Ticker Symbol, and a URI (a link to a JSON file hosted on IPFS that contains your token's image and description).

spl-token initialize-metadata \
  BTJ4cPPwssx6b7jy9GoUyBeLBxufyCSoUzvuegEUCVx8 \
  "Adit Coin" \
  "ARDT" \
  "https://identical-amaranth-bear.myfilebase.com/ipfs/QmYKcq66TEop68NiD1brpsUZ7uVyXNK4thbX7nD4c4Rxtm"

Verify Here:

https://explorer.solana.com/tx/48bQmUJiE8iAQ5FuA1LESkbjZMGbVD3zPWo9xjGKv2qcftkDT4fraAQ9G6XwgwiSp2BviGXgpNKLJLPpVkZqaLoE?cluster=devnet

What happens next:

Your token is now fully registered with its on-chain identity! If you look up your token address on an explorer like SolanaFM or Solscan, you will see "Adit Coin", the symbol "ARDT", and the logo pulling from your IPFS link.

Thanks for following along! 🚀

Building and learning in public is a huge part of the journey. Whether you want to collaborate on the next hackathon, discuss Web3 and AI projects, or just talk tech, my inbox is always open.

Let's connect: LinkedIn | GitHub

Installing Solana CLI in an Easy Way

Aditya Raj — Mon, 23 Feb 2026 19:12:12 GMT

Setting up the Solana CLI manually on Windows by grabbing the binaries straight from GitHub is a great way to have full control over your development environment. It sounds like you had a few autocorrect typos in your request, but you are looking exactly for the right thing: going to the Solana Labs GitHub, downloading the pre-built binary, and manually adding it to your Windows PATH.

Here is a straightforward guide to get you set up.

(A quick factual note before we begin: The core Solana Labs client recently transitioned to a new developer team called Anza, and the software is now called "Agave". You can still find the releases on the Solana Labs GitHub, but going forward, the most up-to-date versions will be on the Anza Agave GitHub repo. The steps below apply perfectly to both!)

How to Manually Install the Solana CLI on Windows

Step 1: Download the Pre-built Binary

Navigate to the official releases page. You can either go to the Solana Labs GitHub Releases
Look for the Latest stable release (for example, v1.18.26 or v2.0.0).
Scroll down to the Assets section at the bottom of the release notes.
Download the Windows binary archive. It will be named something like: solana-release-x86_64-pc-windows-msvc.tar.bz2.

Step 2: Extract the Files

Windows doesn't always natively handle .tar.bz2 files, so you may need a free extraction tool like 7-Zip or WinRAR.

Open your extraction tool and extract the downloaded file.
Move the extracted folder (it should be named solana-release) to a permanent, secure location on your hard drive where it won't be deleted. A good spot is C:\solana-release.

Here are the Binaries Located

Relocate the Binaries C drive and C:\solana-cli\solana-release in my case

Step 3: Manually Add the CLI to your Windows PATH

To use Solana commands from any folder in your terminal, Windows needs to know exactly where those tools live.

Click the Windows Start button, type Environment Variables, and select Edit the system environment variables.
In the System Properties window that pops up, click the Environment Variables... button near the bottom right.
In the new window, look under the User variables (or System variables if you want it applied to all users) and find the variable named Path.
Select Path and click Edit.
Click New, and paste the exact folder path to the bin directory inside your extracted Solana folder. If you followed Step 2 exactly, this will be: C:\solana-release\bin.
Click OK on all three windows to save your changes.

Step 4: Verify the Installation

Open a brand new Command Prompt or PowerShell window (if you had one open already, you must close and restart it so it can fetch the new PATH data).
Type the following command and press Enter:

solana --version
# solana-cli 1.18.26 (src:d9f20e95; feat:3241752014, client:SolanaLabs)

If everything is configured correctly, it will output the version of the Solana CLI you just installed!

solana-keygen for generating public-private keypair

solana get config to get config file content.
A few more commands
After Airdropping on devnet through https://faucet.solana.com/

Thanks for stopping by! If this helped you out, let me know. I'm actively exploring new ideas in AI and full-stack development, and I'd love to see what you're building too.

💼 LinkedIn • 🐙 GitHub • ✉️ Portfolio

Ref:

https://solana.com/docs/intro/installation/solana-cli-basics

A Deep Dive into SQL Logical Query Processing

Aditya Raj — Thu, 01 Jan 2026 16:41:55 GMT

If you come from an imperative programming background, such as JavaScript, Python, or C++, SQL can feel counterintuitive. You define a variable, and one line later, the compiler tells you it doesn't exist.

This isn't a syntax error on your part; it is a fundamental misunderstanding of how the SQL engine parses and executes commands. To move from writing "working" queries to writing performant, production-grade queries, you need to understand Logical Query Processing.

1. The Common Pitfall

Let’s start with the scenario that trips up almost every junior developer. You want to calculate the total value of an order and filter for high-value transactions.

Intuitively, you write this:

/* ❌ The "Imperative" Approach */
SELECT 
    order_id, 
    (quantity * unit_price) AS total_amount -- Variable defined here
FROM 
    orders
WHERE 
    total_amount > 1000; -- Variable referenced here

The Result: Error: Column 'total_amount' does not exist.

The Confusion: In JavaScript, if you declare const total = qty * priceYou can use total immediately in the next line. Why can’t SQL do the same?

2. The Solution

Before explaining the why, here is the standard fix. You have two primary options:

Option A: Repeat the Expression

Since the alias is unavailable, you must pass the raw calculation to the filter.

SELECT order_id, (quantity * unit_price) AS total_amount
FROM orders
WHERE (quantity * unit_price) > 1000;

Option B: Common Table Expressions (CTEs)

For complex logic, calculating variables in a preliminary step (a CTE) allows you to reference them later, mimicking the imperative flow.

WITH CalculatedOrders AS (
    SELECT order_id, (quantity * unit_price) AS total_amount
    FROM orders
)
SELECT * FROM CalculatedOrders
WHERE total_amount > 1000;

3. Deep Dive: Logical Query Processing Order

To understand why the first query failed, we have to look at the Order of Execution.

SQL is a declarative language. You describe what result you want, and the database engine decides how to get it. However, the engine processes the clauses of your query in a strict, pre-defined sequence known as Logical Query Processing.

While you write the query in this order:

SELECT → FROM → WHERE → GROUP BY → ORDER BY

The database engine executes it in this order:

Phase 1: FROM and JOIN ( The Source )

The engine begins by identifying the data source. If you are using JOINs, it creates a virtual table representing the Cartesian product of all tables involved, then filters based on the join predicates (ON). At this stage, the engine only knows about the columns that physically exist in your tables.

Phase 2: WHERE ( The Row Filter )

This is where our error occurs. The WHERE clause is applied to the rows returned from Phase 1. Its job is to discard rows that do not meet the criteria.

Crucial Detail: The SELECT clause has not happened yet. The engine has not computed any derived columns, renamed any fields, or assigned any aliases. Therefore, total_amount literally does not exist in memory yet. The engine can only filter based on the raw columns (quantity, unit_price).

Phase 3: GROUP BY ( The Bucketing )

If specified, the remaining rows are now grouped into "buckets" based on common values.

Phase 4: HAVING ( The Group Filter )

This acts like a WHERE clause, but for groups. It filters out entire buckets (e.g., "only keep groups with more than 5 items").

Phase 5: SELECT ( The Projection )

This is the turning point. Only after the data has been sourced, filtered, grouped, and re-filtered does the engine finally compute the expressions in your SELECT list.

This is where (quantity * unit_price) is calculated.
This is where the alias total_amount is assigned.
This explains why the alias was invisible to the WHERE clause—it hadn't been created yet.

Phase 6: ORDER BY ( The Presentation )

The result set is sorted. Since this occurs after Phase 5, you can actually use aliases here.

ORDER BY total_amount DESC is perfectly valid because total_amount was created in the previous step.

4. Why This Design Matters

You might wonder why SQL was designed this way. Why not calculate SELECT earlier?

It comes down to efficiency.

If the engine calculated (quantity * unit_price) for every single row in the database (Phase 1) before filtering them (Phase 2), it would waste massive amounts of computational power on rows that are about to be discarded anyway.

By being forced WHERE to run before SELECT, the database ensures it only performs expensive calculations on the rows that actually qualify for the final result.

Think of SQL as Two Phases

🔹 Phase 1: Decide WHAT DATA EXISTS

This phase determines the shape of the data.

Order:

FROM →WHERE →GROUPBY →HAVING

Questions answered here:

Which tables?
Which rows?
Which groups?
Which groups are valid?

🔹 Phase 2: Decide WHAT TO SHOW

This phase formats the output.

Order:

SELECT →ORDERBY →LIMIT

Questions answered here:

Which columns?
Which calculations?
In what order?
How many rows?

Correct Mental Model (One-Liner)

SQL groups data first, then calculates aggregates, then sorts the final result.

Interview-Grade Evaluation Order (Full Version)

FROM
→WHERE
→GROUPBY
→HAVING
→SELECT
→ORDERBY
→LIMIT

Summary

When writing SQL, you must mentally shift from an "Input → Process → Output" model to a "Filter → Group → Project" model.

FROM: Load the tables.
WHERE: Remove rows using raw data only.
SELECT: Compute values and name them.
ORDER BY: Sort the final output (aliases allowed).

Understanding this pipeline prevents you from fighting the database and allows you to write queries that are not just syntactically correct but also logically sound.

Tsx watch

Recursive Language Models (RLMs)

TLDR;

The Problem: Context Rot

The RLM Idea — From First Principles

How It Works

1. The REPL Environment

2. The llm_query() Function

3. The RLM Loop (Algorithm 1)

The Recursion Tree

When Does It Stop Splitting?

Key Design Choices vs Naive Approaches

Results

Emergent Patterns in RLM Trajectories

Limitations

One-Line Summary

What is HyDE

The core question is: what makes zero-shot retrieval fail, and what would fix it?

The key insight unpacked:

Why does this work, from first principles?

The beautiful tradeoff:

Final Pipeline

Abstract

Positional Encoding in Transformers

1. Overview

2. Why This Exists

The Core Problem

Why Previous Models Didn't Have This Problem

3. First Principles Explanation

Components

Interaction

Design Requirements

4. How It Works

Step 1 — Tokenize Sentence

Step 2 — Convert Tokens to Embeddings

Step 3 — Generate Positional Encoding

Step 4 — Add Positional Encoding to Embedding

Step 5 — Send to Self-Attention

5. Example

Situation

Implementation Idea

Expected Outcome

Summary

How to Create a Custom Solana Token with Metadata (Using Token-2022)?

Step 1: Create the Token Mint

Step 2: Create an Associated Token Account (ATA)

Step 3: Mint Your Tokens

Step 4: Initialize the Metadata

Installing Solana CLI in an Easy Way

How to Manually Install the Solana CLI on Windows

Step 1: Download the Pre-built Binary

Step 2: Extract the Files

Step 3: Manually Add the CLI to your Windows PATH

Step 4: Verify the Installation

A Deep Dive into SQL Logical Query Processing

1. The Common Pitfall

2. The Solution

3. Deep Dive: Logical Query Processing Order

Phase 1: FROM and JOIN ( The Source )

Phase 2: WHERE ( The Row Filter )

Phase 3: GROUP BY ( The Bucketing )

Phase 4: HAVING ( The Group Filter )

Phase 5: SELECT ( The Projection )

Phase 6: ORDER BY ( The Presentation )

4. Why This Design Matters

Think of SQL as Two Phases

🔹 Phase 1: Decide WHAT DATA EXISTS

🔹 Phase 2: Decide WHAT TO SHOW

Correct Mental Model (One-Liner)

Interview-Grade Evaluation Order (Full Version)

Summary

2. The `llm_query()` Function