Skip to main content

Command Palette

Search for a command to run...

I Built a LogLens-CLI: That Uses No Vector DB, Here's How It Works

Chat With Your

Updated
11 min read
I Built a LogLens-CLI: That Uses No Vector DB, Here's How It Works
S

Software developer with a strong foundation in React, Node.js, PostgreSQL, and AI-driven applications. Experienced in remote sensing, satellite image analysis, and vector databases. Passionate about defense tech, space applications, and problem-solving. Currently building AI-powered solutions and preparing for a future in special forces.

A few days back I was exploring why RAG is expensive and whether I could overcome it for structured data. I kept hitting the same wall, traditional RAG systems chunk text, embed it, and retrieve "similar" chunks. That works fine for documents. But for structured data like databases or logs, it fundamentally breaks.

Then I came across jq and the idea of a JSON agent. What if instead of retrieving similar chunks, the LLM just wrote a query against the actual data? Which means no embeddings and no vector database.

I built a basic prototype and the results surprised me: token usage was low, responses were precise, and it could answer analytical questions that traditional RAG would completely fumble. So I started testing with different JSON files.

Then I thought: what about system logs?

Logs have JSON export in most modern stacks. But when I started testing, I hit a normalization problem. Real-world logs are messy , some lines are structured JSON, others are plaintext with INFO:app.auth:JWT validation failed, and some are Python tracebacks spanning 15 lines. None of this maps cleanly to a JSON array you can query with jq.

That problem and the solution I built around it: became LogLens CLI.


The Core Problem With Log Analysis Today

Logs are the most underused source of truth in any engineering org. They contain exact timestamps, correlation IDs, error traces, HTTP status codes, and response latencies. But getting insight out of them typically means:

  • Manual grep patterns that miss context

  • An ELK or Splunk stack that costs weeks to set up

  • LLM-based RAG that chunks logs like documents — which completely breaks for analytical questions

That last one is worth dwelling on. Here's why vector-based RAG fails for logs:

Query Traditional RAG LogLens
"How many 500 errors?" Retrieves chunks mentioning errors, LLM guesses a count jq counts exactly: `map(select(.response_status >= 500))
"Which endpoint is slowest?" No way to aggregate across all records jq groups and averages across every record
"What failed at 14:32?" Semantic similarity on timestamps doesn't work jq filters by ISO 8601 string range
Hallucination risk LLM fills gaps with plausible fiction Evidence panel shows the verbatim source log lines

The insight is simple: logs are structured data, not documents. The right tool for structured data is a query engine, not a similarity search.


The Architecture: Vectorless RAG

Instead of embedding-based retrieval, LogLens uses the LLM as a programmer. Given a schema and a natural language question, the LLM writes a jq program that executes deterministically on the actual log data. The result is exact — no approximation, no hallucination.

Here's the full pipeline:

Raw Log File
  → [Log Parser]          — normalize messy logs into structured JSON
  → [Schema Discovery]    — stream the JSON, build a field map (cached)
  → [ID Map Builder]      — find entity relationships (cached)
  → [Skill Detection]     — pick the right domain context (Nginx? Python? systemd?)
  → [Two-Pass jq Gen]     — LLM writes a jq query, executes it, refines it
  → [Synthesis]           — LLM turns raw jq output into a human answer + evidence

Let me walk through each layer.


Layer 1: The Log Parser

This was the trickiest part to get right. Real-world logs look like this:

{"request":"/v1/study","method":"GET","response_status":404,"response_time_ms":268}
INFO:app.services.auth:JWT token validation failed for user u_123
  Traceback (most recent call last):
    File "/app/services/auth.py", line 47, in validate
      raise AuthError("token expired")
{"timestamp":"2026-04-30T06:45:38Z","level":"ERROR","message":"publish failed"}

Three formats, mixed together, in the same file.

The parser handles this with a priority-ordered detection strategy:

Line received
  ├── Starts with `{` ?  ──► JSON mode — buffer until valid JSON decodes
  ├── Matches LEVEL:LOGGER:MSG ? ──► Plaintext mode — handle multiline tracebacks
  └── Otherwise ──► Unknown fallback — try to extract timestamp, assign level=UNKNOWN

After parsing, every record gets normalized so downstream code always sees the same field names regardless of the original format:

Canonical Field Accepted Source Names
timestamp written_at, time, ts, @timestamp
level severity, log_level, priority
message msg, log, text

The parser is a generator — it yields records one at a time and never loads the whole file into memory. This means it can handle log files of any size.


Layer 2: Schema Discovery + ID Map (Cached)

Once the logs are parsed, LogLens makes a single streaming pass to build two things that get cached and reused for every query:

Schema — a structural map of every field: its type, and what percentage of records have it.

{
  "response_status": { "types": ["integer"], "occurrence_rate": 1.0 },
  "response_time_ms": { "types": ["integer"], "occurrence_rate": 0.94 },
  "correlation_id": { "types": ["string"], "occurrence_rate": 1.0 }
}

This is what the LLM gets when asked to write a jq query. It tells the LLM exactly which fields exist and their types — without exposing any actual data values.

ID Map — a cross-reference dictionary. The parser scans every record for fields that look like IDs (matching .*id|uuid|guid|key|token|hash.*) and builds a lookup: { "uuid-value": [record_indices...] }. This enables queries like "show me all events with correlation_id X" without re-scanning the file.

These two artefacts are computed once on loglens ingest and cached. A developer asking 20 questions about the same log file pays this cost exactly once.


Layer 3: The Skills System

Before any LLM call, LogLens determines the domain context — the vocabulary and rules specific to this type of log. An Nginx access log needs different guidance than a Python app log or a systemd journal.

Skills are pluggable .toml files. Each skill has three sections:

[meta]
name = "nginx_access"
description = "Nginx access logs"

[detection]
signals = ["response_status", "request", "method", "response_time_ms"]

[prompts]
domain_context = """
DOMAIN: Nginx Access Logs
- Flag endpoints with response_status >= 500 as critical
- P95 latency > 2000ms is a performance issue
- Group errors by .request to find the most affected endpoint
"""

jq_hints = """
- HTTP status: .response_status (integer)
- Endpoint path: .request (string)
- Latency: .response_time_ms (integer, milliseconds)
- Filter 5xx: select(.response_status >= 500)
"""

Skill detection is pure string matching — no LLM call needed. Each skill has signals (field names). LogLens scores every skill by counting how many of its signals appear in the schema, and picks the winner.

Skill blending handles hybrid logs. If a log has both level/logger fields (app_logs skill) and response_status/request fields (nginx_access skill), both skills score above the blend threshold and their prompts are merged. You'll see this in the CLI footer as skill: nginx_access+app_logs.

The skills directory is open for contribution — anyone can write a kubernetes.toml or django.toml and drop it in without touching any Python code.


Layer 4: Two-Pass jq Retrieval

This is the heart of the system. When you ask a question, the agent runs two LLM calls before synthesis:

Pass 1 — Exploration

The LLM is given the schema and asked to write a fuzzy jq query matching any keyword from the question. The goal isn't precision — it's to get a few real records back so the LLM can see the actual field names and values.

Why? Because users ask about "the publish API" without knowing whether the field is .request, .endpoint, .url, or .path. Pass 1 resolves this by returning real records with real field names.

Pass 2 — Precise Extraction

Using the Pass 1 sample as ground truth, the LLM writes a precise jq query. It now knows the exact field names and can write something like:

[.[] | select(.request == "/v1/study/publish")] | 
sort_by(.timestamp)

If either pass fails (syntax error, empty result), LogLens feeds the error back to the LLM and retries up to 3 times:

JQ failed: "compile error: syntax error... at <stdin>:1"
→ LLM: "That JQ failed: <error>. Write a corrected version."
→ Retry

One important safeguard: the JQ generation prompt includes an exhaustive allowlist of ~50 real jq builtins. Without this, the LLM invents functions. I observed hallucinations like parse_time(), hours_ago(), and strtotime() — none of which exist in jq. The allowlist blocks these before they reach the subprocess.


Layer 5: Synthesis With Evidence Anchors

After retrieval, the raw jq output goes to a synthesis prompt. The LLM is instructed to respond in a strict format:

ANSWER:
<Direct answer in 1-2 sentences. Lead with the key finding.>

DETAILS:
<Bullet points with flags: critical / healthy / recommendation / data point>

EVIDENCE:
<Verbatim log lines copied from the retrieved data — not paraphrased>

Three mechanisms prevent hallucination:

  1. Explicit empty-data handling — if jq returns nothing, the data section reads "NO DATA FOUND". The LLM cannot invent results.

  2. History summary, not history injection — instead of injecting the full Q&A history (which would bias the LLM toward old numbers), only the last 3 user questions are included as topic context. No previous answers, no previous numbers.

  3. Structured output rendering — the CLI parses ANSWER, DETAILS, and EVIDENCE sections separately. Evidence gets its own red-bordered panel so engineers can immediately verify the cited lines.

Here's what it looks like in practice:

╭─────────────────────── Copilot ────────────────────────╮
│ app.services.auth leads with 847 ERROR events (61%).   │
│                                                         │
│ ⚠️  JWT failures spiked at 14:32 UTC — 340 in 4 mins  │
│ ✅  app.services.billing had zero ERROR events          │
│ 💡  Rotate the signing key, check token expiry config  │
╰─────────────────────────────────────────────────────────╯
╭─────────────────────── Evidence ───────────────────────╮
│ [2026-04-30T14:32:01Z] ERROR app.services.auth:        │
│   JWT token validation failed, user=u_4821             │
│ [2026-04-30T14:32:04Z] ERROR app.services.auth:        │
│   JWT token validation failed, user=u_2093             │
╰─────────────────────────────────────────────────────────╯

The Auto-Briefing: Zero LLM Calls

One feature worth calling out: the briefing panel that appears when you run loglens chat makes zero LLM calls. It runs 5 pure jq programs directly against the cached records:

  1. Total record count

  2. 4xx and 5xx counts

  3. Top 3 failing endpoints (grouped by .request, sorted by error count)

  4. Most recent 5xx failure

  5. Top 3 slowest endpoints (grouped and averaged)

This gives you an immediate system health snapshot before asking a single question — with no API latency and no cost.


Performance

Operation LLM Calls jq Calls
ingest 0 0
chat startup (briefing) 0 5
query (happy path) 3 2
query (with 1 retry) 4 3
query (worst case) 6 5

For 3,500 records, ingest takes about 1 second. The schema and ID map are never recomputed after that. Every query pays only the LLM call cost — typically 3 API calls at haiku/flash tier, which runs to fractions of a cent per question.


What I Learned

The LLM as programmer pattern is underused. Most people use LLMs to retrieve or summarize. Using them to write code that executes against real data is a different mode entirely — more precise, more verifiable, and often cheaper because you're not embedding large amounts of text.

Schema is more valuable than data in the prompt. Giving the LLM the field names, types, and occurrence rates is enough for it to write accurate queries. You don't need to embed actual records (which would bloat the prompt and expose sensitive data).

Two passes beats one. The exploration pass that resolves ambiguous field names before the precision pass is the single biggest reliability improvement in the whole pipeline. Without it, the LLM guesses field names and fails constantly.

Structured output with evidence is non-negotiable for trust. Showing the raw log lines that produced the answer is what makes the tool feel reliable rather than magical. Engineers don't want to trust a black box — they want to see the receipts.


Try It

LogLens CLI is open source and free to use. You bring your own API key (supports Anthropic, OpenAI, Groq, Gemini).

curl -fsSL https://raw.githubusercontent.com/ShivprasadRoul/LogLens-CLI/main/install.sh | bash
loglens ingest /var/log/nginx/access.log --name myapp
loglens chat myapp

GitHub: github.com/ShivprasadRoul/LogLens-CLI

Feedback & feature requests: loglens.featurebase.app

If you work with logs and want to try it on your own data, I'd love to hear what breaks. The Skills system is designed to be extended — if you write a skill for a log format you work with, open a PR.