From Weekend Hack to 173 Languages
Rob Kunkle
x/twitter: @lux
linkedin: linkedin.com/in/robkunkle
ASIMOV DevLabs — Circuit Launch, Oakland
March 2026
| Current Approach | With Structured Graphs |
|---|---|
grep -r "def " *.py |
?func a python:function_definition |
| String matching, regex, glob | Typed nodes with relationships |
| No understanding of structure | AST + LSP + dataflow |
| Context window = search results | Precise graph queries |
| Hopes the right file is nearby | Traverses cross-file dependencies |
Markdown, grep, glob, regex—these are approximations. Graphs are the actual structure.
1 week. Claude + Python.
We built a parser that turned Python repos into RDF graphs, queryable with SPARQL.
SELECT ?func ?name ?startRow
WHERE {
?func a python:function_definition ;
python:text ?name ;
python:startRow ?startRow .
}
ORDER BY ?startRow
An agent could ask structured questions about code and get precise answers. No hallucination. No guessing file locations.
Agents are extremely adept at using SPARQL for generalized queries.
Find all functions, trace imports, map class hierarchies—all as graph queries.
Query for functions that exist but are never referenced anywhere. Instant cleanup list.
"Compare our repo to our competitor's." Same graph schema, same queries, side by side.
Agents can generate accurate docs because they have the full structural picture, not just string matches.
Everything grounded in actual code facts.
I even asked my friend Dan to try it ...
"Sorry, our codebase is Ruby."
And just like that, the real project began.
I knew this was a powerful idea. Store codebases as RDF, give agents SPARQL, and they get clear vision of the full repo.
But how could I make this readily available to everyone?
Not just Python. Not just my repos. A universal code intelligence layer that any agent could use against any codebase, regardless of the language.
It turned out: No, nobody is using SPARQL for code understanding.
Instead I found:
Cursor, Copilot, Claude Code—that space is covered.
IDE plugins that parse on every keystroke? That's an LSP's job.
How would I even benefit from this?
That no one actually ever uses.
Look at a repo, at a given commit, and give an agent the full picture.
173 languages
Tree-sitter gives us fast, incremental AST parsing for basically every language that matters. One parser to rule them all.
Python has function_definition, Ruby has method, Go has function_declaration—same concept, different names.
The ontology isn't documentation—it's the source of truth that drives everything:
Change the ontology, and the parser, materializer, and query tool all adapt. Zero code changes.
python:function_definition
a owl:Class ;
rdfs:subClassOf ts-core:Node ;
rdfs:subClassOf repolex:function_definition .
python:name
a owl:DatatypeProperty ;
python:isTerminalField true .
repolex:function_definition
a owl:Class ;
rdfs:subClassOf ts-core:Node .
# Ruby, Go, Java all map here too:
ruby:method
rdfs:subClassOf repolex:function_definition .
go:function_declaration
rdfs:subClassOf repolex:function_definition .
RDFS reasoning materializes the cross-language types at parse time. Query once, get results across all languages.
Key insight: Git already content-addresses files. Same content = same blob hash, regardless of filename or location.
Stored as gzipped N-Quads. Append-only commits + content-addressed blobs = efficient incremental updates.
# We see:
import jmespath
# Tree-sitter gives us:
# import_statement
# dotted_name: "jmespath"
#
# But where does jmespath
# come from? No idea.
# Now we know:
?import a repolex:lsp.import_statement ;
repolex:scm.resolution_node ?node .
?node ts-core:sourceFile
"/site-packages/jmespath/__init__.py" ;
ts-core:startRow 1 .
SCM queries (.scm files) capture resolution nodes from tree-sitter. Multilspy resolves them to actual definitions.
Currently supporting 11 languages via Microsoft's multilspy. Enables call graphs, cross-file navigation.
Structural syntax from tree-sitter. Every node, field, relationship in the source code.
?func a python:function_definition
Semantic resolution via multilspy. Where things are defined, what calls what.
?node a repolex:lsp.call_site
How data moves through the code. Variable assignments, return values, parameter passing.
?assign repolex:flowsTo ?usage
Execution paths, branches, loops. Which code runs under what conditions.
?block repolex:branchesTo ?target
Each layer is a separate named graph. Compose them for richer queries.
The dependency question: your code imports libraries. Those libraries import other libraries. How do you query across all of them?
We use deps.dev to resolve dependencies to their actual GitHub repos, and address all code with the org/repo structure. Load multiple graphs into one store. Named graphs keep them organized. SPARQL queries span them all.
SELECT ?func ?name
WHERE {
?func a repolex:function_definition ;
ts-core:text ?name .
FILTER NOT EXISTS {
?call repolex:lsp.call_site ?func
}
}
SELECT ?file ?module
WHERE {
?node a repolex:lsp.import_statement ;
repolex:scm.resolution_node ?res .
?res ts-core:sourceFile ?file .
?node ts-core:text ?module .
}
SELECT ?class ?method (COUNT(?call) as ?n)
WHERE {
?class a repolex:class_definition ;
ts-core:text ?name .
?method a repolex:function_definition .
?call repolex:lsp.call_site ?method .
} GROUP BY ?class ?method
SELECT ?yourFunc ?libFunc ?lib
WHERE {
GRAPH ?yourGraph {
?call repolex:lsp.call_site ?libFunc .
?call ts-core:sourceFile ?yourFunc . }
GRAPH ?libGraph {
?libFunc a repolex:function_definition . }
}
No servers. No cloud compute bills. Just GitHub.
Repolex fits into ASIMOV as a perception layer—structured knowledge that feeds into ASIMOV's intelligence architecture.
Verifiable, local-first knowledge. No cloud dependency. The graphs carry provenance—you can trace every triple back to a specific commit, file, and line.
Done
Tree-sitter parsing for 173 languages • Ontology-driven development • Content-addressed blob storage • Git history graphs • RDFS cross-language reasoning • lexq query tool with JSON-LD compaction • SCM query capture • LSP integration (11 languages via multilspy) • Dataflow analysis • Control flow graphs • Slimmer AST graphs
In Progress
LLM query ergonomics • Premade CONSTRUCT queries • repolex-forx GitHub Actions pipeline fine-tuning • ASIMOV module integration
Next
Public-facing graph registry • Parse 1,000 open source repos • AI training set publication
We've built something that turns Git repositories into composable knowledge graphs.
But Git repos aren't just code.
Documentation repos
Config repos (IaC, K8s)
Data repos
Research paper repos
Legal document repos
Anything versioned in Git
github.com/repolex-ai
Rob Kunkle
x/twitter: @lux
linkedin: linkedin.com/in/robkunkle
rob.kunkle@gmail.com
RDF SPARQL tree-sitter knowledge graphs