Modern Tree-sitter, part 6: reinventing symbols-view
Weāve been telling a series of stories about all the different ways that Tree-sitter can improve the editing experience in Pulsar. Todayās story about symbols-view
starts a bit slowly, but itās got a great ending:Ā the addition of a major new feature to Pulsar 1.113.
Background
Back in March, @mauricioszabo gave me an assignment:
Currently, ādefinitionsā are implemented using CTags in
symbols-view
. What do you think about transforming this into a āserviceā, likepulsar.definitions
oreditor.definitions
? That way, the tokenizer can āpushā definitions into this service, andsymbols-view
can query for the definitions on the current file.
Iāll explain what heās talking about.
You might already use the symbols-view
package to navigate to important parts of your source code files. For instance, if you want to jump to the definition of render
in a given file, you can
- press Ctrl+R (or Cmd+R on macOS),
- start typing
render
, and - press Return to accept the first result in the list (or use arrow keys or the mouse to navigate to a different result).
Choosing the render
symbol in your symbols list will move the editor to the line where render
is defined.
This is a time-saving feature. But how does it work? How does Pulsar know which items to put in the list? How does it know where your render
method is defined? You might be surprised: it uses an ancient program called ctags
ā specifically a fork called Exuberant Ctags.
(What MaurĆcio calls ādefinitionsā is what symbols-view
calls āsymbols,ā and what ctags
calls, well, ātags.ā For simplicity Iāll use the term āsymbolā just to align with what Pulsar calls it.)
ctags
works well enough that you might never notice its drawbacks, but itās got plenty of drawbacks. It reads files from disk, so it can return inaccurate results if you use it on a file that has been modified since its last save. For the same reason, it doesnāt work at all on new files that havenāt yet been saved. And it needs special configuration for each language it supports ā meaning that, even after youāve written a Pulsar grammar for your newly-invented Language X, you wonāt get any symbol-based navigation unless you modify the symbols-view
package itself and tell ctags
how to find your languageās symbols.
I know youāre probably tired of hearing me say āTree-sitter would be great for this task!ā ā but code navigation systems really are in its wheelhouse. The trees weāre already using to highlight code and do other useful tasks can be queried to supply symbols much more easily than via ctags
. And many parsers even come bundled with a query file that does the work of identifying the symbols weāre interested in.
You might have noticed how GitHub can nowadays give you an outline-like view of a source code file, listing lines where methods are defined. Thatās all happening through Tree-sitter. If GitHub can use it for symbol navigation, so can we.
symbols-view
Refactoring But to make that happen, we need to change how symbols-view
works. All it knows about is ctags
! Could we rip all that out and replace it with a Tree-sitter solution? Yes, but in the process weād be abandoning support for any languages that donāt yet have Tree-sitter parsers.
A better approach would be to know about both strategies and pick the best one on the fly. So letās figure out exactly what MaurĆcioās request ā ātransforming [symbols] into a āserviceāā ā means.
A crash course in services
In Pulsar, services are how packages talk to one another. Suppose Iāve authored package-b
and it depends on another package called package-a
that someone else has written. I could reach into atom.packages
and grab the reference to package-a
, but this feels weird for a number of reasons. For one, it incorrectly assumes that package-a
has already been activated. It may get activated after package-b
ā or else it may never get activated because the user has disabled or uninstalled it.
But even if package-b
were able to find and consume package package-a
this way, itād create a tight coupling between the two. That coupling would break if package-a
renamed itself, or if it changed implementation details that package-b
was relying on.
So instead of communicating directly, they can invent a service called foo
and use it to communicate. One package defines itself as a provider of service foo
, and the other defines itself as a consumer of service foo
.
During startup, Pulsar will activate each package, notice the match, and arrange an introduction as soon as both packages have been activated. The provider will end up returning an object that the consumer can use however it likes; this object is typically some sort of interface with methods that the consumer can call.
Services thus act as contracts between packages. And they can be versioned, too. If it wants, package-a
can provide several different versions of the service at once; this leaves the author free to make changes without breaking packages that consume the older version.
A built-in example
This flexibility makes new things possible. Consider a package like autocomplete-plus
, the bundled package that provides an autocompletion menu in Pulsar. It doesnāt try to implement the various tactics that can be used to suggest completion candidates; all it does is make the user interface for an autocomplete menu. It then defines an autocomplete.provider
service so that other packages can provide completion suggestions. Packages like autocomplete-html
, autocomplete-css
, and others know how to suggest context-specific completions at the cursor, so they feed that data to autocomplete-plus
.
We like this approach because it gives users an incredible amount of control. For example, if you donāt like the HTML autocompletion suggestions, you can change autocomplete-html
ās configuration, or even disable it entirely. Or you could write your own alternative to autocomplete-html
. Or you could even write your own alternative to autocomplete-plus
! By registering as a consumer of autocomplete.provider
, your replacement package would be able to communicate with packages like autocomplete-html
just as easily as autocomplete-plus
can.
This is the model we need for symbols-view
. We now have a second approach for generating symbols that can compete favorably with the ctags
strategy. So letās reinvent symbols-view
in the style of autocomplete-plus
and make it a consumer of a new service weāll invent named symbol.provider
.
The built-in ctags
provider can be spun off into a package called symbol-provider-ctags
, and our new Tree-sitterābased approach can be in a package called symbol-provider-tree-sitter
. These packages can provide the symbol.provider
service for symbols-view
to consume.
How will it work?
Iāve talked about why Pulsar chose not to leverage the built-in highlights.scm
query files that exist for most Tree-sitter parsers: we needed richer information than they could provide. Luckily, thatās not true for other kinds of files! Many parsers also provide tags.scm
query files, and theyāre easy for us to consume as-is.
When a user presses Ctrl+R / Cmd+R, we can run a query against the current buffer. Any node that is captured as @name
in a tags.scm
file can be represented as a symbol. Often the node will be contained in a larger capture called (for example) @definition.function
; we can detect that and infer that the text captured by @name
refers to a function.
The information we get is not only richer than what ctags
can provide, but also more accurate, since weāre querying against the current buffer text. Even if the file hasnāt been saved recently. Even if it hasnāt been saved at all!
Now, we can only do this when the file in question is using a Tree-sitter grammar, so itās not a universal solution. But we can prefer a Tree-sitter symbol provider where itās available, and fall back to our ctags
provider where it isnāt.
Project symbols
Another thing that symbols-view
has long supported ā theoretically ā is project-based symbol navigation, allowing you to search for and jump to symbols in other files.
Itās been able to do this because ctags
can read project-wide symbol metadata ā a genuine upside it has over some other approaches. But this feature only works if the user has generated a special file called a ātags fileā for their project. Pulsar itself canāt generate this file on its own because it doesnāt know which files it should crawl to find symbols (imagine if it tried to crawl your entire node_modules
folder!), so the ctags
strategy requires the user to regenerate that file on a regular basis.
For now, our Tree-sitter symbol provider can only suggest symbols in the current file. If you activate Toggle Project Symbols via Ctrl+Shift+R / Cmd+Shift+R, it wonāt even volunteer for the job. Using Tree-sitter to list the symbols in an open buffer is very fast precisely because the buffer is open; weāve already paid the startup cost of the initial parse. But thereās no way Tree-sitter could parse all of a projectās files in a similar amount of time. If we want project-wide symbol search weāll have to look elsewhere.
Go to declaration
āWho cares,ā you may think. And Iāll admit I donāt attempt a project-wide symbol search very often. But thereās a related feature Iām pretty sure youāll like.
symbols-view
defines a Go To Declaration command. Itāll search the project for a symbol matching the word under the cursor. If thereās one result, itāll get opened automatically; if thereās more than one, it offers up the options in a list for you to choose. And when youāre done, thereās a corresponding Return From Declaration command that takes you back to the place you just were.
Dive into a definition with Ctrl+Alt+Down / Cmd+Alt+Down, then return to the surface with Ctrl+Alt+Up / Cmd+Alt+Up:
Here Iāve demonstrated it on a TypeScript type, but itāll work on functions and classes and other types of things, too.
Did you know this feature existed? I didnāt. Itās been available to you this whole time if youāve had a tags file to supply project-wide symbols, the way nobody does. But with a refactored symbols-view
, another candidate for supplying these symbols enters the arena: a language server.
Language servers
I hesitate to mention language servers merely in passing, because theyāre a deep enough topic to require their own multi-part blog post series. But let me give it a shot.
There are a handful of Pulsar packages named like ide-x
, where x
is the name of a language. Several of them were even originally developed by the Atom team. For now Iāll call them IDE backend packages.
What these packages have in common is that they all run something called a language server under the hood. A language server is designed to be a brain for a few dozen common features youād want from your code editor: autocompletion, code linting, refactor support, and the like. A single language server typically knows how to do these tasks for one specific language or framework.
Language servers are exciting because they make it easier for weirdos like us to use editors other than the market leader. Instead of having to write all those features from scratch for, say, TypeScript, an upstart code editor could instead communicate with typescript-language-server and write some glue code to wire up the language serverās features to the features of the editor.
The good news is that the language server specification includes several actions that are relevant to symbols-view
: textDocument/documentSymbol
for same-file symbols, workspace/symbol
for project-wide symbols, and even textDocument/definition
for finding where a symbol is defined. Some IDE backend packages already have ābrainsā capable of doing these tasks!
But hereās the bad news: since the symbol.provider
service has only just been invented, those IDE backend packages need updates before they can be used for symbol navigation.
Iāve started to do a bit of that work. Inspired by ide-typescript
āĀ but mainly starting fresh āĀ Iāve been working on a package currently called pulsar-ide-typescript-alpha
that aims to be its drop-in replacement. It should be able to do everything that ide-typescript
can already do, but it will also be able to offer project-wide symbol search and go-to-declaration functionality.
And it might take a few version bumps on dependencies, but most other IDE backend packages can also be updated to take advantage of these features.
symbols-view
Anyway, back to Unlike autocomplete-plus
, which aggregates suggestions from multiple providers and shows all of them to the user, symbols-view
is mainly interested in choosing the best provider for the job. Thereās little point in aggregating across a language server and Tree-sitter and ctags
, since theyāre largely going to be offering the same list of symbols with varying degrees of richness, and youād be pretty annoyed if Pulsar offered you three different list entries for the same function. Inside symbols-view
theyāre called āexclusiveā providers because only one of them will be picked for the job.
But I wanted to leave the door open for some creative and unexpected usages, so symbols-view
also has a concept of āsupplementalā providers. A provider that marks itself as supplemental is saying itād like to contribute symbols that would probably not already be in an exclusive providerās list. You may be wondering what kinds of symbols would fit the bill, so let me give you an exampleā¦
Did you know you can bookmark lines in a buffer? Try it out: right-click on any line of your editor and select Toggle Bookmark. The built-in bookmarks
package keeps track of them and will also let you navigate between them via F2 and Shift+F2.
Anyway, to illustrate the idea of a supplemental provider, I wrote one: symbol-provider-bookmarks
will turn each of your bookmarks into a symbol, then display them in the symbols-view
UI alongside your main providerās symbols, using the text of the bookmarked line as the symbol name.
This oneās not bundled with Pulsar, so grab it from the package registry if it sounds interesting.
Shipping now
Iāve had most of this article written for months, but I decided to wait to publish it until we could show this stuff off. That time is now.
Pulsar 1.113 makes two major changes that will vastly improve the quality of the symbol searching you might already be accustomed to:
The new version of
symbols-view
is now in place. It will offer youctags
-based symbols in grammars that donāt use Tree-sitter, but it will prefer Tree-sitterāsupplied symbols in most grammars. If you truly donāt like change, you can disable thesymbol-provider-tree-sitter
package and just rely onsymbol-provider-ctags
, or else you can configuresymbols-view
to prefer some providers over others.But Iām betting youāll want to keep using the symbols provided by Tree-sitter, becauseā¦
As you may have heard, modern Tree-sitter grammars are now the default! The system that we shipped in experimental fashion back in Pulsar 1.106 is now ready for prime time. For now, you can opt back into legacy Tree-sitter with the new
core.useLegacyTreeSitter
setting ā but not for long, because the legacy system will be dropped when weāre finally able to migrate to a newer version of Electron.
Because common languages like JavaScript, Python, Ruby, and many others have full-featured modern Tree-sitter grammars, they will also be using our new Tree-sitter symbol provider for Ctrl+R / Cmd+R. That means the symbol results should be better across the board ā more accurate and more comprehensive. (If it seems worse, please file a bug.)
How does this actually improve the symbols-view
experience? Letās see what our original example looks like with a Tree-sitter symbol provider:
The richness of the metadata we get from these sources has allowed us to enhance the symbols-view
UI, too! Youāll be shown the ākindā of thing that a symbol is āĀ class, function, constant, et cetera. In many cases, these kinds will be illustrated with icons. Visit the package settings page for symbols-view
to explore the possibilities.
And there are even a few killer new features. Open a symbols list on a JSON file and marvel at the entries you see:
The entire key path is now the name of the symbol! The same sorts of query and predicate tricks weāve seen in previous installments in this series can be used for awesome features like this. The symbol-provider-tree-sitter
README has more details.
And itās early days for pulsar-ide-typescript-alpha
, but Iāve been using it for a few months as a symbol provider (and a go-to-declaration provider!) on TypeScript and JavaScript projects. Feel free to give it a shot yourself. (And if youāre interested in bringing one of the other ide-x
packages into the year 2024, please do broach the topic on GitHub Discussions, Discord, or one of our other communities.)
Conclusion
After overhauling Pulsarās syntax highlighting, indentation, code folding, and language injections, weāve found yet another way that Tree-sitter can improve our existing editor experience. But in this case, thereās an even better improvement just around the corner: IDE backend packages and language servers. Iāll be sure to go into more detail on the Pulsar IDE experience in future posts.
Integrating Tree-sitter has been a difficult project. I started working on it in earnest in February of 2023; it shipped in June behind an experimental flag; and itās finally the default grammar type in January of 2024.
Our Tree-sitter series is nearing an end, but thereās one more thing to cover: the challenges. Could it have been easier? Can Tree-sitter overcome its pain points and drawbacks? Weāll talk about it next time.