Modern Tree-sitter, part 6: reinventing symbols-view

savetheclocktowerJanuary 22, 2024
  • dev
  • modernization
  • tree-sitter
About 12 min

Weā€™ve been telling a series of stories about all the different ways that Tree-sitter can improve the editing experience in Pulsar. Todayā€™s story about symbols-view starts a bit slowly, but itā€™s got a great ending:Ā the addition of a major new feature to Pulsar 1.113.

Background

Back in March, @mauricioszaboopen in new window gave me an assignment:

Currently, ā€œdefinitionsā€ are implemented using CTags in symbols-view. What do you think about transforming this into a ā€œserviceā€, like pulsar.definitions or editor.definitions? That way, the tokenizer can ā€œpushā€ definitions into this service, and symbols-view can query for the definitions on the current file.

Iā€™ll explain what heā€™s talking about.

You might already use the symbols-view packageopen in new window to navigate to important parts of your source code files. For instance, if you want to jump to the definition of render in a given file, you can

  • press Ctrl+R (or Cmd+R on macOS),
  • start typing render, and
  • press Return to accept the first result in the list (or use arrow keys or the mouse to navigate to a different result).

Choosing the render symbol in your symbols list will move the editor to the line where render is defined.

This is a time-saving feature. But how does it work? How does Pulsar know which items to put in the list? How does it know where your render method is defined? You might be surprised: it uses an ancient program called ctags ā€” specifically a fork called Exuberant Ctagsopen in new window.

(What MaurĆ­cio calls ā€œdefinitionsā€ is what symbols-view calls ā€œsymbols,ā€ and what ctags calls, well, ā€œtags.ā€ For simplicity Iā€™ll use the term ā€œsymbolā€ just to align with what Pulsar calls it.)

ctags works well enough that you might never notice its drawbacks, but itā€™s got plenty of drawbacks. It reads files from disk, so it can return inaccurate results if you use it on a file that has been modified since its last save. For the same reason, it doesnā€™t work at all on new files that havenā€™t yet been saved. And it needs special configurationopen in new window for each language it supports ā€” meaning that, even after youā€™ve written a Pulsar grammar for your newly-invented Language X, you wonā€™t get any symbol-based navigation unless you modify the symbols-view package itself and tell ctags how to find your languageā€™s symbols.

I know youā€™re probably tired of hearing me say ā€œTree-sitter would be great for this task!ā€ ā€” but code navigation systemsopen in new window really are in its wheelhouse. The trees weā€™re already using to highlight code and do other useful tasks can be queried to supply symbols much more easily than via ctags. And many parsers even come bundled with a query file that does the work of identifying the symbols weā€™re interested in.

You might have noticed how GitHub can nowadays give you an outline-like view of a source code file, listing lines where methods are defined. Thatā€™s all happening through Tree-sitter. If GitHub can use it for symbol navigation, so can we.

Refactoring symbols-view

But to make that happen, we need to change how symbols-view works. All it knows about is ctags! Could we rip all that out and replace it with a Tree-sitter solution? Yes, but in the process weā€™d be abandoning support for any languages that donā€™t yet have Tree-sitter parsers.

A better approach would be to know about both strategies and pick the best one on the fly. So letā€™s figure out exactly what MaurĆ­cioā€™s request ā€” ā€œtransforming [symbols] into a ā€˜serviceā€™ā€ ā€” means.

A crash course in services

In Pulsar, services are how packages talk to one another. Suppose Iā€™ve authored package-b and it depends on another package called package-a that someone else has written. I could reach into atom.packages and grab the reference to package-a, but this feels weird for a number of reasons. For one, it incorrectly assumes that package-a has already been activated. It may get activated after package-b ā€” or else it may never get activated because the user has disabled or uninstalled it.

But even if package-b were able to find and consume package package-a this way, itā€™d create a tight coupling between the two. That coupling would break if package-a renamed itself, or if it changed implementation details that package-b was relying on.

So instead of communicating directly, they can invent a service called foo and use it to communicate. One package defines itself as a provider of service foo, and the other defines itself as a consumer of service foo.

During startup, Pulsar will activate each package, notice the match, and arrange an introduction as soon as both packages have been activated. The provider will end up returning an object that the consumer can use however it likes; this object is typically some sort of interface with methods that the consumer can call.

Services thus act as contracts between packages. And they can be versioned, too. If it wants, package-a can provide several different versions of the service at once; this leaves the author free to make changes without breaking packages that consume the older version.

A built-in example

This flexibility makes new things possible. Consider a package like autocomplete-plus, the bundled package that provides an autocompletion menu in Pulsar. It doesnā€™t try to implement the various tactics that can be used to suggest completion candidates; all it does is make the user interface for an autocomplete menu. It then defines an autocomplete.provider service so that other packages can provide completion suggestions. Packages like autocomplete-html, autocomplete-css, and others know how to suggest context-specific completions at the cursor, so they feed that data to autocomplete-plus.

Service Diagram 1

We like this approach because it gives users an incredible amount of control. For example, if you donā€™t like the HTML autocompletion suggestions, you can change autocomplete-htmlā€™s configuration, or even disable it entirely. Or you could write your own alternative to autocomplete-html. Or you could even write your own alternative to autocomplete-plus! By registering as a consumer of autocomplete.provider, your replacement package would be able to communicate with packages like autocomplete-html just as easily as autocomplete-plus can.

Service Diagram 2

This is the model we need for symbols-view. We now have a second approach for generating symbols that can compete favorably with the ctags strategy. So letā€™s reinvent symbols-view in the style of autocomplete-plus and make it a consumer of a new service weā€™ll invent named symbol.provider.

Service Diagram 3

The built-in ctags provider can be spun off into a package called symbol-provider-ctags, and our new Tree-sitterā€“based approach can be in a package called symbol-provider-tree-sitter. These packages can provide the symbol.provider service for symbols-view to consume.

How will it work?

Iā€™ve talked about why Pulsar chose not to leverage the built-in highlights.scm query files that exist for most Tree-sitter parsers: we needed richer information than they could provide. Luckily, thatā€™s not true for other kinds of files! Many parsers also provide tags.scm query files, and theyā€™re easy for us to consume as-is.

When a user presses Ctrl+R / Cmd+R, we can run a query against the current buffer. Any node that is captured as @name in a tags.scm file can be represented as a symbol. Often the node will be contained in a larger capture called (for example) @definition.function; we can detect that and infer that the text captured by @name refers to a function.

The information we get is not only richer than what ctags can provide, but also more accurate, since weā€™re querying against the current buffer text. Even if the file hasnā€™t been saved recently. Even if it hasnā€™t been saved at all!

Now, we can only do this when the file in question is using a Tree-sitter grammar, so itā€™s not a universal solution. But we can prefer a Tree-sitter symbol provider where itā€™s available, and fall back to our ctags provider where it isnā€™t.

Project symbols

Another thing that symbols-view has long supported ā€” theoretically ā€” is project-based symbol navigation, allowing you to search for and jump to symbols in other files.

Project Symbols example

Itā€™s been able to do this because ctags can read project-wide symbol metadata ā€” a genuine upside it has over some other approaches. But this feature only works if the user has generated a special file called a ā€œtags fileā€ for their project. Pulsar itself canā€™t generate this file on its own because it doesnā€™t know which files it should crawl to find symbols (imagine if it tried to crawl your entire node_modules folder!), so the ctags strategy requires the user to regenerate that file on a regular basis.

For now, our Tree-sitter symbol provider can only suggest symbols in the current file. If you activate Toggle Project Symbols via Ctrl+Shift+R / Cmd+Shift+R, it wonā€™t even volunteer for the job. Using Tree-sitter to list the symbols in an open buffer is very fast precisely because the buffer is open; weā€™ve already paid the startup cost of the initial parse. But thereā€™s no way Tree-sitter could parse all of a projectā€™s files in a similar amount of time. If we want project-wide symbol search weā€™ll have to look elsewhere.

Go to declaration

ā€œWho cares,ā€ you may think. And Iā€™ll admit I donā€™t attempt a project-wide symbol search very often. But thereā€™s a related feature Iā€™m pretty sure youā€™ll like.

symbols-view defines a Go To Declaration command. Itā€™ll search the project for a symbol matching the word under the cursor. If thereā€™s one result, itā€™ll get opened automatically; if thereā€™s more than one, it offers up the options in a list for you to choose. And when youā€™re done, thereā€™s a corresponding Return From Declaration command that takes you back to the place you just were.

Dive into a definition with Ctrl+Alt+Down / Cmd+Alt+Down, then return to the surface with Ctrl+Alt+Up / Cmd+Alt+Up:

Here Iā€™ve demonstrated it on a TypeScript type, but itā€™ll work on functions and classes and other types of things, too.

Did you know this feature existed? I didnā€™t. Itā€™s been available to you this whole time if youā€™ve had a tags file to supply project-wide symbols, the way nobody does. But with a refactored symbols-view, another candidate for supplying these symbols enters the arena: a language server.

Language servers

I hesitate to mention language serversopen in new window merely in passing, because theyā€™re a deep enough topic to require their own multi-part blog post series. But let me give it a shot.

There are a handful of Pulsar packages named like ide-x, where x is the name of a language. Several of them were even originally developed by the Atom team. For now Iā€™ll call them IDE backend packages.

What these packages have in common is that they all run something called a language server under the hood. A language server is designed to be a brain for a few dozen common features youā€™d want from your code editor: autocompletion, code linting, refactor support, and the like. A single language server typically knows how to do these tasks for one specific language or framework.

Language servers are exciting because they make it easier for weirdos like us to use editors other than the market leader. Instead of having to write all those features from scratch for, say, TypeScript, an upstart code editor could instead communicate with typescript-language-serveropen in new window and write some glue code to wire up the language serverā€™s features to the features of the editor.

The good news is that the language server specification includes several actions that are relevant to symbols-view: textDocument/documentSymbolopen in new window for same-file symbols, workspace/symbolopen in new window for project-wide symbols, and even textDocument/definitionopen in new window for finding where a symbol is defined. Some IDE backend packages already have ā€œbrainsā€ capable of doing these tasks!

But hereā€™s the bad news: since the symbol.provider service has only just been invented, those IDE backend packages need updates before they can be used for symbol navigation.

Iā€™ve started to do a bit of that work. Inspired by ide-typescript ā€”Ā but mainly starting fresh ā€”Ā Iā€™ve been working on a package currently called pulsar-ide-typescript-alphaopen in new window that aims to be its drop-in replacement. It should be able to do everything that ide-typescript can already do, but it will also be able to offer project-wide symbol search and go-to-declaration functionality.

And it might take a few version bumps on dependencies, but most other IDE backend packages can also be updated to take advantage of these features.

Anyway, back to symbols-view

Unlike autocomplete-plus, which aggregates suggestions from multiple providers and shows all of them to the user, symbols-view is mainly interested in choosing the best provider for the job. Thereā€™s little point in aggregating across a language server and Tree-sitter and ctags, since theyā€™re largely going to be offering the same list of symbols with varying degrees of richness, and youā€™d be pretty annoyed if Pulsar offered you three different list entries for the same function. Inside symbols-view theyā€™re called ā€œexclusiveā€ providers because only one of them will be picked for the job.

But I wanted to leave the door open for some creative and unexpected usages, so symbols-view also has a concept of ā€œsupplementalā€ providers. A provider that marks itself as supplemental is saying itā€™d like to contribute symbols that would probably not already be in an exclusive providerā€™s list. You may be wondering what kinds of symbols would fit the bill, so let me give you an exampleā€¦

Did you know you can bookmark lines in a buffer? Try it out: right-click on any line of your editor and select Toggle Bookmark. The built-in bookmarks package keeps track of them and will also let you navigate between them via F2 and Shift+F2.

Anyway, to illustrate the idea of a supplemental provider, I wrote one: symbol-provider-bookmarksopen in new window will turn each of your bookmarks into a symbol, then display them in the symbols-view UI alongside your main providerā€™s symbols, using the text of the bookmarked line as the symbol name.

symbol-provider-bookmarks example

This oneā€™s not bundled with Pulsar, so grab it from the package registryopen in new window if it sounds interesting.

Shipping now

Iā€™ve had most of this article written for months, but I decided to wait to publish it until we could show this stuff off. That time is now.

Pulsar 1.113 makes two major changes that will vastly improve the quality of the symbol searching you might already be accustomed to:

  1. The new version of symbols-view is now in place. It will offer you ctags-based symbols in grammars that donā€™t use Tree-sitter, but it will prefer Tree-sitterā€“supplied symbols in most grammars. If you truly donā€™t like change, you can disable the symbol-provider-tree-sitter package and just rely on symbol-provider-ctags, or else you can configure symbols-view to prefer some providers over others.

    But Iā€™m betting youā€™ll want to keep using the symbols provided by Tree-sitter, becauseā€¦

  2. As you may have heard, modern Tree-sitter grammars are now the default! The system that we shipped in experimental fashion back in Pulsar 1.106 is now ready for prime time. For now, you can opt back into legacy Tree-sitter with the new core.useLegacyTreeSitter setting ā€” but not for long, because the legacy system will be dropped when weā€™re finally able to migrate to a newer version of Electron.

Because common languages like JavaScript, Python, Ruby, and many others have full-featured modern Tree-sitter grammars, they will also be using our new Tree-sitter symbol provider for Ctrl+R / Cmd+R. That means the symbol results should be better across the board ā€” more accurate and more comprehensive. (If it seems worse, please file a bugopen in new window.)

How does this actually improve the symbols-view experience? Letā€™s see what our original example looks like with a Tree-sitter symbol provider:

The richness of the metadata we get from these sources has allowed us to enhance the symbols-view UI, too! Youā€™ll be shown the ā€œkindā€ of thing that a symbol is ā€”Ā class, function, constant, et cetera. In many cases, these kinds will be illustrated with icons. Visit the package settings page for symbols-view to explore the possibilities.

And there are even a few killer new features. Open a symbols list on a JSON file and marvel at the entries you see:

symbols-view JSON example

The entire key path is now the name of the symbol! The same sorts of query and predicate tricks weā€™ve seen in previous installments in this series can be used for awesome features like this. The symbol-provider-tree-sitter READMEopen in new window has more details.

And itā€™s early days for pulsar-ide-typescript-alphaopen in new window, but Iā€™ve been using it for a few months as a symbol provider (and a go-to-declaration provider!) on TypeScript and JavaScript projects. Feel free to give it a shot yourself. (And if youā€™re interested in bringing one of the other ide-x packages into the year 2024, please do broach the topic on GitHub Discussionsopen in new window, Discordopen in new window, or one of our other communitiesopen in new window.)

Conclusion

After overhauling Pulsarā€™s syntax highlighting, indentation, code folding, and language injections, weā€™ve found yet another way that Tree-sitter can improve our existing editor experience. But in this case, thereā€™s an even better improvement just around the corner: IDE backend packages and language servers. Iā€™ll be sure to go into more detail on the Pulsar IDE experience in future posts.

Integrating Tree-sitter has been a difficult project. I started working on it in earnest in February of 2023; it shipped in June behind an experimental flag; and itā€™s finally the default grammar type in January of 2024.

Our Tree-sitter series is nearing an end, but thereā€™s one more thing to cover: the challenges. Could it have been easier? Can Tree-sitter overcome its pain points and drawbacks? Weā€™ll talk about it next time.