Intro
Over time, microprocessors get orders of magnitude faster (thanks, Moore!), memory and storage gets orders of magnitude larger, and our operating environments get orders of magnitude more complex. And yet, the way we program now is largely unchanged from decades ago: we use a text editor to manually enter lines of code, laboriously test and debug the code, and repeat until we have a reasonably bug-free product.
This means that our tools for writing ever more complex software which can run on ever more complex operating environments are proportionally less powerful than ever before.
To be sure, certain aspects of software development have gotten vastly more efficient: we have more modern languages, efficient optimizing compilers, and smart IDEs that were only a pipe dream a decade or two ago. And yet, we still suffer from a toolchain deficit: despite using modern software development tools, programmers still spend an inordinate amount of their time doing mundane druge work, much of which can--and should--be handled by automated tools.
Enough talk: it's time for examples! The problems I'm addressing fall into the following rough categories:
- Wrangling with tools, both individually and as part of a development pipeline, and dealing with new, buggy dev tools.
- Drudge work: renaming variables, refactoring code, moving code, doing basic housekeeping, ensuring internal consistency.
- Dealing with idiosyncracies of the simplistic development and runtime environments. (Why are we storing things in hidden files like .env? Shouldn't this be in a completely separate part of our project? Kind of like a database of project-specific configuration settings.
We tend to perpetuate all of these problems listed above for one simple reason: programmers love to fiddle with techie things, and have been doing it for so long that we conflate "doing techie things" with "developing software." Heck, it would seem like downright cheating if we could somehow write software without having to deal with things like missing "includes", typos in CSS selectors, and a bezillion other things we do all day long as developers. After all, isn't that what we're getting paid to write and fix? Even if we could eliminate a good chunk of that work, why would we want to?
It would be like telling an accountant that we're about to simplify the tax code down to a 10-page document: first, they'd laugh and tell you it's impossible. Then, when you insisted it was really happening, they'd wonder what they would end up actually doing all day.
You may still be skeptical at this point, and you're probably already thinking of your favorite dev tools, editor plug-ins, and browser extensions that currently make your life as a software developer seem downright streamlined. Unless you're a brand spanking new junior dev, you've probably already settled into a daily routine of writing code, hunting bugs, and managing it all with Git. You've mastered the tools, and all that's left to do is...learn more languages? Remove more bugs from your existing codebase? Keep adding more plug-ins to your editor to shave off a few minutes of time every day to help your dev time become even more efficient? How do you expect to be developing in a year from now? 5 years? 10 years? 20 years? What's your end game here?
If you've even given this some though at all, I submit that your goals fall wildly, massively short of where they could and should be. Maybe you thought you'd master a few more languages, or learn how to write asynchronous code, or even bone up on blockchain or ML. But have you given any real thought to the *mechanics* of how you'd continue to develop code?
Let me put it simply: by continuing to develop code in languages that are written as a series of flat text files, using IDEs that crudely (albeit sometimes cleverly) manage those text files, running on operating systems that are holdovers from the 1960s, we are holding ourselves back massively.
I propose developing and using software development tools which take the vast majority of drudgework off the programmer's plate and help automate the development process in ways we've only barely touched, let alone thought of. Storing project files in a folder? That's great, but filesystems are terrible at storing additional metatdata, and the tools for interacting with them are so, well 1970. Don't get me wrong, Unix commands for searching and manipulating the filesystem are amazingly powerful, but they are, in a nutshell, crude.
Our projects should be self-contained environments which can be copied, moved, manipulated, and, probably most important, interacted with via automatic scripts.
I know what you're thinking: this sounds a lot like Docker containers! Yes, it does. But Docker containers are just a convenient way of packaging and passing around pre-configured operating environments. Tools like Docker are certainly useful, but nowhere near as powerful and abstract as what I have in mind.
Recently I had the decidedly unpleasant experience of trying to refactor a Python project. Every time I moved a function from one file to another, I had to drag along all the imports used by that function. Not to mention deleting all the (now useless) imports in the file where that function used to reside. I felt like I had stepped back 40 years, to the days when I was writing C code and manually curating .h files and makefiles. The IDE I was using (VS Code) was nice enough to highlight where I was missing an import, and showed me imports that weren't being used, and I can already hear most of you saying, "Yeah! See? The IDE is helping make your life easier! Refactoring that code should be a snap!"
Why should I have to manually refactor ANY of that? The IDE sees what code I'm moving, it knows what imports that code needs and doesn't need. It should *automatically* change the imports where necessary to allow my code to continue to function. This is not just a matter of saving me the time of manually doing these updates; it's a matter of preventing possible introduction of errors. The more a human being is hitting the keyboard, the more chance they'll introduce a typo. Maybe inadvertently change "x++" to "xx++". Especially if, like most programmers, we try to get smart by writing complex greps.
Yes, I know you're a regexp genius who knows how to replace "from foo import bar" with "from foo import bar, baz" in all files. But that doesn't impress me, especially because one typo and you've hosed everything. What would impress me would be an IDE that did it for me automatically, without my having to ask. Why? Because the IDE should have the smarts to do that, saving me time that is better spent on writing things that the IDS can't (and will never be able to) figure out on its own, like changing business logic according to new tax laws or whatever.
Offload to the IDE everything that the IDE *should* be able to do (but doesn't currently, because we treat modern IDEs as glorified terminals that hardly help us at all.
Don't get me wrong, there are many excellent extensions for VS Code. Just don't get me started on the "Top 10 Must-Have VS Code Extensions to Improve Productivity" articles, which espouse extensions for things like "dark mode" and "auto rename tag". Sure, those extensions are definitely useful, but in my experience:
- They're buggy
- They're not always maintained
- Even when they work perfectly and are constantly maintained (which is rare), they are a Band-Aid to a much, much larger problem.
In short, they are the top of the food pyramid, not the bottom: they are the least important of all the tools we should be relying on to develop software faster, more efficiently, and more accurately.
Ref
What Is Ref?
Ref is an IDE (Integrated Development Environment) which supports syntax highlighting, automatic FTP uploads, sophisticated regex search-and-replace within multiple files, and integrates with your project's source control repository.
So far these sound like features of any typical IDE, but in fact these are not Ref's main strengths. The real power of Ref goes much deeper: Ref also contains parsers that allow it to gain an understanding of your code's components (variables, constants, functions, classes, etc.), structure, scopes, various files and other resources, and much more.
Rather than treating your source code like a long series of free-form characters (as pretty much every other editor and IDE does), Ref parses and stores your source code internally as structured data objects, kind of like a cross between a spreadsheet and the browser's DOM.
Once you've entered code into Ref, any syntactic or referential errors or warnings are displayed immediately, and each block, statement, expression, keyword, etc. is stored as a discrete unit (object). The "code" that the programmer writes and sees in Ref's editor is simply the outward manifestation (i.e. rendering) of Ref's internal representation of that code.
What's the purpose of all these tools for analysis and introspection? The best way to describe the benefits is to understand the basic problems Ref tries to help solve:
The Problems
Imagine you're working on a large program that you want to refactor. Part of that process involves changing the name of the variable $i
to $j
. (Code samples herein assume you're programming in some PHP- or Javascript-like language, but in fact Ref can parse various source languages, including Python, Perl, C, C++, and more.)
Most programmers will use their IDE's search-and-replace feature to make the change, perhaps with a clever regexp to make sure they don't mistakenly change things like $i2
, and do change things like ${i}
. When you're only making the change in a small area of your code, this works fairly well (although it's still subject to failure: is your regex smart enough to ignore string literals like '$i'
?) When you're making such changes to a large chunk of code, things can go south very quickly. And if you're making this change to thousands of instances in hundres of files, well, good luck!
What should be a simple task (renaming a variable) can be fraught with danger. And that's just for what should be a simple task! What if you want to rename the class Foo
to Bar
—including all places where you reference the class name Foo
? You're back to clumsy regexes again.
But this dilemma gets worse, and quickly: imagine you want to move a chunk of code into a separate function. First step: copy and paste the code into the body of a new function. That was easy! But now comes the hard part: which variables (not to mention constants and other functions) did that block of code make use of that are no longer in scope in your new function? There's no easy way to determine that, aside from reading the code and looking at what you need to either pass in as an argument, make global (shudder!), or otherwise give your function access to (e.g. PHP's "use").
Another example: How many times have you forgotten to "include" a file containing a function that you want to call? Or had to change an "include" to a different filename when you renamed a file? With Ref, all that goes away. Theoretically, there should be no need to "include" anything—just reference a function in a file, and Ref does the rest (specifically, it transpiles your code to add an "include" where necessary). Best of all, if the function you refer to is ever renamed (or even moved to a different file), all references to it are automatically changed as well to reflect that.
One last example: imagine you want to create a new variable $foo
. How can you reliably determine whether there already exists such a variable (not to mention function, constant, etc.) in the given scope? Aside from laboriously parsing through the code manually, your standard IDE will usually fail you here. But helping with this kind of task is exactly what Ref excels at.
At this point you may be thinking, "All my functions begin with fn_
, all my class names are InitCaps
, all my local variables are camelCase
, and all my constants are ALL_CAPS
! It's easy to write a regexp to match exactly what I want, without mistakenly matching anything else! Also, none of my functions is more than ten lines long, so it's easy for me to understand everything about my code!"
If that's really the case, I tip my hat to your use of best practices, but I would still argue that Ref can help you tremendously with those and other tasks, especially when you're dealing with code written by other people that may not follow your conventions. Besides, the main reason we have such naming conventions is because programmers--not their IDEs--are still tasked with managing symbol names, and the only way for an unassisted programmer to determine a symbol's type is to encode that in its name. With Ref, a symbol's type (not to mention scope) is always at your fingertips. Ref will tell you instantly whether your attempt to change $foo
to $foo->bar
is valid, i.e. whether $foo
can be dereferenced based on its type.
Ref manages the translation between frail, human-readable code (source code) and abstract internal data structures representing what that code really means.
Ref encourages you to stop worrying about language (implementation)-specific details and instead start thinking about what your program does, not how it does it.
Does this mean Ref uses an imperative, rather than declarative, paradigm? No, not really in the classical senses of those words. Ref just considers source code to be ugly, and the underlying meaning of the source code to be beautiful. Ref helps you manage those underlying ideas, not the ugly source code that those ideas must inevitably be expressed as—even though, paradoxically, we programmers still communicate these ideas via ugly source code. At the end of the day, we human programmers have (unfortunately) agreed that for ( ... ) { ... }
is both the physical and abstract representation of a loop.
At this point you may be thinking that Ref is trying to be one of those weird niche languages where everything is expressed visually, like a flowchart. (Any old-schoolers remember Rocky's Boots?) Indeed, projects like Hour of Code even teach coding in those highly abstract, managed environments in which you drag-and-drop components like loops and if/thens, and right-click or double-click to edit their properties ("End loop when iteration reaches 'n'"). That's not what Ref is trying to do, although there's no reason why you shouldn't be able to add such a visual interface to Ref.
Ref attempts to reconcile the paradox that programmers will forever struggle with: we want to optimize our code by taking advantage of implementation details, but we also want to abstract away messy implementation details.
What Ref Can Do
Symbols
- Identify unused symbols
- Ref can tell you which symbols are not used at all, allowing you to remove cruft.
-
Identify duplicate symbols at write time (not compile time or run time)
- Ref can tell you if you're redefining an existing symbol, and where that existing symbol was first defined.
-
Determine which symbols are in scope at any given point (and filter them by type, e.g. local or global variable, function, class, constant, etc.)
- Click on any line of code and Ref will tell you which symbols are in scope: a callout appears showing tabs for variables, constants, classes, functions. Click on any symbol to see its type, and optionally jump to the file and line where it was first declared.
-
For any symbol, which block(s) of code is it in scope?
-
You can click on any symbol and Ref can show you which scope(s) it's visible in. You can see where that variable was first declared (
var foo
) or, if Weak Typing is enabled, first used (foo = 0
) which effectively declares it by default. The scope(s) are highlighted in a color which makes them obvious, and Ref will show you an overview of the relevant file(s) so you can easily jump between them. -
Identify undefined symbols
- Are there any points in your code where symbols have not been properly declared before being used (e.g. weak typing)? Ref can point them out and let you resolve them, either manually or automatically (by declaring each symbol at the top of the block where it is first used).
-
Identify ambiguous symbols
- Are there any points in your code where a symbol is ambiguous, i.e. it refers to both a variable and a function that are both in scope? Ref shows you these instances and lets you rename ambiguous symbols, either manually or automatically.
- Renaming Symbols
-
Ref lets you rename symbols, either globally or in a given scope, by renaming them either where they are originally defined (e.g.
var $foo
), or anywhere they are used ($foo++
). Just rename$foo
once and all instances of$foo
in that scope will get changed automatically and instantly. And if your new name is not unique, Ref will warn you and suggest an alternate name.
When changing an instance of a variable from, say$foo
to$bar
, Ref will ask if you want to change (a) Just this instance, or (b) All instances.-
In either case, if this would result in a name-clash with an existing
$bar
, Ref will ask "use existing$bar
(Y/N)?"-
If the user selects Yes, the requested instances of
$foo
in that scope (a or b) will change to use the existing$bar
in that scope. - If the user selects No, they have the option of specifying a new variable name; see above for the same rules for deciding whether to allow it (i.e. whether it's already declared in that scope).
-
If the user selects Yes, the requested instances of
-
If no name-clash with an existing
$bar
, the requested instance(s) of$foo
will get changed to$bar
in the requested scope.
-
In either case, if this would result in a name-clash with an existing
With Ref, defining a variable (or any other type of sybol, really) is analogous to putting a value in a spreadsheet cell (e.g. cell C12); when you refer to that variable elsewhere in your code, it's analogous to adding add an absolute reference to that cell (e.g. $C$12): When you change the contents of cell C12, the value in all cells which refer to $C$12 automatically change to display that new value. Likewise, when you change var $foo
to var $bar
, Ref gives you the option to rename all instances of $foo
, regardless of whether you're renaming it where it was originally defined or any place it is referred to (e.g. $bar = $foo / 2
).
Ref : your code :: Excel : data
Referencing Symbols
- When typing or highlighting a symbol that is in scope, Ref will highlight that instance as "safe", meaning it is in scope and was declared properly.
- When typing or highlighting a symbol that does not exist in that scope, Ref will highlight that instance as "undefined". If the symbol is a variable: in "strongly typed" mode, this will be an error; in "weakly typed mode" this will be ok (and will automatically generate a "var variable_name" declaration on the previous line). If the symbol is anything else, Ref will flag this as an error and will suggest which file to import/include to bring that symbol into scope.
-
When creating a new variable (
var $foo
) that is already in scope, Ref will shows the new$foo
as overriding the main$foo
. The idea is to indicate to the programmer, "are you sure you want to override that pre-existing variable?" This is considered a warning; Ref can be configured to ignore such warnings, although this is not recommended.
Strings and String Interpolation
A large part of programming involves manipulating and generating strings. These days most languages have the ability to do string interpolation, which often beats having to assemble strings solely via string operators (not that string operators are always a bad thing).
The problem with string interpolation is that it still requires "proprietary" syntax to specify an embedded variable (e.g. "Hello, ${fname} ${lname}"
). When you want to include any of those special characters verbatim, you need to escape them, requiring even more "proprietary" characters.
Most programmers take this process with a grain of salt and overlook the annoying (read: error-prone!) requirements for embedding strings (not to mention other variables) within a string. (Some programmers are even silly enough to consider the ability to write complex string expressions as a badge of honor, rather than the annoying pain that it really is.)
So why is the syntax for string interpolation such a bad thing, and what does Ref do about it?
In following with Ref's philosophy, anything that requires programmers to track things (in this case, weird symbols such as character that indicate a variable, string delimiters, etc.) which the IDE could (and should!) track, well, let the IDE keep track of it!
Want to embed a variable inside a string? No problem, just tell Ref that the thing you're typing is a variable (exact mechanics of how to do this are TBD, but hey, how hard could it be?) and Ref will remember it. Even better, you can tell Ref to show all variables in a given style (e.g. larger font, different color, etc.) so they are easy to identify visually in a large block of code. (Although smart programmers will use Ref's sophisticated search feature to find a given variable, rather than parse blocks of code visually.)
Ref doesn't require you to delimit strings with quotes or apostrophes. Sure, Ref can be configured to show string delimiters, but those are just syntactic sugar which Ref will patiently accommodate you by displaying if you so desire. Same with special characters for embedding strings, variables, literals, escape characters, etc.: in the ideal world you shouldn't need them, and if you're using Ref's native language (also called Ref, naturally), you'll never see them.
This means that you never again have to worry about special characters to delimit the begining and end of a string, an embedded variable, an escape character, or anything else that attempts to clumsily bridge the gap between a string and the syntax representing that string.
The ability to hide such syntactic cruft isn't just so you can remove a few extra character from your code; the purpose is really to eliminate one of the many sources of potential errors when writing code--errors which, unfortunately, advanced programmers tend to be relatively good at avoiding in the first place, but which trip up beginner and intermediate programmers all the time. I would argue, though, that even advanced programmers will find this feature of Ref quite compelling.
Code Blocks
Ref excels at managing blocks of code: Ref handles all the housekeeping related to the tasks of moving, copying, commenting out (or in), or writing comments about any block of code.
This may sound needlessly pedantic given that most programmers take it for granted that programming by definition involves manually parsing through blocks of code to determine how to to begin and end them. It's one of those things that most of us have gotten so good at that we just assume it's easy—until we watch a newbie crash and burn hard trying to do something like insert a block inside a longer block, or wrap a block inside another block.
Have you ever tried to wrap a block inside another block and had your editor close the newly-opened block for you, as part of its autocompletion feature? (In other words, you write <div>
and the editor automatically adds </div>
at the end—way before you were ready for it.) What would ordinarily have been a useful feature is now working against you, all because you had no way of telling the editor, "wrap this block in a <div>...</div>
".
Ref makes it easy to:
-
Create a new, standalone block (->
<div>...</div>
) -
Wrap a block in a new block (
<bar>...</bar>
-><foo><bar>...</bar></foo>
) -
Remove a block but not its contents (
<foo><bar>...</bar></foo>
-><bar>...</bar>
) -
Remove a block and its contents (
<foo><bar>...</bar></foo>
->)
Of course, I'm using HTML in these examples but the same principles would work for any language that uses any kind of hierarchical markup. Theoretically Ref could even be used to edit a graph, although I can't even begin to wrap my head around how that UI would look! Maybe version 2.0.
Ref automatically checks the following and does them for you:
-
Are all your blocks properly opened and closed (i.e. any missing or extra braces)? This is never a problem with Ref, because Ref automatically closes all blocks. In fact, the only time you ever need to type a brace to indicate the beginning of a block of code is when you're first creating it; thereafter, Ref manages the end of the block for you. With Ref, it's literally impossible to mistakenly delete the closing
}
because that character is simply syntactic sugar to show you where the block ends. (This applies whether blocks are enclosed by brackets, braces, parentheses, angle brackets, or evenbegin ... end
. Remember, Ref is language-agnostic.) - When you move a block of code from one area to another, how do you reliably determine which variables (and constants and functions and classes) are no longer in the new scope, and which variables/functions/classes from the moved block of code clash with variables/functions/classes that are already in scope where you moved the code to?
-
The mechanism for moving an arbitrary block of code prevents invalid blocks from being moved. E.g. given this block of code:
1. while ( foo ) { 2. if ( bar ) { 3. baz(); 4. bat++; 5. } 6. }
It would be impossible to move or copy lines 2-3 elsewhere. You would have to move lines 2-5, i.e. an entire self-contained block. (In fact Ref will let you copy lines 2-3 elsewhere; it will just create a properly closed block on your behalf.) -
After you move a block of code, symbols that clash with an existing symbol will result in those being flagged as "need to be renamed". You can either select an existing symbol name, or specify a new symbol (Ref will walk you through the process of defining that symbol). Ref can also automatically rename symbols if you wish, e.g. Ref can rename
$foo
to$foo2
). See the section on Introspection for more info on how Ref deals with this.
Strong/Weak Typing
Some languages use strong typing while other are weak (for varying definitions of "strong" and "weak"). Wouldn't it be nice if you could enforce (or not!) strong or weak typing at will?
When "Strong Typing" is enabled, assigning the wrong type of value to an explicitly typed variable will be considered a warning (or not if "Coercion OK" is enabled)
Variable Declaration
Some languages require variables to be declared before they are defined. Wouldn't it be nice if you could enforce (or not!) declaration before definition?
When "Declaration Before Definition" is enabled, assigning a value to a variable before that variable is defined will be considered a warning (or not if "Declaration Before Definition" is disabled)
Comments
-
You can comment out blocks of code in their entirety, even if they overlap with other blocks of code. (I.e. no need to worry about whether the block of code you're commenting out contains
/*
or*/
or any other syntax-specific characters. - Since Ref does not consider comments to be part of the code, they are not delimited by any special characters; instead, they are stored in a different "scope" from the code, although they can be searched as if they exist in that part of the code where they are written, just as with traditional code.)
-
Traditional "in-code" comments are still allowed (e.g.
//
or/* ... */
) but their use is discouraged.
Introspection
-
Variable references that clash with an existing variable will be flagged as "unresolved" and the user can either select:
-
Use existing variable
$foo
; or - Use new name (user supplies new name; IDE helps by declaring new var name in the correct spot, preferably before first instance; warns if new name is also being used; etc.)
-
Use existing variable
- Ref lets you see all unresolved clashes, and easily either resolve them individually or auto-resolve them all.
This is just the tip of the introspection iceberg. Since Ref by definition stores the formalized, normalized version of your code, it stands to reason that this database should be queryable—not just via command line queries but via a scriptable interface. Wouldn't be it great if you would write meta-code along the lines of:
foreach ( 'for' in Project::foo as $f ) {
echo $f->initializer;
}
What does that do, and why would I want to do it? This is just pseudo-code, of course, but the idea is that you could query your codebase for all instances of a for()
loop and determine its initializing statement. Why? Well, this is just an example, not necessarily a useful one.
I can imagine all sorts of amazing statistics you could get, not to mention rules you could write for checking/enforcing security measures and business rules, e.g. "all function calls that begin with get_user_...()
must start with a call to user_security_check()
", and "any function that calls foo()
should never then call bar()
".
Clearly Ref would need to contain its own built-in meta-language and API that would give the user access to the database of projects stored in Ref.
Transpiling
Depending on what language you develop in, your runtime and/or IDE may already do some of these things to varying degrees. That's great, but Ref goes several steps further.
First, keep in mind that Ref is language-independent: if you're developing a PHP project, Ref's parser will understand PHP. Likewise, if you're developing a Python project, Ref's parser will operate in Python mode. Ref supports multiple languages and is always expanding to add more. This is purely for pragmatic reasons: today, developers develop in the languages they develop in; nobody does actual development in a theoretical language! But more realistically, we all know that programmers develop in the same old languages largely due to inertia: what works, works; there's no blame being placed here.
These "real world language-specific" versions of Ref are designed to solve immediate problems with those languages. But Ref also attempts to solve larger problems by adding features to those languages that they don't already contain, like the ability to manage not just code but also resources (text, images, etc.). In that way, all the components of your project are stored in Ref's database, and you can make use of all of Ref's introspection tools to gain insights into your project--not just while developing but also during runtime!
By storing your entire code base—including external data and configuration files—in what is essentially a database, in which references to symbols are linked to each other, Ref's IDE lets you develop in a virtual, project-centric environment. The term "project" here is similar to current IDEs' use of the word, but instead of refering to "a bunch of files which make up a software project" it instead refers to the next higher level of abstraction: a collection of symbols, namespaces, and data objects which can be navigated via Ref's editor. While your code may reside in separate files, and you can certainly continue to develop that way, Ref encourages you to take a more holistic approach to your project and make use of Ref's references to traverse, manage, and develop your project.
To this end, Ref also operates in "abstract language" mode, in which you can develop in Ref's native language (also called "Ref"), which happens to be very C-like. Why would you do this? Because the Ref language can automatically transpile into your language of choice! In other words, write your code in Ref, and save it in, say, PHP. This makes Ref like JSX or SASS in the sense that it transpiles high-level code into native code. When you add in Ref's tools for introspection and managing resources, you have an extremely powerful tool for not just development but also runtime analysis.
Are you still wondering why Ref is named "Ref"? Because every symbol in your project is stored in Ref's database, it can be accessed by name, symbol type (variable, constant, function, class name, file name, etc.). In fact, every instance of a given symbol in your code is simply a reference (get it?) to that symbol in the database. Once your code has been fully parsed and analyzed and stored in a database (rather than a series of—shudder—text files), it's easy to make changes to it! Want to change $foo
to $bar
in a given scope? That just boils down to Ref making an DB call behind the scenes along the line of UPDATE tbl_symbol SET sym_name = "$bar" WHERE sym_name = "$foo" WHERE ..."
There's much more to it than that, of course, but that's the underlying theory of how Ref works.
If you're starting to think Ref sounds a lot like Ted Nelson's Project Xanadu, you're not wrong! While Ref contains some elements of the idea of "everything is linkable" (my super oversimplification of Xanadu's Tumbler), Ref is definitely more than "just" a complex hyperlinking/referencing system.
How Does Ref Work From the Developer's Perspective?
Developers use Ref as a powerful IDE. The main difference between Ref's editor and that of a traditional IDE is that once you've entered a line of code in Ref, it gets parsed and displayed similarly to a spreadsheet: you can click on symbols (variables, constants, etc.) within that line of code to edit them and get more info (see above).
If you make a major change to a line of code (e.g. turning a for()
loop into an assignment), the block of code that made up the loop is automatically outdented (because it's no longer in a loop) and Ref analyzes that section of code for reference errors.
To make a block (e.g. a for()
loop) around multiple lines of code in a traditional editor, you would manually add a for()
statement, along with opening and closing braces. With Ref, you create a block around code by highlighting several lines of code, right-clicking, and selecting the "Make block..." option, which will let you create a block of your choice (whether simply a new scope, or a loop, or a function, etc.). If the lines of code you're attempting to make a block around are not "blockable" (e.g. because they begin inside an existing block and end outside it), Ref will prevent you from creating the block, and/or possibly assist you by suggesting a syntactically correct containing scope for the block.
Similarly, to remove the block around a bunch of code, in a traditional editor you would remove the opening and closing braces (and, if it's a loop, the for()
or while()
code that precedes it). The most difficult part of this process is the administrative step of finding the matching brace. With Ref, you would right-click the block indicator (as opposed to the line indicator) anywhere in that block and select "Unblock" from the context menu. (Another option would be "Delete Block", which deletes the entire block, including the code inside it.) With Ref, there is no need to find matching braces because you're dealing with a block as a whole, which Ref has already identified.
You can drag-and-drop multiple lines of code into a pre-existing block, and they will get indented automatically. In fact, indenting/outdenting is completely automatic with Ref: the developer never has to worry about that. If you care, you can set the displayed indentation level, but that's nothing but syntactic sugar.
New Editing Paradigm
With traditional editors, your code is a long sequence of characters separated by newlines and indented with tabs purely to help the poor programmer navigate the tree structure. With Ref, your code is a series of nested objects. That means the process of editing your code really means creating, updating, deleting, or moving those blocks.
Because Ref encourages you to think of code as more abstract objects, the editing process suddenly becomes much different than when you were treating code as a linear sequence of characters. In order to ensure your code is always in a valid state (i.e. no syntax errors), Ref must give you tools to cleanly perform the following operations:
- Insert new block
- Surround selection with new block
- Move block from point "a" to point "b"
- Delete block and its contents
- Delete just the block but not its contents
- Copy block from point "a" to point "b"
There are probably a few more primitive operations I haven't thought of but this is the basic idea. More complex combinations of these primitives might exist as individual features as well (similar to the concept that multiplication is simply repeated addition).
Expressions vs Blocks
While Ref excels at managing blocks of code as a tree structure, it has similar features for expressions. Imagine the following expression:
( ( ( $a + b ) + 2 ) / $c > $d - $e ) + 5
Now imagine trying to edit this, and not get confused by the parentheses. With a traditional editor, the best you can do (at least that I can think of) would be to reformat it, adding some clarifying parentheses while you're at it:
(
(
(
(
( $a + $b )
+
2
)
/
$c
)
>
( $d - $e )
)
+
5
)
(In fact, this is how I write all my expressions; they're just much easier to read.)
Now that this long expression has been divided up into a series of smaller expressions, it's much easier to understand. (Still TBD: how much Ref would do this automatically, and how much would be left to the user to format the way they prefer.)
Regardless of whether Ref actually does this reformatting for you, you'd be able to highlight any expression or subexpression and edit in its entirety, without having to worry about matching braces. (TBD: the exact UI mechanism for the user to specify "this (sub)expression" vs "this chuck of code regardless of whether it's an entire (sub)expression".)
In any case, the point here is that once again Ref removes the burden from the programmer of worrying about exactly where an expression begins and ends.
And, of course, every time you edit a line of code, Ref re-parses it and lets you know immediately if you've made any syntax errors or references to unknown symbols.
If you're accustomed to a traditional editor, Ref will at first seem like a needless restriction on your ability to make any freeform change you want to any line of code. With a traditional editor, until you've finished your edits, your code is technically in a state of syntax error--and as a developer, that's usually fine, because you know how you want it to end up. The problem is, you don't always know if you've created syntactially correct code until your IDE either stops showing unmatched braces, or you attempt to run your code.
With Ref, it's impossible for your code to be in a state of syntax error! This is a feature, not a bug. In fact, it's one of Ref's strengths.
Because sometimes it's just easier to do it the old fashioned way, Ref also has a "cowboy-coding" mode, in which you can cut to the chase and edit code directly, exactly like a traditional editor. Once you exit that mode, though, Ref will warn you about any untenable code you may have written.
Caveats
Depending on what language you're writing in and how far you're pushing its limits, Ref's analysis tools may be limited. For example, if you're using PHP's include
feature, Ref will be able to parse and digest any constant filename (e.g. include 'foo.php'
) but filenames determined at runtime (e.g. include $foo
) are beyond Ref's ability to analyze deeply. The same applies to variable variables: Ref doesn't have the ability to determine all possible values of a variable.
I'm pretty sure these seemingly intractible problems are related to the Halting Problem, although I'm hoping some Comp Sci genius might be able to give Ref the ability to make some headway in these areas in certain circumstances. Until then, the more of Ref's features you want to take advantage of, the more you'll have to eliminate code that uses these features which Ref is incapable of dealing with.
Okay, But What Does Ref Actually Look Like?
Patience. Sample screenshots are coming soon. Meanwhile, here's a crude mockup:
Name | Type | Defined In File | Line |
---|---|---|---|
$x | int | /home/my_project/my_file.php | 123 |
$y | int | /home/my_project/my_file.php | 124 |
$z | int | /home/my_project/my_file.php | 125 |
Perhaps you're thinking, "Hmmm, except for the missing semicolons at the end of each statement, and the missing braces at the beginning and end of a block, this looks just like a regular editor with some PHP code!"
Correct! Ref's rendering of your code is as close as possible to the way you wrote it. Ref has a toggle to show/hide syntactic sugar such as statement-terminating semicolons and block-enclosing braces. Since those are ultimately just eye candy, some programmers may prefer to hide them. But make no mistake: when you save your "final" code for, Ref inserts those where necessary (depending on the source language you're editing).
But let's move past the similarity to your favorite IDE to the real power of Ref.
Since it's quite difficult to make this mockup functional in any meaningful way, I'll have to resort to textual descriptions.
Code Editor
The challenge in building Ref's UI is to allow the user to edit their code in a manner that is as similar as possible to a traditional text editor, while still giving them the ability to treat their code as virtual objects that can be analyzed and "understood" by Ref. That means users need the ability to select an individual item in a line (e.g. a variable or constant) to get stats on that item (see the Introspection Pane), rename it, etc.
One particularly difficult task in creating Ref's UI will be to allow the user to highlight any part of the code, but restrict them from moving lines that don't form a fully self-contained block. For example, if you highlight a function declaration and the first few lines of the function body, you can't meaningfully move that somewhere else because it's not a fully self-contained block of code (because your highlight stops before the end of the function).
This is uncharted territory for me, since I'm not sure what the options should be:
- Outright prevent the user from moving this block?
- Let them move it but attempt to "repair" the "hole" they made?
- Just move the lines that are inside the block (in this case the function body) but leave the function declaration as-is?
- Or let them move the lines they want, but mark the now "headless" function body as incomplete (i.e. syntax error)? (This kind of defeats the purpose of Ref, but OTOH it needs to be able to deal with this kind of scenario when reading in syntactially incorrect files.)
Most of the other features should operate in a pretty straightforward manner:
- Edit a line: Double-click whitespace to the right of a line (or in the left margin?), or right-click anywhere on the line and select "Edit" from the context menu to edit a line.
- Edit part of a line (e.g. an expression): Double-click the part of the line you want to edit, or right-click the part you want to edit (it will highlight so you can see what you right-clicked) and select "Edit Expression" from the context menu.
- Select line(s): Click the handle in the left margin. You can then Ctrl/Shift+PgUp/PgDn/ArrowUp/ArrowDown to select multiple lines
Introspection and Analysis
The top pane is obviously the current file you're editing. The pane at the bottom is similar to the console in your browser's Developer Tools: it shows the symbols (variables, constants, classes, etc.) which are in scope at the given line in the editor that you've selected. If you click any of those symbols, you can jump to the line in the file where they were defined. (Ref's UI also has "Forward" and "Back" buttons so you can navigate forward and backward to the file/line you were at previously, similar to a web browser's forward and back buttons. Note that this is not the same as Undo/Redo, nor is it the same as tracing through your code! It's merely a convenient way to cycle back (or forward) to the places in your code that you visited previously.)
Analysis tools should be similarly easy to figure out how to invoke/display: Errors vs. warnings vs info could be hidden/shown, and Ref could walk the user through resolving them, giving hints along the way about its best guess as to what the code should say.
I fully expect the Ref development process will raise all sorts of interesting questions and dilemmas about how much freedom the programmer should have to write "bad" code (for various definitions of "bad") vs. how much control the editor should retain to prevent the programmer from shooting themselves in the foot. That discovery process, to me, will be the most exciting part about developing Ref!
The first iteration of Ref will focus largely on editor mechanics, and making it easy for the programmer to just write code, while preventing them from making the most obvious mistakes (typos, undeclared variables, name clashes, etc.). After that, more powerful analysis and introspection tools can be added gradually.
Obtaining Ref
Are you ready to download Ref and start using it? No problem, as long as you're willing to first write it! Currently, Ref is just a grand idea. It will take people with far more talent and time than I have to implement even the most basic features. Thanks for volunteering!
If you're as convinced as I am that Ref could singlehandedly revolutionize the field of software development (and not just for the web but all programming everywhere), please join me in continuing to formalize Ref's features and bring it to fruition. My ultimate goal is to be able to write Ref in my language of choice: Ref!
Contact me atRelated Ideas
- Intentional programming [Wikipedia]
- Source Code in Database [Wikipedia]
- Codebase as Database: Turning the IDE Inside Out with Datalog [Pete Vilter's blog]
- The Great Software Stagnation [Jonathan Edwards' blog]
- Subtext language [research project by Jonathan Edwards]