The Invisible Interface

· 06.27.2013 · etc

Everyone's always fascinated by new modes of (digital) interactions, and there are a lot of interesting and novel ideas around what might be the dominant interaction medium in the future. Touch? Gesture? Voice? Eye-tracking?

Although these modes are what interaction design seems to be trending towards, I want to revisit a hugely efficient if not largely unappreciated mode — sibling in some ways to these new interaction modes — that has been around for ages: keyboard shortcuts.

When seeing discussions around interaction design, I seldom, if ever, see the mention of keyboard shortcuts (I'll be talking about desktop web from here on out, since that's what uses a hardware keyboard). This is maybe because interaction design by and large seems focused on web design, and keyboard shortcuts have been relegated to the realm of desktop software^[1] (I'm not sure why they didn't fully carry over). But where they are, they are typically used — interaction designers use them all the time, I'm sure, while using Illustrator, or Photoshop, or Omnigraffle, etc. But, ironically, keyboard shortcuts always seem like an afterthought in the designs generated by these software, if they are thought about at all.

Perhaps keyboard shortcuts are not thought of because design is so focused on immediate intuitiveness and user-friendliness. And to be honest, keyboard shortcuts are not necessarily either of those (well, at first). There is almost always a learning curve to them, and their usage is often associated with only advanced or "power" users. That's a valid concern — you want to entice new users to use your product, and spare them an intimidating or hidden interface. It doesn't have to be that way, especially with the usage of convention to establish a degree of predictability when approaching one of these interfaces. But in general, I'm not advocating keyboard shortcuts as a replacement, but as a supplement to an existing interface, especially for products that people may be using for several hours a day, every day.

Physical-Metaphor Interfaces

Lately I've been captured by the idea of invisible interfaces — interfaces that don't necessarily require visual elements. Why is that a good thing? What makes keyboard shortcuts so great? Well, a lot of interface design is still grounded in physical metaphor. You have to move your mouse cursor (or stylus) to a button, which you then press, and then something happens. This is fairly intuitive in that this is how we interact with things in the real world: I have to make a targeted motion to manipulate something.

In human-computer interaction, Fitt's law describes the inverse relationship between speed and accuracy when working with this type of interface. Smaller or further targets take longer to "acquire", and trying to do so quicker means a sacrifice in accuracy.

On the left, D is the distance from the cursor to the target, and S is the width of the target.
Fitt's law is typically expressed as T = a + (b * log₂(1 + 2W/S)), where a and b are constants for either mouse, stylus, etc, and T is the time to acquire the target.

On the right, a keypress is a much more direct means to action.

But in the digital world, we have the benefit of much more direct routes between intent and action. I can hit a combination of keys, and immediately an action is executed. No need to waddle my cursor through space and time to get the job done. The intent-action gap is condensed dramatically, and we can effectively circumvent the constraints of Fitt's law.

And furthermore, the interface doesn't necessarily need to take up any space any more. It's "invisible"; it exists in the muscle memory of the user, and actions can be executed impulsively.

Key Expressions

There is, however, something even more powerful than keyboard shortcuts: keyboard expressions.

That is, certain keys or key combinations correspond to certain actions, which can be chained together like words in a sentence, and you can express more complex actions in a few keystrokes.

Vim in action.

Vim is probably the ultimate manifestation of this approach. Vim is a text editor favored by programmers^[2] for its extreme efficiency, and notorious for its difficulty to learn. Its steep learning curve can be frustrating, but once you learn it, the amount of time and effort it saves you is seemingly infinite.

In Vim, certain keys are mapped to certain actions, and you can express complex chains of action in a few keystrokes. There are really only a handful of keys and bit of syntax you need to know, but their combinatorial power can be very potent. These expressions make Vim one of the elegant and poetic tools I've ever used.

Say, for a somewhat contrived example, that you're editing a document, you're somewhere in the middle of it and you wanted to delete the first line and then return to the line you're currently on.

In a normal text editor, you'd grab your mouse, move up to that line, select it all, then hit delete, then move the mouse back to the line you were on. This requires a degree of precision, especially if you're moving quickly, to position the mouse over the correct line (if you look closely, when selecting the original line again, I accidentally select the line below at first). We have to worry about Fitt's law here.

In Vim, all you have to do is type:

ggdd``

gg jumps you to the top of the document, dd deletes the line you're on, then `` jumps you back to where you were before. The discreteness of the keystroke — that is, it's pressed or it's not — means we can't accidentally select the wrong line^[3]. Here, the `` command will resolutely and absolutely bring you back to the last line you were on; the computer won't accidentally jump you to an adjacent line.

It may not seem like a big difference, but this is just scratching Vim's surface, and if this is something you're doing a lot, it saves you a great deal of time and headache.

The real of power Vim is that these keystroke combinations are a language. You "say" what you want to do. Want to delete the next 10 lines of text? You can just type:

10dd

To break it down, what you're "saying" is:

10 = "10 times..."

dd = "execute the delete line command"

The Invisible Interface

Here's a more realistic example.

Think about some sort of office software, say a presentation creation application. It will have a fairly complex interface due to the sheer amount of actions available — you have certain actions for type, such as changing font size, italicizing, underlining, and other formatting options, and then certain actions for a shape, such as coloring, size, position, stroke size, and so on. To mitigate this onslaught of options, actions are stuffed in menus, and a select few are surfaced as keyboard shortcuts.

What if this application had an invisible interface like Vim's? Say I'm on slide 10, and I want to move this slide's title, "Space and Times", to slide 22. In a traditional interface, I'd have to visually scan for the title, then move the cursor to select it, then hit CTRL+X to cut it out, then move over to the sidebar that lists all the slides, possibly scroll down this sidebar until I see slide 22, then select slide 22, then paste in the title.

With an expressive keystroke language, I could accomplish the same with just:

/Spacxxg22gpp

To break this down:

/ = "start searching for an object starting with the text..."

Spac = "Spac" (matches the text object containing "Space and Times")

<Enter> = (hit Enter) "select this matching object"

xx = "and cut it"

g = "then go to slide..."

22 = "22"

g = (confirm the go to movement)

pp = "and then paste"

This might look like complicated gibberish, but in practice it's very fluid and hard to go back to physical-metaphor interfaces.

Beyond the Keyboard

These ideas can be expanded beyond hardware keyboard inputs to other inputs as well. Broadly speaking, the general idea here is that, with a set of limited, distinguishable inputs, you can craft an interaction "language", expressed through meaningful combinations of input values, vastly expanding the power of the few inputs. This can decrease reliance on visual elements for input, which are often single-purpose (i.e. you click a button and it triggers a single, specific action). Gestural interfaces, in addition to other trending interfaces, might fall into this categorization.

Does this approach make sense for all interfaces? Not necessarily. There are concerns, for instance, of satisficing, where users tend to opt for suboptimal, but low-penalty, behaviors, preferring to settle for less-than-best because the best requires an investment of time and effort. Of course, if your interactions with a particular system are short and infrequent, that strategy makes sense. But even with interfaces where there is repeated and prolonged engagement, people typically continue to satisfice. The initial investment of time and effort is off-putting, and people are terrible at evaluating long-term gains against short-term costs. For example, even though the Dvorak keyboard layout is much more efficient and less damaging than the QWERTY keyboard layout (which is a vestigial pattern from typewriters), hardly anyone uses it because it's too damn inconvenient to learn.

But I believe it's at least important to consider this option. Within these interfaces is a potential for much more fluid and efficient, and even enjoyable (Vim is really fun to use), interactions. And it's interesting to move away from a reliance on visual digital interfaces and start exploring one that we carry with us, one that exists in muscle memory.

1. One exception is Google Docs, which has an extensive set of keyboard shortcuts and is arguably directly modeled off of desktop software.

2. There's also Emacs.

3. This doesn't mean you can't make any errors in Vim — you can still hit the wrong key, of course!