Discovering high-quality comments and discussions

12.05.2014 15:34
etc

Say you have a lot of users commenting on your service, and inevitably all the pains of unfettered (pseudo-)anonymous chatter emerge - spam, abusive behavior, trolling, meme circlejerking, etc. How do you sort through all this cruft? You can get community feedback - up and down votes, for example - but then you have to make sense of all of that too.

There are many different approaches for doing so. Is any one the best? I don’t think so - it really depends on the particular context and the nuances of what you hope to achieve. There is probably not a single approach generalizable to all social contexts. We won’t unearth some grand theory of social physics which could support such a solution; things don’t work that way.

Nevertheless! I want to explore a few of those approaches and see if any come out more promising than the others. This post isn’t going to be exhaustive but is more of a starting point for ongoing research.

Goals

There are a lot of different goals a comment ranking system can be used for, such as:

  • detection of spam
  • detection of abusive behavior
  • detection of high-quality contributions

Here, we want to detect high-quality contributions and minimize spam and abusive behavior. The three are interrelated, but the focus is detecting high-quality contributions, since we can assume that it encapsulates the former two.

Judge

First of all, what is a “good” comment? Who decides?

Naturally, it really depends on the site. If it’s some publication, such as the New York Times, which has in mind a particular kind of atmosphere they want to cultivate (top-down), that decision is theirs.

If, however, we have a more community-driven (bottom-up) site built on some agnostic framework (such as Reddit), then the criteria for “good” is set by the members of that community (ideally).

Size and demographics of the community also play a large part in determining this criteria. As one might expect, a massive site with millions of users has very different needs than a small forum with a handful of tight-knit members.

So what qualifies as good will vary from place to place. Imgur will value different kinds of contributions than the Washington Post. As John Suler, who originally wrote about the Online Disinhibition Effect, puts it:

According to the theory of “cultural relativity,” what is considered normal behavior in one culture may not be considered normal in another, and vice versa. A particular type of “deviance” that is despised in one chat community may be a central organizing theme in another.

To complicate matters further, one place may value many different kinds of content. What people are looking for varies day to day, so sometimes something silly is wanted, and sometimes something more cerebral is (BuzzFeed is good at appealing to both these impulses). But very broadly there are two categories of user contributions which may be favored by a particular site:

  1. low-investment: quicker to digest, shorter, easier to produce. Memes, jokes, puns, and their ilk usually fall under this category. Comments are are more likely made based on the headline rather than the content of the article itself.
  2. high-investment: lengthier, requires more thought to understand, and more work to produce.

For our purposes here, we want to rank comments according to the latter criteria - in general, comments tend towards the former, so creating a culture of that type of user contribution isn’t really difficult. It often happens on its own (the “Fluff Principle”[1]) and that’s what many services want to avoid! Consistently high-quality, high-investment user contribution is the holy grail.

I should also note that low-investment and high-investment user contributions are both equally capable of being abusive and offensive (e.g. a convoluted racist rant).

Comments vs discussions

I’ve been talking about high-quality comments but what we really want are high-quality discussions (threads). That’s the whole point of a commenting system, isn’t it? To get users engaged and discussing whatever the focus of the commenting is (an article, etc). So the relation of a comment to its subsequent or surrounding discussion will be an important criteria.

Our definition of high-quality

With all of this in mind, our definition of high-quality will be:

A user contribution is considered high quality if it is not abusive, is on-topic, and incites or contributes to civil, constructive discussion which integrates multiple perspectives.

Other considerations

Herd mentality & the snowball effect

We want to ensure that any comments which meet our criteria has equal opportunity of being seen. We don’t want a snowball effect[2] where the highest-ranked comment always has the highest visibility and thus continues to attract the most votes. Some kind of churn is necessary for keeping things fresh but also so that no single perspective dominates the discussion. We want to discourage herd mentality or other polarizing triggers [foo] [3].

Gaming the system

A major challenge in designing such a system is the preventing anyone from gaming the system. That is, we want to minimize the effects of Goodhart’s Law:

When a measure becomes a target, it ceases to be a good measure.

This includes automated attacks (e.g. bots promoting certain content) or corporate-orchestrated manipulation such as astroturfing (where a company tries to artificially generate a “grassroots” movement around a product or policy).

Minimal complexity

On the user end of things, we want to keep things relatively simple. We don’t want to have to implement complicated user feedback mechanisms for the sake of gathering more data solely for better ranking comments. We should leverage existing features of the comment (detailed below) where possible.

Distinguish comment ranking from ranking the user

Below I’ve included “user features” but we should take care that we judge the comment, not the person. Any evaluation system needs to recognize that some people have bad days, which doesn’t necessarily reflect on their general behavior[4]. And given enough signals that a particular behavior is not valued by the community, users can adjust their behavior accordingly - we want to keep that option open.

No concentration of influence

Expanding on this last point, we don’t want our ranking system to enable an oligarchy of powerusers, which has a tendency of happening in some online communities. This can draw arbitrary lines between users; we want to minimize social hierarchies which may work to inhibit discussion.

Transparency

Ideally whatever approach is adopted is intuitive enough that any user of the platform can easily understand how their submissions are evaluated. Obscurity is too often used as a tool of control.

Adaptable and flexible

Finally, we should be conscious of historical hubris and recognize that it’s unlikely we will “get it right” because getting it right is dependent on a particular cultural and historical context which will shift and evolve as the community itself changes. So whatever solution we implement, it should be flexible.

Our goals

In summary, what we hope to accomplish here is the development of some computational approach which:

  • requires minimal user input
  • requires minimal user input
  • minimizes snowballing/herd mentality effects
  • is difficult to game by bots, astroturfers, or other malicious users
  • promotes high-quality comments and discussions (per our definition)
  • penalizes spam
  • penalizes abusive and toxic behavior
  • is easy to explain and intuitive
  • maintains equal opportunity amongst all commenters (minimizes poweruser bias)
  • adaptable to change in community values

What we’re working with

Depending on how your commenting system is structured, you may have the following data to work with:

  • Comment features:
    • noisy user sentiment (simple up/down voting)
    • clear user sentiment (e.g. Slashdot’s more explicit voting: “insightful”, for instance.)
    • comment length
    • comment content (the actual words that make up the comment)
    • posting time
    • the number of edits
    • time interval between votes
  • Thread features:
    • length of the thread’s branches (and also the longest, min, avg, etc branch length)
    • number of branches in the thread
    • Moderation activity (previous bans, deletions, etc)
    • Total number of replies
    • Length of time between posts
    • Average length of comments in the thread
  • User features:
    • aggregate historical data of this user’s past activity
    • some value of reputation

Pretty much any commenting system includes a feature where users can express their sentiment about a particular comment. Some are really simple (and noisy) - basic up/down voting, for instance. It’s really hard to tell what exactly a user means by an up or a downvote - myriad possible reactions as reduced to a single surface form. Furthermore, the low cost of submitting a vote means they will be used more liberally, which is perhaps the intended effect. But introducing high-cost unary or binary voting systems, such as Reddit Gold (which requires purchase before usage), have a scarcity which is a bit clearer in communicating the severity of the feedback. The advantage of low-cost voting is that its more democratic - anyone can do it. High-cost voting introduces barriers which may bar certain users from participating.

Other feedback mechanisms are more sophisticated and more explicit about the particular sentiment the user means to communicate. Slashdot allows randomly-selected moderators to specify whether a comment was insightful, funny, flamebait, etc, where each option has a different voting value.

I’ve included user features here, but as mentioned before, we want to rate the comment and discussion while trying to be agnostic about the user. So while you could have a reputation system of some kind, it’s better to see how far you can go without one. Any user is wholly capable of producing good and bad content, and we care only about judging the content.


Approaches

Caveat: the algorithms presented here are just sketches meant to convey the general idea behind an approach.

Basic vote interpretation

The simplest approach is to just manipulate the explicit user feedback for a comment:

score = upvotes - downvotes

But this approach has a few issues:

  • scores are biased towards post time. The earlier someone posts, the greater visibility they have, therefore the greater potential they have for attracting votes. These posts stay at the top, thus maintaining their greater visibility and securing that position (snowballing effect).
  • it may lead to domination and reinforcement of the most popular opinion, drowning out any alternative perspectives.

In general, simple interpretations of votes don’t really provide an accurate picture.

A simple improvement here would be to take into account post time. We can penalize older posts a bit to try and compensate for this effect.

score = (upvotes - downvotes) - age_of_post

Reddit used this kind of approach but ended up biasing post time too much, such those who commented earlier typically dominated the comments section. Or you could imagine someone who posts at odd hours - by the time others are awake to vote on it, the comment is already buried because of the time penalty.

This approach was later replaced by a more sophisticated interpretation of votes, taking them as a statistical sample rather than a literal value and using that to estimate a more accurate ranking of the comment (this is Reddit’s “best” sorting):

If everyone got a chance to see a comment and vote on [a comment], it would get some proportion of upvotes to downvotes. This algorithm treats the vote count as a statistical sampling of a hypothetical full vote by everyone, much as in an opinion poll. It uses this to calculate the 95% confidence score for the comment. That is, it gives the comment a provisional ranking that it is 95% sure it will get to. The more votes, the closer the 95% confidence score gets to the actual score.

If a comment has one upvote and zero downvotes, it has a 100% upvote rate, but since there’s not very much data, the system will keep it near the bottom. But if it has 10 upvotes and only 1 downvote, the system might have enough confidence to place it above something with 40 upvotes and 20 downvotes – figuring that by the time it’s also gotten 40 upvotes, it’s almost certain it will have fewer than 20 downvotes. And the best part is that if it’s wrong (which it is 5% of the time), it will quickly get more data, since the comment with less data is near the top – and when it gets that data, it will quickly correct the comment’s position. The bottom line is that this system means good comments will jump quickly to the top and stay there, and bad comments will hover near the bottom.

Here is a pure Python implementation, courtesy of Amir Salihefendic:

from math import sqrt

def _confidence(ups, downs):
    n = ups + downs

    if n == 0:
        return 0

    z = 1.0 #1.0 = 85%, 1.6 = 95%
    phat = float(ups) / n
    return sqrt(phat+z*z/(2*n)-z*((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)

def confidence(ups, downs):
    if ups + downs == 0:
        return 0
    else:
        return _confidence(ups, downs)

Reddit’s Pyrex implementation is available here.

The benefit here is you no longer need to take post time into account. When you are getting the vote count doesn’t matter, all that matters is the sample size!

Manipulating the value of votes

How you interpret votes depends on how you collect and value you them. Different interaction design of voting systems, even if the end result is just up or downvote, can influence the value users place on a vote and under what conditions they submit one.

For instance, Quora makes only upvoters visible and hides downvoters. This is a big tangent so I will just leave it at that for now.

In terms of valuing votes, it’s been suggested that a votes should not be equal - that they should be weighted according to the reputation of the voter:

  • Voting influence is not the same for all users: its not 1 (+1 or -1) for everyone but in the range 0-1.
  • When a user votes for a content item, they also vote for the creator (or submitter) of the content.
  • The voting influence of a user is calculated using the positive and negative votes that he has received for his submissions.
  • Exemplary users always have a static maximum influence.

Although like any system where power begets power, there’s potential for a positive feedback loop where highly-voted people amass all the voting power and then you have an poweruser oligarchy.

Using features of the comment

For instance, we could use the simple heuristic: longer comments are better.

Depending on the site, there is some correlation between comment length and comment quality. This appears to be the case with Metafilter and Reddit:

(or at least we can say more discussion-driven subreddits have longer comments).

Using the structure of the thread.

Features of the thread itself can provide a lot of useful information.

Srikanth Narayan’s tldr project visualizes some heuristics we can use for identifying certain types of threads.

A few examples from that project:

Conclusion

Here I’ve only discussed a few approaches for evaluating the quality of comments and using that information to rank and sort them. But there are other techniques which can be used in support of the goals set out at the start of this post. For example, posting frequency or time interval limits could be set to discourage rapid (knee-jerk?) commenting, or the usage of real identities could be made mandatory (ala Facebook, not a very palpable solution), or scale posting privileges (ala Stack Overflow) or limit them to only certain users (ala Metafilter). Gawker’s Kinja allows the thread parent to “dismiss” any child comments. You could provide explicit structure for responses, as the New York Times has experimented with. Disqus creates a global reputation across all its sites, but preserves some frontend pseudonymity (you can appear with different identities, but the backend ties them all together).

Spamming is best handled by shadowbanning, where users don’t know that they’re banned and have the impression that they are interacting with the site normally. Vote fuzzing is a technique used by Reddit where actual vote counts are obscured so that voting bots have difficulty verifying their votes or whether or not they have been shadowbanned.

What I’ve discussed here are technical approaches, which alone cannot solve many of the issues which plague online communities. The hard task of changing peoples’ attitudes is pretty crucial too.

As Joshua Topolsky of the Verge ponders:

Maybe the way to encourage intelligent, engaging and important conversation is as simple as creating a world where we actually value the things that make intelligent, engaging and important conversation. You know, such as education, manners and an appreciation for empathy. Things we used to value that seem to be in increasingly short supply.


[1]: The Fluff Principle, as described by Paul Graham:

on a user-voted news site, the links that are easiest to judge will take over unless you take specific measures to prevent it. (source)

User LinuxFreeOrDie also runs through a theory on how user content sites tend towards low-investment content. The gist is that low-investment material is quicker to digest and more accessible, thus more people will vote on it more quickly, so in terms of sheer volume, that content is most likely to get the highest votes. A positive feedback effect happens where the low-investment material subsequently becomes the most visible, therefore attracting an even greater share of votes.

[2]: Users are pretty strongly influenced by the existing judgement on a comment, which can lead to a snowballing effect (at least in the positive direction.):

At least when it comes to comments on news sites, the crowd is more herdlike than wise. Comments that received fake positive votes from the researchers were 32% more likely to receive more positive votes compared with a control…And those comments were no more likely than the control to be down-voted by the next viewer to see them. By the end of the study, positively manipulated comments got an overall boost of about 25%. However, the same did not hold true for negative manipulation. The ratings of comments that got a fake down vote were usually negated by an up vote by the next user to see them. (source)

[3]: Researchers at George Mason University Center for Climate Change Communication found that negative comments set the tone for a discussion:

The researchers were trying to find out what effect exposure to such rudeness had on public perceptions of nanotech risks. They found that it wasn’t a good one. Rather, it polarized the audience: Those who already thought nanorisks were low tended to become more sure of themselves when exposed to name-calling, while those who thought nanorisks are high were more likely to move in their own favored direction. In other words, it appeared that pushing people’s emotional buttons, through derogatory comments, made them double down on their preexisting beliefs. (source)

[4]: Riot Games’s player behavior team found that toxic behavior is typically sporadically distributed amongst normally well-behaving users:

if you think most online abuse is hurled by a small group of maladapted trolls, you’re wrong. Riot found that persistently negative players were only responsible for roughly 13 percent of the game’s bad behavior. The other 87 percent was coming from players whose presence, most of the time, seemed to be generally inoffensive or even positive. These gamers were lashing out only occasionally, in isolated incidents–but their outbursts often snowballed through the community. (source)


No Monoliths

10.09.2014 15:28
etc

Image from

The dust has mostly settled around Ello, the new social network that promises no information tracking or selling of user data to advertisers. Ello has been popular and refreshing because of these policies. Of course, there’s concern and criticism around whether or not the folks behind Ello can be taken on their word and the long-term business viability of this strategy - their current plan (as of 10/08/2014) for revenue is selling premium features:

We occasionally offer special features to our users. If we create a special feature that you really like, you may choose to support Ello by paying a very small amount of money to add that feature to your Ello account.

And since, it seems, Ello blew up prematurely, their privacy policy has also drawn fire, though they have made promises to tighten things up.

Aside from these problems, which are understandable for any nascent social network, Ello has been a great opportunity to evaluate what we want and expect from social networks, and the inadequacies against the current dominating services: Ello has frequently been contrasted against Facebook (naturally), who’s policies are always found problematic by at least some subset of its users. Facebook’s selling of personal data and excessive tracking is the main thing Ello positions itself against, and a big part of what has drawn people to the platform.

I don’t think Ello is a solution to these problems. Ello only challenges the symptoms of social networks like Facebook, that is, a centrally-controlled social network that, whether intentionally or not, is perceived to be the one platform everyone needs to be on. For social networking services structured in this way, I believe it is inevitable that user data eventually gets collected (and on Ello, it does, ostensibly to improve the site) and sold. It is a feature of this structure that you, as a user, must entrust your data to a third party which you do not know, to which you have no personal relationship with. What happens with that is then at the discretion of that third party.

To really resolve these problems, we must challenge the fundamental structure of these services. An ideal solution is one similar to that which the ill-fated Diaspora pursued, but perhaps taken a step further. I would love to see a service which allows users to self-host their own social network for their friends or community/communities (or for the less technically-oriented, spawn a cloud-hosted version at the click of a button).

Each community then has the opportunity to manage its own data, implement its own policies, make its own decisions about financial support for the network (e.g. have users pay membership fees, run on donations, or even sell data for ads if the users are ok with it).

Each network can run independently on its own social norms, but all the networks are technically interoperable. So if I wanted to I could join multiple networks, post across networks, and so on, all with the same identity.

But this kind of plurality of networks better acknowledges that people have multiple identities for different social contexts, something which monolithic social networks (i.e. one platform for all things social) are not well-suited for. With the latter, all activity is tied to one identity, and then there’s a great deal of manual identity management to keep each social sphere properly contained. With this parallel network design, users can - if they want - share an identity across multiple networks, or they can have different identities for different networks, which for the user are all linked to a private master identity (thus making management a bit easier). To others, each identity appear as a distinct user.

It’s something worth trying.


Anonymity in Online Spaces

06.23.2014 17:15
etc

Disallowing anonymity is a very brutish way of curbing bad behavior online. But in certain circumstances it makes sense; there are probably particular online spaces where anonymity doesn’t add much. Sites like LinkedIn, for example, which are explicitly meant to bind your real-world identity to the internet. Sites that are for discussion, expression, leveraging distributed expertise, etc–as opposed to merely emulating IRL networks and interactions online–use it sloppily, a lame bandage for problems they’d rather not consider with more nuance. Anonymity needs to be appreciated as a unique feature of the internet, adding dimensions to such platforms which significantly distinguish it from the limitations of meatspace.

Using real identities is appealing for a few reasons. The most obvious is that it attaches an IRL identity to online behavior and ideally transfers some of those IRL dimensions which make us less likely to act like assholes to our online activity. For instance, the accountability you have to your social relationships are transfered.

Related to that, it creates continuity in your online behavior by establishing a record of it. In part over time this becomes valuable if only because of the time investment put into it, but it provides an access to past behavior from which future behavior can be (however unreliably) interpreted. So people are less likely to act like assholes because it jeapordizes the possibility of your being allowed to participate in other online communities.

Finally, there’s the simple fact that using a real identity can have increased friction to use when compared to anonymous systems. For instance, anonymous commenting systems often make it so easy to post (just type and send, no authentication process to slow you down) that you’re likely to get more knee-jerk reactions. The inconvenience of the flow for associating your real identity is often enough to deter these sorts of low-value contributions.


The Dream of the Internet

06.21.2014 14:30
etc

Follow Your Dreams

It is upsetting though unsurprising that, amidst our current internet-driven company bonanza much of the original spirit of the internet has been lost. Yes, most internet companies are founded on networking ideals of connection and communication - the most universally lauded (read: marketed) aspect of internet services - but these manipulatively uplifting appeals overshadow, perhaps intentionally, an equally powerful promise of the internet’s architecture.

The internet’s architecture is decentralized by nature: it is about individual computers communicating with one another*. But the dominant model is completely antithetical to that: communication between users on the internet are now almost entirely mediated through corporate-controlled and centralized servers (e.g. the servers of social networking services).

The internet services through which most of us find value are typically centralized. When I access Gmail, I am gathering my mail from servers concentrated under Google. When I’m syncing files from Dropbox, those files are coming from servers concentrated under Dropbox. When I send messages to friends on Facebook, those messages are routed through servers concentrated under Facebook.

Centralized internet

If you’ve been around for the past couple decades then you are well aware that the no-cost distribution and infinite replicability of digital information has undermined many industries founded on the scarcity of their products (i.e. piracy). Anyone appreciative of this unique quality of digital information is likely to wonder: why is this model of infinite replicability and distribution paradoxically absent from services - Google, Dropbox, Facebook, et al - which are digital, born and bred?

It is because the business models of these companies are not about providing digital services. They are about consolidation. Their value is directly derived from being centrally positioned within the network and extracting data (and data == value) from all communication that must pass through it.

This consolidation is created by controlling access. When discussing internet services, we tend to glean over their material foundations. But these companies are about creating a scarcity of service, which is rooted in the concentration of the hardware running the service. Control over the software - that is, the prevention of its distribution and replication - is necessary because it enables control over the hardware as well. Only Facebook, Inc. can administer the Facebook software; thus it will run only on their hardware. If I want to access the service, I must access it on their hardware. Thus my communication must go through their hardware, the wellspring of their value. And because that hardware belongs to them, they control access to it - even if their user policies say otherwise.


The principles of open source software (OSS) is meant to counteract this balkanization of the internet (or the formation of the “Splinternet”). OSS is fundamental to supporting the decentralization spirit of the internet. Anyone can run OSS on their own servers and provide the service to their own community or simply to themselves. I don’t have to access the service on an untrusted party’s hardware: I can access it on a friend’s server or even on my own computer. Open source software enables the freedom to access services on hardware that you or someone you trust controls.

Decentralized internet

Diaspora is an example of an open source, decentralized alternative to conventional social networking services (collectively which are known as “the federated web”). Individuals or organizations can host their own “pods” (servers running the Diaspora software) so that the service is physically distributed across computers that are not concentrated under any one group. However, your identity on the network is portable so that the experience is similar to that of a centralized service: you can access any Diaspora pod without really noticing the difference.

For example, a group of friends decide to host a Diaspora server (pod) and we sign up to the service through it. We’re free to interact with each other on it, and your personal data remains on that server, under your jurisdiction. You control the access to it.

If you meet someone who is part of a different pod, that’s no problem - you can still communicate with them because the service functions as a cohesive whole.


We could even go a step deeper than the software. Where needed, we should look to the level of collectively defining protocols, or more widespread adoption of those that already exist. A protocol is a set of standardized rules or a “language” which developers can implement in their own software. Provided that everyone adheres to the protocol, different systems can communicate reliably. Thus individuals and communities can run OSS on their own servers, and these servers can communicate amongst each other. Though the hardware is distributed amongst independent hosts, the standardization at the software layer forms a cohesive whole in the final user experience. Thus you can achieve the sensation of a centralized service with the crucial feature that the data is not concentrated under the control of a single entity.

An example of such a protocol are email protocols. There are a few which you may have seen when digging deep into your Gmail settings: SMTP (for outgoing mail), IMAP and POP3 (for incoming mail). There are many, many different email services - Gmail, Hotmail, Yahoo, Fastmail, etc - yet they are all able to communicate with each other because they adhere to these common sets of rules. I can send an email from Gmail and a Hotmail user will receive it without issue. The added benefit here is that the user is not locked into any particular software experience - they can choose from many, or even roll their own, and they will all work so long as it sticks to the protocol.

For example: imagine if you could favorite a tweet through Facebook. You can’t right now because they do not share a common standard on how a “favorite” is registered in the software. A favorite on Twitter is not equivalent to a like on Facebook, although what the user is trying to communicate through each action may be equivalent. As a user this becomes inflexible: your activity on one network is not portable at all to another network.

OStatus is an open protocol which attempts to standardize these social interactions. I could host my own social networking service adhering to the OStatus protocol and someone else could have their own service completely independent to mine. So long as their service also implements the OStatus protocol, users will be able to interact across the platforms, and are thus afforded a unique mobility not present in the social networking ecosystem today.


Here’s a short list of alternatives to popular, centralized services:

Some of these services still require work and effort; certainly they are at a disadvantage when against capital-laden organizations. But they are worthwhile projects and needed alternatives. The effort is worth it.


Of course, a major appeal of centralized services is their ease of use for non-technical folk. Someone with no experience provisioning servers will have a very hard time deploying a service on their own. It’s often a pain even for myself. And it’s hard to fully appreciate the decentralization potential of the internet if you have no idea how it works. But these are problems which can be solved with some education and discussion.

It’s clear that with tightening grip over the flow of users and their information across networks, the original dream of the internet has been lost, but it is has also become more crucial than ever. The vision of communities running their own servers with open source software, so they have control over access to their own data, is still within reach.

1. It's worth noting that the physical connection between two computers on the internet is typically routed through other hardware controlled by others, such as your ISP. This is where strong encryption practices come into play: while your data travels through many other devices, practically speaking only you and your intended recipient have access to it. Furthermore, initiatives around mesh networking are trying to replace centralized ISP hardware with a distributed model of independently-run nodes.

2. OSS has the additional crucial quality of transparency: anyone can independently audit the code and influence it's development (in theory at least: sometimes you have projects lorded over by a single owner). It's development is more likely to reflect the demands of the community which use and depend on it, as opposed to an external party in what inevitably is an asymmetric relationship of service controller and end user. Not every user of course will be directly involved in it's involvement, but OSS at least allows the possibility.


The Overmind vs the Hivemind

06.17.2014 22:04
etc

I was introduced to the Overmind last week, which is a Starcraft (Brood War) AI developed at Berkeley. Starcraft is a “real-time strategy” game, abbreviated as “RTS”, where two or more players face-off, competing for resources and building troops to fight and ultimately destroy each other. Last one standing is the victor.

It’s an interesting game to develop AI for because it requires a great deal of intricate management, both at the macro level (managing your economy; that is, resource gathering and allocation) and at the micro level (positioning and controlling your individual troops in battle). There’s enough complexity that a game can go in many unexpected ways; it is an ideal environment for humans to creatively respond and adapt to - something which computers traditionally have a great deal of trouble with.

The Overmind AI controlling a fleet of Mutalisks (the orange and green flying alien creatures) with terrifying precision

When I think of “AI vs human”, I tend to think of it in a sort of Deep Blue vs Garry Kasparov sense. A solitary expert against an expertly programmed machine. The machine proves its superiority by consistently beating the human. The Overmind fits that narrative (though it still had trouble with human professional Starcraft players).

It’s interesting to think of the Overmind in contrast to Twitch Plays Pokemon, where masses of participants essentially button-mashed their way through Pokemon Red (and is now making its way through the other titles). Anyone can join the game’s chat room and submit a button to press. The system had two modes: “anarchy” and “democracy”, the former meaning that every command someone submitted was executed, the latter meaning that there was consensus on what the next command would be. It took 16 days non-stop to complete the game which typically takes a bit more than a day. TPP is the complete antithesis of unitary control by a program.

Twitch Plays Pokemon
Twitch Plays Pokemon (via Wikipedia)

Sci-fi AI narratives (and the AI field’s ultimate aspirations) are far more ambitious than these video game bots. In Iain M. Banks’ The Culture, general artificial intelligences are organizing galactic societies, thus liberating organic beings from the complexities of deciding policy and the stress of providing for and governing themselves.

This is a huge leap from video game bots-impressive as they are-but it was explored long before the conception of these bots, as early as the mid-20th century, in Soviet Russia. Francis Spufford’s Red Plenty details an attempt at integrating computers into the central planning model:

Soviet planners, economists, physicists and mathematicians…persuaded the Soviet leadership that, using cybernetic principles and the newly developed computers, the centralised, planned Soviet economy could at last be made efficient.

Francis Spufford’s

These ambitions for AI are far removed from one-on-one video game contests. Rather than substituting for an individual, these programs are meant to replace the collective decision-making capacity of entire societies of people. Naturally, at this scale the evaluation of success is much murkier than a game with simple victory conditions. And when we begin discussing decisions which impact peoples’ lives, the particular strategies the AI uses become matters of ethics than merely mechanics.

It feels as if we are still quite a long way off from a managerial AI becoming a reality. But with the frenzy around big data, where companies like Palantir aggregate massive amounts of disparate data to draw and act upon high-level conclusions, the technical feasibility of a computational caretaker (/overlord) will only grow. When will we see organizations managed by “Computer Executive Officers” instead of by people? In The Culture, such artificial intelligences are the foundation for utopia, but they are equally the fodder of nightmarish futures - Skynet, ARIIA, etc… - will we be willing to use them?

<< >>