Just add water

  • 11.14.2015
  • etc
No Man's Sky (Rock Paper Shotgun)
No Man's Sky (Rock Paper Shotgun)

The go-to rebuttal to increasing automation tends to be something around how creativity could not be emulated by computers, at least, not for awhile. There's some truth to that statement depending on how you define "creativity", I suppose. The least charitable definition might be synonymous with "content generation", a domain typically exclusive to humans - artists, musicians, writers, and so on - but computers have made some claim to this territory.

The poster child of the automated (i.e. procedural) content generation beachhead is No Man's Sky, which generates a literal universe, complete with stars and beaches and creatures, using interweaving mathematical equations. The universe it generates will reportedly take 5 billion years to explore, and that's just one of those universes. In theory, an infinite amount of universes can be generated by setting different seed values (the value which is used to determine all other values in the game). This interview with Sean Murray goes a bit more into depth on their approach.

No Man's Sky procedural creatures (via)
No Man's Sky procedural creatures (via)

These procedurally-generated universes aren't random, though they are designed to give that appearance. They are completely deterministic depending on the seed value - this is a critical property since there needs to be consistency. If you leave a planet, you should not only be able to come back to that planet, but it should be in a similar state to when you left it.

A big challenge in procedural content generation is figuring out a way of creating things which feel organic and natural - for No Man's Sky, the solution is the impressive-sounding Superformula, developed by John Gielis in 2003. In polar coordinates, with radius $r$ and angle $\varphi$, parameterized by $a, b, m, n_1, n_2, n_3$, the Superformula is:

$$ r\left(\varphi\right) = \left( \left| \frac{\cos\left(\frac{m\varphi}{4}\right)}{a} \right| ^{n_2} + \left| \frac{\sin\left(\frac{m\varphi}{4}\right)}{b} \right| ^{n_3} \right) ^{-\frac{1}{n_{1}}}. $$

The above is the formula for two dimensions, but it is easily generalized to higher dimensions as well by using spherical products of superformulas (cf. Wikipedia).

Some forms resulting from the 2D formula:

2D Superformula-generated shapes (Wikipedia)
2D Superformula-generated shapes (Wikipedia)

Procedural generation is a really beautiful approach to game design. It's not so much about creating a specific experience but rather about defining the conditions for limitless experiences. No Man's Land is far from the first to do this, but it is the first (as far as I know) to have all of its content procedurally generated. The game Elite (from 1984!) also had procedurally-generated universe. Elite's universe was much simpler of course, but used a clever approach using the Fibonacci sequence to simulate randomness:

$$ \begin{aligned} x_0 &= \text{seed} \\ x_n &= x_{n-1} + x_{n-2} \end{aligned} $$

And then taking the last few digits of values generated from this sequence.

For example, take the Fibonacci sequence:

$$ 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, \dots $$

Let's jump at head in the sequence:

$$ 1597, 2584, 4181, 6765, 10946, 17711, 28657, \dots $$

If we just look at the last three digits, the resulting sequence looks quite random:

$$ 597, 584, 181, 765, 946, 711, 657, \dots $$

The great thing about this is that these universes can be shared by sharing seed values.

Elite's procedurally-generated universe (Gamasutra)
Elite's procedurally-generated universe (Gamasutra)

Procedural content generation is interesting, but other games focus on "procedural" stories. These games have (some) manually generated content - creatures, buildings, etc - but focus on designing complex systems which form the bedrock of the game's ability to spawn wild and fantastic stories. Dwarf Fortress and RimWorld are two great examples, which are essentially fantasy and sci-fi world-simulators (respectively) which model things like each individual's mental health, weather patterns, crop growth, and so on. No one writes the stories ahead of time, no one has a premeditated experience for you put in place - it's all off-the-cuff based on the dynamics of the game's rules.

RimWorld (via)
RimWorld (via)

The stories that come out of these games are amazing. Dwarf Fortress has an especially vibrant community based around the often absurd things that happen in game (for example, see r/dwarffortress, or for a more academic treatment, see Josh Diaz's thesis).

With the independent game industry continually expanding, I think we'll see more of these kinds of games. They can be developed with relatively small teams but have the richness and depth (and often to a better degree) of a massively-built studio game.

Machine Learning 101

  • 11.10.2015
  • etc

This post is an adaptation from a talk given at artsec and a workshop given at a few other places (Buenos Aires Media Party 2015, MozFest 2015). The goal is to provide an intuition on machine learning for people without advanced math backgrounds - given that machine learning increasingly rules everything around us, everyone should probably have some mental model of how it works.

The materials for the workshop are available on GitHub.

What is "machine learning" anyways?

There are plenty of introductory machine learning tutorials and courses available, but they tend to focus on cursory tours of algorithms - here's linear regression, here's SVMs, here's neural networks, etc - without building an intuition of what machine learning is actually doing under the hood. This intuition is obvious to practitioners but not for people new to the field. To get the full value of the other machine learning resources out there, that intuition really helps.

But first, to clarify a bit what machine learning is: a basic working definition could be "algorithms for computers to learn patterns". Pattern recognition is something that people are very good at, but difficult for computers to do.

Here we'll go through walkthrough a very simple machine learning task which is prototypical of many real-world machine learning problems. First we'll go through it by hand, noting where our human superpower of pattern recognition comes into play, and then think about how we can translate what we did into something a computer is capable of executing.

Learning functions by hand

A common machine learning goal is to take a dataset and learn the pattern (i.e. relationship) between different variables of the data. Then that pattern can be used to predict values of those variables for new data.

Consider the following data:

Some data
Some data

If I were to ask you to describe the data by a pattern you see in that data, you'd likely draw a line. It is quite obvious to us that, even though the data points don't fall exactly in a line, a line seems to satisfactorily represent the data's pattern.

But a drawn line is no good - what are we supposed to do with it? It's hard to make use of that in a program.

A better way of describing a pattern is as a mathematical equation (i.e. a function). In this form, we can easily plug in new inputs to get predicted outputs. So we can restate our goal of learning a pattern as learning a function.

You may remember that lines are typically expressed in the form:

$$ y = mx + b $$

As a refresher: $y$ is the output, $x$ is the input, $m$ is the slope, and $b$ is the intercept.

Lines are uniquely identified by values of $m$ and $b$:

Some lines
Some lines

Variables like $m$ and $b$ which unique identify a function are called parameters (we also say that $m$ and $b$ paramterize the line). So finding a particular line means finding particular values of $m$ and $b$. So when we say we want to learn a function, we are really saying that we want to learn the parameters of a function, since they effectively define the function.

So how can we find the right values of $m, b$ for the line that fits our data?

Trial-and-error seems like a reasonable approach. Let's start with $m=12, b=2$ (these values were picked arbitrarily, you could start with different values too).

Trying $m=12, b=2$
Trying $m=12, b=2$

The line is still quite far from the data, so let's try lowering the slope to $m=8$.

Trying $m=8, b=2$
Trying $m=8, b=2$

The line's closer, but still far from the data, so we can try lowering the slope again. Then we can check the resulting line, and continue adjusting until we feel satisfied with the result. Let's say we carry out this trial-and-error and end up with $m=4.6, b=40$.

Trying $m=4.6, b=40$
Trying $m=4.6, b=40$

So now we have a function, $y = 4.6x + b$, and if we got new inputs we could predict outputs that would follow the pattern of this dataset.

But this was quite a laborious process. Can we get a computer to do this for us?

Learning functions by computer

The basic approach we took by hand was just to:

  1. Pick some random values for $m$ and $b$
  2. Compare the resulting line to the data to see if it's a good fit
  3. Go back to 1 if the resulting line is not a good fit

This is a fairly repetitive process and so it seems like a good candidate for a computer program. There's one snag though - we could easily eyeball whether or not a line was a "good" fit. A computer can't eyeball a line like that.

So we have to get a bit more explicit about what a "good" fit is, in terms that a computer can make use of. To put it another way, we want the computer to be able to calculate how "wrong" its current guesses for $m$ and $b$ are. We can define a function to do so.

Let's think for a moment about what we would consider a bad-fitting line. The further the line was from the dataset, the worse we consider it. So we want lines that are close to our datapoints. We can restate this in more concrete terms.

First, let's notate our line function guess as $f(x)$. Each datapoint we have is represented as $(x, y)$, where $x$ is the input and $y$ is the output. For a datapoint $(x, y)$, we predict an output $\hat y$ based on our line function, i.e. $\hat y = f(x)$.

We can look at each datapoint and calculate how far it is from our line guess with $y - \hat y$. Then we can combine all these distances (which are called errors) in some way to get a final "wrongness" value.

There are many ways we can do this, but a common way is to square all of these errors (only the magnitude of the error is important) and then take their mean:

$$ \frac{\sum (y - \hat y)^2 }{n} $$

Where $n$ is the number of datapoints we have.

It also helps to think about this as:

$$ \frac{\sum (y - f(x))^2 }{n} $$

To make it a bit clearer that the important part here is our guess at the function $f(x)$.

A function like this which calculates the "wrongness" of our current line guess is called a loss function (or a cost function). Clearly, we want to find a line (that is, values of $m$ and $b$) which are the "least wrong".

Another way of saying this is that we want to find parameters which minimize this loss function. This is basically what we're doing when we eyeball a "good" line.

When we guessed the line by hand, we just iteratively tried different $m$ and $b$ values until it looked good enough. We could use a similar approach with a computer, but when we did it, we again could eyeball how we should change $m$ and $b$ to get closer (e.g. if the line is going above the datapoints, we know we need to lower $b$ or lower $m$ or both). How then can a computer know in which direction to change $m$ and/or $b$?

Changing $m$ and $b$ to get a line that is closer to the data is the same as changing $m$ and $b$ to lower the loss. It's just a matter of having the computer figure out in what direction changes to $m$ and/or $b$ will lower the loss.

Remember that a derivative of a function tells us the rate of change at a specific point in that function. We could compute the derivative of the loss function with respect to our current guesses for $m$ and $b$. This will inform us as to which direction(s) we need to move in order to lower the loss, and gives us new guesses for $m$ and $b$.

Then it's just a matter of repeating this procedure - with our new guesses for $m$ and $b$, use the derivative of the loss function to figure out what direction(s) to move in and get new guesses, and so on.

To summarize, we took our trial-and-error-by-hand approach and turned it into the following computer-suited approach:

  1. Pick some random values for $m$ and $b$
  2. Use a loss function to compare our guess $f(x)$ to the data
  3. Determine how to change $m$ and $b$ by computing the derivative of the loss function with respect to $f(x)$
  4. Go back to 1 and repeat until the loss function can't get any lower (or until it's low enough for our purposes)

A lot of machine learning is just different takes on this basic procedure. If it had to be summarized in one sentence: an algorithm learns some function by figuring out parameters which minimize some loss function.

Different problems and methods involve variations on these pieces - different loss functions, different ways of changing the parameters, etc - or include additional flourishes to help get around the problems that can crop up.

Beyond lines

Here we worked with lines, i.e. functions of the form $y = mx + b$, but not all data fits neatly onto lines. The good news is that this general approach applies to sundry functions, both linear and nonlinear. For example, neural networks can approximate any function, from lines to really crazy-looking and complicated functions.

To be honest, there is quite a bit more to machine learning than just this - figuring out how to best represent data is another major concern, for example. But hopefully this intuition will help provide some direction in a field which can feel like a disconnected parade of algorithms.

House of Cards

  • 07.17.2015
  • etc

Bots coast on the credentials of the real users of the computers they hijack. Bots were observed to click more often (but not improbably more often) than real people. Sophisticated bots moved the mouse, making sure to move the cursor over ads. Bots put items in shopping carts and visited many sites to generate histories and cookies to appear more demographically appealing to advertisers and publishers. - The Bot Baseline: Fraud in Digital Advertising. December 2014. White Ops, Inc. and the Association of National Advertisers.

The study cited above looked at 5.5 billion ad impressions over 60 days across 36 organizations. 23 percent of all observed video impressions were bots (p16).

Like too many things in our world, most useful internet services must justify themselves financially. The business model of much of the internet is based on advertising, and that whole schema is based on the assumption that advertising increases consumption. The further assumption there is that the subjects viewing the advertisements both can be influenced by ads and have the capacity to make purchases - that is, that they're human.

These behaviors are easily simulated, up to the point of purchase - although perhaps it makes economic sense for a bot to occasionally make a purchase in order to maintain the broader illusion of human engagement. There's something here about the dissolving of the internet human subject but I can't quite put my finger on it.

These bots are deployed by fraudsters to increase revenue, but since the purpose of automation is for outsourcing things we'd rather not do ourselves, maybe we regular folks can have bots simulating our own advertising engagement.

It's worth flipping through the report, if only for this title:


Assume Good Faith

  • 04.06.2015
  • etc

Throughout the many interviews we've been conducting for the Coral Project, the one that has stuck out the most to me was our talk with Jeffrey Lin, Lead Designer of Social Systems at Riot Games. At Riot, he built up a team explicitly designed to address the social and community problems which were common in League of Legends, Riot's flagship game.

Like most online games, players would regularly have to deal with hostility and toxicity from other players. For most of video gaming history, developers would typically just dismiss these social frictions as a problem beyond their control.

Generally, the impression is that this kind of toxicity comes from a relatively small portion of dedicated malicious actors. One of the key insights the social team uncovered was that - at least for League of Legends, but I suspect elsewhere as well - this was not the case. Yes, there were some consistently bad actors. But by and large regular players ended up accounting for most of the toxicity. Toxicity is distributed in the sense that a lot of it comes from people who are just having a bad day, but otherwise socialize well.

One of the social team's principles is to acknowledge that players have a good moral compass. The challenge is in designing systems which allow them to express it. If players have to contend with toxic behavior day in and day out, then their general impression will be that toxic behavior is the norm. There is no space for them to assert their own morality, and so they stay quiet.

In group dynamics, this phenomenon is known as pluralistic ignorance - when members of a community privately feel one way about something, but never express that feeling because they perceive the norm of the community to be the opposite. Not only do they not express it, but in some cases they may be excessively vocal in their support for the perceived community norm.

A classic example is the story of The Emperor's New Clothes - the emperor is tricked into wearing "clothes" which are invisible to the unworthy (in reality, he is wearing nothing). No one is willing to admit they do not see any clothes because they do not want to communicate to others that they are unworthy. Privately, everyone holds the belief that the emperor is not wearing any clothes. But publicly, they cannot admit it. It takes a child - ignorant of the politics behind everyone else's silence - to point out that the emperor is naked.

A more contemporary example is drinking on college campuses. College drinking is an extremely visible part of our cultural understanding of the college experience (e.g. through movies). As a result, many students have the impression that all of their peers are aligned with this norm, while they are privately less comfortable with it. In reality, many of their peers are also less comfortable with it. This is complicated by the fact that students who do conform or buy into the norm are often very vocal about it, to the point of intimidation - and at this point the norm becomes self-enforcing because there is even more social incentive (driven by insecurity) to publicly conform to the norm (called posturing).

Wikipedia operates on a similar principle, which they call "Assume good faith":

Assuming good faith is a fundamental principle on Wikipedia. It is the assumption that editors' edits and comments are made in good faith. Most people try to help the project, not hurt it. If this were untrue, a project like Wikipedia would be doomed from the beginning. This guideline does not require that editors continue to assume good faith in the presence of obvious evidence to the contrary (vandalism). Assuming good faith does not prohibit discussion and criticism. Rather, editors should not attribute the actions being criticized to malice unless there is specific evidence of malice.

Or to put it more succinctly, "give people the benefit of the doubt".

The key insight to draw from all of this is that moderation systems should geared towards reforming users rather than punishing them. Once we acknowledge that people typically have a decent moral compass, we should reconsider the entire moderator-user relationship. It does not have to be an antagonistic one. Most users are not consistently bad and may just need a nudge or a reminder about the effects of their behavior. Moderation systems should instead be opportunities for a community to express their values and for a user to gain better understanding of them. And they should be designed so that the community's values reflect the aggregate of its members' private values rather than a dominant norm which no one really believes in.

This attitude of good faith is refreshing well beyond the scope of the Coral project. So many arguments about important issues seem to devolve into unfair characterizations of "human nature", which have never held much water for me. The behaviors we observe are only one possible manifestation of a person, guided by the systems in which they operate, and we cannot confidently extrapolate claims about some immutable "human nature" from them.

For further reading, The emperor's dilemma: A computational model of self-enforcing norms (unfortunately I cannot find a PDF) develops a computational model of pluralistic ignorance.

News Automata

  • 12.06.2014
  • etc

Since mid-September I've been teaching at The New School as part of their new Journalism+Design program. Heather Chaplin, who organizes the program, brought me in to teach a new class I put together called News Automata, which explores the technical underpinnings, ethical concerns, and potential benefits/pitfalls of technology in journalism.

The way I ended up teaching the class deviated a bit from my original plan. For reference, here's the initial syllabus:

News Automata Fall 2014 syllabus

Week 1: Introduction to the Course

Introduction to journalism and how technology is affecting it both as an industry and as a practice. What is the point of journalism? We'll talk about a few of the major definitions of journalism's responsibility.

We'll also talk about the format of the course and what'll be expected.

Some things we'll go over:

  • Bias in journalism? Is "fair and balanced" actually fair?
  • Does journalism work?
  • Varying definitions of journalism: the spotlight, the watchdog, the entertainer, the simulacrum, the influencer, the activator, the business

Week 2-3: Leveraging networks: from consumers => producers

How are individuals becoming empowered to participate in the production of news, whether through whistleblowing or "citizen journalism"? How does news "break" in the age of Twitter? How are journalists collaborating with on-the-ground sources to have direct access to those affected, in real time? How are online communities becoming involved in the investigative process? We'll discuss some of the challenges in this area, such as verifying accuracy and security, and look at how things can go wrong.

Some things we'll go over:

Week 4: Bots and drones: the automated assemblage

Automation is creeping into more and more parts of our lives. Jobs which existed a few decades ago are now obsolete. Jobs which exist now will likely be obsolete in an even shorter amount of time. Will journalists be one of them?

Some things we'll go over:

Week 5: Information overload and context

The information age is a blessing and a curse. What good is all that information if you don't have the time, energy, or attention to make use of it? What approaches are being used to make the news easier to digest and give us the fullest understanding of what's happening in the world? Technology and design are both about getting more with less. How can we get the maximum impact from the smallest amount of information?

Some things we'll go over:

Week 6: Engineering virality and control over networks

Facebook wants you to be happy, BuzzFeed knows you like lists and quizzes, and Upworthy understands how to tease you into clicking a link. They have all been immensely successful. Is that all these companies are about? Or is there something larger at play? What happens when all news is designed just to get us to click into it? Or has that already happened?

Some things we'll go over:

Week 7: The Filter Bubble and the new gatekeepers

The most prominent approach to managing information overload is self, algorithmic, or thought-leader curation. But the nature of these three filtering mechanisms leads many to worry – are we just seeing more of the same? How does that affect how we think about the world? Is that fundamentally antithetical to journalism's aspirations as a practice?

Some things we'll go over:

Week 8: Taking action

Journalism is more than about knowing what's happening. It's also about acting on that knowledge. How can we design systems to tighten that action-intent gap? How can we make it easier for people to organize and act on the issues they feel strongly about?

Some things we'll go over:

Here's what the classes ended up being about:

  • What does journalism hope to accomplish?
  • Social networks and news production: crowdsourcing/citizen journalism, problems of verification, perspectives, popular curation vs the gatekeeper model
  • Automation, news bots, intro to machine learning, intro to natural language processing
  • Communities and discussions online, anonymity, and bits of behavioral economics, game theory, decision theory, group dynamics, sociological/psychological research for understanding online behavior
  • How elections are reported (the week after midterm elections), how algorithmic filters work, filter bubbles/"personalized propaganda"
  • Hands-on: building our own news feeds and algorithmic filters with Python
  • The maturing medium of video games (narrative, mechanics, aesthetic, and technical perspectives) and how it relates to journalism

There are still two more classes in which I plan on covering:

  • The physical basis of the internet (an overview on its infrastructure and the politics of that infrastructure)
  • Taking action and digital impact IRL: slacktivism, hacktivism (Anonymous, Wikileaks), doxxing, DDoS/LOIC, etc

This was my first time teaching a class so it's been a great learning experience for me, and my students were great. I'm hoping to teach it again next fall. The class was public which ended up being a boon (I was worried about it at first) - lots of people from all sorts of backgrounds stopped in for a class or two and had interesting contributions to our discussions.