Simulating the swarm

· 07.16.2015 · etc

For some time, I've been enamored with an idea that Tim Hwang once told me about creating bots for the purpose of testing a social network, rather than for purely malicious or financial reasons. Say you're developing a new social networking platform and you've thought ahead far enough to implement some automated moderation features. How do you know they'll work as expected?

Imagine you could simulate how users are expected to behave - spawn hundreds to thousands to millions of bots with different personality profiles, modeled from observed behaviours on actual social networks, and let them loose on the network. Off the top of my head, you'd need to model things at the platform/network and agent (user) level.

At the platform level, you'd want to model:

  • When a new user joins
  • How users are related on the social graph, in terms of influence. With this, you can model how ideas or behaviours spread through the network.

At the agent (user) level:

  • When a user leaves the network
  • When a user sends a message
    • What that message contains (no need for detail here, perhaps you can represent it as a number in [-1, 1], where -1=toxic and 1=positive)
    • Who that message is sent to
  • What affects user engagement

In terms of user profiles, some parameters (which can change over time) might be:

  • Level of toxicity/aggressiveness
  • Base verbosity/engagement (how often they send messages)
  • Base influence (how influential they are in general)
  • Interest vectors - say there are some ¦n¦ topics in the world and users can have some discrete position on them (e.g. some users feel "1" towards a topic, others feel "2", still others feel "3"). You could use these interest vectors to model group dynamics.

Finally, you need to develop some metrics to quantify how well your platform does. Perhaps users have some "enjoyment" value which goes down they are harassed, and you can look at the mean enjoyment of the network. Or you could how often users start leaving the network. Another interesting thing to look at would be the structure of the social graph. Are there high levels of interaction between groups with distant interest vectors (that is, are people from different backgrounds and interests co-mingling)? Or are all the groups relatively isolated from one another?

You'd also have to incorporate the idiosyncrasies of each particular network. For instance, is there banning or moderation? You could add these as attributes on individual nodes (i.e. is_moderator=True|False). This can get quite complex if modeling features like subreddits, where moderator abilities exist only in certain contexts. Direct messaging poses a problem as well, since by its nature that data is unavailable to use for modeling behaviour. Reddit also has up and down voting which affect the visibility of contributions, whereas Twitter does not work this way.

Despite these complications, it may be enough to create some prototypical users and provide a simple base model of their interaction. Developers of other networks can tailor these to whatever features are particular to their platform.

With the recently released Reddit dataset, consisting of almost 1.7 billion comments, building prototypical user models may be within reach. You could obtain data from other sites (such as New York Times comments) to build additional prototypical users.

This would likely be a crude approximation at best, but what model isn't? As it is said, models can be useful nonetheless.

These are just some initial thoughts, but I'd like to give this idea more consideration and see how feasible it is to construct.