Half-Earth Socialism (The Game)

06.07.2022

projects

For the past year I worked on Half-Earth Socialism, an online game accompanying the book of the same name by Drew Pendergrass and Troy Vettese (Verso 2022). The game launched at the beginning of May; you can play it here. This post will make more sense after you've played the game!

This post is adapted from a talk I gave at Trust (who organized the project) and goes a bit into the design and development process of the game.

Genesis

Trust was approached early in 2021 about developing a website to accompany the forthcoming book Half-Earth Socialism, which would be published a little more than a year later. To very, very briefly summarize the book (the book itself is a quick read so I encourage you to give it a look!): the authors focus on land use as the central variable of concern for the health of the planet (hence the name "Half-Earth" socialism, building off of E.O. Wilson's idea of the same name) and emphasize the need for rational democratic planning to make decisions around the future of the world. For example: how much land should be devoted to energy production, and how much to food?

Democratic planning requires some way for people to meaningfully engage with plans: to understand them, evaluate them, and make their own plans. The original proposal for the site was based around a linear programming calculator where people could play with the parameters of a model that Drew wrote. I imagined it as something akin to Chris Crawford's Balance of the Planet (1990), where the player is similarly adjusting parameters of a global model to influence planetary health.

Drew's model and a screenshot of Chris Crawford's *Balance of the Planet* (1990)

In this original version you could choose your energy mix, energy use levels, meat consumption, and so on. The model would figure out the allocation of land, emissions, and so on that were required. Using this model you can easily compare results and see, for example, that veganism opens up quite a lot of land for energy production. But part of the book's appeal is the vignettes throughout that imagine what life might be like under Half-Earth socialism or what it might be like were it not to happen. We wanted to take this model and build a richer, more narrative experience around it to reflect some of the feeling those vignettes evoked: the pacing of such a world, people's concerns and values, and the social fabric of their lives.

Reading Group

Before any development began we first participated in a reading group, organized by Chiara Di Leone, covering topics like: socialist cybernetics, cybernetic planning, complex systems management interface design, climate modeling, and games that we found interesting or related to the these topics. These helped us coalesce on a set of mechanics, design elements, constraints, and feelings to develop the game around.

Two games that we looked closely at were 11 Bit Studio's Frostpunk and Nerial's Reigns. Frostpunk's gameplay is centered around one primary variable (heat) and is filled with many brutal, no-win policy decisions around managing morale. The game is really well organized around its primary variable—it's very easy to see how heat is distributed, what's producing it, how it decays, and so on, so even though the decisions can be difficult, you're seldom disoriented or confused about what your priorities are. The game is, however, very dark and depressing. In contrast to Half-Earth Socialism's more utopian outlook, Frostpunk is a never-ending crisis. We knew that we wanted our game's arc to be different: the beginning is a difficult struggle to get through, but if you do well the game opens up into a world better than the pre-crisis past. There's light on the other side!

We knew early on that we wanted the game to be a web game, keeping in line with the original website idea and making sure that it's easy to access. If you're making a web game then you really need to consider mobile usage. We expected that most people would share the game on Twitter and thus others would likely encounter it on their phones and want to give it a try right there. That assumption's held: the majority of plays so far have been on mobile resolutions. Reigns was a main inspiration of how you could make a simple yet deeply engaging game for mobile. It has one main interaction—swiping left or right—but it's enough to support a great branching narrative. It's also very flexible in terms of time commitment: you can play a session for a few minutes or an hour if you want. That's something we wanted to replicate in our game—players could finish a run in 5 to 10 minutes, do something else, and come back and try a different approach later. That isn't how things turned out (people have spent 30 minutes just reading the starting cards) and in the end the scope of our game was just too complex to really reproduce these elements of Reigns.

From the whole reading group process came a few orienting values:

Accuracy: Most games prioritize a fun and engaging experience, and often systems that have real-world correlates are simplified in service of this priority. But because of the nature of the book and the weight of the subject, we wanted to prioritize rigor and accuracy a bit more than a typical game would. This was a huge challenge because representing something as complicated as the planet and human economic activity requires a great deal of simplification regardless of commitment to accuracy, and with the complexity of the planet and the global economy, details matter a lot. And often details that don't seem to matter end up becoming quite important later on.
Amusing: Climate change and ecological disaster are already very weighty topics without us needing to exaggerate it. We wanted to bring some levity to the game so that the player doesn't feel pummeled by depressing thing after depressing thing (although it does kind of happen because of the subject matter). So using dialogue, character design, etc, as outlets to lower the "seriousness" of the game and leave plenty of space for the content to do that.
Expressive: Another goal was for the game to play like a political compass quiz, in a way. We had a few player "builds" (above) in mind that made play feel expressive.
Quick: Shorter games that you play in rapid succession, to try different things. I think originally we wanted a session to be 5-10min...but sessions last far longer now. It takes at least 5 minutes just to soak in the game's starting content!

As mentioned above, take a cue from the book and convey the feeling of living under HES through dialogue and other narrative elements.

Development proceeded in roughly five categories:

Story: the main beats of the game and game dialogue.
Art Direction: character design, graphics, game feel, and so on.
Game Design: the game's primary mechanics and interactions.
Legibility: the game's information design, how players access/read the information they need to make decisions and how they make those decisions.
Technical Requirements: various constraints around the game's technical infrastructure and architecture

Story

We wanted to give the game world a richer feel through events, which give us opportunities to world-build and develop character personalities. They're also one main way players get feedback about their plans. More importantly they allow us to represent way more than we could with the core game model alone, which focuses on a relatively limited set of variables (water usage, land usage, electricity production, fuel production, etc). With a very simple probability system we can represent a much wider range of events like mass coral bleaching events or cultural changes like a new cuisine trend of eating invasive jellyfish, without needing to make the model itself much more complex.

Of course, the events we have in the game pale in comparison to all the possible events one might think of for the future. They allow us to represent more detail than we could otherwise but open up an overwhelming amount of things we could represent. It's one of those areas where we had to just stop adding things at some point, even though we probably could have added hundreds more.

Technical Requirements

Mobile and web support were the primary technical constraints, and those came with their usual challenges (mostly cross-browser compatibility). The more interesting technical requirements were related to the models and data that we needed.

Hartin, C. A., Patel, P., Schwarber, A., Link, R. P., and Bond-Lamberty, B. P.: A simple object-oriented and open-source model for scientific and policy analyses of the global climate system – Hector v1.0, Geosci. Model Dev., 8, 939-955, doi:10.5194/gmd-8-939-2015, 2015.

Early on we knew we wanted to have some kind of climate model running, but climate models are usually huge, requiring supercomputers long periods to run. Drew suggested Hector, which is a "simple climate model". It runs quickly on commodity hardware, but of course lacks the depth and detail of its massive counterparts. From a technical standpoint I didn't really want to have a server crunching a climate model for several players at once, even if it's a relatively simple one. We managed to get it to run directly in the browser so that each player runs their own model independently.

We have some other models running too:

a biome model, not running on mobile but on desktop, which colors the world according to temperature and precipitation changes over time.
a linear programming model for determining production resource allocation and guided assistance in planning. This ended up being reduced to a much simpler form because there weren't models available for the browser that handled the kind of optimization we needed.

The other major set of requirements were data requirements. For processes there were input requirements per unit output, impacts (e.g. CO2 emissions) per unit output, and current global process mixes. We also needed current emissions, current biodiversity loss, population projections aggregated for each region, global per-capita demand for each of our outputs, estimate per-capita impacts based on regional income levels, and impacts and inputs for different sectors/industries.

It's often very difficult to track down good numbers for these. Data may be available at regional or national levels, but not globally. Or there may be a lot of variability in estimates. Some technologies like vertical farming and cellular agriculture are very new, so there are only estimates for very limited cases, if at all. For example with vertical farming we could only find a couple sources that on a one or two crops published by vertical farming startups (so very unclear how trustworthy the numbers are), and instead relied more on values from more general indoor greenhouse farming.

Art Direction

The game's art direction is mostly Son La's department but I'll briefly mention my two main contributions.

The first is the globe's design, which took an embarrassingly long time (I'm really inexperienced with shaders). Our main aesthetic reference was a sort of retro-computing, so we played a lot with poorer color representation (dithering) and lower resolutions (pixelation).

The final version drew from this graphic that was produced for an article about the book:

Illustration by Lukas Eigler-Harding and Ariel Noltimier-Strauss

And here's the final version:

The final globe design

One major element of the game's feel was the imagery we used to represent projects, processes, and events. We had a very small team, and it was really only Son La creating visual content (and also working on the UI development). We ended up with 238 events, 123 projects, 29 processes, 9 industries, and 20 regions, all of which needed images, so ~420 images total. There was no way we could create all of that on our own, so we looked at CC-licensed and public domain imagery. The problem with sourcing images that way is that they vary a lot in quality and style, and they kind of just look like digital photos. We played around with ways to process them so that they were more interesting and consistent and ended up with the following line:

Legibility

There is a lot to consider when assembling your plan and so a major challenge was making information available and clear to the player. All throughout the interface are tooltips and "factors" cards that breakdown what's contributing to whatever variable you clicked on. The hope is that whatever info you need to make a decision is available quickly, but it still is and feels like a lot of information!

Game Design

The most challenging part of the project was the game design. There were a lot of different things we wanted to communicate, different feelings we wanted to evoke, and so that led us down a few different design paths. But the biggest difficult was that the game is meant to somewhat accurately represent a set of very complex systems—we wanted some legitimacy and rhetorical weight behind what happens in the game. This commitment to accuracy of complexity is directly at odds with making an entertaining and accessible game. Most games are not complex in terms of their mechanics. Even games that are very deep do not have to be complex, and they often aren't. Truly complex systems make for unfun games because they in their very nature inscrutable, so they become a very frustrating experience. You never quite know why something is happening: is it because of something you did, is it because of something you didn't do, or did it have nothing to do with you at all? Part of what makes a game fun is learning its rules and systems, and one reasonably expects consistency, predictability, and legibility in how the game responds to your actions. Complex systems don't care.

Balancing was also very difficult to do with such a complex game. We couldn't anticipate all the paths or strategies a player might try. Ultimately we hoped to avoid players finding a strategy that works in the game that wouldn't work in the real world because of some detail we left unmodeled, but it would require a lot more testing to have some assurance that we succeeded.

I'll briefly describe two sets of ideas we had that I really liked but got cut for one reason or another.

Regional System

I remember "space" being a big concern. Given the book's focus on land use, how we deal with space is really important. But it came in conflict with priorities like accuracy. Games might abstract space to make it more manageable, like using larger units of space which then reduces the level of detail you can represent spatially; or they might limit their focus e.g. to a single region or map or level. The earth is really big. We can't really represent it in great detail without just blasting the player with stuff to manage. So we could limit the player's focus at a given time to a single region. That's basically where this design came from.

One bonus was the opportunity for more visual feedback/eye candy: you could zoom into a region and see wildlife return, to make your impacts on the world feel rewarding and more obvious.

But there was a more important game design element to this regional system. We wanted to avoid was the "god view" in games, where the player unilaterally makes decisions of all kinds, which is contra the democratic planning that's emphasized in the book. It's very challenging to design a game about global planning without making it a god game! Games are often fantasies of control...representing democracy in a game very difficult because it can disrupt that fantasy and make the game a frustrating experience. It's interesting how certain kinds of friction are expected in a game, like a hard boss fight being difficult, but others, like having your decisions questioned or ignored, are not.

I don't think we succeeded in avoiding the god view. The parliamentary system is supposed to represent that to some degree, and you can be ousted from power if you're too unpopular. But that's not quite the same as democratic planning. One iteration of this regional system idea had a greater and more autonomous role for individual regions (in the current game they mostly just exist to spatialize the game a bit, but they don't really do much on their own). The player would set targets and maybe some specific policies/projects but regions would go and figure out how to achieve those targets on their own. There would be more bargaining with regions to accept targets or to achieve them in a particular way.

A further iteration on this idea—which was definitely out of reach given our constraints—was a multiplayer regional system. Different people play different regions, and perhaps elect one player to be a global planner for a term. The global planner mediates regional relationships: regions have to negotiate with one another, like if I'm representing East Asia I want North America to reduce their energy usage in exchange for reducing my coal usage or something.

Turn-based System

The other major concept was a turn-based game, similar to Into the Breach. Each turn is some fixed time amount, and you have a preview of everything that will happen in the next turn or next n turns. For example: this natural gas plant will emit this much methane next turn, this patch of permafrost has 50% chance of melting by next turn, this hurricane will move left 2 tiles next turn, etc. It doesn't really work for the global scale, since you can't deal with things like an individual power plant, but I still think you could make a fun and interesting game this way if you could get away with more simplifications. And it's not clear what amount of time a turn should represent. The hurricane movement, for example, requires a much shorter turn time, but other decisions like building new power plants are better suited to turns of a year or more. Similarly, it's not clear what the spatial resolution should be. How many hexagons should the globe be divided into? A hurricane and a power plant are on two different scales.

One advantage of this design was that scientific uncertainty can more easily be a bigger part of the game. That natural gas plant might emit this much methane, but you don't really know without better sensors. That 50% estimate for that patch of permafrost is based on your best climate models, but if you invest in improving them and training more climate scientists you'd have a better estimate. In this way the benefits of good long-term planning manifest as "powerups" in a sense that make the shorter game loop easier.

Card System

We ended up settling on a card-based system. It felt more familiar and easier conceptually, giving players something to hold on to while we barraged them with other new information to absorb. Cards also provide a convenient way to compartmentalize "abilities" and a give players a discrete object to think with. Basically all player actions are expressed through manipulating cards in some way.

Diablo equipment slots and Civilization 6 policy slots

There are still a lot of ways to use cards. One idea was to have your plan be something like equipment slots in an RPG, like Diablo (on the left here). But instead of a head slot for your helmet you'd have a concrete slot for your concrete production technology, and another one for your transport policy, and so on.

This is kind of how it works in Civilization 6 (on the right). This form ended up being too limiting because there are many projects we wanted to include that don't fit neatly into an existing slot, thus requiring many single-purpose slots, and just making the whole thing clunky and confusing.

Above are a couple other ideas—on the left we have something based around a deck. One idea was that your "plan" is a deck of cards that you assemble and use to react to events, under the idea that a good plan prepares you for the future. So if your scientists are telling you a major heatwave is likely, you'd have a mass cooling center policy prepared to respond to that heatwave, if it does happen.

This ended up not really working because it feels like busywork for the player—you're making two redundant decisions, the decision to establish the cooling centers, and the decision to use them when the heatwaves occurs. In trying to minimize extra actions, we'd assume that if you have the cooling centers in place, you'd want to use them.

Card prototypes

Above is one of our prototypes for card interactions. This one is a Reigns-like interface, with four directions instead of two. We ditched it because as I mentioned the Reigns reactive playstyle didn't quite fit what we needed.

We settled on this "scanning" interaction, which has a retro computing punchcard-like vibe. Conceptually this is like the deck idea without the redundancy: in a way you're scanning cards to add them to your deck (plan) so they make bad future events less likely.

We also wanted to add a political aspect to the game, in part to make gameplay more interesting (the player can't just do everything they want to do—to lessen the "god view" problem) and also add some drama and an opportunity to develop the world through some strong personalities. They also give players some scaffolding to develop playstyles. There's more clear guidance on what an accelerationist might want, for example, to nudge the player towards using those cards. The parliament system was also conceived to bring some of the "democratic" side of "democratic planning", but I don't think we succeeded in that. It's my main regret of the game. I feel that we would have needed to design the game very differently for that to have worked, and at that point it was too late to make such major changes.

Tooling

The editor

One last thing I want to show off is the Half-Earth editor, where all the content (dialogue, events, regions, projects, processes, and so on) and model parameters are written. Whenever I embark on a project like this, one of the first things I do (once the main architecture/schemas are sorted out) is build content authoring tools. It makes later work much quicker, makes it easier to experiment with the content, helps me think through more ideas and possible conflicts in that level of the design, and makes it much easier to bring on others (like writers/researchers Lucy Chinen and Spencer Roberts) to contribute without needing to muck about in the code.

Final note

Half-Earth Socialism was one of the bigger projects I've worked on, and I'm proud of what we accomplished with such a small team. Our ultimate ambition was way beyond our capacity and resources, but we managed to achieve quite a lot of it. The game was made possible because of these people:

The people behind *Half-Earth Socialism*.

Getting in front of pharma: Automated public discovery of drug candidates

11.30.2018

projects

A couple weeks ago Sean and I were fortunate enough to participate in another edition of Rhizome's 7x7, this time in Beijing in collaboration with the Chinese Central Academy of Fine Arts (CAFA). I was very excited to collaborate with Sean again after our first collaboration in New York at the New Museum, and to have the chance to try something different together.

We thought about revisiting our previous project, cell.farm, which was a proposal for a cryptocurrency/distributed computing system for which the proof-of-work protocol involved computing simulation updates for an atomic-level model of a human cell (though our proposal initially suggested simulating a ribosome). Such detailed simulation of biological processes would be a boon for medical research, but simulating even the simplest cell at that resolution is so computationally demanding that it's infeasible even for the world's best supercomputers. But the aggregate computing power of the Bitcoin network is orders of magnitudes larger than any supercomputer, and might be able to run such a model in a reasonable amount of time. By adopting that model for in silico cells, a crucial part of medical research is essentially collectivized, and as part of our design, so too are the results of that research. The project bears similarity to Folding@Home and its crypto-based derivatives (e.g. FoldingCoin), but as far as I know none of these projects explicitly distribute ownership of the research that results from the network. There were also some design details that we didn't have time to hash out, and we left open a big question of computational verifiability: given a simulation update from a node, how can you be certain that they actually computed that value rather than returned some random value? (Golem has this problem too, the difficulty of which is discussed a bit here).

This time around, rather than a project about medical research abstractly, we focused specifically on the pharmaceutical industry, the 1.1 trillion dollar business lying at the nexus of intellectual property law, predatory business practices, and the devaluing of human life.

The pharmaceutical industry

“Is curing patients a sustainable business model?” Goldman Sachs analysts ask

(For background I'm going to lean heavily on the "Pill of Sale" episode of the Ashes Ashes podcast which goes into more detail about the pharmaceutical industry — definitely worth a listen.)

Most Americans are familiar with exorbitantly-priced drugs — if not directly than via one of the many horrifying stories of people crowdfunding their continued existence or flying elsewhere to access more reasonable prices. A hepatitis C cure from Gilead, Solvadi, costs $84,000 for a 12-week course and is the subject of a recent Goldman Sachs report. The report describes cures as effective as Solvadi (up to 97%) as bad for business since you cure yourself out of a market. Even something as common insulin can cost a significant portion of income — to the point where people die from needing to ration it.

This hostile environment is thinly justified with rhetoric around drug development costs and enforced through the patent law system, all under the implicit, sometimes explicit, assumption that it is necessary for drug companies to make a profit on their drugs. Patents provide exclusive rights for a company to sell a particular drug; this temporary monopoly essentially gives them carte blanche to set whatever price they want so that they recoup the drug development costs, so the story goes. These patents last 20 years and can basically be extended by "exclusivity" periods which add up to another 7 years. A drug may take 10-15 years to develop, leaving a window of at least 5 years of exclusive rights to produce and sell it. "Orphan drugs", drugs that treat rare conditions, may have longer monopolies to compensate for the smaller market size. After this period generics are permitted to enter the market, which drives the cost down, but there are all sorts of tricks available that can prolong this protection period even further, a practice called "evergreening". For example, slightly modifying how the drug is delivered (e.g. by tablet or capsule) can be enough for it to essentially be re-patented.

(It's worth noting that prices can be high even for generics. For example, epinephrine — commonly known as an EpiPen, essential for severe allergic reactions — can be bought for about 0.10-0.95USD outside the US, whereas generics in the US can cost about $70.)

Drug development pipeline. From: Pharmaceutical Research and Manufacturers of America, Drug Discovery and Development: Understanding the R&D Process, www.innovation.org.

Drug development is expensive, averaging at over $2.5 billion per drug, and that's only counting for those that gain FDA approval. However, these exclusivity rights are not merely used to recapture R&D costs, as is often said, but instead to flagrantly gouge prices such that the pharmaceutical industry is tied with banking for the largest profit margins of any industry (as high as 43% in the case of Pfizer).

The narrative around high drug development costs also takes for granted that pharmaceutical companies are the ones bearing all of these costs. A considerable amount of the basic research that is foundational to drug development is funded publicly; the linked study found that public funding contributed to every drug that received FDA approval from 2010-2016. The amount of funding is estimated to be over $100 billion.

It used to be that inventions resulting from federal funding remained under federal ownership, but the 1980 Bayh–Dole Act offered businesses and other institutions the option to claim private ownership. The result is the public "paying twice" for these drugs. The Act does preserve "march-in rights" for the government, allowing the government to circumvent the patent and assign licenses independently if the invention is not made "available to the public on reasonable terms", but as of now these rights have never been exercised. In 2016 there was an unsuccessful attempt to use these march-in rights to lower the price of a prostate cancer drug called Xtandi, priced at $129,000/year.

All of this isn't to say that the work of the pharmaceutical industry isn't valuable; drugs are a necessary part of so many peoples' lives. I recently started using sumatriptan to deal with debilitating migraines, and am hugely grateful it exists (and is not ridiculously expensive). It's because pharmaceuticals are such a critical part to life that their development and distribution should not be dictated the values that currently shape it.

One particularly egregious example of this mess is the nightmare scenario of Purdue Pharmaceuticals, owned by the Sackler family (who are also prolific patrons of the arts), producers of OxyContin (accounting for over 80% of their sales last year), basically responsible for the ongoing opioid crisis (affecting at least 2.1 million Americans directly, and many more collaterally), and recently granted a patent for a drug that treats opioid addiction. The patented treatment is a small modification of an existing generic.

The day before our 7x7 presentation a story broke in the Guardian: "Sackler family members face mass litigation and criminal investigations over opioids crisis".

Computational drug discovery

One reason drug development is so difficult is that the space of possible drug compounds is extremely large, estimated to be between 10⁶⁰ and 10⁶³ compounds. For comparison, there are an estimated 10²² to 10²⁴ stars in the entire universe, and according to this estimate about 10⁴⁹-10⁵⁰ atoms making up our entire world.

PubChem's chemical space, from "Exploring Chemical Space for Drug Discovery Using the Chemical Universe Database"

Drug development is in large part a search problem, looking to find useful compounds within this massive space. A brute-force search is impossible; even if it took only a couple seconds to examine each possible compound you'd see several deaths of our sun (a lifespan of about 10 billion years) before fully exploring that space.

More effective techniques for searching this space include slightly modifying existing drugs for different therapeutic applications ("me-too" compounds) and literally looking at plants and indigenous medical traditions for leads (this general practice is called "bioprospecting" and this particularly colonialist form is called "biopiracy").

Of course with the proliferation of machine learning there is a big interest in searching this space computationally. Two main categories are virtual screening (looking through known compounds for ones that look promising) and molecular generation (generating completely new compounds that look promising). We focused on molecular generation for reasons described below.

A primary goal for matter.farm is to publicize this work in computational drug discovery and also help researchers use these generated compounds as potential leads for new beneficial drugs. With our system, which is also open source, independent and institutional researchers alike can access automated drug discovery technology and hopefully accelerate the drug development process.

Prior art and public discovery of drugs

(For this section we spoke with a patent lawyer who requested that we note that they are not representing us.)

One crucial criteria for a patent is that the invention must be novel; that is, the invention cannot have already been known to the public. An existing publicly-known instance of an invention is called "prior art" and can invalidate a patent claim. However, sufficient variations to an invention may qualify it as original enough to be patentable (this is the idea behind evergreening, described above).

If a drug is discovered and made public prior to a patent claim on it, it would function as prior art and make that compound un-patentable in its current form. If we were able to generate new molecules that could function as useful drugs, and make public those new molecules, then perhaps we can prevent companies from patenting them and maintaining a temporary monopoly on their distribution.

This is the second goal of matter.farm: develop an automated drug discovery system to find and publish useful drugs so they cannot be patented.

Additional efforts

Other efforts to the address problems with pharmaceutical industry can be found in initiatives like Medicare for All and the proposed Prescription Drug Price Relief Act, and the organizing happening around those. The issues with the pharmaceutical industry are just one piece of a more general hostility in American healthcare.

There is also a burgeoning DIY medicine movement which aim to build alternatives to industrialized medicine, providing autonomy, access, and reliability where those are normally withheld. For example, the artist Ryan Hammond is working on genetically modifying tobacco plants to produce estrogen and testosterone, and the Four Thieves Vinegar Collective (discussed in the Ashes Ashes "Pill of Sale" episode) provides instructions for a DIY EpiPen and a DIY lab ("MicroLab") for synthesizing various pharmaceuticals, including Naloxone and Solvadi.

That's it for the background of the project. The following section describes how the system works in more detail.

The project code is available here.

How it works

The complete matter.farm system involves three components:

A molecular generation model, using a version of the graph variational autoencoder ("JTNN-VAE") described in [2], modified to be conditional ("JTNN-CVAE"). This generates new compounds given a receptor and action, e.g. a nociceptin receptor agonist.
An ATC code prediction model. The Anatomical Therapeutic Chemical (ATC) classification system categorizes compounds based on their therapeutic effects. We use this to estimate what a generated compound might treat.
A retrosynthesis planner. Retrosynthesic analysis is the process of coming up with a plan to synthesis some target compound from an inventory of base compounds (e.g. compounds you can purchase directly from a supplier). This is necessary to meet the enablement requirement of prior art; that is, it's not enough to come up with a new compound, you also need to sufficiently demonstrate how it could be synthesized.

To train these various models we relied on a number of public data sources, including PubChem, UniProtKB, STITCH, BindingDB, ChEMBL, and DrugBank.

Common chemical compound representations, from Prediction methods and databases within chemoinfromatics: emphasis on drugs and drug candidates.

Chemical compounds can be represented in a number of ways, e.g. as a 2D structural diagram or a 3D model. One of the most portable formats is SMILES, which represents a molecule as an ASCII string, and is what we use throughout the project.

Clustering

For training the JTNN-CVAE model we needed to cluster the compounds in a meaningful way. At first we didn't look at receptors and actions but rather tried to leverage the vast published chemical research literature (in PubMed) and USPTO patents.

For the first attempt we tried learning word2vec embeddings from PubMed article titles and abstracts, then representing documents using TF-IDF as described in the WISDM algorithm, and finally clustering using DBSCAN or OPTICS. This ended up being way too slow, memory intensive, and limited in what clustering algorithms we could try.

The second attempt involved generating a compound graph such that an edge exists between compound A and B if they appear in an article or patent together. So instead of linking compounds based on the content of the articles they're mentioned in, they're linked solely on the virtue of being mentioned together in an article, under the assumption that this indicates some meaningful similarity. Then we ran a label propagation community detection algorithm to identify clusters within the graph. The graph was fairly sparse however and in the end looking at the connected components seemed to be enough. There were still some limitations with speed and memory that led us to abandon that approach.

Finally we decided to cluster based on receptors compounds were known to interact with. This reduced the amount of compounds we were able to look at (since the compounds for which receptor interactions are known are much less than the total of all known compounds), but the data was richer and more explicit than co-mentions in text documents. A drug's effects are determined by the receptor it targets and how it interacts with that receptor (does it activate it, does it block it, etc). For instance, OxyContin is a mu-type and kappa-type opioid receptor agonist (it activates them), and Naloxone (used to treat opioid overdose) is a mu-type and kappa-type opioid receptor antagonist (it blocks them).

G-Protein Coupled Receptor, from Random42

The ChEMBL data has information about both receptors and the type of interaction (agonist, antagonist, etc), whereas BindingDB has information about only the receptors but for more compounds. So we used a two-pass approach: for the first pass, we create initial clusters based on receptor-action types and on the second-pass we augment these clusters with the BindingDB compounds, assigning them to the cluster that matches their target receptor and has the highest fingerprint similarity to the cluster's current members. This resulted in 461 receptor-action clusters.

Molecular generation model

The JTNN-VAE model described in [2] is a variational autoencoder that handles molecular graphs. A variational autoencoder is a generative model that learns how to compress ("encode") data in such a way that it can be reliable decompressed ("decoded"). It accomplishes this by learning an underlying probability distribution that describes the data. Once the model has learned this distribution you can sample it to generate new data that looks like the old data. This post provides an overview on variational autoencoders.

We modified the model to be conditional, allowing us to sample from the learned probability distribution conditioned on the cluster (the receptor-action) we want to generate new compounds for.

ATC code prediction model

The ATC code prediction model is a straightforward multiclass neural network. Though ATC code prediction is technically a multilabel problem (a compound may have more than one ATC code), most compounds had only one code, so we treated it as a multiclass problem. ATC codes have 5 levels, from low detail to high detail; we predict level 3 codes ("pharmacological/therapeutic/chemical subgroup"), with one class for each level 3 code. We use 2048-bit Morgan fingerprints as representations for the compounds and achieved about 80% accuracy with this naive approach — not ideal, but fine for our purposes and time constraints.

Retrosynthesis planner

Here's where it came down to the wire. We attempted to implement the model ("3N-MCTS") described in [7], which involves Monte Carlo Tree Search (MCTS) and three policy networks. The policy networks predict what reaction rules might apply to a given compound. The paper's model is trained on Reaxys data, which is way too expensive for us, so we used a smaller dataset extracted from USPTO patents (from [26]). Our implementation is basically complete but we didn't have enough time to train the models.

In an 11th-hour Hail Mary we used MIT's ASKCOS system which got us mostly partial synthesis plans. At some point we should revisit the 3N-MCTS system to see if we can get that working.

Sampling and filtering

Once all the models were ready we sampled 100 new compounds for each of the 461 receptor-action clusters. Then we filtered down to valid compounds (with no charge) and to those not present in PubChem's collection of 96 million compounds. We ended up whittling the set down to about 15,000 new compounds for which we then predicted ATC codes and generated synthesis plans.

Future work

Time was fairly tight for the project so we didn't get to tune or train the system as much as we wanted to. And it would have been nice to try to experiment with more substantial modifications of the models we used. But for me this project was a wonderful learning experience and renewed an interest in chemistry. I want to spend some more time in this area, especially in materials science because of its relevance to lower-impact technologies, such as this cooling material.

References

Botev, Viktor, Kaloyan Marinov, and Florian Schäfer. "Word importance-based similarity of documents metric (WISDM): Fast and scalable document similarity metric for analysis of scientific documents." Proceedings of the 6th International Workshop on Mining Scientific Publications. ACM, 2017
Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for Molecular Graph Generation." arXiv preprint arXiv:1802.04364 (2018).
Kusner, Matt J., Brooks Paige, and José Miguel Hernández-Lobato. "Grammar variational autoencoder." arXiv preprint arXiv:1703.01925 (2017).
Goh, Garrett B., Nathan O. Hodas, and Abhinav Vishnu. "Deep learning for computational chemistry." Journal of computational chemistry 38.16 (2017): 1291-1307.
Yang, Xiufeng, et al. "ChemTS: an efficient python library for de novo molecular generation." Science and technology of advanced materials 18.1 (2017): 972-976.
Liu, Yue, et al. "Materials discovery and design using machine learning." Journal of Materiomics 3.3 (2017): 159-177.
Segler, Marwin HS, Mike Preuss, and Mark P. Waller. "Planning chemical syntheses with deep neural networks and symbolic AI." Nature 555.7698 (2018): 604.
Kim, Edward, et al. "Virtual screening of inorganic materials synthesis parameters with deep learning." npj Computational Materials 3.1 (2017): 53.
Josse, Julie, Jérome Pagès, and François Husson. "Testing the significance of the RV coefficient." Computational Statistics & Data Analysis 53.1 (2008): 82-91.
Cordasco, Gennaro, and Luisa Gargano. "Community detection via semi-synchronous label propagation algorithms." Business Applications of Social Network Analysis (BASNA), 2010 IEEE International Workshop on. IEEE, 2010.
Community Detection in Python
Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for Molecular Graph Generation." arXiv preprint arXiv:1802.04364 (2018).
What are the differences between community detection algorithms in igraph?
Summary of community detection algorithms in igraph 0.6
Wang, Yong-Cui, et al. "Network predicting drug’s anatomical therapeutic chemical code." Bioinformatics 29.10 (2013): 1317-1324.
Liu, Zhongyang, et al. "Similarity-based prediction for Anatomical Therapeutic Chemical classification of drugs by integrating multiple data sources." Bioinformatics 31.11 (2015): 1788-1795.
Cheng, Xiang, et al. "iATC-mHyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals." Oncotarget 8.35 (2017): 58494.
Szklarczyk D, Santos A, von Mering C, Jensen LJ, Bork P, Kuhn M. STITCH 5: augmenting protein-chemical interaction networks with tissue and affinity data. Nucleic Acids Res. 2016 Jan 4;44(D1):D380-4.
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y, Maciejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon A, Knox C, Wilson M. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2017 Nov 8. doi: 10.1093/nar/gkx1037.
Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH. PubChem Substance and Compound databases. Nucleic Acids Res. 2016 Jan 4; 44(D1):D1202-13. Epub 2015 Sep 22 [PubMed PMID: 26400175] doi: 10.1093/nar/gkv951.
Gilson, Michael K., et al. "BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology." Nucleic acids research 44.D1 (2015): D1045-D1053.
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45: D158-D169 (2017)
Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibrián-Uhalte E,
Davies M, Dedman N, Karlsson A, Magariños MP, Overington JP, Papadatos G, Smit I, Leach AR. (2017)
'The ChEMBL database in 2017.' Nucleic Acids Res., 45(D1) D945-D954.
Papadatos, George, et al. "SureChEMBL: a large-scale, chemically annotated patent document database." Nucleic acids research 44.D1 (2015): D1220-D1228.
Lowe, Daniel Mark. Extraction of chemical structures and reactions from the literature. Diss. University of Cambridge, 2012.
Lowe, Daniel (2017): Chemical reactions from US patents (1976-Sep2016). figshare. Fileset. CC0 License.
Liu, Bowen, et al. "Retrosynthetic reaction prediction using neural sequence-to-sequence models." ACS central science 3.10 (2017): 1103-1113.
Klucznik, Tomasz, et al. "Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory." Chem 4.3 (2018): 522-532.
Segler, Marwin HS, and Mark P. Waller. "Neural‐Symbolic Machine Learning for Retrosynthesis and Reaction Prediction." Chemistry–A European Journal 23.25 (2017): 5966-5971.
Law, James, et al. "Route designer: a retrosynthetic analysis tool utilizing automated retrosynthetic rule generation." Journal of chemical information and modeling 49.3 (2009): 593-602.
Schwaller, Philippe, et al. "“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models." Chemical science 9.28 (2018): 6091-6098.
Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016).
Yang, Zhilin, William W. Cohen, and Ruslan Salakhutdinov. "Revisiting semi-supervised learning with graph embeddings." arXiv preprint arXiv:1603.08861 (2016).
Kipf, Thomas, et al. "Neural relational inference for interacting systems." arXiv preprint arXiv:1802.04687 (2018).
Wei, Jennifer N., David Duvenaud, and Alán Aspuru-Guzik. "Neural networks for the prediction of organic chemistry reactions." ACS central science 2.10 (2016): 725-732.
Coley, Connor W., et al. "Prediction of organic reaction outcomes using machine learning." ACS central science 3.5 (2017): 434-443.
Gupta, Anvita. "Predicting Chemical Reaction Type and Reaction Products with Recurrent Neural Networks."
Plehiers, Pieter P., et al. "Automated reaction database and reaction network analysis: extraction of reaction templates using cheminformatics." Journal of cheminformatics 10.1 (2018): 11.
MIT's ASKCOS ("Automated System for Knowledge-based Continuous Organic Synthesis")

Culture: A Social Network Simulator

06.18.2018

projects

This is a proposal for Culture, a social network simulator designed and developed to teach students about bot development.

This proposal was originally developed for a class on "news bots" I was scheduled to teach in the fall of 2017 (I ended up having a conflict and was unable to teach it). I wanted students to not only explore the impact of bots from a theory perspective, but also engage hands-on to see just how radically influential these bots are on social media platforms.

And not only bots. Ideally students would take on the role of other actors in social media ecosystems, such as a "traditional" media publication, or as an advertiser, or as a political candidate, or as a influencer, or even as the platform itself, making decisions around aspects such as the newsfeed algorithm.

Unfortunately, there are a number of challenges that make hands-on experience infeasible with live social networks:

Ethical concerns. For example, many bots are meant to deceive and manipulate, and we'd be working with real user data.
Issues of access. For example, rate-limiting and limited access to data. For privacy reasons APIs generally don't provide sensitive user data to developers, though some such data may be provided to advertisers. And of course, with live social networks there isn't a way for students to change the newsfeed algorithms for the entire network.
Limits of reality. For example, a student can't magically become an influencer on Twitter, but in a simulated setting, they can.

There are also some technical obstacles, namely that students taking the class weren't required to have any programming background and I didn't want to spend too much time on introductory programming lessons. Even if students were fairly experienced in programming, working with bots has a lot of advanced challenges, such as dealing with natural language. A simulated social network can be simplified so that these problems are easier to deal with.

This proposal doesn't really have a strong advertising component. After speaking with Irwin Chen about it, I realized it's a pretty big omission. So an updated proposal will include all of that: selling ads, ad targeting, ad exchanges, etc. It's not an area I know well, so I'd have to speak with some people and do some research before sketching that out.

Overview

Culture will be an agent-based simulation of a simple social network modeled off of Twitter. As such, the simulation will consist of the following (each part is elaborated further below):

users communicate in a rudimentary language
users have different personalities
each user will have a feed of messages from people they follow and include promoted/ad messages
- here students can potentially design their own news feed algorithms and see how that affects individual/public opinion
users can message, post media, block, be blocked, be banned, follow, unfollow
messages and media influence users
the network responds to and affects outside events

Motivation

So much of our exposure to and understanding of the world beyond our immediate experience is mediated by social networks, which is to say by newsfeed algorithms and other individual users of these networks. Students should develop a stronger literacy in these dynamics if they are to adequately navigate this information ecology.

This literacy is best developed by direct interaction with these social networks, such as Twitter or Facebook, rather than through theory alone. However, working directly with these networks may be impractical in that they are massive, closed-source, and limited in access. For instance, due to API limits it is impossible to survey or conduct analyses of the entire population of the network, or to examine in detail its inner operations.

Furthermore, there is no room for counterfactual speculation in these existing social networks. For example, we can't intervene and change the behaviors of all users and see how information propagation changes as a result. This limits the pedagogical value of working directly with, for example, Twitter or Facebook.

A simulated social network addresses these concerns. It can be designed to model the dynamics of its real counterparts, it can be entirely open in that students have access to all the network's data, and its parameters can be tweaked to see how information propagation evolves under different circumstances. Students can develop bots on this network without worrying about API limits, spam protection, and so on. In contrast to the black-box nature of a real social network, a simulated social network functions more like a sandbox.

Agents

The simulated agents are individual users of the social network. They are randomly generated to have particular personalities and interests (see below). Their generation is part of the simulation's initialization. Students do not directly interact with these agents, but can indirectly interact with them via, for example, ads and bots they create (see below).

Language

Dealing with natural language is difficult even for experienced developers and advanced researchers in the topic. Broadly, the problem of natural language in the context of bots can described in two parts: understanding and generation. Both are very difficult and beyond the scope of the courses that this simulation is designed for, which includes introductory classes.

To avoid dealing with natural language, the simulation will consist of a very basic grammar and a relatively small vocabulary which can be easily expanded as needed. Because of its relatively simplicity, the same natural language processing techniques that are currently used for "real" languages can also be applied, but with greater success, and better yet, simpler heuristics will go a longer way. Thus students will not need to have a deep understanding of, for example, word vectors or TF-IDF, but may develop their own simpler techniques that will still be effective.

This simpler language will consist of verbs, nouns, and modifiers (adjectives and adverbs) (collectively, "terms"). Because the courses are assumed to be taught in English, this language will be reflective of English.

These terms are combined into formal propositional statements, e.g. single-payer-healthcare + country -> < freedom, which expresses the opinion that implementing single payer health care in this country will cause (->) a loss (<) of freedom. (This is just a sketch of the syntax; it's subject to change).

This is a bit limiting; there is no room for poetics, for instance, but will provide a strong starting point that can be expanded on later.

The design of this language will involve developing a network of terms (i.e. defining term associations), such that terms represent mixtures of other terms and values in the simulation (e.g. individuality/collectivism, see "Personalities" below). This term association network is opaque to the students; they do not get to see what these terms mean to the agents in the simulation. As with the real world, they must use algorithms or their own intuition from observing the network to determine what language best communicates their messages.

For example: the term "car" may be connected to the terms "individuality" and "freedom" to establish that the term "car" symbolically evokes these two ideas. We could then imagine ads for "cars" appeal more to agents with personalities that align more with those concepts relative to agents who, for example, align more with "collectivity" and "freedom".

Terms also have sentiment valences, e.g. "bad" may have a valence of -0.5 to express a negative opinion, whereas "terrible" may have a stronger valence of -0.8, and so on.

Ideally, this term association network is not objective but rather subjective; i.e. differs depending on the particular agent. For example, the term "freedom" may be associated with different values for one agent than for another. However, it is likely that this will be computationally infeasible (though some kind of heuristics could be developed to simplify it).

This term association network also changes over time as terms are used in slightly different contexts. This provides a way for the meaning of terms to change or be entirely inverted, e.g. a negative term being co-opted as a positive identifying term for a group.

The language is the part of the simulation that will require most care in designing - it needs to represent important aspects of how language is used in social networks (e.g. to express opinion/judgement, to harass/abuse, to make propositional statements, etc).

Personalities

Simulated agents will have could loosely be described as "personalities"; that is, a set of parameters that determines how the agent interacts with others (e.g. aggressiveness/friendliness, within-bubble/outside-bubble, etc) and what their values are (e.g. conservative/progressive, individualist/collectivist, etc). These personalities will be generated randomly, via a Bayes Net (or some similar probabilistic model) that will be editable in some way. A model like a Bayes Net lets us describe assumed relationships between values (e.g. more collectivist agents are more likely to be friendly).

These personalities also determine who agents tend to interact with (under principles of homophily, i.e. like attracts like) and also what kind of messaging resonates with them (e.g. messages about rugged individuality will resonate more with individualist agents).

Messages

"Messages" are the equivalent of Twitter's tweets. Agents compose their own messages based on their personalities and who they are interacting with. Messages may affect an agent's mood and also their personality (see below).

Media

Text is not the only important part of a social network - memes and other media (news stories, videos, etc) form a crucial part of their information flow.

States

Agents' states include their personalities, in addition to other attributes like mood and use frequency (how often they visit the social network) and post frequency (how often they post messages). Mood may affect, for example, how agents interact with other agents (e.g. with more or less hostility). This can be used to model emotion contagion.

Influence

Based on who they interact with and what other messaging they are exposed to (e.g. targeted ads), the personalities (traits/opinions) of an agent may shift over time. Various social phenomena, e.g. bipolarization, can be modeled here.

Events

Social networks are not closed systems; they do not exist in isolation. The "outside" world affects what goes on in network, just as what goes in the network can spill out and effect the outside world.

Part of the simulation will support external events (also simulated) that affect and can be affected by the social network, such as an election. The outside event(s) affect what are popular topics (i.e. topics that are relevant and agents are more likely to talk about and respond to) and they can be defined to have some relationship to the shape of discourse in the network.

In this section, "social network" is used not to refer to the platform itself, but to the actual network of relationships between users (expressed by "following" relationships). Some users may be highly connected (many followers), and students may, for example, as part of their strategy (whether for ads or opinion influence) try to target these opinion leaders.

Visualization

It will likely be too computationally taxing to display all activity on the social network, but students will have access to various views that provide summaries (i.e. mean sentiment towards some topic, number of users talking about a topic, etc). Ideally an API can be provided like a real social network, so that students can build their own visualizations as part of their bot development process, but this may be limited by the size of the simulation.

Bot API

A simple API will be provided for students to develop their own bots that interact with this network. These bots can follow, be followed, message, etc like agents can and will be the primary way students interact with the social network.

What distinguishes bots from simulated agents is that bots are designed and controlled by students, whereas the simulated agents represent "real" users of the network.

Learning Objectives

The network functions as a simplified social landscape for students to understand how ads, bots, and news feed algorithms affect opinion, trends, and discussion on a social network, and how that links up with broader spheres of discourse outside of the network. Some students may, for example, design bots that influence opinion in a certain direction, while others may design bots to influence opinion in a different direction, while still others may design bots that root out these interfering bots. Depending on how the network is designed, some students can be the managers of the social network platform.

The goal is for students to develop a comprehensive mental model about the dynamics of social media and communication in the internet age, to peek "behind the curtain" and develop a critical perspective when using social media and reading the news (i.e. develop social media literacy).

Extensions

In theory this simulated social network can be extended with features that could be present on any social network, such as anonymous accounts, different kinds of blocking and muting functionality, and so on. Thus it can also be a place where students can experiment with new features to see how that affects dynamics on the network.

Variants

Ideally the simulator accommodates students who are comfortable with programming and those who aren't.

For students who aren't, bot templates could be provided which require little to no programming experience, or another layer can be developed where they "purchase" different bot, marketing, and so on services that run automatically.

If there are multiple classes going on, they can all work from the same simulation and take on different roles. If one class is focused on advertising, they can take on roles of the advertising ecosystem, while in another class perhaps they collectively take on the role of the platform. The potential for cross-class interactivity is exciting.

7x7 Cutting Room Floor

05.22.2018

projects

I was fortunate enough to participate in this year's edition of Rhizome's Seven on Seven with Sean Raspet. The event pairs an artist and a technologist and gives some limited time for the pair to come up with and implement a concept or project. In previous editions pairs only had a day or so; this time we had about a month.

It was enough time to churn through several ideas that never made it to the final presentation. We landed on producing a white paper proposing leveraging blockchain-based distributed computing to collectively simulate a complete human cell at the atomic level, starting with something relatively simple like a red blood cell. A human cell might have hundreds of trillions of atoms and so simulating one at the atomic resolution is basically infeasible with existing computational resources. But it is more feasible now than it was maybe a decade ago.

Through our research we came across some staggering statistics about the computing power of the Bitcoin network, namely that it is estimated to have an aggregate computing power of 80.7 zettaFLOPS (80.7 million petaFLOPS) as of May 2018. The world's reigning supercomputer, the Sunway TaihuLight, has a theoretical peak of 125 petaFLOPS. The Folding@Home network, which enables people to donate spare computing power for protein folding simulations, had an aggregate power of about 100 petaFLOPS in January 2018. Not bad for a volunteer distributed network, but still far off from the Bitcoin network. There are more details in the white paper, but those numbers stuck out.

Anyways, we went through a few ideas before we landed on this white paper. Our first focus was on the phosphorus commodity market in relationship to "peak phosphorus". This was something Sean had been researching for some time now, and for the past few months I've been poking around the agri-tech scene, so I was naturally drawn to it as a topic. The gist is that phosphorus is a mineral crucial to agriculture, a key component in fertilizers (along with nitrogen and potassium; the history of nitrogen fertilizer is very interesting and troubling one), and is basically a non-renewable resource (some can be recovered from waste but I'm not sure what percentage of it is recoverable). At some point in the relatively near future phosphorus extraction may become too expensive or difficult and that could lead to some serious food security crises. So we were thinking of various ways to represent this issue. Here are a few ideas we played around with.

Global phosphorus simulation

Phosphorus, like any resource-extractive industry, is global. We wanted to be able to convey geopolitical issues like Morocco's occupation of Western Sahara, which is where Morocco mines its phosphorus. The most straightforward way to do something like that is a 4X-style global simulation, so we played around with that first.

I designed a little framework for laying out a hex-based map (similar to the cartog library I created for my Simulation & Cybernetics class, but I wanted to support 3D):

That's about as far as we got in terms of implementation. But the general idea was that we'd model the dynamics of the global phosphorus market, with some shocks and random events, and projections of changes in relevant indicators like growth rates, meat consumption rates, and so on. And somehow you'd see these effects on this map and through changes in the price of commodity phosphorus.

I didn't want this hex map to be the only "output" of the simulation. We wanted to show that the macro-level dynamics of the phosphorus market are intimately connected to the health of individual plants, and so I wanted to setup an automated growing system as a more material visualization. The system would be hydroponic or aeroponic, with a phosphorus nutrient pump that releases more or less phosphorus depending on its simulated price. As peak phosphorus approaches, the plant's health starts to deteriorate as it manifests symptoms of phosphorus deficiency. There were a few issues here, namely that 1) it's a pretty big task to set up such a growing system, and 2) the changes in the plant's health would happen over long time scales relative to the simulation (e.g. one simulation year might run in one real minute, and the impacts on the plant's health might not be visible for a few real days).

A build on this idea we considered is that we'd reserve some set amount of funds for the plant, and it would actually have to "purchase" phosphorus from the nutrient reservoir on its own.

Commodity traders vs food consumers

For awhile I've wanted to make an asymmetric game which consists of two separate games that are at first glance unrelated. For example, on one side of the room is a relatively innocuous-looking life simulator game where you have to e.g. buy a house and care for your family. On the other side of the room is a stock market game where you just try to earn the highest return on your investments. What isn't apparent at first is that the actions of the player in the stock market game directly affect how difficult the life-simulator game is, for example, by triggering financial crises or affecting house prices.

We briefly considered doing something along these lines. The idea was that when we presented, we'd direct audience members to a website where they could join our phosphorus game. Some audience members would be redirected to the "commodity trader" version of the game, while others would instead be redirected to the "food consumer" version.

The commodity trader game is basically same as the stock market game, except just for phosphorus trading.

The food consumer game is built around a "basket", like a simplified version of a consumer price index focused on products especially affected by phosphorus prices. As a player you'd have some nutritional requirements to meet or some other purchasing obligations and some weekly budget with which to buy food. We didn't really get far enough to thoroughly think through the mechanics.

I did have a really fun time modeling the food:

Plant care Tamagotchi

Riffing off the plant-as-visualization idea, we also toyed around with the idea of some kind of plant-tamagotchi. You'd have to manage its water and phosphorus needs by doing some sort of trading or other gameplay. I can't really remember how far we got with the design.

I did enjoy making this wilting animation though:

Physics-based food thing

I honestly can't remember what the concept was for this. The most I can recall is that we discussed a system where you could rapidly click on some food or raw material objects to create derivative objects (such as beef and milk from a cow) and that somehow we'd connect that to the relative use of phosphorus in these products. For example, a cow requires a lot of feed which requires a lot of phosphorus, which results in a less efficient phosphorus-to-calorie ratio than if you had just eaten the feed grains yourself. I think I was really just excited about making something physics-based.

I'll definitely use this again for a different project.

Scripts I Have Known And Loved

07.14.2017

projects

I've been using Linux as my main driver (Ubuntu 14.04, recently and catastrophically upgraded to 16.04, with no desktop environment; I'm using bspwm as my window manager) for about two years now. It's been challenging and frustrating, but ultimately rewarding — the granular control is totally worth it.

Over these two years I've gradually accumulated a series of Bash and Python scripts to help me work quickly and smoothly. They generally operate by two principles: accessible from anywhere and usable with as few keystrokes as possible.

All the scripts are available in my dotfiles repo in the bin folder.

Screenshots and recordings

A few of the scripts are devoted to taking screenshots (shot) or screen recordings (rec). This is a basic feature in OSX (and many Linux desktop environments, afaik), but something I had to implement manually for my system. The benefit is being able to customize the functionality quite a bit. For example, the rec script will automatically convert the screen recording into an optimized gif (using another script, vid2gif).

The shot script lets me directly copy the image or the path to the screenshot immediately after it's taken, but sometimes I need to refer to an old screenshot. It's a pain to navigate to the screenshot folder and find the one I'm looking for, so I have another script, shots, which lets me browse and search through my screenshots and screengifs with dmenu (which is a menu that's basically accessible from anywhere).

Passwords and security

Entering passwords and managing sensitive information is often a really inconvenient process, but some scripts make it easier.

I use KeePassX to manage my passwords, which means when I want to enter a password, I have to open up KeePassX, unlock the database, search for the password, and then copy and paste it into the password input.

This is a lot of steps, but my keepass script does all this in much fewer keystrokes.

I open it super+p, enter my master password, directly search for my password, and select it. Then it pastes the password into the input form automatically.

It can also create and save new passwords as needed.

I use 2FA on sites that support it, which means there's another step after entering a password - opening up the authenticator to get the auth code. I have another script, 2fa, which I open with super+a, that copies the appropriate auth code into my clipboard so I can paste it in straight away.

These two scripts make logging in and good account security way easier to manage.

For local data that I want to encrypt, I have a script called crypt that lets me easily encrypt/decrypt individual files or directories with my GPG key. I use this script in another script, vault, which makes it easy to encrypt/decrypt a particular directory (~/docs/vault) of sensitive information.

Finally, I have a lock script for when I'm away from my computer that pixelates the screen contents and requires my password to unlock (this was snagged from r/unixporn).

Working with `hubble`

My main driver is a relatively low-powered chromebook (an Acer C720), so for heavy processing I have a beefy personal server ("hubble") that I access remotely. It's not publicly accessible - as in, it's not a box provided by a service like Digital Ocean but a literal computer under my desk. This introduces some challenges in reliably connecting to it from anywhere, so I have a few scripts to help out with that.

hubble can consume quite a bit of power so I don't like to leave it running when it's not in use. It's easy enough to shut off a server remotely (shutdown now) but turning it on remotely is trickier.

There's a really useful program called wakeonlan that lets you send a special packet to a network interface (specifying its MAC address) that will tell its machine to boot up. However, you still need a computer running on the same network to send that packet from.

I keep a Banana Pi running at all times on that network. Its power consumption is much lower so I don't feel as bad having it run all the time. When I need to access hubble, I ssh into the Pi and then run wakeonlan to boot it up.

This Pi isn't publicly accessible either - there's no public IP I can ssh directly into. Fortunately, using the script tunnel, I can create an ssh tunnel between the Pi and my hosting server (where this website and my other personal sites are kept), which does have a public IP, such that my hosting server acts as a bridge that the Pi piggybacks off of.

Finally, sometimes I'll run a web service on hubble but want to access it through my laptop's browser. I can use portfwd to connect a local laptop port to one of hubble's ports, so that my laptop treats it as its own. It makes doing web development on hubble way easier.

Misc.

I have several other scripts that do little things here and there. Some highlights:

q: quickly searches my file system, with previews for images.
sms: lets me send an arbitrary notification over Signal, so I can, for example, run a long-running job and get texted when it's finished: ./slow_script && sms "done!" || sms "failed!". This also accepts attachments!
twitch: lets me immediately start streaming to Twitch.
caffeine: prevents the computer from falling asleep. I have it bound to super+c with a eye indicator in my bar.
phonesync: remotely sync photos from my phone to my laptop and media from my laptop to my phone (they must be on the same LAN though).
emo: emoji support on Linux is still not very good; this script lets me search for emoji by name to paste into an input. Still thinking of a better solution for this...
bkup: this isn't in my dotfiles repo but I use it quite a bit — but it lets me specify a backup system in YAML (example) that is run with bkup <backup name>.
office: unfortunately this repo is not yet public (need to clean out some sensitive info) but this is a suite of scripts that automates a lot of freelance paperwork-ish stuff that I used to do manually in InDesign or Illustrator:
- generate invoices from a YAML file
- generate contracts from a YAML file
- generate a prefilled W9

Genesis

Reading Group

Story

Technical Requirements

Art Direction

Legibility

Game Design

Regional System

Turn-based System

Card System

Tooling

Final note

The pharmaceutical industry

Computational drug discovery

Prior art and public discovery of drugs

Additional efforts

How it works

Clustering

Molecular generation model

ATC code prediction model

Retrosynthesis planner

Sampling and filtering

Future work

References

Overview

Motivation

Agents

Language

Personalities

Messages

Media

States

Influence

Events

Social Network

Visualization

Bot API

Learning Objectives

Extensions

Variants

Global phosphorus simulation

Commodity traders vs food consumers

Plant care Tamagotchi

Physics-based food thing

Screenshots and recordings

Passwords and security

Working with hubble

Misc.

Working with `hubble`