Almost four years ago I got it in my head that it would be fun to try to build a way to automatically detect breaking news. I thought of creating a predictive algorithm based on deviations of word usage across the internet - if usage of the word "pizza" suddenly deviated from its historical mean, something was going on with pizza! Looking back, it was a really silly idea, but it got me interested in working programmatically with natural language and eventually (somehow) morphed into Argos, the news automation service I started working on a little over a year ago. Since I recently got Argos to a fairly-well functioning tech demo stage, this seems like a good spot to reflect on what's been done so far.
The fundamental technical goal for Argos is to automatically apply structure to a chaotic news environment, to make news easier for computers to process and for people to understand.
When news pundits talk about the changing nature of news in the digital age, they often try to pinpoint the "atomic unit" of news. Stretching this analogy a bit far, Argos tries to break news into its "subatomic particles" and let others assemble them into whatever kind of atom they want. Argos can function as a service providing smaller pieces of news to readers, but also as a platform that other developers can build on.
The long-term vision for Argos is to contribute to what I believe is journalism's best function - to provide a simulacrum of the world beyond our individual experience. There are a lot of things standing in the way of that goal. This initial version of Argos focuses on the two biggest obstacles: information overload and complex stories that span long time periods.
At this point in development, Argos watches new sources for new articles and automatically groups them into events. It's then able to take these events and build stories out of them, presented as timelines. As an example, the grand jury announcing their verdict for Darren Wilson would be one event. Another event would be Darren Wilson's resignation, and another would be the protests which followed the grand jury verdict in Ferguson and across the country. Multiple publications reported on each of these events. A lot of that reporting might be redundant, so by collapsing these articles into one unity, Argos eliminates some noise and redundancy.
These events would all be grouped into the same story. The ongoing protests around Eric Garner's murder would also be an event but would not necessarily be part of the same story, even though the two are related thematically.
A five-point summary is generated for each event, cited from that event's source articles. Thus the timeline for a story functions as an automatically generated brief on everything that's happened up until the latest event. The main use case here is long-burning stories like Ferguson or the Ukraine conflict which often are difficult to follow if you haven't been following the story from the start.
Argos can also see what people, places, organizations, and other key terms are being discussed in an event, and instantaneously provide information about these terms to quickly inform or remind readers.
Finally, Argos calculates a "social (media) importance" score for each event to try and estimate what topics are trending. This is mainly to support a "day-in-brief" function (or "week-in-brief", etc), i.e. the top n most "important" (assuming talked about == important) events of today. Later it would be great to integrate discussions and other social signals happening around an event.
I've been testing Argos mainly with world and political news (under the assumption that those would be easier to work with for technical reasons). So far that has been working well, so I recently started trying some different news domains, though it's too early to say how that's working out.
The API is not yet public and I'm not sure when it will be officially released. At the moment can't devote a whole lot of time to the project (if you're interested in becoming involved, get in touch.
Argos does have an unreleased Android app (and an older version for iOS) which at this point is mainly just a tech demo for small-scale testing. Frankly, I don't know if Argos will work best as a consumer product or some intermediary technology powering some other consumer service.
(Later I'll write a post detailing the development of Argos up until now.)