Since my master’s at MIT, I have participated in a handful of projects whose goal was not to publish a paper but to deliver a product. They were not common products. All of them hinged on civic goods, such as government transparency, data literacy, and civic engagement. They gave me the opportunity to put to good use my eight-year experience as a professional software developer in Brazil while also sparking my reflections as a communication scholar.

Machine Learning for Local Newsrooms

Large national news outlets usually have groups of data scientists in-house producing insights that optimize audience engagement and drive reader revenue. In contrast, local newsrooms do not have the resources needed to tap into the insights provided by machine learning models.

I worked at the Local News Lab to create open-source machine learning solutions that can be used by local newsrooms to improve article recommendations, reader retention, and the number of subscriptions.

Methods: Statistical learning, collaborative filtering, semantic similarity.

Tools: Python, AWS, PyTorch.

Promise Tracker – Data Collection for Civic Action

At the MIT Center for Civic Media, I worked on the creation of Promise Tracker, a tool to promote citizen participation and governmental accountability. The project was conceived by my master’s thesis advisor Ethan Zuckerman. As its name suggests, Promise Tracker helps communities record commitments made by public officials and track the progress and completion of those commitments on a mobile app.

I led workshops with community leaders and nonprofit organizations in Brazil for concept validation and usability testing. I spoke about that experience at TEDxReset in Istanbul:

If you want to understand the theoretical underpinnings of that project, take a look at this paper that I wrote and presented at the annual conference of the International Communication Association in 2018. It describes the origins of the concept of monitorial citizenship and how Promise Tracker was inspired by it.

Since Brazilians constitute the lion’s share of Promise Tracker’s user base—thanks to my role in the first phases of the project and to project lead Emilie Reiser’s love for my home country—the MIT Center for Civic Media decided to transfer the ownership of Promise Tracker to a local partner—the CoLab research group, led by professor Gisele Craveiro at the University of São Paulo (USP). The project is thriving and its website has been regularly updated with Promise Tracker’s latest news and awards.

Methods: Codesign workshops, cognitive walk-throughs, group interviews.

Source code (GitHub): All repositories associated with the project.

open.contractors – Making Contracting Data Available to Reporters and the Public

My project partner, Allison McCartney, and I decided to build open.contractors after realizing how inconsistent and messy the data on federal spending was. At that time, the public records were published at usaspending.gov as a mammoth single CSV file.

I was responsible for cleaning the data and making it consistent. I created a relational database based on it. Then, my friend Allison built a web dashboard where users could slice and dice that data. We decided to focus on the Department of Defense—the largest discretionary share of the national budget.

We were one of the first recipients of the Magic Grants offered by the Brown Institute for Media Innovation at Columbia and Stanford universities. Thanks to that grant, we had a working prototype by the end of 2017.

In early 2018, we realized that a significant share of our user base—mostly data journalists—had enough command of SQL to query the data directly. For that reason and with funding from the Tow Center for Digital Journalism, I made it available on AWS Athena and wrote a road map on how to access that data.

Late in 2018, however, usaspending.gov released a new version of their web platform with considerable improvements. Chief among them was a PostgreSQL data dump with tables and relationships analogous to the ones that I had manually recreated based on the original CSV file. They also sanitized the data. It was much more consistent than ever before. Those changes were certainly welcome from a civic perspective, but they rendered open.contractors largely superfluous.

Tools: PostgreSQL, Django/Python, Elasticsearch, Linux scripts.

Source code (GitHub): Cleaning the data, making it consistent, and importing it into PostgreSQL and Athena.

Workbench – Data Platform for Journalists

I was one of the early collaborators on this project, an initiative of Columbia Journalism School.

In 2016, I was sent by the school to the annual conference of the National Institute for Computer Assisted Reporting (NICAR). I interviewed fourteen data journalists from different mainstream media outlets. Based on those conversations, I wrote a memo to Steve Coll—then dean of the Journalism School—about the unmet needs of data journalists. That report helped inform the school’s decision to support the creation of a new platform that would not only address the demands outlined in my memo, but also serve as a training ground for new data journalists.

The project was entrusted to professor Jonathan Stray, whose commitment to journalism is only rivalled by his passion for technology. I was invited to work with him. Pierre Conti, an accomplished designer, also joined the initial team. I stayed with them for a year. During that time, I worked as a software developer and helped build the product based on the input received from early test users.

I left a few months before the official launch in 2018 to focus on my dissertation research. The team had naturally grown by then. The platform elicited a warm response from the data journalism community. Unfortunately, like many projects in this space, it was unable to generate enough revenue—through a freemium model—to make the project sustainable. As a result, it was discontinued in August 2018.

Methods: In-depth interviews with data journalists.

Tools: Django/Python, React/JavaScript, PostgreSQL.

Source code (GitHub): Project’s repository.