GitHub is where most of today’s open source software development happens, but social media platforms like Twitter also play an important role in open source by facilitating communication and coordination among developers.
This week’s paper describes an exploratory study that cross-links users on Twitter and GitHub to see how they use Twitter to communicate about open source GitHub projects.
Linking users on Twitter and GitHub
The GHTorrent project mirrors public GitHub metadata. The researchers extracted user profile names from one of the project’s data dumps and then tried to link those profile names to Twitter accounts.
There are many ways to make such cross-links. In this case two methods were used that are fairly precise, but also simple to implement (at the cost of a little bit of recall):
- Explicit links to Twitter accounts from GitHub user profiles
- Links to Twitter accounts from personal websites that are linked from GitHub user profiles.
The resulting dataset contains 70,427 GitHub-Twitter user account pairs and is freely available on Zenodo.
An exploratory analysis
First, the 3,200 most recent tweets were retrieved for each of the 70,427 users. The researchers then discarded any tweets that:
- were published before 2018 or after July 2019;
- did not contain the substring
- included references to multiple repositories.
Then, users were assigned to one of five mutually exclusive groups based on metadata from the GHTorrent data dump and the GitHub API:
- Owner: an individual who owns a repository. Organisational GitHub accounts are excluded from the analysis.
- Collaborator: someone who has write access to a repository, i.e. can close others’ issues or pull requests.
- Contributor: someone who contributed to a repository by creating a pull request, commit, or issue.
- Follower: follows the repository’s owner or one of its collaborators or contributors.
- Other: none of the above.
Finally, the researchers sampled 200 tweets from each of these five groups and qualitatively coded the motivation and purpose of the tweets. Some tweets turned out to be written in a language other than English. After discarding these there were 786 valid tweets.
Why and how developers tweet about repositories
Tweets that mention GitHub repositories are generally written for one of six reasons:
- Question (3%): This can be about technical details or an open issue. Most tweets link to a GitHub issue.
- Answer (12%): Responses to questions by other Twitter users. Tweets often link to the repository homepage, a file or an issue.
- Call for action (4%): The author of the tweet asks the community to do something, like starring a repository or helping to solve an issue. Tweets typically link to an issue or the repository homepage.
- Repository advertisement (47%): Tweets that advertise the repository by linking to its homepage.
- Work discussion (16%): Tweets about how to use a repository. Tweets can link to pretty much anything from a repository.
- Information sharing (18%): These are tweets that provide information about a repository or a specific part of it, without explicitly advocating for its use.
As you can see, most tweets that link to a GitHub repository are intended to advertise the repository. Unsurprisingly, most of these tweets are written by regular users of the repository and the repository owner themselves.
Owners spend relatively little time on answering questions, calls for action, work discussions, and information sharing – these types of tweets are often written by collaborators and (to a lesser extent) contributors. This suggests that collaborators play an important role in the project’s usability and user satisfaction.
The authors of this paper have published a dataset with more than 70,000 GitHub-Twitter user account pairs
An exploratory study suggests that different types of repository stakeholders write different types of tweets