Skip to content

Affiliations

The affiliations data set is one of the key parts of the Bitergia Analytics Platform. Here you will find the information to modify the information about specific contributors or organizations.

The huge variety of tools used by software development communities makes really difficult any analysis. It is not enough to measure the activity in the code, issues, Q&A forums, mailing lists, etc. Without having a way to connect the activity of the contributors in the different tools (data sources as we call them) it will not be possible to offer high quality metrics. Just as an example, the three metrics below are only possible by having the affiliations data set:

  • Total number of contributors of a community. If your team members are using GitHub and Jira, are you sure you are not counting them twice?
  • Impact of an organization/company in the project. To get this metric you need the link between community members and organizations.
  • Retention rate of your contributors. How can you tell that someone is new to the community if you do not have a way to identify people using different accounts?

Affiliations Editor#

Identities are collected and stored by the Bitergia Analytics Platform. In case you want to improve the quality of the data set you will need to use the affiliations editor (Hatstall). Find below the typical actions our customers perform to achieve it.

The most common issue is to have some of the organizations under-represented. Follow the steps below to fix this:

  • Go to the Affiliations dashboard.
  • Select the organization Unknown in any of the two pie charts. Organizations are shown in the inner circle.
  • Identify the domains that should be associated to an organization.
  • Open the identities editor. It will be available at https://[INSTANCE].biterg.io/identities.
  • Go to the Organizations section on top and search the one you are missing.
  • If you do not find it, add that organization to the database.
  • If you find it, extend the information available for that organization.
  • When the domain is added to the organization a process will refresh the metrics you see on the dashboards. This takes a couple of hours.

Adding a new organization#

An organization can be added via HatStall which will be available at https://[INSTANCE].biterg.io/identities.

Go to the Organizations page, and click the Add button on the top right corner to add an organization.

To add auto affiliation you have to add the based domain of the organization.

New organization

Removing an organization#

Organization can be removed via HatStall. Note that all the enrollments of contributors affiliated to the deleted organization will be lost. Go to the Organizations page, and use the Search box to look for an organization. Click on the button Edit (on the right side) and then on the button Remove.

Remove organization

Adding an e-mail domain to an organization#

The platform can map email domains to organizations. By doing that every identity with any of those mapped domains will be associated to the proper organization.

Go to the Organizations page, and use the Search box to look for an organization. For each organization listed, the button Edit (on the right side) allows to add/remove the organization domains or delete the organization completely.

Manage organizations

Enrolling a person to an organization#

Go to the Profile page of the user, click on the Add button. Then, you can set the start and end dates and link the profile to an organization. More details here

Enroll identity

Un-enrolling a person from an organization#

Go to the Profile page of the user, and click on the un-enroll button.

Unenroll

Marking bots#

There is an option to mark an identity as bot. This is available in the Profile section in Hatstall. This helps to filter out automated activity in the dashboard while keeping such information in the database.

Search the identity that you want to mark as bot, click on Edit, put a check mark in the box Bot? and click on Submit.

Mark as bot

Preventing that affiliations removed from a contributor are restored#

If you changed the affiliation for some contributors and you discover the changes were restored, you can try the following. Instead of removing an organization automatically added to the contributor, change its timeframe. For instance, you can set the enrollment from 1900-01-01 to 1900-01-02, thus there will not be any data containing that enrollment.

How dates work#

Dates are by default set to the beginning of the day, i.e. 00:00:00. Thus, we should setting the periods as:

  • Organization A: 1900-01-01 to 2019-10-01
  • Organization B: 2019-10-01 to 2100-01-01

Because end date for Organization A will be set under the hood as 2019-10-01 00:00:00 (not inclusive) and start date for Organization B will be set as 2019-10-01 00:00:00 (inclusive).

So 2019-10-01 will be the first day that profile is enrolled to Organization B and 2019-09-30 will be the last day of that profile enrolled to Organization A.

It could be summarized as the following condition, where date is the date to check and decide the enrollment: start_date <= date < end_date.

Matching of identities#

Our support team will study the data sources you are tracking and set the best algorithm to identify the different accounts used by your community members. The goal is avoid duplicated identities but also having the wrong ones unified.

Types of matching#

There are four types of matching that will allow the system to unify identities: email, email-name, github, username.

  • email: same email address.
  • email-name: same email address or same full name.
  • github: same username extracted from any of the APIs of GitHub.
  • username: same username of any source.

Troubleshooting#

Contributor activity is labeled with a single company when she/he is enrolled with more companies#

When two enrollments overlap in time, the system will use one of them to update the enriched information on the dashboard. That means that only one of them will be visible. In case you detect identities with overlapping dates edit them so the profile is not enrolled to more than one company at the same time.

Changes are not visible in the metrics dashboard#

Changes performed over the identities database need to be synchronized with the metrics data set.

For each data source (e.g., git, mboxes) there are 3 main phases: collection, enrichment and identities refresh. The last one takes care of synchronizing the data stored in ElasticSearch with the one available in HatStall. The time to perform the 3 steps depends on the amount of data available in the data source, however it generally takes less than 3 hours.