Document Type

Dissertation

Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Disciplines

Computer Sciences

Publication Details

A dissertation submitted in partial fulfilment of the requirements of Technological University Dublin for the degree of M.Sc. in Computer Science (Data Science), 2021.

Abstract

Stack Overflow is the world’s largest community of software developers. Users ask and answer questions on various tagged topics of software development. The set of questions a site user answers is representative of their knowledge base, or “wheelhouse”. It is proposed that clustering users by their wheelhouse yields communities of similar software developers by skill-set. These communities represent the different roles within software development and could be used as the basis to define roles at any point in time in an ever-evolving landscape of software development. A network graph of site users, linked if they answered questions on the same topic, was created. Eight distinct communities were identified using the Louvain method. The modularity of this set of communities was 0.46, indicating the presence of community structure that is unlikely to occur randomly. This partition was validated with the results of previous research that used data from the same time period. By extracting the top 5 tags from each identified community, the harmonic F1-score between the communities and the external dataset was found to be 0.75. It was statistically proven with 95% confidence that the communities identified were not identical to the results from the previous research. Nonetheless, there exists a strong similarity to the previous research. Hence, it was suggested that Stack Overflow data could be used to identify and define roles within software development. Upon applying this method to 2021 data, a previously unknown community of experts in R, C and Rust was identified. The method used in this research could be applied directly to any of the 177 Stack Exchange sites and could be used to form the basis of job roles for a wide range of industries.

DOI

https://doi.org/10.21427/RZNJ-0036


Share

COinS