A study conducted with GitHub Copilot by GitHub and Microsoft showcased how an AI pair programmer or AI coding assistant can help improve developers’ productivity.
What is an AI pair programmer?
It is a term used for AI-powered tools designed to augment a programmer's capabilities. Pair programming usually involves two programmers working together on the same task. One acts as the "driver" who writes the code, while the other acts as the "navigator" who reviews the code, identifies potential issues, and suggests improvements.
AI pair programmer tools, like GitHub Copilot, aim to replicate this collaboration by offering features like code generation, auto-completion, code review and analysis, and information retrieval.
Methodology of the experiment
A month-long controlled experiment was conducted from May 15, 2022, to June 20, 2022. For this, 95 professionals were recruited on contract using Upwork - a freelancing platform. Once selected, the participants were randomly assigned groups: control and treatment.
The task of these groups was to create an HTTP server in JavaScript - the treatment group would use GitHub Copilot to achieve the tasks, while the control group was not supposed to. Other than this condition both groups were allowed to use any other resource such as the internet, or assistance from Stack Overflow, to complete the assignment.
The treated group were sent the following instructions regarding GitHub Copilot:
Introduction video: Since the group was unaware of GitHub Copilot, they were sent a 1-minute introduction video of the AI code assistant.
Tool access: They were also sent an automated email that had installation instructions granting them access to the tool.
How was the experiment conducted?
Each group’s performance was measured using two metrics:
Task success: Measured in the percentage of participants in a group who successfully completed the task.
Task completion time: Measured in the time taken to complete the task from start to finish.
Since the task was standardized across the groups it helped to accurately measure the performance which otherwise made it difficult to measure developers’ productivity.
To ensure a controlled and fair environment, the study leveraged GitHub Classroom, a platform designed for managing coding assignments. This platform provided a crucial advantage: accurate timekeeping and completion tracking for each participant.
Here's how it worked:
Invitation and setup: Participants received a link to a specific GitHub Classroom instance with a single assignment linked to a template repository. Upon joining, they received a private copy of this repository containing the task description and a starter codebase. The creation timestamp of this personal copy served as the starting point.
Blinded development: Each participant's repository remained private to them - not accessible to other participants. The repository contained a pre-built test suite with twelve checks to ensure code correctness. Participants could see the tests but not modify them.
Submission and evaluation: When participants committed and pushed their code changes, GitHub Classroom automatically ran the test suite, reporting the number of passed tests. Participants could push as often as needed, with each push recording a timestamp.
Calculating completion time: The key metric – task completion time – was calculated as the difference between the initial repository creation timestamp and the timestamp of the first successful commit passing all twelve tests. This approach allowed researchers to track even partial progress for participants who didn't fully complete the task.
Here is an account of one of the developer’s experiences with AI code assistant, GitHub Copilot:
(With Copilot) I have to think less, and when I have to think it’s the fun stuff. It sets off a little spark that makes coding more fun and more efficient.
- Senior Software Engineer
Out of the 95 professionals, 45 were in the treated group while 50 were in the control group. A total of 35 developers in each group completed the task and survey. The average experience of the group was 6 years and they reported coding 9 hours a day.
On completion of the assignment, here is what was concluded:
Task completion time: Average completion time of the treated group was 71 minutes, while that of the control group was 161 minutes. This shows that AI assistance helped reduce the time by 56% or rather developers using GitHub Copilot were able to complete their tasks 56% faster.
Success rate of task completion: Statistically it showed that the treated group’s success rate was 7% higher than the control group.
Job satisfaction: 73% of the treated group reported that the AI assistant helped them stay in the flow, while 87% felt they saved mental energy during repetitive tasks. This directly translated to the overall happiness and job satisfaction of the developer.
Takeaways
On investigating the heterogeneous effects of the experiment, the result showed that the below groups benefited more from the AI code assistance:
Less experienced developers.
Developers who spent long hours coding per day .
Older developers between the age of 25-44.
Exit survey
After finishing the task, participants completed an exit survey. They were asked two questions. Here are the questions and results of the survey:
1. How much productivity gain or loss in percentages Copilot provided while performing the task?
To the first question, the treatment group reported on its helpfulness and estimated their time savings compared to working without Copilot. The control group was shown a 1-minute demo video, after which they estimated the potential speed gain they might have experienced with Copilot.
On average both groups estimated a 35% productivity increase which is an underestimation of the actual productivity boost of 56%.
2. How much would they pay to access the GitHub Copilot?
For the second question, the average irrelevant price for the treated group was $27.25 per month, while that of the control group was $16.91 per month. This implied that the treated group believed in the tool's usefulness after experiencing it first-hand and were willing to pay more than the control group.
Overall, the study showcases that AI pair programmers like GitHub Copilot, although still under development, can support developers. It can assist in completing their tasks faster, reduce manual effort, and increase overall job satisfaction - helping them make coding more efficient and fun!
Do you agree with the findings of this study? Let us know your thoughts.
Want to cut the clutter and get information directly in your mailbox?