Enable javascript in your browser for better experience. Need to know to enable it? Go here.

How to cultivate trust with AI coding assistants

Disclaimer: AI-generated summaries may contain errors, omissions, or misinterpretations. For the full context please read the content below.

Does your team use AI coding assistants? Have you been urged to increase usage rates? Have you noticed that, many times, team members don’t use AI coding assistants to write code? Besides conventional knowledge sharing and use cases sharing, are there any other effective methods to boost usage?

 

Since my team enabled GitHub Copilot three months ago, these issues have been bothering me — this  blog post is my attempt at finding answers.

 

Why increase the usage rate of AI coding assistants?

 

Before discussing “how”, we must first clarify “why.”

 

Personally, I believe it’s based on a core assumption: using AI coding assistants can enhance development efficiency and the more they are used, the greater the efficiency improvement. Let’s temporarily accept this assumption as valid and verify it through practice.

 

So, can we mandate that team members use AI coding assistants in all their work? Given the unproven assumption, such actions carry significant risks — what if it doesn’t improve efficiency? What if it’s not true that “the more it’s used, the higher the efficiency?” 

 

As discussed in How to handle unknown problems?, the key to handling such situations lies in reducing potential losses, with the best approach being granting developers autonomy — let them decide when and in what scenarios to use AI coding assistants.

 

In cases where developers have the freedom to choose whether or not to use AI coding assistants, will usage naturally increase? From our team’s experience using GitHub Copilot for three months, developers tend to use it frequently in the following scenarios:

  • Code comprehension.

  • Solution analysis.

  • Function-level code refactoring.

  • Function-level algorithm implementation.

  • Unit test generation.

 

They rarely use it in these scenarios:

  • Line-level code modifications.

  • Modifications that can be otherwise achieved with IDE shortcuts.

  • Bug fixes.

  • Complex task implementations (especially those spanning multiple files or modules).

 

From an inline code suggestion perspective, the usage rate might already reach 100%, as many functions of AI coding assistants are enabled by default, utilized by every developer while coding. However, despite line-level code suggestions being stronger than regular plugins, they still offer limited help to developers, unable to reduce cognitive load and thus unable to enhance development efficiency significantly. 

 

To further enhance development efficiency, the key lies in increasing the usage rate in complex scenarios. So, how do we achieve this goal when developers have autonomy? We will discuss this in the following sections.

 

A usage rate model for AI Coding assistants

 

An AI coding assistant can be understood as a kind of automation tool which is capable of completing tasks assigned by developers. The field of human-computer interaction has numerous studies on automation tools; I found the model in Trust, self-confidence and operators’ adaptation to automation particularly suitable for explaining the usage rate of AI coding assistants:

 

Here, we define the Usage Rate as the percentage of time developers use AI coding assistants out of their total development time. From the model, we see that factors affecting usage rates include four elements:

  • Trust: Developers’ level of trust in AI coding assistants, which can be rated from 1-10 based on subjective perception.

  • Self confidence: Developers’ confidence in their own abilities, which can also be rated from 1-10.

  • Bias (b): Developers’ negative tendency towards AI coding assistants, mainly determined by past experiences and environment. A higher s value means lower willingness to use AI coding assistants.

  • Shape parameter (s): The strength of developers’ inclination to maintain existing work habits. Larger s values indicate greater resistance to changing current development habits.

 

It’s important to note that an AI coding assistant is not a single-function automation tool like a shortcut key but a general automation tool capable of handling any programming task. Therefore, for the same AI coding assistant, model parameters (trust, self confidence, b, s) vary across different task types

 

For example:

  • Although developers trust AI coding assistants to generate code blocks efficiently, this does not mean they equally trust them to fix online bugs effectively.

  • While developers are confident in implementing new features efficiently within existing structures, it does not mean they also trust themselves to refactor architectures efficiently.

  • Even though developers are eager to try various uses of AI coding assistants in personal projects, it does not mean they are equally willing to take time and risk in commercial projects to explore various uses of AI coding assistants.

  • With increased trust and reduced bias, the usage rate increases differently for code block generation compared to simple refactorings (ones that can be done with shortcuts).

 

How can we increase the usage rate of AI coding assistants?

 

Now let’s examine what measures we can take according to the model to increase usage rates.

 

Eliminate bias (b)

 

Each developer’s bias towards AI coding assistants varies. Some people are very optimistic, willing to try various types of tasks, even if failures don’t affect their attitude towards AI coding assistants. Others are very cautious, only willing to try after AI coding assistants prove effective in specific types of tasks, and quickly give up upon encountering failures.

 

To eliminate bias, we first need to strengthen the motivation for using AI coding assistants. For example, we can consider incorporating the ability to use AI coding assistants into developers’ capability evaluations, encouraging them to try different methods in daily work, sharing success stories, summarizing best practices, solving team pain points and developing others’ related capabilities.

 

Next, we need to recognize the costs incurred by using AI coding assistants. Using AI coding assistants carries a cost, with developers needing time and effort to write prompts, review code, correct errors and try different approaches. These costs might not always yield returns, such as when multiple attempts reveal that AI coding assistants cannot complete a task, leaving developers to manually finish it. If we do not acknowledge such costs, developers would bear them themselves, making them hesitant before using AI coding assistants and becoming more cautious.

 

Finally, we need to enhance internal team information sharing. Unlike IDEs, AI coding assistants lack fixed usage methods, so each person may discover methods with high success or failure rates. Only by strengthening information sharing, continuously collecting successful or failed usage methods, summarizing effective practices and promoting them widely can we maximize the reduction of biases.

 

Provide more information to calibrate subjective perception

 

From the model, we see that developers’ trust and self-confidence are major factors affecting usage rates and these two factors are subjective perceptions. We need to provide sufficient information to showcase the development efficiency of AI coding assistants and their own development efficiency to calibrate these subjective perceptions.

 

For instance, we can identify stories and code commits using AI coding assistants through tags, then analyze metrics of such stories and code commits separately to gauge the development efficiency of AI coding assistants.

 

Enhance trust level

 

Although trust often refers to interpersonal trust, in human-computer interaction, it has been extended to include trust in automation tools.

 

So, how to enhance trust level? We’ll discuss this in detail in the next section.

 

A trust model for AI coding assistants

 

What is trust?

 

Here, I use the definition of trust from the paper Trust in Automation: Designing for Appropriate Reliance: Trust is the attitude that an agent (like an AI coding assistant) will help achieve an individual’s goals in a situation characterized by uncertainty and vulnerability

 

Trust model

 

Regarding trust, various models describe its components. Some describe it as competence, integrity, consistency, loyalty and openness. Others describe it as credibility, reliability, intimacy and self-orientation. Still others describe it as ability, benevolence and integrity.

 

Here, I adopt the model from Trust in Automation: Designing for Appropriate Reliance, which describes the components of trust as performance, process and purpose.

Trust = Performance + Process + Purpose

 

Where:

 

  • Performance: Current and historical performance of the AI coding assistant, including characteristics like reliability, predictability, ability, etc. It primarily manifests in specific tasks and usage scenarios. Examples:

    • Whether it is familiar with the developer’s domain.

    • Whether it understands the developer’s task context.

    • Whether it can comprehend assigned tasks.

    • Whether it can complete tasks within expected time frames.

    • Whether it can ensure task completion quality.

    • Whether it can consistently complete tasks over multiple uses.

       

  • Process: How the AI coding assistant completes tasks, including characteristics like dependability, openness, consistency, understandability, etc. It primarily manifests in behavioral patterns. Examples:

    • Whether it asks good questions about the developer’s tasks.

    • Whether it provides detailed plans before implementation.

    • Whether actual outcomes match described plans.

    • Whether it follows the developer’s best practices to complete tasks.

    • Whether it adheres to developer instructions.

    • Whether it respects developer feedback.

    • Whether it allows developers to interrupt at any time.

    • Whether it has a well-designed permission control mechanism.

    • Whether it allows easy exit and environment restoration.

       

  • Purpose: Consistency between the design intent of the AI coding assistant and developer goals. Examples:

    • Whether it has hallucinations.

    • Whether it can guarantee data security.

    • Whether it ensures compliance.

    • Whether it engages in malicious operations.

    • Whether it intentionally deceives.

    • Whether it respects developer goals.

Trust enhancement model for AI coding assistants

 

Before discussing the trust enhancement model, we need to first classify the potential erroneous behaviors of AI coding assistants according to the SRK behavior model (Skill-Rule-Knowledge Behavioral Model), so as to analyze behaviors that affect trust in a targeted manner.

 

Behavioral error classification

 

Based on the SRK behavioral model, we can classify AI coding assistant errors into the following three categories:

  • Slips (Skill-based): Behavior itself is correct but executed incorrectly. Examples include AI coding assistants indicating the need to install a dependency but generating an incorrect installation command; AI coding assistants wanting to read command-line output but failing to do so.

  • Rule-based mistakes: Behavior itself is wrong due to incorrect rules. Examples include AI coding assistants writing code first and then tests, not following TDD to write tests first; adding large amounts of logic to existing modules instead of splitting them into new ones.

  • Knowledge-based mistakes: Behavior itself is wrong due to a lack of applicable rules or relevant knowledge to make correct decisions. Examples include AI coding assistants not incorporating the latest language features into training sets, generating outdated code; AI coding assistants misunderstanding project terminology, generating erroneous code.

 

Trust enhancement model

 

Based on the dynamic model of trust and reliance on automation in Trust in Automation: Designing for Appropriate Reliance and behavioral error classification, we developed the following trust enhancement model for AI coding assistants.

To enhance trust in AI coding assistants, we first need to establish an enabling team to push the entire process. Next, we need to visualize the performance of AI coding assistants and developers to accurately calibrate our trust and self-confidence levels regarding AI coding assistants. Finally, we need to organize knowledge-sharing initiatives—such as sessions, training, workshops, and playbooks—at various levels (company-wide, departmental, team-specific, etc.) to ensure everyone can promptly grasp the company’s AI strategies, successful or failed use cases, methods of usage, and the internal operational principles of AI coding assistants.

 

Based on the above measures, we need to adopt a trial-and-error approach to improve the behavior of AI coding assistants. Throughout this process, we must continuously experiment and collect failed use cases, identifying behaviors causing low trust and making corresponding improvements.

 

For identified behaviors, we need to analyze them based on the trust model, finding the root causes of low trust in a manner similar to an Ishikawa diagram. For example, high hallucination rates signify problems in the purpose of AI coding assistants; not following TDD signifies problems in the process of AI coding assistants; inability to read command-line outputs signifies performance issues of AI coding assistants.

 

Problems with purpose require company analysis, as they relate to the procurement of AI coding assistants. For example, high hallucination rates might be due to contracts not enabling advanced models; data leaks might result from inadequate data protection mechanisms in AI coding assistants. Companies need to analyze results and decide whether to update contracts or choose other AI coding assistants to ensure alignment of purpose with our goals. Solving these issues has the highest impact on enhancing our trust in AI coding assistants.

 

Problems with process require team analysis in conjunction with the behavioral error classification, as each team’s practices differ, as do task processes. For example, not following TDD might be due to AI coding assistants choosing traditional implementation methods; directly generating code without explanations might be due to prioritizing faster task completion; seeing no plan beforehand might be due to AI coding assistants choosing a “do-first-think-later” mode. Teams need to analyze results and supplement missing skills, rules, or knowledge to optimize task processes. Solving these issues has a moderate impact on enhancing our trust in AI coding assistants.

 

Performance issues require developer analysis in conjunction with the behavioral error classification, as each developer faces different tasks. For example, using incorrect frameworks might be due to AI coding assistants not understanding the current project’s technology stack; inability to read command-line outputs might be due to integration issues in AI coding assistants’ command-line plugin; generating large classes might be due to non-adherence to the open/close principle. Developers need to analyze results and supplement missing skills, rules, or knowledge to achieve better task performance. Solving these issues has the lowest impact on enhancing our trust in AI coding assistants.

 

Applying the trust enhancement model in GitHub Copilot

 

Copilot instructions template

 

Based on the trust enhancement model from the previous section, we optimize the behavior of GitHub Copilot from three dimensions: knowledge, rules and skills, designing the following copilot instruction template:

 

Knowledge addresses knowledge-based mistakes. When discovering knowledge-based mistakes in GitHub Copilot, we can add the missing knowledge to this part. Examples include system architecture, coding guidelines, technology stacks, etc.

 

Rules address rule-based mistakes. When discovering rule-based mistakes in GitHub Copilot, we can add applicable rules for the scenario to this part. Examples include defining problems first before solving, creating plans before implementation, using TDD during implementation, etc.

Skills address slips. When discovering slips in GitHub Copilot, we can add the missing skills to this part. Examples include problem definition, planning solutions, TDD skills, etc.

Click here to view the full template

 

Team collaboration

 

The above template is primarily maintained by the team’s AI lead, ensuring GitHub Copilot’s development process aligns with the team’s best practices. In the above template, we included best practices in the development process, such as problem definition, problem planning, TDD, thinking aloud, running tests, executing commands, etc.

 

The above template will be uploaded to the code repository, ensuring each team member adopts best practices while using GitHub Copilot for development. However, this does not guarantee that GitHub Copilot can successfully complete all tasks—it only increases the probability of task success. For failed tasks, developers need to analyze failure reasons and add the missing knowledge, rules, or skills to the copilot instructions template, gradually making it the repository’s unique copilot instructions.

 

AI leads need to understand each repository’s copilot instructions, identify widely-used knowledge, rules, or skills and add them to the copilot instructions template.

 

Through this method, the team’s trust in GitHub Copilot will increase, along with usage rates.

 

Conclusion

 

An AI coding assistant is like a new employee. Despite potentially possessing strong capabilities, we still need to retrain them to align with our values, adhere to our best practices and learn required skills. Only when we fully trust them can we assign important tasks.

 

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

Elevate your expertise. Explore more of our deep technical content.