DevBots Assist and Empower Human-Bot Collaborative Software Architecting

DevBots Assist and Empower Human-Bot Collaborative Software Architecting

DevBots Assist and Empower Human-Bot Collaborative Software Architecting

The architectural design of software-intensive systems is a complex process, where the stakeholders' perspectives need to be unified, designers' intellect is leveraged, and tool-based automation and pattern-driven reuse are performed. Through the process, a blueprint that guides software implementation and evaluation is sketched.

Despite its many benefits, architecture-centric software engineering (ACSE) faces many challenges, which may stem from a lack of standardized processes, socio-technical constraints, and a lack of human expertise. All these may hinder the development of existing and emergent categories of software (such as IoT, blockchain, and quantum systems). Software development bots (DevBots) trained on large language models (LLMs) can help combine architects' knowledge with AI decision-making to enable fast architecting in human-bot collaborative ACSE. An emerging solution for this collaboration is ChatGPT, a disruptive technology that is not specifically introduced for software engineering, but can express and refine architectural artifacts based on natural language processing (NLP).

This paper draws a preliminary result from a case study by the authors that ChatGPT can imitate the role of architects to support and lead ACSE but still requires human supervision and decision-making support for collaborative architecting.

01 Overview

The architecture of software-intensive systems enables architects to specify structural composition, express behavioral constraints, and rationalize design decisions, thereby hiding implementation complexity with architectural components and sketch a software implementation blueprint. ACSE aims to leverage architectural knowledge (such as tactics and patterns), architectural languages, tools, and architects' decisions (human intellect) to create a model that drives the implementation, validation, and maintenance phases of software systems. In recent years, ACSE has been used to study the role of architecture in complex and emergent software categories (such as blockchain applications and quantum systems), and has proved useful to systematize software development in industrial scenarios. Despite its great potential, ACSE faces a variety of challenges, including but not limited to mapping stakeholder perspectives to architectural requirements, managing architectural drifts, erosion, and technical debts, or lack of automation and architect expertise in developing complex and emergent software categories. Under such circumstances, a software engineer may enter a phase called a "lonesome architect", who needs to provide non-intrusive support based on processes and tools to address ACSE challenges by reusing knowledge and leveraging decision support.

Background and motivation: The process to architect software applications and services, or the "architecting process", unifies many methods that support incremental, process-centric, and systematic approaches to apply ACSE in software development. Empiricism still plays a key role in deriving or utilizing architecting processes that support architecting activities such as analysis, synthesis, and evaluation.

To enrich the architecting process and empower architects, R&D focuses on integrating patterns and styles (knowledge), recommender systems (intelligence), and distributed architecting (collaboration) into the ACSE process. Artificial intelligence for software engineering (AI for SE) is an active research field that aims to combine AI solutions with SE practices to inject intelligence into software development processes and tools. From an ACSE perspective, AI research usually focuses on developing decision support systems or development bots to assist architects by providing recommendations on design decisions, pattern and style selection, or predicting points of architectural failure and degradation. As of now, no innovative solution is proposed where AI can be used to enrich the architecting process and implement human-bot collaborative architectural design. Human-bot collaborating can unify the knowledge of architects and the intelligent agent capabilities of bots, so that bots can guide the architecting process based on human conversation and supervision. Such collaboration allows architects to delegate architecting tasks to bots, supervise bot through natural language conversations to implement automation, and free architects from cumbersome tasks in ACSE.

Research objective: Chat Generative Pre-trained Transformer (ChatGPT) has become a disruptive technology that represents an unprecedented example of a bot capable of interacting with humans in context-preserved conversations and generate clear responses to complex queries. Although ChatGPT is not developed specifically to address software engineering challenges, it can generate diversified textual specifications including architectural requirements, Unified Modeling Language (UML) scripts, source code libraries, and test cases. Recent research has started to explore the role of ChatGPT in engineering education, software testing, and source code generation. There is no research on what role ChatGPT can play as a DevBot in the architecting process. Considering that ACSE can be driven by architects' conversations and feedback and benefit from intelligent and automated architecting, the authors focus on a preliminary investigation into whether ChatGPT can process an architecture story (scenario) conversed to it by an architect, and analyze, synthesize, and evaluate software architecture in human-bot collaborative architecting.

Contributions: The authors use ChatGPT to analyze, synthesize, and evaluate the architecture of microservice-based software using a process-centric and scenario-based approach. Preliminary results show that ChatGPT's capabilities include but are not limited to processing an architecture story (conversed to it by an architect), clarifying architectural requirements, specifying models, recommending and applying architectural tactics and patterns, and developing scenarios for architecture evaluation. Primary contributions of this study are to:

· Investigate the potential of human-bot collaborative architecting, synthesize ChatGPT output and architect decisions, and automate ACSE with a preliminary case study.

· Identify the potential and dangers of ChatGPT-assisted ACSE to identify ethical, governance, and socio-technical constraints of collaborative architecting.

· Build the foundation of ChatGPT capabilities and architect productivity in collaborative architecting for empirically-grounded evidence (ongoing and future work).

The results of this study can help academic researchers develop new assumptions about the role of ChatGPT in ACSE and investigate the possibility of human-bot collaborative architecting for emergent and future software. Practitioners can follow the guidelines provided to delegate their cumbersome tasks in ACSE to ChatGPT for experimentation.

02 Research Background and Method

Next, we will talk about some core concepts (Figure 1) and the research method (Figure 2).

Human-Bot Collaborative Architecting

According to the ISO/IEC/IEEE 42010:2011 standard, software architecture aims to abstract complexity in source code implementations through architectural components and connectors that represent a blueprint for software applications, services, and systems to be developed. In academia and industry, the architecture-centric approach has proved effective because it can effectively design and develop software by providing architectural knowledge such as patterns, styles, languages, and frameworks. To enable software designers and architects to systematically and incrementally design software architectures, there needs to be a process for architecting software, that is, the software architecting process. The process can contain many fine-grained architecting activities to support the separation of architectural concerns in ACSE. The architecting process shown in Figure 1 is derived from five industrial projects and consists of three architecting activities: architectural analysis, architectural synthesis, and architectural evaluation. For example, the architectural evaluation activity in the process focuses on using scenarios to evaluate the designed architecture. During the architecting process, an architect can extract and document the requirements for the required functions and their quality, known as architecturally significant requirements (ASRs). ASRs need to be mapped to source code implementations through an architectural model that can be visualized or textually specified by using architectural languages, such as UML or architectural description language (ADLs). The architecture models for the ASRs need to be evaluated using an architecture evaluation method, such as the software architecture analysis method (SAAM) or architecture tradeoff analysis method (ATAM).

Figure 1: LLMs, DevBots, process, and architecture

DevBots are AI-driven conversational agents or recommender systems that assist software engineers by providing a certain degree of automation or introducing intelligence into software engineering. From the software architecting perspective, the combination of AI and DevBots is limited to bots answering questions or providing recommendations on architectural erosion and maintenance. Currently, there is no research or solution to demonstrate the incorporation of DevBots into the architecting process to implement human-bot collaboration software architecting. Human-bot collaboration can enrich the architecting process, go beyond Q&A and recommendations, and integrate the wisdom of architects (human rationale) and the intelligence of robots (automated architecting process) in ACSE. Human-bot collaboration architecting can give beginners who lack experience or professional knowledge the ability to design or architect. Beginners can specify requirements in natural languages and convert the requirements into ASRs, architecture models, and evaluation scenarios through DevBots. As shown in Figure 1, ChatGPT based on a Large Language Model (LLM) can talk to architects and guide the creation of architectural components under human supervision.

Research Method

The overall research method consists of three main phases, as shown in Figure 2.

Figure 2 Overview of the research method

Phase 1: Architecture story development. A software architecture story refers to expressing core functions, expected quality (ASRs), and constraints in natural language, and describing the expected solution in text. The story is developed based on the analyzing software domain, which represents the operating environment of the system or a set of scenarios implemented through a software solution. Architects can analyze the domain and identify scenarios to write an architecture story as the basis for the architecting process. The architecture story is fed into ChatGPT as the pre-processing of the human-bot collaborative architecting through a prompt.

Phase 2: Collaborative architecting implementation. This phase is based on three activities, as detailed below.

· Architecture analysis is driven by the architecture story fed to ChatGPT to express ASRs, through (i) automatically generated and recommended requirements (by ChatGPT), or (ii) manually specified requirements (by the architect), or (iii) continuous dialogue between ChatGPT and the architect to refine (add/delete/update) the requirements.

· Architecture synthesis is to integrate ASRs to create an architecture model or description as a point of reference, that is, visualized software structure and running scenarios. The authors chose UML for architectural synthesis due to many factors, such as available documentation, ease of use, diversity of diagrams, tool support, and wide adoption as a language for representing software systems. In the synthesis process, the authors also combine knowledge reuse and best practices to optimize the architecture using tactics and patterns.

· Architectural evaluation is based on the scenarios in the architecture story and evaluates the synthesized architecture against ASRs. Architecture evaluation is performed incrementally. The architecture is fully or partially validated based on the use cases or scenarios in ASRs. During the evaluation, the authors used the software architecture analysis method (SAAM) to supervise ChatGPT for architecture evaluation.

Phase 3: Empirical validation. This phase is a supplement to the first two phases. As an extension of this study, empirical validation is mainly performed on collaborative architecting, outlining the future work. The existing research scope aims to explore and introduce the role of ChatGPT in human-bot collaborative architecting. Future work will explore a variety of socio-technical issues involved in ChatGPT-driven collaborative architecting based on empirically grounded guidelines.

03 Collaborative Architecting Case Study

This section describes the collaborative architecting process in detail and provides scenario-based examples through a case study, as shown in Figure 3.

Figure 3: Human-bot collaborative architecting process overview

Building the Architecture Story

An architecture story refers to describing the expected software solution in text, that is, expressing core functions and constraints in natural language. According to the method in Figure 2, the story is developed based on the analyzing software domain, which represents the operating environment of the system or a set of scenarios implemented through a software solution. Architects can analyze the domain, identify scenarios to write an architecture story, and feed the story to ChatGPT to set the foundation for the architectural analysis activity.

Architectural Analysis

After the architecture story is fed to ChatGPT, the focus during architectural analysis is to specify ASRs based on the required functions (for example, viewing available bikes), expected quality (for example, response time < N), and constraints (such as compliance with relevant data security regulations) of the CampusBike software. ChatGPT needs to have the ability to outline the ASRs or any necessary constraints when queried by the architect. However, according to the case study, the ASRs and constraints expressed by ChatGPT need to be refined (added, deleted, or modified) by the architect. For example, "booking a bicycle" is expressed by ChatGPT as "......the system must allow users to view nearby available bikes and enable bike reservation immediately and securely". The requirements refined by the architect are as follows.

Ÿ Functionality:

Ÿ Viewing bikes: through location proximity

Ÿ Quality: immediately - within 90 seconds; securely - encrypt the reservation token

Ÿ Constraint: apply data minimization to registration data (GDPR constraint)

After narrating the architecture story, Figure 4 shows the architect's query and ChatGPT response (human-bot collaboration) to specify functionality, quality, and constraints, collectively referred to as ASRs. ASRs are iteratively refined through dialogue between the human and bot to produce a final list. See Reference 2.

Figure 4: Developing and refining the requirements

Architectural Synthesis

The ASRs are synthesized into an architectural model that can be expressed in an architectural (or modeling) language such as UML. The authors used UML class diagrams and component diagrams to create an architectural model. Specifically, component diagrams are used to represent the overall architecture, and class diagrams are used to represent the details of the architecting. During the synthesis, the authors refined the UML class diagram to apply the singleton pattern to the "UserLogin" class, which helps restrict a single login session across devices. At the same time, the authors applied the caching tactic on "ViewBikes" and the data minimization constraint on "User Location".

Figure 5

Figure 5: Modeling and refining the architecting

Figure 5 shows the architect's prompt to ChatGPT, asking it to create a UML class diagram script. The application of singleton pattern, caching tactic, and data minimization constraints in the class diagram is implemented through additional dialogue between the two, as described in Reference 2.

Architectural Evaluation

Once the synthesis is complete (see Figure 4), the architecture needs to be evaluated to determine whether it meets the ASRs and constraints (see Figure 5). The authors used the SAAM method (see Reference 3) to evaluate the synthesized architecture, as shown in Figure 6. For example, the architect specifies SAAM to be applied to evaluate the "View Bike" component. ChatGPT shows the scenario where the "View Bike" component is evaluated separately and the scenario where it interacts with other components. Based on the interactions in the individual component scenario and interaction scenario, an evaluation report is generated, showing the evaluation results of the functionality, quality, and constraints of the software architecture of CampusBike.

Figure 6: Architecture evaluation

04 Related Work

This chapter discusses the most relevant existing research, outlines the application of AI in SE and ACSE, and the role of ChatGPT in SE.

AI in SE and ACSE

Research on combining AI with SE can be divided into two different dimensions: AI for SE and SE for AI. For details, see References 4 and 5. From the perspective of AI for SE, Xie (Reference 4) believes that SE research needs to go beyond traditional approach of applying AI for tool-based automation and pattern selection, and should explore ways to inject intelligence into SE processes and solutions. Specifically, SE solutions need to maintain the so-called "intelligent balance" in terms of processes, patterns, and tools, that is, unifying and balancing machine intelligence and human intelligence to cope with emergent software categories, such as blockchain and quantum applications. Barenkamp et al. (Reference 5) studied the role of AI in the SE process by combining the results of a systematic overview and interviews with software developers.

The results of their research point out that SE need to strengthen intelligence in three aspects:

(1) Automation of cumbersome and complex software engineering activities, such as code debugging.

(2) Big data analysis to discovery patterns.

(3) Evaluation of data in neural networks and software-defined networks.

Given the background of AI application in software architecting, Herold et al. (Reference 6) investigated existing research and proposed a conceptual framework for applying machine learning to mitigate architectural degradation.

ChatGPT-Assisted SE

From the perspective of SE, ChatGPT is an unprecedented chatbot that produces clear answers to complex queries. However, its potential and dangers in the software development process remain unexplored (References 7 and 8). Recently, some proposals and experimental results show that research on ChatGPT focuses on supporting engineering education, software programming, and testing. Avila-Chauvet et al. (Reference 9) described how programmers use HTML, CSS, and JavaScript to perform online development with human-bot assistance by talking to ChatGPT.

They stressed that while ChatGPT requires human supervision and intervention, it can write good programming solutions and reduces developers' time and effort in the programming process. Similarly, in Reference 8, the author advocated an incremental process (human conversation with Chat GPT) for genetic programming - JavaScript code to solve the traveling salesman problem. In addition to developing source code, some research focused on testing and debugging using ChatGPT (References 10 and 11). Sobania et al. (Reference 11) evaluated ChatGPT's performance in automated bug fixing. Compared with existing automated bug fixing technologies, ChatGPT allows software testers to gradually identify and fix bugs through conversation.

Summary: Based on the review of existing literature, there is no current research or development on the exploration of the role of ChatGPT in guiding and supporting ACSE through dialogues (LLM-based AI) with software engineers. This study complements recent research on software test automation and bug fixing using ChatGPT, but focuses on architecture-centric software system development. In the broader context of AI for SE, this research advocates human-bot collaborative architecting, which can combine the knowledge and supervision of architects with the capabilities of bots to enrich the ACSE process and perform architecting for software-intensive systems and services.

05 Discussion and Validity Threats

This section discusses socio-technical issues and potential threats to the validity of collaborative architecting.

Socio-Technical Issues of ChatGPT in ACSE

In addition to the potential of ChatGPT, the authors also emphasized some of the socio-technical issues that need to be discussed during collaborative architecting. By "socio-technical", the authors referred to adopting a unified perspective on issues such as what is a "social" concern and what is a "technical" limitation of collaborative architecting. The issues were systematically investigated. Some prominent issues are as follows.

Response variation: In a human-bot conversation, ChatGPT may generate different responses for the same query. For example, the authors observed that a query, such as "what architectural style is best suited to CampusBike", may produce a variety of different responses, including microservices, layered, and client-server architectures. Such changes in recommendations or scripted artifacts (UML scripts, ASR specifications, etc.) may affect the consistency of the architecting process and eventually lead to differences in architecture analysis, synthesis, and evaluation. A solution to reducing response variations is to have iterative conversations with ChatGPT to optimize its output and the architect's supervision to ensure that the resulting architectural artifacts are consistent and coherent.

Ethics and intellectual property: Textual specifications, architecture-specific scripts, and source code expressed by ChatGPT may cause ethical problems or infringe copyright or intellectual property rights in some cases. For example, software generated by ChatGPT for the GETLOCATION component in the CampusBike system may cause user location privacy leakage and non-compliance with regulatory guidelines (such as GDPR and CCPA). These issues must be handled with caution. In this case, the role of the architect is critical to ensure that the generated architecture does not violate ethics or intellectual property rights (if any).

Output bias: The output bias of such a conversational bot can be attributed to multiple factors, including but not limited to the input, training data, or algorithm bias. From an architectural perspective, recommendation bias about modeling notations, tactics, patterns, or styles for a particular architecture may be based on its extensive adoption or bias in training data, rather than optimal use in a particular environment. In addition, architectural recommendations (specific styles), design decisions (pattern selection), or validation scenarios (evaluation methods) may be affected by such bias, producing suboptimal artifacts in ACSE.

Validity Threats

Validity threats refer to limitations, constraints, or potential flaws in the study, which may affect the generalization, replicability, and validity of the results. Future work can focus on minimizing these threats to ensure the rigor of methodology and generalization of results.

Internal validity checks for system errors (bias) in research processes such as design, implementation, and analysis. To design and implement the study, taking into account internal validity, the authors followed a systematic approach and utilized a well-known architecting process (Reference 12) and architecture evaluation method (Reference 3). The case-based approach, combined with incremental architecting (Figure 3), helped analyze and refine the study, but more work is needed to understand whether the study could be verified by adopting different architecting processes or other evaluation methods.

External validity checks whether the research results can be extended to other scenarios. The authors studied only a single case of moderate complexity, which could affect the generalization of the study. Specifically, the increased complexity of the architecting process (cross-organization development), the category of software to be developed (mission-critical software), and human expertise (novice/experienced engineers) may affect the external validity of this study. Their future work, as emphasized in the conclusion section, will validate the process of collaborative architecting by participating in architecting teams and will analyze their feedback to understand to what extent external validity can be reduced.

Conclusion validity determines whether the conclusion of the study is reliable or credible. To minimize this threat, the authors employed a three-step process (Figure 2) to support a fine-grained software architecting process and validate the results (future work). In addition, a case study-based approach was used to ensure scenario-based demonstration of research results. However, some conclusions (for example, architect productivity, ChatGPT effectiveness) can only be verified by more experiments with multiple case studies, different teams, and real scenarios of collaborative architecting.

06 Conclusions and Future Research

ChatGPT has become a disruptive technology and an unprecedented conversational bot that mimics human conversations and generates well-conceived text artifacts (recommendations, scripts, source code, and more), often referred to as "seeking solutions to problems". Among the numerous application cases covering content creation, digital assistant, and virtual teacher, ChatGPT's role as a DevBot and its ability to architecting software-intensive systems are still unexplored. This study investigated the potential and risks of architecting assisted and empowered by ChatGPT and human-bot collaborative ACSE. Research advocates that in the context of AI for SE, traditional fields that apply AI for tool-based automation should focus on a broader perspective, such as human-bot collaborative architecting, and enrich existing processes by injecting intelligence into them.

This case study reflects how to use ChatGPT for architecting and what factors need to be considered in collaborative architecting. When integrating ChatGPT into SE or ACSE, it is important to consider the differences in responses and artifacts, type of ethical impacts, level of human decision support/supervision, and legal and socio-technical issues. This study requires empirical verification, based on evidence and experiments, to objectively evaluate factors such as engineer productivity improvement, SE process optimization, and assistance for novice developers and designers to effectively use ChatGPT to develop complex and emergent categories of software.

Requirements for future research: The authors have planned to expand the research into a series of studies to explore human feedback and validation (for example, from the perspective of architects) and integrate ChatGPT into the development of software for quantum computing systems. The authors are currently collaborating with multiple software development teams with diverse demographic attributes (such as geographical distribution, expertise type, level of experience, and software category) to experiment the use of ChatGPT in software architecting and document architect responses. Specifically, through case studies involving ChatGPT-assisted architecting, the authors obtain architect feedback through interviews or documents to empirically study the validity, rigor, acceptance, impact on human productivity, and potential risks of ChatGPT in ACSE.

References

[1]Aakash Ahmad, Muhammad Waseem, Peng Liang, Mahdi Fehmideh, Mst Shamima Aktar and Tommi Mikkonen (2023) Towards Human-Bot Collaborative Software Architecting with ChatGPT. In Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering (EASE).. ACM, pages 279-285.

[2]A. Ahmad, M. Waseem, P. Liang, M. Fehmideh, M. S. Aktar, and T. Mikkonen, "Replication package for the paper: Towards human-bot collaborative software architecting with chatgpt." https://github.com/shamimaaktar1/ChatGPT4SA, 2023.

[3]L. Dobrica and E. Niemela, "A survey on software architecture analysis methods, " IEEE Transactions on Software Engineering, vol. 28, no. 7, pp. 638–653, 2002.

[4]T. Xie, "Intelligent software engineering: Synergy between ai and software engineering, " in Proceedings of the 11th Innovations in Software Engineering Conference (ISEC). ACM, 2018, pp. 1–1.

[5]M. Barenkamp, J. Rebstadt, and O. Thomas, "Applications of ai in classical software engineering, " AI Perspectives, vol. 2, no. 1, p. 1, 2020.

[6]S. Herold, C. Knieke, M. Schindler, and A. Rausch, "Towards improving software architecture degradation mitigation by machine learning, " in Proceedings of the 12th International Conference on Adaptive and Self-Adaptive Systems and Applications (ADAPTIVE). IARIA, 2020, pp. 36–39.

[7]A. Borji, "A categorical archive of chatgpt failures, " arXiv preprint arXiv:2302.03494, 2023.

[8]F. Doglio, "The rise of chatgpt and the fall of the software developer - is this the beginning of the end?" Dec 2022. [Online]. Available: https://tinyurl.com/3mxrfmjh

[9]L. Avila-Chauvet, D. Mej´ıa, and C. O. Acosta Quiroz, "Chatgpt as a support tool for online behavioral task programming, " SSRN preprint SSRN:4329020, 2023.

[10]S. Jalil, S. Rafi, T. D. LaToza, K. Moran, and W. Lam, "Chatgpt and software testing education: Promises & perils, " arXiv preprint arXiv:2302.03287, 2023.

[11]D. Sobania, M. Briesch, C. Hanna, and J. Petke, "An analysis of the automatic bug fixing performance of chatgpt, " arXiv preprint arXiv:2301.08653, 2023.

[12]C. Hofmeister, P. Kruchten, R. L. Nord, H. Obbink, A. Ran, and P. America, "A general model of software architecture design derived from five industrial approaches, " Journal of Systems and Software, vol. 80, no. 1, pp. 106–126, 2007.