[{"data":1,"prerenderedAt":564},["ShallowReactive",2],{"content-query-sBs8RuXULj":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"date":10,"cover":11,"type":12,"body":13,"_type":558,"_id":559,"_source":560,"_file":561,"_stem":562,"_extension":563},"/technology-blogs/en/2926","en",false,"","Seven Technical Debt Types in Machine Learning Systems","Technical debt is the cost incurred from having to do additional rework caused by choosing a simplified solution that speeds up software development rather than adopting a superior approach that requires more time.","2023-02-17","https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/12/29/09177b9a4d3b4bd7a5331302d657857d.png","technology-blogs",{"type":14,"children":15,"toc":555},"root",[16,24,30,41,46,59,74,79,96,106,116,130,135,145,155,167,179,191,203,208,213,230,244,249,259,269,283,288,298,308,318,328,338,349,360,371,385,390,395,400,405,410,415,420,434,439,449,459,473,483,493,503,513,527,532,540,545,550],{"type":17,"tag":18,"props":19,"children":21},"element","h1",{"id":20},"seven-technical-debt-types-in-machine-learning-systems",[22],{"type":23,"value":8},"text",{"type":17,"tag":25,"props":26,"children":27},"p",{},[28],{"type":23,"value":29},"Author: Wang Lei | Source: Zhihu",{"type":17,"tag":25,"props":31,"children":32},{},[33,39],{"type":17,"tag":34,"props":35,"children":36},"strong",{},[37],{"type":23,"value":38},"Technical debt",{"type":23,"value":40}," is the cost incurred from having to do additional rework caused by choosing a simplified solution that speeds up software development rather than adopting a superior approach that requires more time. Analogous with monetary debt, although it may seem beneficial in the short term, it must be repaid in the future. Software engineers must spend extra time and effort to fix problems caused by using the simplified solution or re-construct the software for the optimal implementation. In 1992, Ward Cunningham first introduced this concept.",{"type":17,"tag":25,"props":42,"children":43},{},[44],{"type":23,"value":45},"Recent years have witnessed the quick evolution of AI technologies and the in-depth integration of AI technologies into various industries. However, how to develop a reliable machine learning system remains a challenge, as developers focus more on algorithm optimization and technological innovation.",{"type":17,"tag":25,"props":47,"children":48},{},[49,51,57],{"type":23,"value":50},"In 2015, a group of Google engineers with extensive experience in developing and maintaining machine learning systems published a paper titled ",{"type":17,"tag":52,"props":53,"children":54},"em",{},[55],{"type":23,"value":56},"Hidden Technical Debt in Machine Learning Systems",{"type":23,"value":58},". The paper highlights the potential technical debt that may arise during the development of machine learning systems.",{"type":17,"tag":25,"props":60,"children":61},{},[62,67,69],{"type":17,"tag":34,"props":63,"children":64},{},[65],{"type":23,"value":66},"01",{"type":23,"value":68}," ",{"type":17,"tag":34,"props":70,"children":71},{},[72],{"type":23,"value":73},"Complex Models Erode Boundaries",{"type":17,"tag":25,"props":75,"children":76},{},[77],{"type":23,"value":78},"In conventional software engineering practices, engineers use encapsulation and modular design to define abstraction boundaries and improve the maintainability of code. However, it is difficult to enforce strict module abstraction boundaries for machine learning systems due to the need for massive amounts of external data to implement service logics. The paper authors proposed several factors that may increase technical debt in machine learning systems.",{"type":17,"tag":25,"props":80,"children":81},{},[82,87,89,94],{"type":17,"tag":34,"props":83,"children":84},{},[85],{"type":23,"value":86},"Entanglement",{"type":23,"value":88},": Machine learning systems mix signals together and entangle them, making isolation of improvements impossible. For example, assume that a machine learning system uses features X1, X2, ... Xn in a model. If the input distribution of values in X1 is changed, the weights of the remaining ",{"type":17,"tag":52,"props":90,"children":91},{},[92],{"type":23,"value":93},"n",{"type":23,"value":95}," − 1 (X2, ... Xn) features wll all change. This poses a great challenge on system management. The authors refer to this as the Changing Anything Changes Everything (CACE) principle. A possible improvement method in this scenario is to isolate the model and focus on detecting changes in prediction behavior as they occur.",{"type":17,"tag":25,"props":97,"children":98},{},[99,104],{"type":17,"tag":34,"props":100,"children":101},{},[102],{"type":23,"value":103},"Correction Cascades",{"type":23,"value":105},": Assume that a model ma is used to solve problem A, but a solution for a slightly different problem A' is required. In this case, a model m'a that is slightly modified based on the model ma can be used to quickly solve the problem. However, this correction created a new system dependency on ma, which will incur a high cost when analyzing and improving the model in the future. The cost increases when more modified models are cascaded, with a model for problem A'' learned on top of m'a, and so on. This may cause improvement deadlocks. A possible improvement method in this scenario is to add features to differentiate models or accept the cost of creating a separate model for problem A'.",{"type":17,"tag":25,"props":107,"children":108},{},[109,114],{"type":17,"tag":34,"props":110,"children":111},{},[112],{"type":23,"value":113},"Undeclared consumers",{"type":23,"value":115},": Typically, predictions made by a machine learning model can be widely accessed, either at runtime or by writing to files of logs, which may later be used by other systems. If access control is not in place, there may be undeclared customers accessing the system, using the given model's output as the input for another system. In practices, this hidden tight coupling can significantly escalate the cost and difficulty of making any modifications. In classical software engineering, these problems are often referred to as visibility debt. To solve these problems, strict access controls or service-level agreements (SLAs) are required on the system.",{"type":17,"tag":25,"props":117,"children":118},{},[119,124,125],{"type":17,"tag":34,"props":120,"children":121},{},[122],{"type":23,"value":123},"02",{"type":23,"value":68},{"type":17,"tag":34,"props":126,"children":127},{},[128],{"type":23,"value":129},"Data Dependencies Cost More than Code Dependencies",{"type":17,"tag":25,"props":131,"children":132},{},[133],{"type":23,"value":134},"Dependency debt is considered a key factor in causing code complexity and technical debt in classic software engineering, and in machine learning systems, data dependency is also the case. In conventional software engineering, code dependencies can often be identified through static analysis using compilers and linkers, but there is no similar tool to identify data dependencies. This can easily lead to a large number of complex data dependencies in the system, and once they arise, they are difficult to remove.",{"type":17,"tag":25,"props":136,"children":137},{},[138,143],{"type":17,"tag":34,"props":139,"children":140},{},[141],{"type":23,"value":142},"Unstable Data Dependencies",{"type":23,"value":144},": For example, for rapid progress, the outputs of one system are often used as the inputs of another system. However, these inputs may be unstable, and any modification will have profound effects on another system, and the diagnosis and resolution of these effects are costly. A feasible solution proposed by the authors for this issue is to implement version control on the inputs, which involves maintaining and using the current stable version. After data updates, a new version is maintained and validated before switching to it.",{"type":17,"tag":25,"props":146,"children":147},{},[148,153],{"type":17,"tag":34,"props":149,"children":150},{},[151],{"type":23,"value":152},"Underutilized Data Dependencies",{"type":23,"value":154},": Underutilized data dependencies can creep into a model in the following ways:",{"type":17,"tag":25,"props":156,"children":157},{},[158,160,165],{"type":23,"value":159},"1. ",{"type":17,"tag":34,"props":161,"children":162},{},[163],{"type":23,"value":164},"Legacy Features",{"type":23,"value":166},": As time goes by, new features are made redundant.",{"type":17,"tag":25,"props":168,"children":169},{},[170,172,177],{"type":23,"value":171},"2. ",{"type":17,"tag":34,"props":173,"children":174},{},[175],{"type":23,"value":176},"Bundled Features",{"type":23,"value":178},": A complete component that contains all features is added for a specific function.",{"type":17,"tag":25,"props":180,"children":181},{},[182,184,189],{"type":23,"value":183},"3. ",{"type":17,"tag":34,"props":185,"children":186},{},[187],{"type":23,"value":188},"ǫ-Features",{"type":23,"value":190},": Introducing these dependencies bring low benefits but high cost, and even side effects caused by data changes.",{"type":17,"tag":25,"props":192,"children":193},{},[194,196,201],{"type":23,"value":195},"4. ",{"type":17,"tag":34,"props":197,"children":198},{},[199],{"type":23,"value":200},"Correlated Features",{"type":23,"value":202},": If two features are strongly correlated and one is more important, they will be considered as one feature during system check. When you make changes to the less important feature, the system may become brittle.",{"type":17,"tag":25,"props":204,"children":205},{},[206],{"type":23,"value":207},"In this case, the authors suggest using the \"leave-one-feature-out\" method to periodically check and remove unnecessary functions.",{"type":17,"tag":25,"props":209,"children":210},{},[211],{"type":23,"value":212},"In an actual machine learning system, only a small portion of code is used for machine learning. The infrastructure surrounding it is vast and complex.",{"type":17,"tag":25,"props":214,"children":215},{},[216,221,223,228],{"type":17,"tag":34,"props":217,"children":218},{},[219],{"type":23,"value":220},"Static Analysis of Data Dependencies",{"type":23,"value":222},": Tools for static analysis of data dependencies are far less common, but are essential. The authors suggest developing such tools to assist in analyzing data dependencies. One such tool is the automated feature management system described in ",{"type":17,"tag":52,"props":224,"children":225},{},[226],{"type":23,"value":227},"Ad click prediction: a view from the trenches",{"type":23,"value":229},", which enables data sources and features to be annotated. Automated checks can then be run to ensure that all annotations are complete and dependencies are functioning properly.",{"type":17,"tag":25,"props":231,"children":232},{},[233,238,239],{"type":17,"tag":34,"props":234,"children":235},{},[236],{"type":23,"value":237},"03",{"type":23,"value":68},{"type":17,"tag":34,"props":240,"children":241},{},[242],{"type":23,"value":243},"Feedback Loops",{"type":17,"tag":25,"props":245,"children":246},{},[247],{"type":23,"value":248},"A key feature of a properly functioning machine learning system is that its generated behavior may in turn affect the system itself. This creates a form of analysis debt that is difficult to predict the behavior of a model before the model is released. Feedback loops occur in various forms and are only detectable over time, so they are difficult to detect and address.",{"type":17,"tag":25,"props":250,"children":251},{},[252,257],{"type":17,"tag":34,"props":253,"children":254},{},[255],{"type":23,"value":256},"Direct Feedback Loops",{"type":23,"value":258},": A model may directly affect the selection of its future training data. In general, the theoretically correct solution is to use the Bandit algorithm, but standard supervision algorithms are usually used. The Bandit algorithm may not be well scaled to the size of action spaces required for real-world problems. But these effects can be mitigated by using a certain number of randomization policies or data isolation policies.",{"type":17,"tag":25,"props":260,"children":261},{},[262,267],{"type":17,"tag":34,"props":263,"children":264},{},[265],{"type":23,"value":266},"Hidden Feedback Loops",{"type":23,"value":268},": Although the analysis cost of direct feedback loops is high, developers can obtain the analysis results using statistical algorithms. However, it is difficult to do so on hidden feedback loops, as they may be caused by completely unrelated systems.",{"type":17,"tag":25,"props":270,"children":271},{},[272,277,278],{"type":17,"tag":34,"props":273,"children":274},{},[275],{"type":23,"value":276},"04",{"type":23,"value":68},{"type":17,"tag":34,"props":279,"children":280},{},[281],{"type":23,"value":282},"ML-System Anti-Patterns",{"type":17,"tag":25,"props":284,"children":285},{},[286],{"type":23,"value":287},"As mentioned above, in an actual machine system, only a small portion of code is used for machine learning. Most code belongs to infrastructure surrounding it. Several anti-patterns in machine learning systems are as follows:",{"type":17,"tag":25,"props":289,"children":290},{},[291,296],{"type":17,"tag":34,"props":292,"children":293},{},[294],{"type":23,"value":295},"Glue Code",{"type":23,"value":297},": Machine learning engineers tend to build many machine learning libraries, but the use of a large number of general libraries usually leads to glue code. A mature machine learning system may end up with (up to) 5% machine learning code and (at least) 95% glue code, and rebuilding a clean solution costs less than reusing a general package. One important strategy to combat glue code is to encapsulate machine learning libraries into public APIs.",{"type":17,"tag":25,"props":299,"children":300},{},[301,306],{"type":17,"tag":34,"props":302,"children":303},{},[304],{"type":23,"value":305},"Pipeline Jungles",{"type":23,"value":307},": Pipeline jungles usually occurs in the data preparation phase. With the addition of new input signals to data preparation, the system may become a jungle of scrapes, joins, samples, and various intermediate outputs. Pipeline jungles can be avoided only when data collection and feature extraction are fully considered. Glue code and pipeline jungles are symptoms of inheritance problems, which may stem from an over-separation of algorithmic and engineering responsibilities. Encouraging collaboration between engineers and researchers, for example, within a team, can effectively reduce such problems.",{"type":17,"tag":25,"props":309,"children":310},{},[311,316],{"type":17,"tag":34,"props":312,"children":313},{},[314],{"type":23,"value":315},"Dead Experimental Code Paths",{"type":23,"value":317},": During the development of machine learning code, a large number of experiments need to be conducted, so it is necessary to pull code branches that meet various conditions from the original code to perform functional verifications. It is attractive in a short term, and the production cost is low. As time progresses, more and more experimental code has been accumulated. These accumulated code paths create more and more technical debt due to the increasing difficulty of maintaining backward compatibility and the exponential increase in cycle complexity. Therefore, it is necessary to periodically check and delete these invalid code paths. Usually only a small portion of the code will be used in practice, and the other code branches can be deprecated after being tested once.",{"type":17,"tag":25,"props":319,"children":320},{},[321,326],{"type":17,"tag":34,"props":322,"children":323},{},[324],{"type":23,"value":325},"Abstraction Debt",{"type":23,"value":327},": One of the facts that arise from the previous problems is that machine learning systems lack strong abstraction models, similar to the relational database abstractions. For example, what are the most reasonable interfaces for describing data flows, models, and predictions? Especially in distributed learning, there is still a lack of widely accepted abstraction models. Even the widely used Map-Reduce and Parameter-server cannot meet the requirements of abstraction.",{"type":17,"tag":25,"props":329,"children":330},{},[331,336],{"type":17,"tag":34,"props":332,"children":333},{},[334],{"type":23,"value":335},"Common Smells",{"type":23,"value":337},": In software engineering, design smells can indicate potential problems in components or systems. The following lists several common smells in machine learning systems:",{"type":17,"tag":25,"props":339,"children":340},{},[341,342,347],{"type":23,"value":159},{"type":17,"tag":34,"props":343,"children":344},{},[345],{"type":23,"value":346},"Plain-Old-Data Type Smell",{"type":23,"value":348},": Information used and generated by machine learning systems is usually encoded using raw data types, such as raw floating-point numbers and integers. In a robust system, model parameters should indicate whether they are computational parameters or decision thresholds, and the prediction process should clearly specify what information is used and what information is produced.",{"type":17,"tag":25,"props":350,"children":351},{},[352,353,358],{"type":23,"value":171},{"type":17,"tag":34,"props":354,"children":355},{},[356],{"type":23,"value":357},"Multiple-Language Smell",{"type":23,"value":359},": It is usually convenient to write specific parts of a system in a specific language, especially when that language has convenient libraries or syntax. However, using multiple languages often increases testing and handover costs.",{"type":17,"tag":25,"props":361,"children":362},{},[363,364,369],{"type":23,"value":183},{"type":17,"tag":34,"props":365,"children":366},{},[367],{"type":23,"value":368},"Prototype Smell",{"type":23,"value":370},": It is convenient to test new ideas on a small scale in prototyping environments. However, regularly relying on a prototype environment indicates that the entire system is brittle and difficult to update. Maintaining a prototype environment incurs costs, and there is a significant risk that under time pressure, the prototype environment may be directly used as the production environment. Moreover, research results within a small scope rarely reflect the actual situation.",{"type":17,"tag":25,"props":372,"children":373},{},[374,379,380],{"type":17,"tag":34,"props":375,"children":376},{},[377],{"type":23,"value":378},"05",{"type":23,"value":68},{"type":17,"tag":34,"props":381,"children":382},{},[383],{"type":23,"value":384},"Configuration Debt",{"type":17,"tag":25,"props":386,"children":387},{},[388],{"type":23,"value":389},"Machine learning systems, due to their complexities, often require a wide range of system configurations, including the use of features, the selection of data, algorithm-specific parameter settings, preprocessing, postprocessing, and validation methods. Therefore, the maintainability and readability of the configurations should be strengthened. In this case, the following principles should be followed:",{"type":17,"tag":25,"props":391,"children":392},{},[393],{"type":23,"value":394},"1. The configuration option to be changed can be easily found.",{"type":17,"tag":25,"props":396,"children":397},{},[398],{"type":23,"value":399},"2. Manual errors, omissions, or oversights should be avoided.",{"type":17,"tag":25,"props":401,"children":402},{},[403],{"type":23,"value":404},"3. Differences between two configuration versions can be easily viewed.",{"type":17,"tag":25,"props":406,"children":407},{},[408],{"type":23,"value":409},"4. A basic automatic verification mechanism is provided.",{"type":17,"tag":25,"props":411,"children":412},{},[413],{"type":23,"value":414},"5. Invalid or redundant configurations can be detected.",{"type":17,"tag":25,"props":416,"children":417},{},[418],{"type":23,"value":419},"6. The code should undergo code review and be submitted to a code repository.",{"type":17,"tag":25,"props":421,"children":422},{},[423,428,429],{"type":17,"tag":34,"props":424,"children":425},{},[426],{"type":23,"value":427},"06",{"type":23,"value":68},{"type":17,"tag":34,"props":430,"children":431},{},[432],{"type":23,"value":433},"Dealing with Changes in the External World",{"type":17,"tag":25,"props":435,"children":436},{},[437],{"type":23,"value":438},"Machine learning systems often interact with external environments. However, the constantly changing external environments pose significant challenges for maintaining machine learning systems.",{"type":17,"tag":25,"props":440,"children":441},{},[442,447],{"type":17,"tag":34,"props":443,"children":444},{},[445],{"type":23,"value":446},"Fixed Thresholds in Dynamic Systems",{"type":23,"value":448},": In real-world scenarios, a decision threshold needs to be selected for a given model for decision making. However, such thresholds are usually set manually, and if the model is updated with new data, the original thresholds may no longer be applicable. Manually updating multiple model thresholds involves a large amount of workload and is prone to errors. One mitigation strategy is to obtain the threshold for the data to be verified through a simple evaluation.",{"type":17,"tag":25,"props":450,"children":451},{},[452,457],{"type":17,"tag":34,"props":453,"children":454},{},[455],{"type":23,"value":456},"Monitoring and Testing",{"type":23,"value":458},": Unit testing of individual components and end-to-end testing of running systems are effective ways to verify systems, but these techniques are difficult to use in a changing environment to ensure system reliability. To ensure the long-term reliability of systems, real-time monitoring and warning mechanisms are also needed. Generally, monitoring indicators can be set by prediction bias, action limits, and up-stream producers.",{"type":17,"tag":25,"props":460,"children":461},{},[462,467,468],{"type":17,"tag":34,"props":463,"children":464},{},[465],{"type":23,"value":466},"07",{"type":23,"value":68},{"type":17,"tag":34,"props":469,"children":470},{},[471],{"type":23,"value":472},"Other Areas of ML-related Debt",{"type":17,"tag":25,"props":474,"children":475},{},[476,481],{"type":17,"tag":34,"props":477,"children":478},{},[479],{"type":23,"value":480},"Data Testing Debt",{"type":23,"value":482},": If data replaces code in a machine learning system, it is crucial to thoroughly test the data integrity.",{"type":17,"tag":25,"props":484,"children":485},{},[486,491],{"type":17,"tag":34,"props":487,"children":488},{},[489],{"type":23,"value":490},"Reproducibility Debt",{"type":23,"value":492},": Reproducibility of experiments is important, although machine learning systems may make reproducibility difficult due to stochastic algorithms, non-deterministic parallel learning, initial condition dependency, and interaction with external environments.",{"type":17,"tag":25,"props":494,"children":495},{},[496,501],{"type":17,"tag":34,"props":497,"children":498},{},[499],{"type":23,"value":500},"Process Management Debt",{"type":23,"value":502},": Effective process management can effectively update configurations, allocate resources, and locate blockages in data flows when dozens or hundreds of models are running simultaneously. An important principle is to avoid excessive human intervention.",{"type":17,"tag":25,"props":504,"children":505},{},[506,511],{"type":17,"tag":34,"props":507,"children":508},{},[509],{"type":23,"value":510},"Cultural Debt",{"type":23,"value":512},": Long-term differentiation between research and engineering responsibilities in machine learning is also not conducive to building good machine learning systems. Team culture is also important, and it is suggested that algorithm research, technical innovation, and AI engineering be placed on an equal footing. That is, reward deletion of features, reduction of complexity, improvements in reproducibility, stability, and monitoring.",{"type":17,"tag":25,"props":514,"children":515},{},[516,521,522],{"type":17,"tag":34,"props":517,"children":518},{},[519],{"type":23,"value":520},"08",{"type":23,"value":68},{"type":17,"tag":34,"props":523,"children":524},{},[525],{"type":23,"value":526},"Summary",{"type":17,"tag":25,"props":528,"children":529},{},[530],{"type":23,"value":531},"The paper does not propose new algorithms or technologies. Instead, the authors focus on engineering issues in machine learning systems. And based on their years of engineering experience, they have summarized a series of technical debt in machine learning systems, such as data dependency, feedback loops, and anti-patterns, and proposed some improvement methods.",{"type":17,"tag":25,"props":533,"children":534},{},[535],{"type":17,"tag":34,"props":536,"children":537},{},[538],{"type":23,"value":539},"References",{"type":17,"tag":25,"props":541,"children":542},{},[543],{"type":23,"value":544},"[1]Ward Cunningham. The WyCash Portfolio Management System .1992-03-26 [2008-09-26]",{"type":17,"tag":25,"props":546,"children":547},{},[548],{"type":23,"value":549},"[2]D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, Dan Denniso. Hidden Technical Debt in Machine Learning Systems. (NIPS 2015)",{"type":17,"tag":25,"props":551,"children":552},{},[553],{"type":23,"value":554},"[3]H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, S. Chikkerur, D. Liu, M. Wattenberg, A. M. Hrafnkelsson, T. Boulos, and J. Kubica. Ad click prediction: a view from the trenches. In The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, August 11-14, 2013, 2013.",{"title":7,"searchDepth":556,"depth":556,"links":557},4,[],"markdown","content:technology-blogs:en:2926.md","content","technology-blogs/en/2926.md","technology-blogs/en/2926","md",1776506108075]