• Medicine
  • Astronomy
  • Sociology
  • Technology
  • Spirituality
  • History
  • Open Access
  • News

How Are Code Smells Used In Machine Learning?

The code smells catalogue assists data scientists and programmers in the creation and maintenance of high-quality machine learning program code. Machine learning has grown in prominence in recent years. Machine learning techniques have been intensively researched in academia and are utilised in industry to provide economic value. However, requirements for code quality in machine learning applications are lacking. Code smells, in particular, have received little attention in this sector. Although machine learning code is typically implemented as a minor component of a larger system, it plays a vital role in its core functionality. As a result, making sure the code is good is important to avoid problems in the long run.

A group of researchers from Delft University of Technology and AI for Fintech Research in the Netherlands led by Haiyin Zhang, Lus Cruz, and Arie van Deursen proposed and identified a list of 22 machine learning-specific code smells culled from papers, grey literature, GitHub commits, and Stack Overflow posts. They highlighted each odour with a description of its background, potential long-term difficulties, and offered solutions. There was evidence from both academic and non-academic sources that they also linked them to the right pipeline stage.

Code Smells Collection And Description

The researchers used academic literature, grey literature, community-based coding Q&A platforms (like Stack Overflow), and public software repositories to collect machine learning-specific code smells (with GitHub). They mined articles, grey literature, reused existing bug datasets, and conducted Stack Overflow mining as well. They then triangulated their gathered odours with the recommendations offered in the machine learning library's official documentation. Finally, the code smell catalogue was validated.

There were a total of 22 machine learning-specific code odours that were collected and described. The researchers presented a broad explanation for each smell, followed by the setting of the smell, the problem with its presence, and the solution. Finally, they summarised all of the scents, including the references that support the smell, the stage of the machine learning pipeline where they are most important, and the major effect of having those smells.

Implications

The catalogue investigates recurring code issues from many sources to assist in understanding prevalent errors in machine learning application development. Because many data scientists lack software engineering experience and are not up to date on software engineering best practises, their catalogue of smells helps to overcome this barrier by offering some guidance for designing machine learning applications. New versions of machine learning libraries are released on a regular basis. They re-used the "TensorFlow Bugs" replication package and discovered that several instances had been deprecated due to TensorFlow's upgrading to version 2. As a result, they anticipated that new API-specific code smells will emerge with new library versions and features. In reality, the findings suggest that the majority of API-related odours are only recorded in grey literature rather than in literature. Compiling a catalogue of code smells aids in the promotion of a collaborative effort between practitioners and academics. Because the ecosystem of artificial intelligence frameworks is changing so quickly, some smells may become obsolete in the meanwhile.

They predicted that three code smells in their catalogue would be deemed temporary: Dataframe Conversion API Misused, Matrix Multiplication API Misused, and Gradients Not Cleared Before Backward Propagation. Temporary smells may be deprecated after a few years, whilst some smells are considered to last a long time. However, these three smells are significant and should be recognised to assist practitioners in preventing problems down the road.

Conclusion

They gathered the code smells from a variety of sources, including mining 1750 publications, mining 2170 grey literature items, utilising existing bug datasets, which included 88 Stack Overflow posts and 87 GitHub commits, and gathering 403 complimentary Stack Overflow postings. They examined the pitfalls described in the blogs and assessed whether or not to consider them a code smell. They gathered 22 odours from the code, including generic and API-specific smells. They also classified the code smell according to the pipeline phases and its impact. This has aided the machine learning community in improving code quality. It would be interesting to find out how common these code smells are in real-world machine learning systems and how useful it would be to have a list of machine learning-specific code smells.

Comments (0 comments)

    Recent Articles

    • Gravity Wave Analogue Black Hole Spin Precession – A New Study Finds

      Gravity Wave Analogue Black Hole Spin Precession – A New Study Finds

      If the orbit of a stationary gyroscope gets smaller, the spin precession frequency would show weird things in the strong gravity area, and then it would become arbitrarily high very close to the horizon of a rotating black hole.

    • AI Build AI - Developing AI Models To Build AI

      AI Build AI - Developing AI Models To Build AI

      We have worked hard to create real machine intelligence. Maybe we should have let them get on with it. The majority of artificial intelligence is a numbers game.

    • 3D Living Cell Simulation - Researchers Developed Them

      3D Living Cell Simulation - Researchers Developed Them

      The scientists at the University of Illinois Urbana-Champaign created a three-dimensional simulation that replicated these physical and chemical features at the particle level, thereby establishing a dynamic model that replicated the behaviour of a natural cell.

    • Moons And Life On Planets - Key To Understanding Life On Other Planets

      Moons And Life On Planets - Key To Understanding Life On Other Planets

      Only Earth is known to have life on it, despite the incredible variety of worlds in our solar system. On the other hand, other moons and planets reveal evidence of life.

    • New Human Facial Expression Recognition Technology - Scientists Find

      New Human Facial Expression Recognition Technology - Scientists Find

      Human facial expression recognition is important in a variety of human-related systems, including health care and medicine.

    • Thermalization Of Radiation - A New Study Addresses The Issue

      Thermalization Of Radiation - A New Study Addresses The Issue

      The problem of the thermalization of radiation within a self-emitting hydrogen isothermal environment is the subject of many recent studies.

    • Using CBD For Pain - The Best Ways, Dosage & Delivery

      Using CBD For Pain - The Best Ways, Dosage & Delivery

      Using CBD for pain management has been accepted in all of the world's main historic civilizations, from Asia to the Middle East, Europe, and the Americas. Cannabis has been proven to be an efficient and safe analgesic for a variety of pains by scientific research over the last many decades. Pain is the most common cause for which individuals use CBD today.

    • Heterogeneity Has Consequences On Ecological Systems

      Heterogeneity Has Consequences On Ecological Systems

      Heterogeneity is a naturally chosen aspect of ecological interactions.

    • Internet Addiction - Disorder, Types, Causes & Effects

      Internet Addiction - Disorder, Types, Causes & Effects

      Internet addiction disorder negatively impacts people's lives by producing neurological issues, psychological disruptions, and social issues.