6 machine learning misunderstandings

IDG Reporter by IDG Reporter - December 22nd, 2016

As with any technology, machine learning could wreak havoc on a network if improperly implemented. Before embracing it, enterprises should be aware of the ways machine learning can fall flat to avoid setting back their operations and turning off c-suite members.

Forgetting unexpected variable behaviour

It’s amazing what a computer will consider important that a human will immediately dismiss as trivial. This is why it’s imperative to consider as many relevant variables and potential outcomes as possible prior to deploying a machine learning algorithm.

Take for example a model trained to separate images of vehicles into two categories – trucks and cars. If all the images of trucks were taken at night and all the car photos were taken during the day, the model would determine that any image of a vehicle taken at night must be a truck.

Addressing key variables and outcomes will help diminish the possibility of unwanted and unexpected behaviours of the solution.

Neglecting your data homework

In order to build a trained statistical model, one has to understand the origin and collection of the data being analysed. This information is critical to determining the variables and potential outcomes that influence the algorithm’s performance.

Additionally, if a model is misclassifying data, it’s possible that it’s because the model wasn’t trained on the best representative data needed to have an ideal solution.

Develop, test and then unleash the model

Producing a useful model comes down to training data structure and quality. Before releasing machine learning into the enterprise, data scientists will test an algorithm model with data sets to ensure its performance. The data has to be diligently visualised and the whole data pipeline monitored as new data is being added for self-training. Data scientists may try to test a model as quickly as possible and use too few testing data sets that don’t represent the information the algorithm will encounter in the real world.

It’s critical to have enough data for the selected variables to be weighted as this properly tests the algorithm model. Feeding more data during this phase helps improve performance substantially and ensures that once in a production environment, the machine learning project truly enhances operations.

Ignoring potential blunders

A project’s final goal may create new obstacles that can lead to potential blunders.

Not every machine learning project will be so public or give users open access to manipulate data, but awareness of the environment where the algorithm resides will prevent potential blunders.

Choose more data

When testing the model for performance does not yield the expected results, there are two options – design a better learning algorithm or collect more data. Adding more data helps engineers understand performance limitations. If it is easy to collect more data, continue feeding it to your algorithm to see if you achieve the correct outcome without having to do a redesign.

Don’t rule out an ensemble

One type of algorithm that has recently been successful in practical applications is ensemble learning – a process by which multiple models combine to solve a computational intelligence problem. One example of ensemble learning is stacking simple classifiers like logistic regressions. These ensemble learning methods can improve predictive performance more than any of these classifiers individually.




Copyright © 2017 Computer News Middle East. All rights reserved. Product of CPI Media Group. For more information e-mail us at webmaster@cpimediagroup.com. Privacy Policy
SUBSCRIBE TO OUR NEWSLETTER
* E-mail * First name: * Last name:
* Job title: * Industry sector: * Country:
 
x