15.06.2017       Выпуск 182 (12.06.2017 - 18.06.2017)       Интересные проекты, инструменты, библиотеки

MLBox - автоматизированная система машинного обучения


Экспериментальная функция:

Ниже вы видите текст статьи по ссылке. По нему можно быстро понять ссылка достойна прочтения или нет

Просим обратить внимание, что текст по ссылке и здесь может не совпадать.

MLBox, Machine Learning Box

MLBox is a powerful Automated Machine Learning python library. It provides the following features:

  • Fast reading and distributed data preprocessing/cleaning/formatting
  • Highly robust feature selection and leak detection
  • Accurate hyper-parameter optimization in high-dimensional space
  • State-of-the art predictive models for classification and regression (Deep Learning, Stacking, LightGBM,...)
  • Prediction with models interpretation

To get it installed, please refer to https://github.com/AxeldeRomblay/MLBox/blob/master/python-package/README.md

For more details, please refer to docs

Experiments : https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries/leaderboard | Rank : 85/2488

Getting started: 30 seconds to MLBox

MLBox main package is divided into 3 sub-packages : preprocessing, optimisation and prediction. Each one of them are respectively aimed at reading and preprocessing data, testing and optimising a wide range of learners and predicting the target on a test dataset.

Here are a few lines to import the MLBox:

from mlbox.preprocessing import *
from mlbox.optimisation import *
from mlbox.prediction import *

Then, all you need to give is :

  • the list of paths to your train datasets and test datasets
  • the name of the target you try to predict (classification or regression)
paths = ["<file_1>.csv", "<file_2>.csv", ..., "<file_n>.csv"] #to modify
target_name = "<my_target>" #to modify

Now, let the MLBox do the job !

... to read and preprocess your files :

data = Reader(sep=",").train_test_split(paths, target_name)  #reading
data = Drift_thresholder().fit_transform(data)  #deleting non-stable variables

... to evaluate models (here default configuration):

... or to test and optimize the whole Pipeline [OPTIONAL]:

  • missing data encoder, aka 'ne'
  • categorical variables encoder, aka 'ce'
  • feature selector, aka 'fs'
  • meta-features stacker, aka 'stck'
  • final estimator, aka 'est'

NB : please have a look at all the possibilities you have to configure the Pipeline (steps, parameters and values...)

space = {
        'ne__numerical_strategy' : {"search":"choice", "space":[0, 'mean']},
        'ce__strategy' : {"search":"choice", "space":["label_encoding", "random_projection"]},
        'fs__strategy' : {"search":"choice", "space":["variance", "l1"]},
        'fs__threshold': {"search":"choice", "space":[0.1,0.2,0.3]},             
        'est__strategy' : {"search":"choice", "space":["XGBoost"]},
        'est__max_depth' : {"search":"choice", "space":[5,6]},
        'est__subsample' : {"search":"uniform", space":[0.6,0.9]}
best = opt.optimise(space, data, max_evals = 5)

... finally to predict on the test set with the best parameters (or None for default configuration):

That's all ! You can have a look at the folder "save" where you can find :

  • your predictions
  • feature importances
  • drift coefficients of your variables (0.5 = very stable, 1. = not stable at all)

How to Contribute

MLBox has been developed and used by many active community members. Your help is very valuable to make it better for everyone.

  • Check out call for contributions to see what can be improved, or open an issue if you want something.
  • Contribute to the tests to make it more reliable.
  • Contribute to the documents to make it clearer for everyone.
  • Contribute to the examples to share your experience with other users.
  • Open issue if you met problems during development.

For more details, please refer to CONTRIBUTING.

Лучшая Python рассылка

Разместим вашу рекламу

Пиши: mail@pythondigest.ru

Нашли опечатку?

Выделите фрагмент и отправьте нажатием Ctrl+Enter.

Система Orphus