Regression job and pipelines

Rating & reviews (0 reviews)
Study notes

Before training a model, data must be preprocessed (prepare), make easy for the algorithm to fit it into the model.
The most two common preparation techniques are:
  • Scaling numeric features - bring all in the same range ie.
ABC
348065
will become:
ABC
0.30.480.65

  • Encoding categorial variable
Size: S, M, L
will become:
Size: 0, 1,2
or better:
(one hot encodig)
Size_SSize_MSize_L
100
010
001

Preprocessing and the algorithm will be packed up into a pipeline.

# Will be used:
# sklearn
# compose
# ColumTransformer
# pipeline
# Pipeline
# impute
# SimpleImputer
# preprocessing
# StandardScaler
# OneHotEncoder
# liniar_model
# LiniarRegression

# Train the model
from sklearn.composeimport ColumnTransformer
from sklearn.pipelineimport Pipeline
from sklearn.imputeimport SimpleImputer
from sklearn.preprocessingimport StandardScaler, OneHotEncoder
from sklearn.linear_modelimport LinearRegression
import numpyas np

# preprocessing for numeric columns (scale)
numeric_features= [1,2]
numeric_transfirmer= Pipeline(
steps=[
('scaler', StandardScaler())
]
)

# preprocessing for categorial
categorial_features= [3,4]
categorial_transformer= Pipline(
steps=[
('onehot', OneHotEncoder(handle_unknown='ignore'))
]
)

# combine preprocessing bove
preprocessor= ColumnTransformer(
transformers= [
('num', numeric_transformar, numeric_features),
('cat', categorial_transformar, categorial_features),
]
)

# add into the same pipeline both preprocessing steps and the algorithm
pipeline = Pipeline(
steps = [
('peprocessor', preprocessor),
('regressor', GradientBoostingRegressor)
]
)

# Train model
model = pipeline.fit(X_train, (y_train))

print (model)

Result:

Pipeline(steps=[('preprocessor',
ColumnTransformer(transformers=[('num',
Pipeline(steps=[('scaler',
StandardScaler())]),
[1, 2]),
('cat',
Pipeline(steps=[('onehot',
OneHotEncoder(handle_unknown='ignore'))]),
[3, 4])])),
('regressor', GradientBoostingRegressor())])

Now is easy to use another algorithm
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('regressor', RandomForestRegressor())])


# Save job:
import joblib
joblib.dump(model, 'my_pipelined_job.pkl')

# Load job in a model
loaded_model = joblib.load('my_pipelined_job.pkl')



References:
Create machine learning models - Training | Microsoft Learn
1. Supervised learning — scikit-learn 1.2.1 documentation