Pipeline package

Submodules

Base module

class src.pipe.base.DateProcessing

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Date processing module, used to create/manipulate date based features with a sklearn Pipeline base structure.

fit(X, y=None)
fit_transform(X, y=None)

Apply the date tranformation on date columns of the provided Dataframe.

Parameters

X (pd.DataFrame) – Dataframe with data to be processed.

Returns

Dataframe withouth date columns and new numerical features.

Return type

pd.DataFrame

transform(X)

Apply the date tranformation on date columns of the provided Dataframe.

Parameters

X (pd.DataFrame) – Dataframe with data to be processed.

Returns

Dataframe withouth date columns and new numerical features.

Return type

pd.DataFrame

class src.pipe.base.OverallProcessing

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

This method exists to condense mild feature processings such as removing NaN values, removing undisered columns, and so on, using the sklearn Pipeline base structure.

fit(X, y=None)

There is no process here.

fit_transform(X, y=None)

Transform the provided Dataframe.

Parameters

X (pd.DataFrame) – Dataframe with data to be processed.

Returns

Dataframe with processed data.

Return type

pd.DataFrame

transform(X)

Transform the provided Dataframe.

Parameters

X (pd.DataFrame) – Dataframe with data to be processed.

Returns

Dataframe with processed data.

Return type

pd.DataFrame

class src.pipe.base.TextProcessing

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

This processing module is responsible to convert the text features, into numerical features using TfidfVectorizer with a sklearn Pipeline base structure.

fit(X, y=None)

Fit the TfidfVectorizer model into the text features presented in X and mapped in _TEXT_COLUMNS.

Parameters
  • X (pd.DataFrame) – Dataframe with text feature data.

  • y (pd.Series) – Series with labels.

Returns

This class.

Return type

TextProcessing

fit_transform(X, y=None)

Train the TfidfVectorizer based on the provided data and also process the provided data using the trained model.

Parameters
  • X (pd.DataFrame) – Dataframe with text feature data.

  • y (pd.Series) – Series with labels.

Returns

Dataframe without text columns, and with numerical features included.

Return type

pd.DataFrame

transform(X)

Transform the provided Dataframe using the trained TfidfVectorizer already trained.

Parameters

X (pd.DataFrame) – Dataframe with data to be processed.

Returns

Dataframe without text columns, and with numerical features included.

Return type

pd.DataFrame

Module contents