Pipeline package¶
Submodules¶
Base module¶
-
class
src.pipe.base.DateProcessing¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinDate processing module, used to create/manipulate date based features with a sklearn Pipeline base structure.
-
fit(X, y=None)¶
-
fit_transform(X, y=None)¶ Apply the date tranformation on date columns of the provided Dataframe.
- Parameters
X (pd.DataFrame) – Dataframe with data to be processed.
- Returns
Dataframe withouth date columns and new numerical features.
- Return type
pd.DataFrame
-
transform(X)¶ Apply the date tranformation on date columns of the provided Dataframe.
- Parameters
X (pd.DataFrame) – Dataframe with data to be processed.
- Returns
Dataframe withouth date columns and new numerical features.
- Return type
pd.DataFrame
-
-
class
src.pipe.base.OverallProcessing¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinThis method exists to condense mild feature processings such as removing NaN values, removing undisered columns, and so on, using the sklearn Pipeline base structure.
-
fit(X, y=None)¶ There is no process here.
-
fit_transform(X, y=None)¶ Transform the provided Dataframe.
- Parameters
X (pd.DataFrame) – Dataframe with data to be processed.
- Returns
Dataframe with processed data.
- Return type
pd.DataFrame
-
transform(X)¶ Transform the provided Dataframe.
- Parameters
X (pd.DataFrame) – Dataframe with data to be processed.
- Returns
Dataframe with processed data.
- Return type
pd.DataFrame
-
-
class
src.pipe.base.TextProcessing¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinThis processing module is responsible to convert the text features, into numerical features using TfidfVectorizer with a sklearn Pipeline base structure.
-
fit(X, y=None)¶ Fit the TfidfVectorizer model into the text features presented in X and mapped in _TEXT_COLUMNS.
- Parameters
X (pd.DataFrame) – Dataframe with text feature data.
y (pd.Series) – Series with labels.
- Returns
This class.
- Return type
-
fit_transform(X, y=None)¶ Train the TfidfVectorizer based on the provided data and also process the provided data using the trained model.
- Parameters
X (pd.DataFrame) – Dataframe with text feature data.
y (pd.Series) – Series with labels.
- Returns
Dataframe without text columns, and with numerical features included.
- Return type
pd.DataFrame
-
transform(X)¶ Transform the provided Dataframe using the trained TfidfVectorizer already trained.
- Parameters
X (pd.DataFrame) – Dataframe with data to be processed.
- Returns
Dataframe without text columns, and with numerical features included.
- Return type
pd.DataFrame
-