%global _empty_manifest_terminate_build 0
Name: python-alchemyml
Version: 0.1.36
Release: 1
Summary: AlchemyML API package
License: MIT License
URL: https://github.com/alchemyml/alchemyml
Source0: https://mirrors.aliyun.com/pypi/web/packages/f2/1e/9bcf3f9c7dacafc9838bd95d03c11fe03d77fcb6573f9a16fa5210a6a249/alchemyml-0.1.36.tar.gz
BuildArch: noarch
%description
# AlchemyML API Documentation
Version Date: 2021-05-28
## Prerequisites
* Python >= 3.6
* * requests >= 2.22.0
* * urllib3 >= 1.25.7
## Module Overview
### Description
AlchemyML is a multi-environment solution for data exploiation.
To maximize customer convenience, there are three ways to run it: via the AlchemyML Platform, via the API, and via ad hoc solutions. The one documented below is the second tool, AlchemyML API.
AlchemyML API is an easy way to use advanced data analysis techniques in Python, accelerating the work of the data scientist and optimizing her/his time and resources.
AlchemyML API has operations at the dataset level (upload, list, delete...), at the experiment level (create, send, add to project, view metrics and logs...) and at the project level (create, update, delete...). Moreover, it also has specific actions so that the client can perform her/his own experiment manually: pre-process the dataset, remove highly correlated columns, detect outliers, impute missings...
## List of scripts and their functions
* __init__
* alchemyml()
* get_api_token
* _CRUD_classes
* dataset()
* upload
* view
* update
* delete
* statistical_descriptors
* download
* send
* experiment()
* create
* view
* update
* delete
* statistical_descriptors
* results
* add_to_project
* extract_from_project
* send
* project()
* create
* view
* update
* delete
* _manual_ops
* actions()
* list_preprocessed_dataframes
* download_dataframe
* prepare_dataframe
* encode_dataframe
* drop_highly_correlated_components
* impute_inconsistencies
* drop_invalid_columns
* target_column_analysis
* balancing_dataframe
* impute_missing_values
* merge_cols_into_dt_index
* detect_experiment_type
* build_model
* operational_info
* detect_outliers
* impute_outliers
## __init__.py - Code explanations
### Prerequisites - Imports
* **Python** packages:
* JSON: `import json`
* Internal classes and functions from **alchemyml**:
* `from ._CRUD_classes import dataset, experiment, project`
* `from ._manual_ops import actions`
* `from ._request_handler import retry_session`
### class _alchemyml_
Main class containing all AlchemyML functionalities
#### method _get_api_token_
```python
def get_api_token(self, username, password):
from ._request_handler import retry_session
url = 'https://alchemyml.com/api/token/'
data = json.dumps({'username':username, 'password':password})
session = retry_session(retries = 10)
r = session.post(url, data)
if r.status_code == 200:
tokenJSON = json.loads(r.text)
self.dataset.token = tokenJSON['access']
self.experiment.token = tokenJSON['access']
self.project.token = tokenJSON['access']
self.actions.token = tokenJSON['access']
return tokenJSON['access']
else:
msgJSON = json.loads(r.text)
msg = msgJSON['message']
return msg
```
##### Description
This method returns the necessary token to be used from now on for the API requests. To be able to make use of the API before all it is necessary to sign-up.
##### I/O
* Parameters:
* _**username**_ (_str_): Username.
* _**password**_ (_str_): Password.
## _CRUD_classes.py - Code explanations
### Prerequisites - Imports
* **Python** packages:
* JSON: `import json`
* OS: `import os`
* Sys: `import sys`
* Functions from **_request_handler**:
* `from ._request_handler import retry_session, general_call`
### class _dataset_
This class unifies and condenses all the operations related to datasets in their most general sense: uploading them to the workspace, listing them, removing them...
Each and every operation (request) needs the token that is obtained through the class _authentication_.
#### method _upload_
```python
def upload(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through the call to this method, you will be able to upload a dataset.
We recommend you to consider the next points before uploading your dataset:
* The accepted reading formats are: .xlsx, .xls, .csv, .json, .xml, .sql.
* Files whose name contains two extensions will not be accepted. E.g.: 'Iris.xlsx.csv'.
* Your data set should contain at least 50 observations.
* The file must not exceed the size limit specified by the AlchemyML team.
* Make sure that your data are not empty. Otherwise, this file will be rejected.
##### I/O
* Parameters:
* _**file_path**_ (_str_): The path where the dataset file is located.
* _**dataset_name**_ (_str_): Custom name for the dataset file.
* _**description**_ (_str_, optional): Optional parameter to specify description if needed for the dataset. If no description is inputted, no description is added to the dataset.
#### method _get_
```python
def get(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method lists the datasets available in the workspace.
* By setting the _detail_ parameter to **True** or **False**, you can control receiving the details of each uploaded dataset or simply a list with the names of the datasets.
* By setting the _dataset_name_ parameter, you can control for which datasets return the details.
##### I/O
* Parameters:
* _**dataset_name**_ (_str_/_list_, optional): Name or list of names of the dataset(s) for which details will be returned.
* _**detail**_ (_bool_, optional): Optional boolean parameter to return the details for the specified dataset(s) (False/ True).
#### method _update_
```python
def update(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method gives the option to rename a dataset and/ or update the datasets description. At least one of the two previous options must be selected.
##### I/O
* Parameters:
* _**dataset_name**_ (_str_, optional): Name of the dataset to update.
* _**new_dataset_name**_ (_str_, optional): New name of the specified dataset. If no name is inputted, the dataset won't be renamed.
* _**new_description**_ (_str_/_list_, optional): New description for the specified dataset. If no description is inputted, the description is not going to be updated.
#### method _delete_
```python
def delete(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through the use of the Delete method you will be able to delete one, several or all uploaded datasets. Note that if a dataset consists of experiments associated with it, you must first remove the experiments that have been created.
AlchemyML is not responsible for any consequences that may be caused by removing one, several or all datasets.
##### I/O
* Parameters:
* _**dataset_name**_ (_str_/_list_): Name or list of names of the datasets to be removed from workspace. If _All_ used, then all datasets will be removed. Datasets will be removed only if were not used in any experiment.
#### method _statistical_descriptors_
```python
def statistical_descriptors(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method returns the most relevant statistical descriptors for each column of a dataset.
##### I/O
* Parameters:
* _**dataset_name**_ (_str_): Name of the dataset to return statistical descriptors.
#### method _download_
```python
def download(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method to download a dataset from the workspace
##### I/O
* Parameters:
* _**dataset_name**_ (_str_): Name of the dataset to download
* _**file_path**_ (_str_, optional): Path to download the dataset. By default, the dataset is downloaded to Downloads folder.
#### method _send_
```python
def send(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method to send a dataset to another user
##### I/O
* Parameters:
* _**dataset_name**_ (_str_): Name of the dataset to send
* _**destination_email**_ (_str_): E-mail of the user to whom the dataset is to be sent
### class _experiment_
This class unifies and condenses all the operations related to experiments in their most general sense: uploading them to the workspace, listing them, removing them... this class also contains the methods for adding experiments to projects, updating them, deleting them...
Each and every operation (request) needs the token that is obtained through the class _authentication_.
#### method _create_
```python
def create(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
By default, an automatic experiment will be created.
This option implies the execution of a sequence of steps that go from the dataset intake to the construction of the predictive model, including the pre-processing and cleaning of the data.
If the experiment procedure is set to manual, then the user has the possibility to control each phase of the experiment by running the available modules in the desired order.
The possible operations that can be executed are those that appear in the manual operations section.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name used for the creation the experiment. This name is given by the user.
* _**description**_ (_str_, optional): Optional parameter to specify description if needed for the experiment. If no description is inputted, no description is going to be added to the experiment.
* _**dataset_name**_ (_str_): Name of the dataset used in the creation of experiment.
* _**target_column**_ (_str_): Specifying the target column name.
* _**clients_choice**_ (_str_): Type of experiment. Valid options: Regression, Classification, Time Series, Auto Detect.
* _**experiment_procedure**_ (_str_, optional): Valid options are: auto or manual.
#### method _get_
```python
def get(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Such as the datasets section, this method will let you know which experiments you have in your workspace.
* By setting the detail parameter to **True** or **False** you can control receiving details of each experiment or simply get a list with the names of the experiments.
* By setting the _experiment_name_ parameter, you control for which experiments return the details (one or some).
##### I/O
* Parameters:
* _**experiment_name**_ (_str_/_list_, optional): The name or list of experiment names to be listed.
* _**detail**_ (_bool_, optional): Optional boolean parameter to return the details for the specified experiment(s) (False/ True).
#### method _update_
```python
def update(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method gives the option to rename an experiment and/ or update the experiments description. At least one of the two previous options must be selected.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to update.
* _**new_experiment_name**_ (_str_, optional): New name of the specified experiment. If no name is inputted, the experiment is not going to be renamed.
* _**new_description**_ (_str_/_list_, optional): New description for the specified experiment. If no description is inputted, the description is not going to be updated.
#### method _delete_
```python
def delete(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through the use of the endpoint Delete you will be able to delete one, several or all the experiments created.
AlchemyML is not responsible for any consequences that may be caused by removing one, several or all experiments.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_/_list_): Name or list of experiment names to be deleted. If _All_ used, then all experiments will be removed.
#### method _statistical_descriptors_
```python
def statistical_descriptors(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method returns the most relevant statistical descriptors for each column of the preprocessed dataset used in the experiment creation.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to return statistical descriptors.
* _**dataset_name**_ (_str_): Name of the dataset used in the experiment creation.
#### method _results_
```python
def results(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
The creation of an experiment (previous method CREATE) returns the results of that experiment. This method gives the option to retrieve the previous results whenever these are needed.
The results are delivered in a JSON structure consisting of two keys: **log**, **model_metrics**.
* **log** contains the information related to the decisions that AlchemyML has taken throughout the creation of the experiment, until finishing the construction of the predictive model.
* On the other hand, **model_metrics** will include the analytical information of these results: metrics obtained, relevant variables, type of experiment, etc.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to return the results.
#### method _add_to_project_
```python
def add_to_project(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method gives the possibility to include an experiment or various into a specified project.
Projects are the way to order and group different experiments that are included within a general topic. For example, you could create a project under the theme of Smart Cities that includes experiments related to this topic.
##### I/O
* Parameters:
* _**associated_experiments**_ (_str_/_list_): Name or list of experiment names to be included into a specified project.
* _**project_name**_ (_str_): The name of the project in which experiment(s) will be included.
#### method _extract_from_project_
```python
def extract_from_project(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Given a project this method gives the possibility to extract specified experiments from it.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_/_list_): Name or list of experiment names that are desired to be extracted from a given project.
* _**project_name**_ (_str_): The project from which will be extracted the specified experiments.
#### method _send_
```python
def send(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This endpoint gives the possibility to send one or various experiments to another registered user.
If the user exists, a confirmation email will be sent. When the recipient confirms that he wants to receive an experiment from another user, an exact copy of the experiment will appear within his/her experiments section and will also be visible through the Workspace.
##### I/O
* Parameters:
* _**destination_email**_ (_str_): The receivers email address.
* _**experiment_name**_ (_str_/_list_): The name or list of experiment names to be sent.
### class _project_
This class unifies and condenses all the operations related to projects in their most general sense: creating them, listing them, deleting them...
Each and every operation (request) needs the token that is obtained through the class _authentication_.
#### method _create_
```python
def create(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method creates a new project.
Projects are the way to order and group different experiments that are included within a general topic. For example, you could create a project under the theme of Smart Cities that includes experiments related to this topic.
##### I/O
* Parameters:
* _**project_name**_ (_str_): Name of the project.
* _**description**_ (_str_, optional): Optional parameter to specify description if needed for the project. If no description is inputted, no description is going to be added to the project.
* _**associated_experiments**_ (_str_/_list_, optional): Name or list of experiment names to be added to the project. If no experiments are inputted, an empty project is going to be created.
#### method _get_
```python
def get(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Such as the datasets section, this method will let you know which projects you have in your workspace.
* By setting the detail parameter to **True** or **False** you can control receiving details of each project or simply get a list with the names of the projects.
* By setting the _project_name_ parameter, you control for which projects return the details (one or some).
##### I/O
* Parameters:
* _**project_name**_ (_str_/_list_, optional): Name or list of names of the project(s).
* _**detail**_ (_bool_, optional): Optional boolean parameter to return the details for the specified project(s) (False/ True).
#### method _update_
```python
def update(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method gives the option to rename a project and/ or update the projects description. At least one of the two previous options must be selected.
##### I/O
* Parameters:
* _**project_name**_ (_str_): Name of the project to be updated.
* _**new_project_name**_ (_str_, optional): New name of the specified project. If no name is inputted, the project is not going to be renamed.
* _**new_description**_ (_str_/_list_, optional): New description for the specified project. If no description is inputted, the description is not going to be updated.
#### method _delete_
```python
def delete(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through the use of the method Delete you will be able to delete one, several or all the projects created.
AlchemyML is not responsible for any consequences that may be caused by removing one, several or all projects.
##### I/O
* Parameters:
* _**project_name**_ (_str_/_list_): Name or list of names of the projects to be deleted. If All used then, all projects will be removed.
## _manual_ops.py - Code explanations
### Prerequisites - Imports
* **Python** packages:
* Sys: `import sys`
* Functions from **_request_handler**:
* `from ._request_handler import general_call`
### class _actions_
Class that encompasses all the operations available in a manual experiment.
#### method _list_preprocessed_dataframes_
```python
def list_preprocessed_dataframes(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method for listing the available processed dataframes for the given experiment.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Experiment name to which processed dataframes will be returned.
#### method _download_dataframe_
```python
def download_dataframe(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
As the name of the endpoint suggests, this method gives the option to download the available processed dataframes for a given experiment.
* If keyword all in dataframe_name, all available dataframes will be downloaded.
* If unknown the available processed dataframes, call first List preprocessed dataframes.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of experiment for which dataframe(s) needed to be download.
* _**dataframe_name**_ (_str_): Dataframe name to be downloaded. Using the keyword all, all dataframes available for the experiment will be downloaded in a rar archive.
#### method _prepare_dataframe_
```python
def prepare_dataframe(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This module is responsible for performing a first pre-processing of the dataset loaded by the user before the data goes through the AlchemyMLs next modules.
In general terms, it seeks to remove spaces to the left and right of a string, remove quotes from cells that are of type string, convert numerical data that comes in string format to numerical format, interpret and convert data that is of type date but comes in string format.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to be prepared.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
#### method _encode_dataframe_
```python
def encode_dataframe(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the sub-module in charge of coding the variables that indicate a category and are string type in numerical codes.
This operation is carried out because the automatic learning algorithms need to understand the nature of the data converted into numbers.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to be encoded.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
#### method _drop_highly_correlated_components_
```python
def drop_highly_correlated_components(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the method responsible for dropping highly correlated columns and duplicate rows.
The threshold to consider a column as highly correlated with another one is 0.9999.
Highly correlated columns can be both numerical and categorical columns.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
* _**component**_ (_str_, optional): Specifying whether you want to drop: "rows", "columns" or "both".
* _**delete_duplicated_indices**_ (_bool_, optional): You can specify wether to take into account the index when dropping duplicated rows.
* _**keep**_ (_bool_, optional): keep = False will drop the first duplicated index, and keep = True will drop the last duplicated index.
#### method _impute_inconsistencies_
```python
def impute_inconsistencies(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the method responsible for iterating over each column of a dataset to find and correct inconsistencies. It is basically a submodule that searches for misspelled words, numbers or dates in an attempt to correct them.
You can choose between applying the operations to the entire dataset or just to the target column.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
* _**just_target**_ (_bool_, optional): Specifying whether you want to treat existing inconsistencies on the target or on the whole dataset (True/False).
#### method _drop_invalid_columns_
```python
def drop_invalid_columns(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method to drop invalid columns in a experiment.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
* _**invalid_cols**_ (_list_, optional): Optional parameter to specify a column or list of columns to be considered as invalid.
#### method _target_column_analysis_
```python
def target_column_analysis(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the method responsible for telling the user wether the dataset is balanced or not by inspecting the target column.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
#### method _balancing_dataframe_
```python
def balancing_dataframe(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the method that deals with unbalanced classification datasets.
It detects unbalanced data, decides whether the data can be balanced (extreme cases are rejected), collects information on unbalance indicators and determines the method to be applied at the classification stage in order to adjust a balanced classifier.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
* _**auto_strategy**_ (_bool_, optional): Determines wether to force the generation of a balanced dataset or not. If auto_strategy is set to False, a balanced dataset will always be generated.
#### method _initial_exp_info_
```python
def initial_exp_info(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method returns initial information for the specified experiment.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
#### method _impute_missing_values_
```python
def impute_missing_values(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method to use for missing values imputation.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
#### method _merge_cols_into_dt_index_
```python
def merge_cols_into_dt_index(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the method in charge of finding candidate columns with which to try to build a single datetime column.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
#### method _detect_experiment_type_
```python
def detect_experiment_type(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method that gives the option to detect experiment type.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
* _**selected_option**_ (_str_, optional): For detect experiment type, the options available are: Regression, Classification, Time Series, Auto Detect.
#### method _build_model_
```python
def build_model(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method to build the model for a given experiment.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**selected_option**_ (_str_, optional): For build the model the options available are: Regression, Classification, Time Series, Auto Detect.
#### method _operational_info_
```python
def operational_info(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through this method you can enter operational information related to each column: in this way you can specify what are the operating limits of a column and its tolerances.
You can also indicate some values that you know and that occur within the values of the column so that the _impute_outliers_ module does not take them into account.
In addition, you can group the time-dependent columns by intervals (morning/evening/night) and you can detail whether the behavior of a column depends on the categories of another categorical column.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**columns_info**_ (_str_/_list_/_dict_): Information on columns.
#### method _detect_outliers_
```python
def detect_outliers(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method gives the option of detect outliers. Different strategies are available, as univariate, bivariate, multivariate, complete.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to be used for outlier detection.
* _**detection_strategy_info**_ (_dict_): Strategies available to employ for detection: univariate, bivariate, multivariate. The general form of the dictionary is: {'univariate':cols (string-list), 'bivariate':cols (string-list), 'multivariate':cols:(string-list)}.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
#### method _impute_outliers_
```python
def impute_outliers(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through this method outliers may be imputed using one of the available strategies.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Experiment name on which outliers imputation is going to take place.
* _**cols_to_impute**_ (_str_/_list_/_float_): Defines to columns on which outliers imputation is going to take place.
* _**handling_strategy**_ (_str_/_dict_): Available options: 'auto', 'mean', 'median', 'mode', 'random_values', 'clipping', 'n_neighbors', 'quartile'.
%package -n python3-alchemyml
Summary: AlchemyML API package
Provides: python-alchemyml
BuildRequires: python3-devel
BuildRequires: python3-setuptools
BuildRequires: python3-pip
%description -n python3-alchemyml
# AlchemyML API Documentation
Version Date: 2021-05-28
## Prerequisites
* Python >= 3.6
* * requests >= 2.22.0
* * urllib3 >= 1.25.7
## Module Overview
### Description
AlchemyML is a multi-environment solution for data exploiation.
To maximize customer convenience, there are three ways to run it: via the AlchemyML Platform, via the API, and via ad hoc solutions. The one documented below is the second tool, AlchemyML API.
AlchemyML API is an easy way to use advanced data analysis techniques in Python, accelerating the work of the data scientist and optimizing her/his time and resources.
AlchemyML API has operations at the dataset level (upload, list, delete...), at the experiment level (create, send, add to project, view metrics and logs...) and at the project level (create, update, delete...). Moreover, it also has specific actions so that the client can perform her/his own experiment manually: pre-process the dataset, remove highly correlated columns, detect outliers, impute missings...
## List of scripts and their functions
* __init__
* alchemyml()
* get_api_token
* _CRUD_classes
* dataset()
* upload
* view
* update
* delete
* statistical_descriptors
* download
* send
* experiment()
* create
* view
* update
* delete
* statistical_descriptors
* results
* add_to_project
* extract_from_project
* send
* project()
* create
* view
* update
* delete
* _manual_ops
* actions()
* list_preprocessed_dataframes
* download_dataframe
* prepare_dataframe
* encode_dataframe
* drop_highly_correlated_components
* impute_inconsistencies
* drop_invalid_columns
* target_column_analysis
* balancing_dataframe
* impute_missing_values
* merge_cols_into_dt_index
* detect_experiment_type
* build_model
* operational_info
* detect_outliers
* impute_outliers
## __init__.py - Code explanations
### Prerequisites - Imports
* **Python** packages:
* JSON: `import json`
* Internal classes and functions from **alchemyml**:
* `from ._CRUD_classes import dataset, experiment, project`
* `from ._manual_ops import actions`
* `from ._request_handler import retry_session`
### class _alchemyml_
Main class containing all AlchemyML functionalities
#### method _get_api_token_
```python
def get_api_token(self, username, password):
from ._request_handler import retry_session
url = 'https://alchemyml.com/api/token/'
data = json.dumps({'username':username, 'password':password})
session = retry_session(retries = 10)
r = session.post(url, data)
if r.status_code == 200:
tokenJSON = json.loads(r.text)
self.dataset.token = tokenJSON['access']
self.experiment.token = tokenJSON['access']
self.project.token = tokenJSON['access']
self.actions.token = tokenJSON['access']
return tokenJSON['access']
else:
msgJSON = json.loads(r.text)
msg = msgJSON['message']
return msg
```
##### Description
This method returns the necessary token to be used from now on for the API requests. To be able to make use of the API before all it is necessary to sign-up.
##### I/O
* Parameters:
* _**username**_ (_str_): Username.
* _**password**_ (_str_): Password.
## _CRUD_classes.py - Code explanations
### Prerequisites - Imports
* **Python** packages:
* JSON: `import json`
* OS: `import os`
* Sys: `import sys`
* Functions from **_request_handler**:
* `from ._request_handler import retry_session, general_call`
### class _dataset_
This class unifies and condenses all the operations related to datasets in their most general sense: uploading them to the workspace, listing them, removing them...
Each and every operation (request) needs the token that is obtained through the class _authentication_.
#### method _upload_
```python
def upload(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through the call to this method, you will be able to upload a dataset.
We recommend you to consider the next points before uploading your dataset:
* The accepted reading formats are: .xlsx, .xls, .csv, .json, .xml, .sql.
* Files whose name contains two extensions will not be accepted. E.g.: 'Iris.xlsx.csv'.
* Your data set should contain at least 50 observations.
* The file must not exceed the size limit specified by the AlchemyML team.
* Make sure that your data are not empty. Otherwise, this file will be rejected.
##### I/O
* Parameters:
* _**file_path**_ (_str_): The path where the dataset file is located.
* _**dataset_name**_ (_str_): Custom name for the dataset file.
* _**description**_ (_str_, optional): Optional parameter to specify description if needed for the dataset. If no description is inputted, no description is added to the dataset.
#### method _get_
```python
def get(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method lists the datasets available in the workspace.
* By setting the _detail_ parameter to **True** or **False**, you can control receiving the details of each uploaded dataset or simply a list with the names of the datasets.
* By setting the _dataset_name_ parameter, you can control for which datasets return the details.
##### I/O
* Parameters:
* _**dataset_name**_ (_str_/_list_, optional): Name or list of names of the dataset(s) for which details will be returned.
* _**detail**_ (_bool_, optional): Optional boolean parameter to return the details for the specified dataset(s) (False/ True).
#### method _update_
```python
def update(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method gives the option to rename a dataset and/ or update the datasets description. At least one of the two previous options must be selected.
##### I/O
* Parameters:
* _**dataset_name**_ (_str_, optional): Name of the dataset to update.
* _**new_dataset_name**_ (_str_, optional): New name of the specified dataset. If no name is inputted, the dataset won't be renamed.
* _**new_description**_ (_str_/_list_, optional): New description for the specified dataset. If no description is inputted, the description is not going to be updated.
#### method _delete_
```python
def delete(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through the use of the Delete method you will be able to delete one, several or all uploaded datasets. Note that if a dataset consists of experiments associated with it, you must first remove the experiments that have been created.
AlchemyML is not responsible for any consequences that may be caused by removing one, several or all datasets.
##### I/O
* Parameters:
* _**dataset_name**_ (_str_/_list_): Name or list of names of the datasets to be removed from workspace. If _All_ used, then all datasets will be removed. Datasets will be removed only if were not used in any experiment.
#### method _statistical_descriptors_
```python
def statistical_descriptors(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method returns the most relevant statistical descriptors for each column of a dataset.
##### I/O
* Parameters:
* _**dataset_name**_ (_str_): Name of the dataset to return statistical descriptors.
#### method _download_
```python
def download(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method to download a dataset from the workspace
##### I/O
* Parameters:
* _**dataset_name**_ (_str_): Name of the dataset to download
* _**file_path**_ (_str_, optional): Path to download the dataset. By default, the dataset is downloaded to Downloads folder.
#### method _send_
```python
def send(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method to send a dataset to another user
##### I/O
* Parameters:
* _**dataset_name**_ (_str_): Name of the dataset to send
* _**destination_email**_ (_str_): E-mail of the user to whom the dataset is to be sent
### class _experiment_
This class unifies and condenses all the operations related to experiments in their most general sense: uploading them to the workspace, listing them, removing them... this class also contains the methods for adding experiments to projects, updating them, deleting them...
Each and every operation (request) needs the token that is obtained through the class _authentication_.
#### method _create_
```python
def create(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
By default, an automatic experiment will be created.
This option implies the execution of a sequence of steps that go from the dataset intake to the construction of the predictive model, including the pre-processing and cleaning of the data.
If the experiment procedure is set to manual, then the user has the possibility to control each phase of the experiment by running the available modules in the desired order.
The possible operations that can be executed are those that appear in the manual operations section.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name used for the creation the experiment. This name is given by the user.
* _**description**_ (_str_, optional): Optional parameter to specify description if needed for the experiment. If no description is inputted, no description is going to be added to the experiment.
* _**dataset_name**_ (_str_): Name of the dataset used in the creation of experiment.
* _**target_column**_ (_str_): Specifying the target column name.
* _**clients_choice**_ (_str_): Type of experiment. Valid options: Regression, Classification, Time Series, Auto Detect.
* _**experiment_procedure**_ (_str_, optional): Valid options are: auto or manual.
#### method _get_
```python
def get(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Such as the datasets section, this method will let you know which experiments you have in your workspace.
* By setting the detail parameter to **True** or **False** you can control receiving details of each experiment or simply get a list with the names of the experiments.
* By setting the _experiment_name_ parameter, you control for which experiments return the details (one or some).
##### I/O
* Parameters:
* _**experiment_name**_ (_str_/_list_, optional): The name or list of experiment names to be listed.
* _**detail**_ (_bool_, optional): Optional boolean parameter to return the details for the specified experiment(s) (False/ True).
#### method _update_
```python
def update(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method gives the option to rename an experiment and/ or update the experiments description. At least one of the two previous options must be selected.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to update.
* _**new_experiment_name**_ (_str_, optional): New name of the specified experiment. If no name is inputted, the experiment is not going to be renamed.
* _**new_description**_ (_str_/_list_, optional): New description for the specified experiment. If no description is inputted, the description is not going to be updated.
#### method _delete_
```python
def delete(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through the use of the endpoint Delete you will be able to delete one, several or all the experiments created.
AlchemyML is not responsible for any consequences that may be caused by removing one, several or all experiments.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_/_list_): Name or list of experiment names to be deleted. If _All_ used, then all experiments will be removed.
#### method _statistical_descriptors_
```python
def statistical_descriptors(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method returns the most relevant statistical descriptors for each column of the preprocessed dataset used in the experiment creation.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to return statistical descriptors.
* _**dataset_name**_ (_str_): Name of the dataset used in the experiment creation.
#### method _results_
```python
def results(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
The creation of an experiment (previous method CREATE) returns the results of that experiment. This method gives the option to retrieve the previous results whenever these are needed.
The results are delivered in a JSON structure consisting of two keys: **log**, **model_metrics**.
* **log** contains the information related to the decisions that AlchemyML has taken throughout the creation of the experiment, until finishing the construction of the predictive model.
* On the other hand, **model_metrics** will include the analytical information of these results: metrics obtained, relevant variables, type of experiment, etc.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to return the results.
#### method _add_to_project_
```python
def add_to_project(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method gives the possibility to include an experiment or various into a specified project.
Projects are the way to order and group different experiments that are included within a general topic. For example, you could create a project under the theme of Smart Cities that includes experiments related to this topic.
##### I/O
* Parameters:
* _**associated_experiments**_ (_str_/_list_): Name or list of experiment names to be included into a specified project.
* _**project_name**_ (_str_): The name of the project in which experiment(s) will be included.
#### method _extract_from_project_
```python
def extract_from_project(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Given a project this method gives the possibility to extract specified experiments from it.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_/_list_): Name or list of experiment names that are desired to be extracted from a given project.
* _**project_name**_ (_str_): The project from which will be extracted the specified experiments.
#### method _send_
```python
def send(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This endpoint gives the possibility to send one or various experiments to another registered user.
If the user exists, a confirmation email will be sent. When the recipient confirms that he wants to receive an experiment from another user, an exact copy of the experiment will appear within his/her experiments section and will also be visible through the Workspace.
##### I/O
* Parameters:
* _**destination_email**_ (_str_): The receivers email address.
* _**experiment_name**_ (_str_/_list_): The name or list of experiment names to be sent.
### class _project_
This class unifies and condenses all the operations related to projects in their most general sense: creating them, listing them, deleting them...
Each and every operation (request) needs the token that is obtained through the class _authentication_.
#### method _create_
```python
def create(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method creates a new project.
Projects are the way to order and group different experiments that are included within a general topic. For example, you could create a project under the theme of Smart Cities that includes experiments related to this topic.
##### I/O
* Parameters:
* _**project_name**_ (_str_): Name of the project.
* _**description**_ (_str_, optional): Optional parameter to specify description if needed for the project. If no description is inputted, no description is going to be added to the project.
* _**associated_experiments**_ (_str_/_list_, optional): Name or list of experiment names to be added to the project. If no experiments are inputted, an empty project is going to be created.
#### method _get_
```python
def get(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Such as the datasets section, this method will let you know which projects you have in your workspace.
* By setting the detail parameter to **True** or **False** you can control receiving details of each project or simply get a list with the names of the projects.
* By setting the _project_name_ parameter, you control for which projects return the details (one or some).
##### I/O
* Parameters:
* _**project_name**_ (_str_/_list_, optional): Name or list of names of the project(s).
* _**detail**_ (_bool_, optional): Optional boolean parameter to return the details for the specified project(s) (False/ True).
#### method _update_
```python
def update(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method gives the option to rename a project and/ or update the projects description. At least one of the two previous options must be selected.
##### I/O
* Parameters:
* _**project_name**_ (_str_): Name of the project to be updated.
* _**new_project_name**_ (_str_, optional): New name of the specified project. If no name is inputted, the project is not going to be renamed.
* _**new_description**_ (_str_/_list_, optional): New description for the specified project. If no description is inputted, the description is not going to be updated.
#### method _delete_
```python
def delete(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through the use of the method Delete you will be able to delete one, several or all the projects created.
AlchemyML is not responsible for any consequences that may be caused by removing one, several or all projects.
##### I/O
* Parameters:
* _**project_name**_ (_str_/_list_): Name or list of names of the projects to be deleted. If All used then, all projects will be removed.
## _manual_ops.py - Code explanations
### Prerequisites - Imports
* **Python** packages:
* Sys: `import sys`
* Functions from **_request_handler**:
* `from ._request_handler import general_call`
### class _actions_
Class that encompasses all the operations available in a manual experiment.
#### method _list_preprocessed_dataframes_
```python
def list_preprocessed_dataframes(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method for listing the available processed dataframes for the given experiment.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Experiment name to which processed dataframes will be returned.
#### method _download_dataframe_
```python
def download_dataframe(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
As the name of the endpoint suggests, this method gives the option to download the available processed dataframes for a given experiment.
* If keyword all in dataframe_name, all available dataframes will be downloaded.
* If unknown the available processed dataframes, call first List preprocessed dataframes.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of experiment for which dataframe(s) needed to be download.
* _**dataframe_name**_ (_str_): Dataframe name to be downloaded. Using the keyword all, all dataframes available for the experiment will be downloaded in a rar archive.
#### method _prepare_dataframe_
```python
def prepare_dataframe(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This module is responsible for performing a first pre-processing of the dataset loaded by the user before the data goes through the AlchemyMLs next modules.
In general terms, it seeks to remove spaces to the left and right of a string, remove quotes from cells that are of type string, convert numerical data that comes in string format to numerical format, interpret and convert data that is of type date but comes in string format.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to be prepared.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
#### method _encode_dataframe_
```python
def encode_dataframe(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the sub-module in charge of coding the variables that indicate a category and are string type in numerical codes.
This operation is carried out because the automatic learning algorithms need to understand the nature of the data converted into numbers.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to be encoded.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
#### method _drop_highly_correlated_components_
```python
def drop_highly_correlated_components(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the method responsible for dropping highly correlated columns and duplicate rows.
The threshold to consider a column as highly correlated with another one is 0.9999.
Highly correlated columns can be both numerical and categorical columns.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
* _**component**_ (_str_, optional): Specifying whether you want to drop: "rows", "columns" or "both".
* _**delete_duplicated_indices**_ (_bool_, optional): You can specify wether to take into account the index when dropping duplicated rows.
* _**keep**_ (_bool_, optional): keep = False will drop the first duplicated index, and keep = True will drop the last duplicated index.
#### method _impute_inconsistencies_
```python
def impute_inconsistencies(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the method responsible for iterating over each column of a dataset to find and correct inconsistencies. It is basically a submodule that searches for misspelled words, numbers or dates in an attempt to correct them.
You can choose between applying the operations to the entire dataset or just to the target column.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
* _**just_target**_ (_bool_, optional): Specifying whether you want to treat existing inconsistencies on the target or on the whole dataset (True/False).
#### method _drop_invalid_columns_
```python
def drop_invalid_columns(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method to drop invalid columns in a experiment.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
* _**invalid_cols**_ (_list_, optional): Optional parameter to specify a column or list of columns to be considered as invalid.
#### method _target_column_analysis_
```python
def target_column_analysis(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the method responsible for telling the user wether the dataset is balanced or not by inspecting the target column.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
#### method _balancing_dataframe_
```python
def balancing_dataframe(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the method that deals with unbalanced classification datasets.
It detects unbalanced data, decides whether the data can be balanced (extreme cases are rejected), collects information on unbalance indicators and determines the method to be applied at the classification stage in order to adjust a balanced classifier.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
* _**auto_strategy**_ (_bool_, optional): Determines wether to force the generation of a balanced dataset or not. If auto_strategy is set to False, a balanced dataset will always be generated.
#### method _initial_exp_info_
```python
def initial_exp_info(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method returns initial information for the specified experiment.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
#### method _impute_missing_values_
```python
def impute_missing_values(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method to use for missing values imputation.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
#### method _merge_cols_into_dt_index_
```python
def merge_cols_into_dt_index(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the method in charge of finding candidate columns with which to try to build a single datetime column.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
#### method _detect_experiment_type_
```python
def detect_experiment_type(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method that gives the option to detect experiment type.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
* _**selected_option**_ (_str_, optional): For detect experiment type, the options available are: Regression, Classification, Time Series, Auto Detect.
#### method _build_model_
```python
def build_model(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method to build the model for a given experiment.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**selected_option**_ (_str_, optional): For build the model the options available are: Regression, Classification, Time Series, Auto Detect.
#### method _operational_info_
```python
def operational_info(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through this method you can enter operational information related to each column: in this way you can specify what are the operating limits of a column and its tolerances.
You can also indicate some values that you know and that occur within the values of the column so that the _impute_outliers_ module does not take them into account.
In addition, you can group the time-dependent columns by intervals (morning/evening/night) and you can detail whether the behavior of a column depends on the categories of another categorical column.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**columns_info**_ (_str_/_list_/_dict_): Information on columns.
#### method _detect_outliers_
```python
def detect_outliers(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method gives the option of detect outliers. Different strategies are available, as univariate, bivariate, multivariate, complete.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to be used for outlier detection.
* _**detection_strategy_info**_ (_dict_): Strategies available to employ for detection: univariate, bivariate, multivariate. The general form of the dictionary is: {'univariate':cols (string-list), 'bivariate':cols (string-list), 'multivariate':cols:(string-list)}.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
#### method _impute_outliers_
```python
def impute_outliers(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through this method outliers may be imputed using one of the available strategies.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Experiment name on which outliers imputation is going to take place.
* _**cols_to_impute**_ (_str_/_list_/_float_): Defines to columns on which outliers imputation is going to take place.
* _**handling_strategy**_ (_str_/_dict_): Available options: 'auto', 'mean', 'median', 'mode', 'random_values', 'clipping', 'n_neighbors', 'quartile'.
%package help
Summary: Development documents and examples for alchemyml
Provides: python3-alchemyml-doc
%description help
# AlchemyML API Documentation
Version Date: 2021-05-28
## Prerequisites
* Python >= 3.6
* * requests >= 2.22.0
* * urllib3 >= 1.25.7
## Module Overview
### Description
AlchemyML is a multi-environment solution for data exploiation.
To maximize customer convenience, there are three ways to run it: via the AlchemyML Platform, via the API, and via ad hoc solutions. The one documented below is the second tool, AlchemyML API.
AlchemyML API is an easy way to use advanced data analysis techniques in Python, accelerating the work of the data scientist and optimizing her/his time and resources.
AlchemyML API has operations at the dataset level (upload, list, delete...), at the experiment level (create, send, add to project, view metrics and logs...) and at the project level (create, update, delete...). Moreover, it also has specific actions so that the client can perform her/his own experiment manually: pre-process the dataset, remove highly correlated columns, detect outliers, impute missings...
## List of scripts and their functions
* __init__
* alchemyml()
* get_api_token
* _CRUD_classes
* dataset()
* upload
* view
* update
* delete
* statistical_descriptors
* download
* send
* experiment()
* create
* view
* update
* delete
* statistical_descriptors
* results
* add_to_project
* extract_from_project
* send
* project()
* create
* view
* update
* delete
* _manual_ops
* actions()
* list_preprocessed_dataframes
* download_dataframe
* prepare_dataframe
* encode_dataframe
* drop_highly_correlated_components
* impute_inconsistencies
* drop_invalid_columns
* target_column_analysis
* balancing_dataframe
* impute_missing_values
* merge_cols_into_dt_index
* detect_experiment_type
* build_model
* operational_info
* detect_outliers
* impute_outliers
## __init__.py - Code explanations
### Prerequisites - Imports
* **Python** packages:
* JSON: `import json`
* Internal classes and functions from **alchemyml**:
* `from ._CRUD_classes import dataset, experiment, project`
* `from ._manual_ops import actions`
* `from ._request_handler import retry_session`
### class _alchemyml_
Main class containing all AlchemyML functionalities
#### method _get_api_token_
```python
def get_api_token(self, username, password):
from ._request_handler import retry_session
url = 'https://alchemyml.com/api/token/'
data = json.dumps({'username':username, 'password':password})
session = retry_session(retries = 10)
r = session.post(url, data)
if r.status_code == 200:
tokenJSON = json.loads(r.text)
self.dataset.token = tokenJSON['access']
self.experiment.token = tokenJSON['access']
self.project.token = tokenJSON['access']
self.actions.token = tokenJSON['access']
return tokenJSON['access']
else:
msgJSON = json.loads(r.text)
msg = msgJSON['message']
return msg
```
##### Description
This method returns the necessary token to be used from now on for the API requests. To be able to make use of the API before all it is necessary to sign-up.
##### I/O
* Parameters:
* _**username**_ (_str_): Username.
* _**password**_ (_str_): Password.
## _CRUD_classes.py - Code explanations
### Prerequisites - Imports
* **Python** packages:
* JSON: `import json`
* OS: `import os`
* Sys: `import sys`
* Functions from **_request_handler**:
* `from ._request_handler import retry_session, general_call`
### class _dataset_
This class unifies and condenses all the operations related to datasets in their most general sense: uploading them to the workspace, listing them, removing them...
Each and every operation (request) needs the token that is obtained through the class _authentication_.
#### method _upload_
```python
def upload(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through the call to this method, you will be able to upload a dataset.
We recommend you to consider the next points before uploading your dataset:
* The accepted reading formats are: .xlsx, .xls, .csv, .json, .xml, .sql.
* Files whose name contains two extensions will not be accepted. E.g.: 'Iris.xlsx.csv'.
* Your data set should contain at least 50 observations.
* The file must not exceed the size limit specified by the AlchemyML team.
* Make sure that your data are not empty. Otherwise, this file will be rejected.
##### I/O
* Parameters:
* _**file_path**_ (_str_): The path where the dataset file is located.
* _**dataset_name**_ (_str_): Custom name for the dataset file.
* _**description**_ (_str_, optional): Optional parameter to specify description if needed for the dataset. If no description is inputted, no description is added to the dataset.
#### method _get_
```python
def get(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method lists the datasets available in the workspace.
* By setting the _detail_ parameter to **True** or **False**, you can control receiving the details of each uploaded dataset or simply a list with the names of the datasets.
* By setting the _dataset_name_ parameter, you can control for which datasets return the details.
##### I/O
* Parameters:
* _**dataset_name**_ (_str_/_list_, optional): Name or list of names of the dataset(s) for which details will be returned.
* _**detail**_ (_bool_, optional): Optional boolean parameter to return the details for the specified dataset(s) (False/ True).
#### method _update_
```python
def update(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method gives the option to rename a dataset and/ or update the datasets description. At least one of the two previous options must be selected.
##### I/O
* Parameters:
* _**dataset_name**_ (_str_, optional): Name of the dataset to update.
* _**new_dataset_name**_ (_str_, optional): New name of the specified dataset. If no name is inputted, the dataset won't be renamed.
* _**new_description**_ (_str_/_list_, optional): New description for the specified dataset. If no description is inputted, the description is not going to be updated.
#### method _delete_
```python
def delete(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through the use of the Delete method you will be able to delete one, several or all uploaded datasets. Note that if a dataset consists of experiments associated with it, you must first remove the experiments that have been created.
AlchemyML is not responsible for any consequences that may be caused by removing one, several or all datasets.
##### I/O
* Parameters:
* _**dataset_name**_ (_str_/_list_): Name or list of names of the datasets to be removed from workspace. If _All_ used, then all datasets will be removed. Datasets will be removed only if were not used in any experiment.
#### method _statistical_descriptors_
```python
def statistical_descriptors(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method returns the most relevant statistical descriptors for each column of a dataset.
##### I/O
* Parameters:
* _**dataset_name**_ (_str_): Name of the dataset to return statistical descriptors.
#### method _download_
```python
def download(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method to download a dataset from the workspace
##### I/O
* Parameters:
* _**dataset_name**_ (_str_): Name of the dataset to download
* _**file_path**_ (_str_, optional): Path to download the dataset. By default, the dataset is downloaded to Downloads folder.
#### method _send_
```python
def send(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method to send a dataset to another user
##### I/O
* Parameters:
* _**dataset_name**_ (_str_): Name of the dataset to send
* _**destination_email**_ (_str_): E-mail of the user to whom the dataset is to be sent
### class _experiment_
This class unifies and condenses all the operations related to experiments in their most general sense: uploading them to the workspace, listing them, removing them... this class also contains the methods for adding experiments to projects, updating them, deleting them...
Each and every operation (request) needs the token that is obtained through the class _authentication_.
#### method _create_
```python
def create(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
By default, an automatic experiment will be created.
This option implies the execution of a sequence of steps that go from the dataset intake to the construction of the predictive model, including the pre-processing and cleaning of the data.
If the experiment procedure is set to manual, then the user has the possibility to control each phase of the experiment by running the available modules in the desired order.
The possible operations that can be executed are those that appear in the manual operations section.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name used for the creation the experiment. This name is given by the user.
* _**description**_ (_str_, optional): Optional parameter to specify description if needed for the experiment. If no description is inputted, no description is going to be added to the experiment.
* _**dataset_name**_ (_str_): Name of the dataset used in the creation of experiment.
* _**target_column**_ (_str_): Specifying the target column name.
* _**clients_choice**_ (_str_): Type of experiment. Valid options: Regression, Classification, Time Series, Auto Detect.
* _**experiment_procedure**_ (_str_, optional): Valid options are: auto or manual.
#### method _get_
```python
def get(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Such as the datasets section, this method will let you know which experiments you have in your workspace.
* By setting the detail parameter to **True** or **False** you can control receiving details of each experiment or simply get a list with the names of the experiments.
* By setting the _experiment_name_ parameter, you control for which experiments return the details (one or some).
##### I/O
* Parameters:
* _**experiment_name**_ (_str_/_list_, optional): The name or list of experiment names to be listed.
* _**detail**_ (_bool_, optional): Optional boolean parameter to return the details for the specified experiment(s) (False/ True).
#### method _update_
```python
def update(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method gives the option to rename an experiment and/ or update the experiments description. At least one of the two previous options must be selected.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to update.
* _**new_experiment_name**_ (_str_, optional): New name of the specified experiment. If no name is inputted, the experiment is not going to be renamed.
* _**new_description**_ (_str_/_list_, optional): New description for the specified experiment. If no description is inputted, the description is not going to be updated.
#### method _delete_
```python
def delete(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through the use of the endpoint Delete you will be able to delete one, several or all the experiments created.
AlchemyML is not responsible for any consequences that may be caused by removing one, several or all experiments.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_/_list_): Name or list of experiment names to be deleted. If _All_ used, then all experiments will be removed.
#### method _statistical_descriptors_
```python
def statistical_descriptors(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method returns the most relevant statistical descriptors for each column of the preprocessed dataset used in the experiment creation.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to return statistical descriptors.
* _**dataset_name**_ (_str_): Name of the dataset used in the experiment creation.
#### method _results_
```python
def results(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
The creation of an experiment (previous method CREATE) returns the results of that experiment. This method gives the option to retrieve the previous results whenever these are needed.
The results are delivered in a JSON structure consisting of two keys: **log**, **model_metrics**.
* **log** contains the information related to the decisions that AlchemyML has taken throughout the creation of the experiment, until finishing the construction of the predictive model.
* On the other hand, **model_metrics** will include the analytical information of these results: metrics obtained, relevant variables, type of experiment, etc.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to return the results.
#### method _add_to_project_
```python
def add_to_project(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method gives the possibility to include an experiment or various into a specified project.
Projects are the way to order and group different experiments that are included within a general topic. For example, you could create a project under the theme of Smart Cities that includes experiments related to this topic.
##### I/O
* Parameters:
* _**associated_experiments**_ (_str_/_list_): Name or list of experiment names to be included into a specified project.
* _**project_name**_ (_str_): The name of the project in which experiment(s) will be included.
#### method _extract_from_project_
```python
def extract_from_project(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Given a project this method gives the possibility to extract specified experiments from it.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_/_list_): Name or list of experiment names that are desired to be extracted from a given project.
* _**project_name**_ (_str_): The project from which will be extracted the specified experiments.
#### method _send_
```python
def send(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This endpoint gives the possibility to send one or various experiments to another registered user.
If the user exists, a confirmation email will be sent. When the recipient confirms that he wants to receive an experiment from another user, an exact copy of the experiment will appear within his/her experiments section and will also be visible through the Workspace.
##### I/O
* Parameters:
* _**destination_email**_ (_str_): The receivers email address.
* _**experiment_name**_ (_str_/_list_): The name or list of experiment names to be sent.
### class _project_
This class unifies and condenses all the operations related to projects in their most general sense: creating them, listing them, deleting them...
Each and every operation (request) needs the token that is obtained through the class _authentication_.
#### method _create_
```python
def create(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method creates a new project.
Projects are the way to order and group different experiments that are included within a general topic. For example, you could create a project under the theme of Smart Cities that includes experiments related to this topic.
##### I/O
* Parameters:
* _**project_name**_ (_str_): Name of the project.
* _**description**_ (_str_, optional): Optional parameter to specify description if needed for the project. If no description is inputted, no description is going to be added to the project.
* _**associated_experiments**_ (_str_/_list_, optional): Name or list of experiment names to be added to the project. If no experiments are inputted, an empty project is going to be created.
#### method _get_
```python
def get(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Such as the datasets section, this method will let you know which projects you have in your workspace.
* By setting the detail parameter to **True** or **False** you can control receiving details of each project or simply get a list with the names of the projects.
* By setting the _project_name_ parameter, you control for which projects return the details (one or some).
##### I/O
* Parameters:
* _**project_name**_ (_str_/_list_, optional): Name or list of names of the project(s).
* _**detail**_ (_bool_, optional): Optional boolean parameter to return the details for the specified project(s) (False/ True).
#### method _update_
```python
def update(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method gives the option to rename a project and/ or update the projects description. At least one of the two previous options must be selected.
##### I/O
* Parameters:
* _**project_name**_ (_str_): Name of the project to be updated.
* _**new_project_name**_ (_str_, optional): New name of the specified project. If no name is inputted, the project is not going to be renamed.
* _**new_description**_ (_str_/_list_, optional): New description for the specified project. If no description is inputted, the description is not going to be updated.
#### method _delete_
```python
def delete(self, *args, **kwargs):
str_meth_name = self.class_name + '.' + sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through the use of the method Delete you will be able to delete one, several or all the projects created.
AlchemyML is not responsible for any consequences that may be caused by removing one, several or all projects.
##### I/O
* Parameters:
* _**project_name**_ (_str_/_list_): Name or list of names of the projects to be deleted. If All used then, all projects will be removed.
## _manual_ops.py - Code explanations
### Prerequisites - Imports
* **Python** packages:
* Sys: `import sys`
* Functions from **_request_handler**:
* `from ._request_handler import general_call`
### class _actions_
Class that encompasses all the operations available in a manual experiment.
#### method _list_preprocessed_dataframes_
```python
def list_preprocessed_dataframes(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method for listing the available processed dataframes for the given experiment.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Experiment name to which processed dataframes will be returned.
#### method _download_dataframe_
```python
def download_dataframe(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
As the name of the endpoint suggests, this method gives the option to download the available processed dataframes for a given experiment.
* If keyword all in dataframe_name, all available dataframes will be downloaded.
* If unknown the available processed dataframes, call first List preprocessed dataframes.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of experiment for which dataframe(s) needed to be download.
* _**dataframe_name**_ (_str_): Dataframe name to be downloaded. Using the keyword all, all dataframes available for the experiment will be downloaded in a rar archive.
#### method _prepare_dataframe_
```python
def prepare_dataframe(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This module is responsible for performing a first pre-processing of the dataset loaded by the user before the data goes through the AlchemyMLs next modules.
In general terms, it seeks to remove spaces to the left and right of a string, remove quotes from cells that are of type string, convert numerical data that comes in string format to numerical format, interpret and convert data that is of type date but comes in string format.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to be prepared.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
#### method _encode_dataframe_
```python
def encode_dataframe(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the sub-module in charge of coding the variables that indicate a category and are string type in numerical codes.
This operation is carried out because the automatic learning algorithms need to understand the nature of the data converted into numbers.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to be encoded.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
#### method _drop_highly_correlated_components_
```python
def drop_highly_correlated_components(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the method responsible for dropping highly correlated columns and duplicate rows.
The threshold to consider a column as highly correlated with another one is 0.9999.
Highly correlated columns can be both numerical and categorical columns.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
* _**component**_ (_str_, optional): Specifying whether you want to drop: "rows", "columns" or "both".
* _**delete_duplicated_indices**_ (_bool_, optional): You can specify wether to take into account the index when dropping duplicated rows.
* _**keep**_ (_bool_, optional): keep = False will drop the first duplicated index, and keep = True will drop the last duplicated index.
#### method _impute_inconsistencies_
```python
def impute_inconsistencies(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the method responsible for iterating over each column of a dataset to find and correct inconsistencies. It is basically a submodule that searches for misspelled words, numbers or dates in an attempt to correct them.
You can choose between applying the operations to the entire dataset or just to the target column.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
* _**just_target**_ (_bool_, optional): Specifying whether you want to treat existing inconsistencies on the target or on the whole dataset (True/False).
#### method _drop_invalid_columns_
```python
def drop_invalid_columns(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method to drop invalid columns in a experiment.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
* _**invalid_cols**_ (_list_, optional): Optional parameter to specify a column or list of columns to be considered as invalid.
#### method _target_column_analysis_
```python
def target_column_analysis(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the method responsible for telling the user wether the dataset is balanced or not by inspecting the target column.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
#### method _balancing_dataframe_
```python
def balancing_dataframe(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the method that deals with unbalanced classification datasets.
It detects unbalanced data, decides whether the data can be balanced (extreme cases are rejected), collects information on unbalance indicators and determines the method to be applied at the classification stage in order to adjust a balanced classifier.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
* _**auto_strategy**_ (_bool_, optional): Determines wether to force the generation of a balanced dataset or not. If auto_strategy is set to False, a balanced dataset will always be generated.
#### method _initial_exp_info_
```python
def initial_exp_info(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method returns initial information for the specified experiment.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
#### method _impute_missing_values_
```python
def impute_missing_values(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method to use for missing values imputation.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
#### method _merge_cols_into_dt_index_
```python
def merge_cols_into_dt_index(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This is the method in charge of finding candidate columns with which to try to build a single datetime column.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**download**_ (_bool_, optional): Optional boolean parameter to be set up if results needed to be downloaded.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
#### method _detect_experiment_type_
```python
def detect_experiment_type(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method that gives the option to detect experiment type.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
* _**selected_option**_ (_str_, optional): For detect experiment type, the options available are: Regression, Classification, Time Series, Auto Detect.
#### method _build_model_
```python
def build_model(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Method to build the model for a given experiment.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**target_col_name**_ (_str_, optional): Specifying the target column name.
* _**selected_option**_ (_str_, optional): For build the model the options available are: Regression, Classification, Time Series, Auto Detect.
#### method _operational_info_
```python
def operational_info(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through this method you can enter operational information related to each column: in this way you can specify what are the operating limits of a column and its tolerances.
You can also indicate some values that you know and that occur within the values of the column so that the _impute_outliers_ module does not take them into account.
In addition, you can group the time-dependent columns by intervals (morning/evening/night) and you can detail whether the behavior of a column depends on the categories of another categorical column.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment on which process will take place.
* _**columns_info**_ (_str_/_list_/_dict_): Information on columns.
#### method _detect_outliers_
```python
def detect_outliers(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
This method gives the option of detect outliers. Different strategies are available, as univariate, bivariate, multivariate, complete.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Name of the experiment to be used for outlier detection.
* _**detection_strategy_info**_ (_dict_): Strategies available to employ for detection: univariate, bivariate, multivariate. The general form of the dictionary is: {'univariate':cols (string-list), 'bivariate':cols (string-list), 'multivariate':cols:(string-list)}.
* _**prepare_dataset**_ (_bool_, optional): Optional boolean parameter that specifies if the dataset needs preparation or not.
#### method _impute_outliers_
```python
def impute_outliers(self, *args, **kwargs):
str_meth_name = sys._getframe().f_code.co_name
input_args = locals()['args']
input_kwargs = locals()['kwargs']
return general_call(self, str_meth_name, input_args, input_kwargs)
```
##### Description
Through this method outliers may be imputed using one of the available strategies.
##### I/O
* Parameters:
* _**experiment_name**_ (_str_): Experiment name on which outliers imputation is going to take place.
* _**cols_to_impute**_ (_str_/_list_/_float_): Defines to columns on which outliers imputation is going to take place.
* _**handling_strategy**_ (_str_/_dict_): Available options: 'auto', 'mean', 'median', 'mode', 'random_values', 'clipping', 'n_neighbors', 'quartile'.
%prep
%autosetup -n alchemyml-0.1.36
%build
%py3_build
%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .
%files -n python3-alchemyml -f filelist.lst
%dir %{python3_sitelib}/*
%files help -f doclist.lst
%{_docdir}/*
%changelog
* Thu Jun 08 2023 Python_Bot - 0.1.36-1
- Package Spec generated