%global _empty_manifest_terminate_build 0
Name:		python-light-famd
Version:	0.0.3
Release:	1
Summary:	Light Factor Analysis of Mixed Data
License:	BSD License
URL:		https://github.com/Cauchemare/Light_FAMD
Source0:	https://mirrors.aliyun.com/pypi/web/packages/9e/f0/60e56c2e3c00e33cfeab5d54dfdb917fa960fd8d178fb57be1320af7010b/light_famd-0.0.3.tar.gz
BuildArch:	noarch

Requires:	python3-scikit-learn
Requires:	python3-scipy
Requires:	python3-pandas
Requires:	python3-numpy

%description

# Light_FAMD

`Light_FAMD` is a library for prcessing [factor analysis of mixed data](https://www.wikiwand.com/en/Factor_analysis). This includes a variety of methods including [principal component analysis (PCA)](https://www.wikiwand.com/en/Principal_component_analysis) and [multiply correspondence analysis (MCA)](https://www.researchgate.net/publication/239542271_Multiple_Correspondence_Analysis). The goal is to provide an efficient and light implementation for each algorithm along with a scikit-learn API.

## Table of contents

- [Usage](##Usage)
  - [Guidelines](###Guidelines)
  - [Principal component analysis (PCA)](#principal-component-analysis-pca)
  - [Correspondence analysis (CA)](#correspondence-analysis-ca)
  - [Multiple correspondence analysis (MCA)](#multiple-correspondence-analysis-mca)
  - [Multiple factor analysis (MFA)](#multiple-factor-analysis-mfa)
  - [Factor analysis of mixed data (FAMD)](#factor-analysis-of-mixed-data-famd)
- [Going faster](#going-faster)


`Light_FAMD` doesn't have any extra dependencies apart from the usual suspects (`sklearn`, `pandas`, `numpy`) which are included with Anaconda.


### Guidelines

Each base estimator(CA,PCA) provided by `Light_FAMD` extends scikit-learn's `(TransformerMixin,BaseEstimator)`.which means we could use directly `fit_transform`,and `(set_params,get_params)` methods.

Under the hood `Light_FAMD` uses a [randomised version of SVD](https://scikit-learn.org/dev/modules/generated/sklearn.utils.extmath.randomized_svd.html). This algorithm finds a (usually very good) approximate truncated singular value decomposition using randomization to speed up the computations. It is particularly fast on large matrices on which you wish to extract only a small number of components. In order to obtain further speed up, n_iter can be set <=2 (at the cost of loss of precision). However if you want reproducible results then you should set the `random_state` parameter.

The randomised version of SVD is an iterative method. Because each of light_famd's algorithms use SVD, they all possess a `n_iter` parameter which controls the number of iterations used for computing the SVD. On the one hand the higher `n_iter` is the more precise the results will be. On the other hand increasing `n_iter` increases the computation time. In general the algorithm converges very quickly so using a low `n_iter` (which is the default behaviour) is recommended.

In this package,inheritance relationship as shown  below(A->B:A is superclass of B):

- PCA -> MFA -> FAMD
- CA ->MCA

You are supposed to use each method depending on your situation:

- All your variables are numeric: use principal component analysis (`PCA`)
- You have a contingency table: use correspondence analysis (`CA`)
- You have more than 2 variables and they are all categorical: use multiple correspondence analysis (`MCA`)
- You have groups of categorical **or** numerical variables: use multiple factor analysis (`MFA`)
- You have both categorical and numerical variables: use factor analysis of mixed data (`FAMD`)

The next subsections give an overview of each method along with usage information. The following papers give a good overview of the field of factor analysis if you want to go deeper:

- [A Tutorial on Principal Component Analysis](https://arxiv.org/pdf/1404.1100.pdf)
- [Theory of Correspondence Analysis](http://statmath.wu.ac.at/courses/CAandRelMeth/caipA.pdf)
- [Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions](https://arxiv.org/pdf/0909.4061.pdf)
- [Computation of Multiple Correspondence Analysis, with code in R](https://core.ac.uk/download/pdf/6591520.pdf)
- [Singular Value Decomposition Tutorial](https://davetang.org/file/Singular_Value_Decomposition_Tutorial.pdf)
- [Multiple Factor Analysis](https://www.utdallas.edu/~herve/Abdi-MFA2007-pretty.pdf)

Notice that `Light_FAMD` does't support the sparse input,see [Truncated_FAMD](https://github.com/Cauchemare/Truncated_FAMD) for an alternative of sparse and big data.


###	Principal-Component-Analysis: PCA

**PCA**(rescale_with_mean=True, rescale_with_std=True, n_components=2, n_iter=3,
                 copy=True, check_input=True, random_state=None, engine='auto'):

**Args:**
- `rescale_with_mean` (bool): Whether to substract each column's mean or not.
- `rescale_with_std` (bool): Whether to divide each column by it's standard deviation or not.
- `n_components` (int): The number of principal components to compute.
- `n_iter` (int): The number of iterations used for computing the SVD.
- `copy` (bool): Whether to perform the computations inplace or not.
- `check_input` (bool): Whether to check the consistency of the inputs or not.
- `engine`(string):"auto":randomized_svd,"fbpca":Facebook's randomized SVD implementation
- `random_state`(int, RandomState instance or None, optional (default=None):The seed of the -pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Return ndarray (M,k),M:Number of samples,K:Number of components.

**Examples:**
```
>>>import numpy as np
>>> np.random.seed(42)  # This is for doctests reproducibility

>>>from light_famd  import PCA
>>>X = pd.DataFrame(np.random.randint(0,10,size=(10,3)),columns=list('ABC'))
>>>pca = PCA(n_components=2)
>>>pca.fit(X)
PCA(check_input=True, copy=True, engine='auto', n_components=2, n_iter=3,
  random_state=None, rescale_with_mean=True, rescale_with_std=True)

>>>print(pca.explained_variance_)
[20.20385109  8.48246239]

>>>print(pca.explained_variance_ratio_)
[0.6734617029875277, 0.28274874633810754]
>>>print(pca.column_correlation(X))  # pearson correlation between component and  original column,while p-value >=0.05 this similarity is `Nan`.
          0        1
A -0.953482      NaN
B  0.907314      NaN
C       NaN  0.84211

>>>print(pca.transform(X))
[[-0.82262005  0.11730656]
 [ 0.05359079  1.62298683]
 [ 1.03052849  0.79973099]
 [-0.24313366  0.25651395]
 [-0.94630387 -1.04943025]
 [-0.70591749 -0.01282583]
 [-0.39948373 -1.52612436]
 [ 2.70164194  0.38048482]
 [-2.49373351  0.53655273]
 [ 1.8254311  -1.12519545]]
>>>print(pca.fit_transform(X))
[[-0.82262005  0.11730656]
 [ 0.05359079  1.62298683]
 [ 1.03052849  0.79973099]
 [-0.24313366  0.25651395]
 [-0.94630387 -1.04943025]
 [-0.70591749 -0.01282583]
 [-0.39948373 -1.52612436]
 [ 2.70164194  0.38048482]
 [-2.49373351  0.53655273]
 [ 1.8254311  -1.12519545]]

```
###	Correspondence-Analysis: CA

**CA**(n_components=2, n_iter=10, copy=True, check_input=True, random_state=None,
                 engine='auto'):

**Args:**
- `n_components` (int): The number of principal components to compute.
- `copy` (bool): Whether to perform the computations inplace or not.
- `check_input` (bool): Whether to check the consistency of the inputs or not.
- `engine`(string):"auto":randomized_svd,"fbpca":Facebook's randomized SVD implementation
- `random_state`(int, RandomState instance or None, optional (default=None):The seed of the -pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Return ndarray (M,k),M:Number of samples,K:Number of components.

**Examples:**
```
>>>import numpy as np
>>>from light_famd import CA
>>>X  = pd.DataFrame(data=np.random.randint(0,100,size=(10,4)),columns=list('ABCD'))
>>>ca=CA(n_components=2,n_iter=2)
>>>ca.fit(X)
CA(check_input=True, copy=True, engine='auto', n_components=2, n_iter=2,
  random_state=None)

>>> print(ca.explained_variance_)
[0.16892141 0.0746376 ]

>>>print(ca.explained_variance_ratio_)
[0.5650580210934917, 0.2496697790527281]

>>>print(ca.transform(X))
[[ 0.23150854 -0.39167802]
 [ 0.36006095  0.00301414]
 [-0.48192602 -0.13002647]
 [-0.06333533 -0.21475652]
 [-0.16438708 -0.10418312]
 [-0.38129126 -0.16515196]
 [ 0.2721296   0.46923757]
 [ 0.82953753  0.20638333]
 [-0.500007    0.36897935]
 [ 0.57932474 -0.1023383 ]]

>>>print(ca.fit_transform(X))
[[ 0.23150854 -0.39167802]
 [ 0.36006095  0.00301414]
 [-0.48192602 -0.13002647]
 [-0.06333533 -0.21475652]
 [-0.16438708 -0.10418312]
 [-0.38129126 -0.16515196]
 [ 0.2721296   0.46923757]
 [ 0.82953753  0.20638333]
 [-0.500007    0.36897935]
 [ 0.57932474 -0.1023383 ]]
```

###	Multiple-Correspondence-Analysis: MCA
MCA class inherits from  CA  class.

```
>>>import pandas as pd
>>>from light_famd import MCA
>>>X=pd.DataFrame(np.random.choice(list('abcde'),size=(10,4),replace=True),columns =list('ABCD'))
>>>print(X)
      A  B  C  D
0  d  e  a  d
1  e  d  b  b
2  e  d  a  e
3  b  b  e  d
4  b  d  b  b
5  c  b  a  e
6  e  d  b  a
7  d  c  d  d
8  b  c  d  a
9  a  e  c  c
>>>mca=MCA(n_components=2)
>>>mca.fit(X)
MCA(check_input=True, copy=True, engine='auto', n_components=2, n_iter=10,
  random_state=None)

>>>print(mca.explained_variance_)
[0.90150495 0.76979456]

>>>print(mca.explained_variance_ratio_)
[0.24040131974598467, 0.20527854948955893]

>>>print(mca.transform(X)) 
[[ 0.55603013  0.7016272 ]
 [-0.73558629 -1.17559462]
 [-0.44972794 -0.4973024 ]
 [-0.16248444  0.95706908]
 [-0.66969377 -0.79951057]
 [-0.21267777  0.39953562]
 [-0.67921667 -0.8707747 ]
 [ 0.05058625  1.34573057]
 [-0.31952341  0.77285922]
 [ 2.62229391 -0.83363941]]

>>>print(mca.fit_transform(X)) 
[[ 0.55603013  0.7016272 ]
 [-0.73558629 -1.17559462]
 [-0.44972794 -0.4973024 ]
 [-0.16248444  0.95706908]
 [-0.66969377 -0.79951057]
 [-0.21267777  0.39953562]
 [-0.67921667 -0.8707747 ]
 [ 0.05058625  1.34573057]
 [-0.31952341  0.77285922]
 [ 2.62229391 -0.83363941]]

```
###	Multiple-Factor-Analysis: MFA
MFA class inherits from  PCA  class.
Since FAMD class inherits from  MFA and the only thing to do for FAMD is to determine `groups` parameter compare to its  superclass `MFA`.therefore we skip this chapiter and go directly to `FAMD`.


###	Factor-Analysis-of-Mixed-Data: FAMD
The `FAMD` inherits from the `MFA` class, which entails that you have access to all it's methods and properties of `MFA` class.
```
>>>import pandas as pd
>>>from light_famd import FAMD
>>>X_n = pd.DataFrame(data=np.random.randint(0,100,size=(10,2)),columns=list('AB'))
>>>X_c =pd.DataFrame(np.random.choice(list('abcde'),size=(10,4),replace=True),columns =list('CDEF'))
>>>X=pd.concat([X_n,X_c],axis=1)
>>>print(X)
        A   B  C  D  E  F
0  96  19  b  d  b  e
1  11  46  b  d  a  e
2   0  89  a  a  a  c
3  13  63  c  a  e  d
4  37  36  d  b  e  c
5  10  99  a  b  d  c
6  76   2  c  a  d  e
7  32   5  c  a  e  d
8  49   9  c  e  e  e
9   4  22  c  c  b  d

>>>famd = FAMD(n_components=2)
>>>famd.fit(X)
MCA PROCESS MCA PROCESS ELIMINATED 0  COLUMNS SINCE THEIR MISS_RATES >= 99%
Out:
FAMD(check_input=True, copy=False, engine='auto', n_components=2, n_iter=2,
     random_state=None)

>>>print(famd.explained_variance_)
[17.40871219  9.73440949]

>>>print(famd.explained_variance_ratio_)
[0.32596621039327284, 0.1822701494502082]

>>> print(famd.column_correlation(X))
             0         1
A         NaN       NaN
B         NaN       NaN
C_a       NaN       NaN
C_b       NaN  0.824458
C_c  0.922220       NaN
C_d       NaN       NaN
D_a       NaN       NaN
D_b       NaN       NaN
D_c       NaN       NaN
D_d       NaN  0.824458
D_e       NaN       NaN
E_a       NaN       NaN
E_b       NaN       NaN
E_d       NaN       NaN
E_e       NaN       NaN
F_c       NaN -0.714447
F_d  0.673375       NaN
F_e       NaN  0.839324


>>>print(famd.transform(X)) 
[[ 2.23848136  5.75809647]
 [ 2.0845175   4.78930072]
 [ 2.6682068  -2.78991262]
 [ 6.2962962  -1.57451325]
 [ 2.52140085 -3.28279729]
 [ 1.58256681 -3.73135011]
 [ 5.19476759  1.18333717]
 [ 6.35288446 -1.33186723]
 [ 5.02971134  1.6216402 ]
 [ 4.05754963  0.69620997]]

>>>print(famd.fit_transform(X))
MCA PROCESS HAVE ELIMINATE 0  COLUMNS SINCE ITS MISSING RATE >= 99%
[[ 2.23848136  5.75809647]
 [ 2.0845175   4.78930072]
 [ 2.6682068  -2.78991262]
 [ 6.2962962  -1.57451325]
 [ 2.52140085 -3.28279729]
 [ 1.58256681 -3.73135011]
 [ 5.19476759  1.18333717]
 [ 6.35288446 -1.33186723]
 [ 5.02971134  1.6216402 ]
 [ 4.05754963  0.69620997]]

```


## Going faster

By default `light_famd` uses `sklearn`'s randomized SVD implementation. One of the goals of `Light_FAMD` is to make it possible to use a different SVD backend. For the while the only other supported backend is [Facebook's randomized SVD implementation](https://research.facebook.com/blog/fast-randomized-svd/) called [fbpca](http://fbpca.readthedocs.org/en/latest/). You can use it by setting the `engine` parameter to `'fbpca'` or see [Truncated_FAMD](https://github.com/Cauchemare/Truncated_FAMD) for an alternative of automatic selection of svd_solver depends on the structure of input:

```python
>>> import Light_FAMD
>>> pca = Light_FAMD.PCA(engine='fbpca')

```


%package -n python3-light-famd
Summary:	Light Factor Analysis of Mixed Data
Provides:	python-light-famd
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-light-famd

# Light_FAMD

`Light_FAMD` is a library for prcessing [factor analysis of mixed data](https://www.wikiwand.com/en/Factor_analysis). This includes a variety of methods including [principal component analysis (PCA)](https://www.wikiwand.com/en/Principal_component_analysis) and [multiply correspondence analysis (MCA)](https://www.researchgate.net/publication/239542271_Multiple_Correspondence_Analysis). The goal is to provide an efficient and light implementation for each algorithm along with a scikit-learn API.

## Table of contents

- [Usage](##Usage)
  - [Guidelines](###Guidelines)
  - [Principal component analysis (PCA)](#principal-component-analysis-pca)
  - [Correspondence analysis (CA)](#correspondence-analysis-ca)
  - [Multiple correspondence analysis (MCA)](#multiple-correspondence-analysis-mca)
  - [Multiple factor analysis (MFA)](#multiple-factor-analysis-mfa)
  - [Factor analysis of mixed data (FAMD)](#factor-analysis-of-mixed-data-famd)
- [Going faster](#going-faster)


`Light_FAMD` doesn't have any extra dependencies apart from the usual suspects (`sklearn`, `pandas`, `numpy`) which are included with Anaconda.


### Guidelines

Each base estimator(CA,PCA) provided by `Light_FAMD` extends scikit-learn's `(TransformerMixin,BaseEstimator)`.which means we could use directly `fit_transform`,and `(set_params,get_params)` methods.

Under the hood `Light_FAMD` uses a [randomised version of SVD](https://scikit-learn.org/dev/modules/generated/sklearn.utils.extmath.randomized_svd.html). This algorithm finds a (usually very good) approximate truncated singular value decomposition using randomization to speed up the computations. It is particularly fast on large matrices on which you wish to extract only a small number of components. In order to obtain further speed up, n_iter can be set <=2 (at the cost of loss of precision). However if you want reproducible results then you should set the `random_state` parameter.

The randomised version of SVD is an iterative method. Because each of light_famd's algorithms use SVD, they all possess a `n_iter` parameter which controls the number of iterations used for computing the SVD. On the one hand the higher `n_iter` is the more precise the results will be. On the other hand increasing `n_iter` increases the computation time. In general the algorithm converges very quickly so using a low `n_iter` (which is the default behaviour) is recommended.

In this package,inheritance relationship as shown  below(A->B:A is superclass of B):

- PCA -> MFA -> FAMD
- CA ->MCA

You are supposed to use each method depending on your situation:

- All your variables are numeric: use principal component analysis (`PCA`)
- You have a contingency table: use correspondence analysis (`CA`)
- You have more than 2 variables and they are all categorical: use multiple correspondence analysis (`MCA`)
- You have groups of categorical **or** numerical variables: use multiple factor analysis (`MFA`)
- You have both categorical and numerical variables: use factor analysis of mixed data (`FAMD`)

The next subsections give an overview of each method along with usage information. The following papers give a good overview of the field of factor analysis if you want to go deeper:

- [A Tutorial on Principal Component Analysis](https://arxiv.org/pdf/1404.1100.pdf)
- [Theory of Correspondence Analysis](http://statmath.wu.ac.at/courses/CAandRelMeth/caipA.pdf)
- [Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions](https://arxiv.org/pdf/0909.4061.pdf)
- [Computation of Multiple Correspondence Analysis, with code in R](https://core.ac.uk/download/pdf/6591520.pdf)
- [Singular Value Decomposition Tutorial](https://davetang.org/file/Singular_Value_Decomposition_Tutorial.pdf)
- [Multiple Factor Analysis](https://www.utdallas.edu/~herve/Abdi-MFA2007-pretty.pdf)

Notice that `Light_FAMD` does't support the sparse input,see [Truncated_FAMD](https://github.com/Cauchemare/Truncated_FAMD) for an alternative of sparse and big data.


###	Principal-Component-Analysis: PCA

**PCA**(rescale_with_mean=True, rescale_with_std=True, n_components=2, n_iter=3,
                 copy=True, check_input=True, random_state=None, engine='auto'):

**Args:**
- `rescale_with_mean` (bool): Whether to substract each column's mean or not.
- `rescale_with_std` (bool): Whether to divide each column by it's standard deviation or not.
- `n_components` (int): The number of principal components to compute.
- `n_iter` (int): The number of iterations used for computing the SVD.
- `copy` (bool): Whether to perform the computations inplace or not.
- `check_input` (bool): Whether to check the consistency of the inputs or not.
- `engine`(string):"auto":randomized_svd,"fbpca":Facebook's randomized SVD implementation
- `random_state`(int, RandomState instance or None, optional (default=None):The seed of the -pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Return ndarray (M,k),M:Number of samples,K:Number of components.

**Examples:**
```
>>>import numpy as np
>>> np.random.seed(42)  # This is for doctests reproducibility

>>>from light_famd  import PCA
>>>X = pd.DataFrame(np.random.randint(0,10,size=(10,3)),columns=list('ABC'))
>>>pca = PCA(n_components=2)
>>>pca.fit(X)
PCA(check_input=True, copy=True, engine='auto', n_components=2, n_iter=3,
  random_state=None, rescale_with_mean=True, rescale_with_std=True)

>>>print(pca.explained_variance_)
[20.20385109  8.48246239]

>>>print(pca.explained_variance_ratio_)
[0.6734617029875277, 0.28274874633810754]
>>>print(pca.column_correlation(X))  # pearson correlation between component and  original column,while p-value >=0.05 this similarity is `Nan`.
          0        1
A -0.953482      NaN
B  0.907314      NaN
C       NaN  0.84211

>>>print(pca.transform(X))
[[-0.82262005  0.11730656]
 [ 0.05359079  1.62298683]
 [ 1.03052849  0.79973099]
 [-0.24313366  0.25651395]
 [-0.94630387 -1.04943025]
 [-0.70591749 -0.01282583]
 [-0.39948373 -1.52612436]
 [ 2.70164194  0.38048482]
 [-2.49373351  0.53655273]
 [ 1.8254311  -1.12519545]]
>>>print(pca.fit_transform(X))
[[-0.82262005  0.11730656]
 [ 0.05359079  1.62298683]
 [ 1.03052849  0.79973099]
 [-0.24313366  0.25651395]
 [-0.94630387 -1.04943025]
 [-0.70591749 -0.01282583]
 [-0.39948373 -1.52612436]
 [ 2.70164194  0.38048482]
 [-2.49373351  0.53655273]
 [ 1.8254311  -1.12519545]]

```
###	Correspondence-Analysis: CA

**CA**(n_components=2, n_iter=10, copy=True, check_input=True, random_state=None,
                 engine='auto'):

**Args:**
- `n_components` (int): The number of principal components to compute.
- `copy` (bool): Whether to perform the computations inplace or not.
- `check_input` (bool): Whether to check the consistency of the inputs or not.
- `engine`(string):"auto":randomized_svd,"fbpca":Facebook's randomized SVD implementation
- `random_state`(int, RandomState instance or None, optional (default=None):The seed of the -pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Return ndarray (M,k),M:Number of samples,K:Number of components.

**Examples:**
```
>>>import numpy as np
>>>from light_famd import CA
>>>X  = pd.DataFrame(data=np.random.randint(0,100,size=(10,4)),columns=list('ABCD'))
>>>ca=CA(n_components=2,n_iter=2)
>>>ca.fit(X)
CA(check_input=True, copy=True, engine='auto', n_components=2, n_iter=2,
  random_state=None)

>>> print(ca.explained_variance_)
[0.16892141 0.0746376 ]

>>>print(ca.explained_variance_ratio_)
[0.5650580210934917, 0.2496697790527281]

>>>print(ca.transform(X))
[[ 0.23150854 -0.39167802]
 [ 0.36006095  0.00301414]
 [-0.48192602 -0.13002647]
 [-0.06333533 -0.21475652]
 [-0.16438708 -0.10418312]
 [-0.38129126 -0.16515196]
 [ 0.2721296   0.46923757]
 [ 0.82953753  0.20638333]
 [-0.500007    0.36897935]
 [ 0.57932474 -0.1023383 ]]

>>>print(ca.fit_transform(X))
[[ 0.23150854 -0.39167802]
 [ 0.36006095  0.00301414]
 [-0.48192602 -0.13002647]
 [-0.06333533 -0.21475652]
 [-0.16438708 -0.10418312]
 [-0.38129126 -0.16515196]
 [ 0.2721296   0.46923757]
 [ 0.82953753  0.20638333]
 [-0.500007    0.36897935]
 [ 0.57932474 -0.1023383 ]]
```

###	Multiple-Correspondence-Analysis: MCA
MCA class inherits from  CA  class.

```
>>>import pandas as pd
>>>from light_famd import MCA
>>>X=pd.DataFrame(np.random.choice(list('abcde'),size=(10,4),replace=True),columns =list('ABCD'))
>>>print(X)
      A  B  C  D
0  d  e  a  d
1  e  d  b  b
2  e  d  a  e
3  b  b  e  d
4  b  d  b  b
5  c  b  a  e
6  e  d  b  a
7  d  c  d  d
8  b  c  d  a
9  a  e  c  c
>>>mca=MCA(n_components=2)
>>>mca.fit(X)
MCA(check_input=True, copy=True, engine='auto', n_components=2, n_iter=10,
  random_state=None)

>>>print(mca.explained_variance_)
[0.90150495 0.76979456]

>>>print(mca.explained_variance_ratio_)
[0.24040131974598467, 0.20527854948955893]

>>>print(mca.transform(X)) 
[[ 0.55603013  0.7016272 ]
 [-0.73558629 -1.17559462]
 [-0.44972794 -0.4973024 ]
 [-0.16248444  0.95706908]
 [-0.66969377 -0.79951057]
 [-0.21267777  0.39953562]
 [-0.67921667 -0.8707747 ]
 [ 0.05058625  1.34573057]
 [-0.31952341  0.77285922]
 [ 2.62229391 -0.83363941]]

>>>print(mca.fit_transform(X)) 
[[ 0.55603013  0.7016272 ]
 [-0.73558629 -1.17559462]
 [-0.44972794 -0.4973024 ]
 [-0.16248444  0.95706908]
 [-0.66969377 -0.79951057]
 [-0.21267777  0.39953562]
 [-0.67921667 -0.8707747 ]
 [ 0.05058625  1.34573057]
 [-0.31952341  0.77285922]
 [ 2.62229391 -0.83363941]]

```
###	Multiple-Factor-Analysis: MFA
MFA class inherits from  PCA  class.
Since FAMD class inherits from  MFA and the only thing to do for FAMD is to determine `groups` parameter compare to its  superclass `MFA`.therefore we skip this chapiter and go directly to `FAMD`.


###	Factor-Analysis-of-Mixed-Data: FAMD
The `FAMD` inherits from the `MFA` class, which entails that you have access to all it's methods and properties of `MFA` class.
```
>>>import pandas as pd
>>>from light_famd import FAMD
>>>X_n = pd.DataFrame(data=np.random.randint(0,100,size=(10,2)),columns=list('AB'))
>>>X_c =pd.DataFrame(np.random.choice(list('abcde'),size=(10,4),replace=True),columns =list('CDEF'))
>>>X=pd.concat([X_n,X_c],axis=1)
>>>print(X)
        A   B  C  D  E  F
0  96  19  b  d  b  e
1  11  46  b  d  a  e
2   0  89  a  a  a  c
3  13  63  c  a  e  d
4  37  36  d  b  e  c
5  10  99  a  b  d  c
6  76   2  c  a  d  e
7  32   5  c  a  e  d
8  49   9  c  e  e  e
9   4  22  c  c  b  d

>>>famd = FAMD(n_components=2)
>>>famd.fit(X)
MCA PROCESS MCA PROCESS ELIMINATED 0  COLUMNS SINCE THEIR MISS_RATES >= 99%
Out:
FAMD(check_input=True, copy=False, engine='auto', n_components=2, n_iter=2,
     random_state=None)

>>>print(famd.explained_variance_)
[17.40871219  9.73440949]

>>>print(famd.explained_variance_ratio_)
[0.32596621039327284, 0.1822701494502082]

>>> print(famd.column_correlation(X))
             0         1
A         NaN       NaN
B         NaN       NaN
C_a       NaN       NaN
C_b       NaN  0.824458
C_c  0.922220       NaN
C_d       NaN       NaN
D_a       NaN       NaN
D_b       NaN       NaN
D_c       NaN       NaN
D_d       NaN  0.824458
D_e       NaN       NaN
E_a       NaN       NaN
E_b       NaN       NaN
E_d       NaN       NaN
E_e       NaN       NaN
F_c       NaN -0.714447
F_d  0.673375       NaN
F_e       NaN  0.839324


>>>print(famd.transform(X)) 
[[ 2.23848136  5.75809647]
 [ 2.0845175   4.78930072]
 [ 2.6682068  -2.78991262]
 [ 6.2962962  -1.57451325]
 [ 2.52140085 -3.28279729]
 [ 1.58256681 -3.73135011]
 [ 5.19476759  1.18333717]
 [ 6.35288446 -1.33186723]
 [ 5.02971134  1.6216402 ]
 [ 4.05754963  0.69620997]]

>>>print(famd.fit_transform(X))
MCA PROCESS HAVE ELIMINATE 0  COLUMNS SINCE ITS MISSING RATE >= 99%
[[ 2.23848136  5.75809647]
 [ 2.0845175   4.78930072]
 [ 2.6682068  -2.78991262]
 [ 6.2962962  -1.57451325]
 [ 2.52140085 -3.28279729]
 [ 1.58256681 -3.73135011]
 [ 5.19476759  1.18333717]
 [ 6.35288446 -1.33186723]
 [ 5.02971134  1.6216402 ]
 [ 4.05754963  0.69620997]]

```


## Going faster

By default `light_famd` uses `sklearn`'s randomized SVD implementation. One of the goals of `Light_FAMD` is to make it possible to use a different SVD backend. For the while the only other supported backend is [Facebook's randomized SVD implementation](https://research.facebook.com/blog/fast-randomized-svd/) called [fbpca](http://fbpca.readthedocs.org/en/latest/). You can use it by setting the `engine` parameter to `'fbpca'` or see [Truncated_FAMD](https://github.com/Cauchemare/Truncated_FAMD) for an alternative of automatic selection of svd_solver depends on the structure of input:

```python
>>> import Light_FAMD
>>> pca = Light_FAMD.PCA(engine='fbpca')

```


%package help
Summary:	Development documents and examples for light-famd
Provides:	python3-light-famd-doc
%description help

# Light_FAMD

`Light_FAMD` is a library for prcessing [factor analysis of mixed data](https://www.wikiwand.com/en/Factor_analysis). This includes a variety of methods including [principal component analysis (PCA)](https://www.wikiwand.com/en/Principal_component_analysis) and [multiply correspondence analysis (MCA)](https://www.researchgate.net/publication/239542271_Multiple_Correspondence_Analysis). The goal is to provide an efficient and light implementation for each algorithm along with a scikit-learn API.

## Table of contents

- [Usage](##Usage)
  - [Guidelines](###Guidelines)
  - [Principal component analysis (PCA)](#principal-component-analysis-pca)
  - [Correspondence analysis (CA)](#correspondence-analysis-ca)
  - [Multiple correspondence analysis (MCA)](#multiple-correspondence-analysis-mca)
  - [Multiple factor analysis (MFA)](#multiple-factor-analysis-mfa)
  - [Factor analysis of mixed data (FAMD)](#factor-analysis-of-mixed-data-famd)
- [Going faster](#going-faster)


`Light_FAMD` doesn't have any extra dependencies apart from the usual suspects (`sklearn`, `pandas`, `numpy`) which are included with Anaconda.


### Guidelines

Each base estimator(CA,PCA) provided by `Light_FAMD` extends scikit-learn's `(TransformerMixin,BaseEstimator)`.which means we could use directly `fit_transform`,and `(set_params,get_params)` methods.

Under the hood `Light_FAMD` uses a [randomised version of SVD](https://scikit-learn.org/dev/modules/generated/sklearn.utils.extmath.randomized_svd.html). This algorithm finds a (usually very good) approximate truncated singular value decomposition using randomization to speed up the computations. It is particularly fast on large matrices on which you wish to extract only a small number of components. In order to obtain further speed up, n_iter can be set <=2 (at the cost of loss of precision). However if you want reproducible results then you should set the `random_state` parameter.

The randomised version of SVD is an iterative method. Because each of light_famd's algorithms use SVD, they all possess a `n_iter` parameter which controls the number of iterations used for computing the SVD. On the one hand the higher `n_iter` is the more precise the results will be. On the other hand increasing `n_iter` increases the computation time. In general the algorithm converges very quickly so using a low `n_iter` (which is the default behaviour) is recommended.

In this package,inheritance relationship as shown  below(A->B:A is superclass of B):

- PCA -> MFA -> FAMD
- CA ->MCA

You are supposed to use each method depending on your situation:

- All your variables are numeric: use principal component analysis (`PCA`)
- You have a contingency table: use correspondence analysis (`CA`)
- You have more than 2 variables and they are all categorical: use multiple correspondence analysis (`MCA`)
- You have groups of categorical **or** numerical variables: use multiple factor analysis (`MFA`)
- You have both categorical and numerical variables: use factor analysis of mixed data (`FAMD`)

The next subsections give an overview of each method along with usage information. The following papers give a good overview of the field of factor analysis if you want to go deeper:

- [A Tutorial on Principal Component Analysis](https://arxiv.org/pdf/1404.1100.pdf)
- [Theory of Correspondence Analysis](http://statmath.wu.ac.at/courses/CAandRelMeth/caipA.pdf)
- [Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions](https://arxiv.org/pdf/0909.4061.pdf)
- [Computation of Multiple Correspondence Analysis, with code in R](https://core.ac.uk/download/pdf/6591520.pdf)
- [Singular Value Decomposition Tutorial](https://davetang.org/file/Singular_Value_Decomposition_Tutorial.pdf)
- [Multiple Factor Analysis](https://www.utdallas.edu/~herve/Abdi-MFA2007-pretty.pdf)

Notice that `Light_FAMD` does't support the sparse input,see [Truncated_FAMD](https://github.com/Cauchemare/Truncated_FAMD) for an alternative of sparse and big data.


###	Principal-Component-Analysis: PCA

**PCA**(rescale_with_mean=True, rescale_with_std=True, n_components=2, n_iter=3,
                 copy=True, check_input=True, random_state=None, engine='auto'):

**Args:**
- `rescale_with_mean` (bool): Whether to substract each column's mean or not.
- `rescale_with_std` (bool): Whether to divide each column by it's standard deviation or not.
- `n_components` (int): The number of principal components to compute.
- `n_iter` (int): The number of iterations used for computing the SVD.
- `copy` (bool): Whether to perform the computations inplace or not.
- `check_input` (bool): Whether to check the consistency of the inputs or not.
- `engine`(string):"auto":randomized_svd,"fbpca":Facebook's randomized SVD implementation
- `random_state`(int, RandomState instance or None, optional (default=None):The seed of the -pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Return ndarray (M,k),M:Number of samples,K:Number of components.

**Examples:**
```
>>>import numpy as np
>>> np.random.seed(42)  # This is for doctests reproducibility

>>>from light_famd  import PCA
>>>X = pd.DataFrame(np.random.randint(0,10,size=(10,3)),columns=list('ABC'))
>>>pca = PCA(n_components=2)
>>>pca.fit(X)
PCA(check_input=True, copy=True, engine='auto', n_components=2, n_iter=3,
  random_state=None, rescale_with_mean=True, rescale_with_std=True)

>>>print(pca.explained_variance_)
[20.20385109  8.48246239]

>>>print(pca.explained_variance_ratio_)
[0.6734617029875277, 0.28274874633810754]
>>>print(pca.column_correlation(X))  # pearson correlation between component and  original column,while p-value >=0.05 this similarity is `Nan`.
          0        1
A -0.953482      NaN
B  0.907314      NaN
C       NaN  0.84211

>>>print(pca.transform(X))
[[-0.82262005  0.11730656]
 [ 0.05359079  1.62298683]
 [ 1.03052849  0.79973099]
 [-0.24313366  0.25651395]
 [-0.94630387 -1.04943025]
 [-0.70591749 -0.01282583]
 [-0.39948373 -1.52612436]
 [ 2.70164194  0.38048482]
 [-2.49373351  0.53655273]
 [ 1.8254311  -1.12519545]]
>>>print(pca.fit_transform(X))
[[-0.82262005  0.11730656]
 [ 0.05359079  1.62298683]
 [ 1.03052849  0.79973099]
 [-0.24313366  0.25651395]
 [-0.94630387 -1.04943025]
 [-0.70591749 -0.01282583]
 [-0.39948373 -1.52612436]
 [ 2.70164194  0.38048482]
 [-2.49373351  0.53655273]
 [ 1.8254311  -1.12519545]]

```
###	Correspondence-Analysis: CA

**CA**(n_components=2, n_iter=10, copy=True, check_input=True, random_state=None,
                 engine='auto'):

**Args:**
- `n_components` (int): The number of principal components to compute.
- `copy` (bool): Whether to perform the computations inplace or not.
- `check_input` (bool): Whether to check the consistency of the inputs or not.
- `engine`(string):"auto":randomized_svd,"fbpca":Facebook's randomized SVD implementation
- `random_state`(int, RandomState instance or None, optional (default=None):The seed of the -pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Return ndarray (M,k),M:Number of samples,K:Number of components.

**Examples:**
```
>>>import numpy as np
>>>from light_famd import CA
>>>X  = pd.DataFrame(data=np.random.randint(0,100,size=(10,4)),columns=list('ABCD'))
>>>ca=CA(n_components=2,n_iter=2)
>>>ca.fit(X)
CA(check_input=True, copy=True, engine='auto', n_components=2, n_iter=2,
  random_state=None)

>>> print(ca.explained_variance_)
[0.16892141 0.0746376 ]

>>>print(ca.explained_variance_ratio_)
[0.5650580210934917, 0.2496697790527281]

>>>print(ca.transform(X))
[[ 0.23150854 -0.39167802]
 [ 0.36006095  0.00301414]
 [-0.48192602 -0.13002647]
 [-0.06333533 -0.21475652]
 [-0.16438708 -0.10418312]
 [-0.38129126 -0.16515196]
 [ 0.2721296   0.46923757]
 [ 0.82953753  0.20638333]
 [-0.500007    0.36897935]
 [ 0.57932474 -0.1023383 ]]

>>>print(ca.fit_transform(X))
[[ 0.23150854 -0.39167802]
 [ 0.36006095  0.00301414]
 [-0.48192602 -0.13002647]
 [-0.06333533 -0.21475652]
 [-0.16438708 -0.10418312]
 [-0.38129126 -0.16515196]
 [ 0.2721296   0.46923757]
 [ 0.82953753  0.20638333]
 [-0.500007    0.36897935]
 [ 0.57932474 -0.1023383 ]]
```

###	Multiple-Correspondence-Analysis: MCA
MCA class inherits from  CA  class.

```
>>>import pandas as pd
>>>from light_famd import MCA
>>>X=pd.DataFrame(np.random.choice(list('abcde'),size=(10,4),replace=True),columns =list('ABCD'))
>>>print(X)
      A  B  C  D
0  d  e  a  d
1  e  d  b  b
2  e  d  a  e
3  b  b  e  d
4  b  d  b  b
5  c  b  a  e
6  e  d  b  a
7  d  c  d  d
8  b  c  d  a
9  a  e  c  c
>>>mca=MCA(n_components=2)
>>>mca.fit(X)
MCA(check_input=True, copy=True, engine='auto', n_components=2, n_iter=10,
  random_state=None)

>>>print(mca.explained_variance_)
[0.90150495 0.76979456]

>>>print(mca.explained_variance_ratio_)
[0.24040131974598467, 0.20527854948955893]

>>>print(mca.transform(X)) 
[[ 0.55603013  0.7016272 ]
 [-0.73558629 -1.17559462]
 [-0.44972794 -0.4973024 ]
 [-0.16248444  0.95706908]
 [-0.66969377 -0.79951057]
 [-0.21267777  0.39953562]
 [-0.67921667 -0.8707747 ]
 [ 0.05058625  1.34573057]
 [-0.31952341  0.77285922]
 [ 2.62229391 -0.83363941]]

>>>print(mca.fit_transform(X)) 
[[ 0.55603013  0.7016272 ]
 [-0.73558629 -1.17559462]
 [-0.44972794 -0.4973024 ]
 [-0.16248444  0.95706908]
 [-0.66969377 -0.79951057]
 [-0.21267777  0.39953562]
 [-0.67921667 -0.8707747 ]
 [ 0.05058625  1.34573057]
 [-0.31952341  0.77285922]
 [ 2.62229391 -0.83363941]]

```
###	Multiple-Factor-Analysis: MFA
MFA class inherits from  PCA  class.
Since FAMD class inherits from  MFA and the only thing to do for FAMD is to determine `groups` parameter compare to its  superclass `MFA`.therefore we skip this chapiter and go directly to `FAMD`.


###	Factor-Analysis-of-Mixed-Data: FAMD
The `FAMD` inherits from the `MFA` class, which entails that you have access to all it's methods and properties of `MFA` class.
```
>>>import pandas as pd
>>>from light_famd import FAMD
>>>X_n = pd.DataFrame(data=np.random.randint(0,100,size=(10,2)),columns=list('AB'))
>>>X_c =pd.DataFrame(np.random.choice(list('abcde'),size=(10,4),replace=True),columns =list('CDEF'))
>>>X=pd.concat([X_n,X_c],axis=1)
>>>print(X)
        A   B  C  D  E  F
0  96  19  b  d  b  e
1  11  46  b  d  a  e
2   0  89  a  a  a  c
3  13  63  c  a  e  d
4  37  36  d  b  e  c
5  10  99  a  b  d  c
6  76   2  c  a  d  e
7  32   5  c  a  e  d
8  49   9  c  e  e  e
9   4  22  c  c  b  d

>>>famd = FAMD(n_components=2)
>>>famd.fit(X)
MCA PROCESS MCA PROCESS ELIMINATED 0  COLUMNS SINCE THEIR MISS_RATES >= 99%
Out:
FAMD(check_input=True, copy=False, engine='auto', n_components=2, n_iter=2,
     random_state=None)

>>>print(famd.explained_variance_)
[17.40871219  9.73440949]

>>>print(famd.explained_variance_ratio_)
[0.32596621039327284, 0.1822701494502082]

>>> print(famd.column_correlation(X))
             0         1
A         NaN       NaN
B         NaN       NaN
C_a       NaN       NaN
C_b       NaN  0.824458
C_c  0.922220       NaN
C_d       NaN       NaN
D_a       NaN       NaN
D_b       NaN       NaN
D_c       NaN       NaN
D_d       NaN  0.824458
D_e       NaN       NaN
E_a       NaN       NaN
E_b       NaN       NaN
E_d       NaN       NaN
E_e       NaN       NaN
F_c       NaN -0.714447
F_d  0.673375       NaN
F_e       NaN  0.839324


>>>print(famd.transform(X)) 
[[ 2.23848136  5.75809647]
 [ 2.0845175   4.78930072]
 [ 2.6682068  -2.78991262]
 [ 6.2962962  -1.57451325]
 [ 2.52140085 -3.28279729]
 [ 1.58256681 -3.73135011]
 [ 5.19476759  1.18333717]
 [ 6.35288446 -1.33186723]
 [ 5.02971134  1.6216402 ]
 [ 4.05754963  0.69620997]]

>>>print(famd.fit_transform(X))
MCA PROCESS HAVE ELIMINATE 0  COLUMNS SINCE ITS MISSING RATE >= 99%
[[ 2.23848136  5.75809647]
 [ 2.0845175   4.78930072]
 [ 2.6682068  -2.78991262]
 [ 6.2962962  -1.57451325]
 [ 2.52140085 -3.28279729]
 [ 1.58256681 -3.73135011]
 [ 5.19476759  1.18333717]
 [ 6.35288446 -1.33186723]
 [ 5.02971134  1.6216402 ]
 [ 4.05754963  0.69620997]]

```


## Going faster

By default `light_famd` uses `sklearn`'s randomized SVD implementation. One of the goals of `Light_FAMD` is to make it possible to use a different SVD backend. For the while the only other supported backend is [Facebook's randomized SVD implementation](https://research.facebook.com/blog/fast-randomized-svd/) called [fbpca](http://fbpca.readthedocs.org/en/latest/). You can use it by setting the `engine` parameter to `'fbpca'` or see [Truncated_FAMD](https://github.com/Cauchemare/Truncated_FAMD) for an alternative of automatic selection of svd_solver depends on the structure of input:

```python
>>> import Light_FAMD
>>> pca = Light_FAMD.PCA(engine='fbpca')

```


%prep
%autosetup -n light_famd-0.0.3

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-light-famd -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Thu Jun 08 2023 Python_Bot <Python_Bot@openeuler.org> - 0.0.3-1
- Package Spec generated