%global _empty_manifest_terminate_build 0 Name: python-sealion Version: 4.4.5 Release: 1 Summary: SeaLion is a comprehensive machine learning and data science library for beginners and ml-engineers alike. License: Apache URL: https://github.com/anish-lakkapragada/SeaLion Source0: https://mirrors.nju.edu.cn/pypi/web/packages/e4/bc/c2f3c5d581eee26bf6ac0084145f5b722caaad00cd001ba5fc5db1eb4afa/sealion-4.4.5.tar.gz BuildArch: noarch Requires: python3-numpy Requires: python3-joblib Requires: python3-pandas Requires: python3-scipy Requires: python3-tqdm Requires: python3-multiprocess Requires: python3-seaborn Requires: python3-Cython Requires: python3-setuptools Requires: python3-matplotlib Requires: python3-dill Requires: python3-pytz Requires: python3-cycler Requires: python3-pillow Requires: python3-kiwisolver Requires: python3-pyparsing Requires: python3-six %description

# SeaLion ![python](https://img.shields.io/pypi/pyversions/sealion?color=blueviolet&style=plastic) ![License](https://img.shields.io/pypi/l/sealion?color=blue) [![Documentation Status](https://readthedocs.org/projects/sealion/badge/?version=latest)](https://sealion.readthedocs.io/en/latest/?badge=latest) ![total lines](https://img.shields.io/tokei/lines/github/anish-lakkapragada/SeaLion?color=brightgreen) ![issues](https://img.shields.io/github/issues/anish-lakkapragada/SeaLion?color=yellow&style=plastic) ![pypi](https://img.shields.io/pypi/v/sealion?color=red&style=plastic) ![repo size](https://img.shields.io/github/repo-size/anish-lakkapragada/SeaLion?color=important) ![Deploy to PyPI](https://github.com/anish-lakkapragada/SeaLion/workflows/Deploy%20to%20PyPI/badge.svg) SeaLion is designed to teach today's aspiring ml-engineers the popular machine learning concepts of today in a way that gives both intuition and ways of application. We do this through concise algorithms that do the job in the least jargon possible and examples to guide you through every step of the way. ## Quick Demo


SeaLion in Action

## General Usage For most classifiers you can just do (we'll use Logistic Regression as an example here) : ```python from sealion.regression import LogisticRegression log_reg = LogisticRegression() ``` to initialize, and then to train : ``` python log_reg.fit(X_train, y_train) ``` and for testing : ```python y_pred = log_reg.predict(X_test) evaluation = log_reg.evaluate(X_test, y_test) ``` For the unsupervised clustering algorithms you may do : ```python from sealion.unsupervised_clustering import KMeans kmeans = KMeans(k = 3) ``` and then to fit and predict : ```python predictions = kmeans.fit_predict(X) ``` Neural networks are a bit more complicated, so you may want to check an example [here.](https://github.com/anish-lakkapragada/SeaLion/blob/main/examples/deep_learning_example.ipynb) The syntax of the APIs was designed to be easy to use and familiar to most other ML libraries. This is to make sure both beginners and experts in the field can comfortably use SeaLion. Of course, none of the source code uses other ML frameworks. ## Testimonials, Stats, and Reddit Posts "Super Expansive Python ML Library" - [@Peter Washington](https://twitter.com/peter\_washing/status/1356766327541616644), Stanford PHD candidate in Bio-Engineering [Analytics Vidhya calls SeaLion's algorithms **beginner-friendly**, **efficient**, and **concise**.](https://www.analyticsvidhya.com/blog/2021/02/6-open-source-data-science-projects-that-provide-an-edge-to-your-portfolio/) Stats : - [**1,700+ downloads**](https://pypistats.org/packages/sealion) - **300+ stars** - **15,000 views on GitHub** - **100+ forks/clones** [r/Python Post](https://www.reddit.com/r/Python/comments/lf59bw/machine_learning_library_by_14year_old_sealion/) [r/learningmachinelearning Post](https://www.reddit.com/r/learnmachinelearning/comments/lfv72l/a_set_of_jupyter_notebooks_to_help_you_understand/) ## Installation The package is available on PyPI. Install like such : ``` shell pip install sealion ``` SeaLion can only support Python 3, so please make sure you are on the newest version. ## General Information SeaLion was built by Anish Lakkapragada, a freshman in high school, starting in Thanksgiving of 2020 and has continued onto early 2021. The library is meant for beginners to use when solving the standard libraries like iris, breast cancer, swiss roll, the moons dataset, MNIST, etc. ## Documentation WE HAVE DOCS NOW! Thanks [@Patrick Huang](http://github.com/phuang1024)! All available here: [docs](https://sealion.readthedocs.io/). Also, use the examples to your advantage! ### Updates for v4.1 and up! First things first - thank you for all of the support. The two reddit posts did much better than I expected (1.6k upvotes, about 200 comments) and I got a lot of feedback and advice. Thank you to anyone who participated in r/Python or r/learnmachinelearning. SeaLion has also taken off with the posts. We currently have had 3 issues (1 closed) and have reached 195 stars and 20 forks. I wasn't expecting this and I am grateful for everyone who has shown their appreciation for this library. Also some issues have popped up. Most of them can be easily solved by just deleting sealion manually (going into the folder where the source is and just deleting it - not pip uninstall) and then reinstalling the usual way, but feel free to put an issue up anytime. In versions 4.1+ we are hoping to polish the library more. Currently 4.1 comes with Bernoulli Naive Bayes and we also have added precision, recall, and the f1 metric in the utils module. We are hoping to include Gaussian Mixture Models and Batch Normalization in the future. Code examples for these new algorithms will be created within a day or two after release. Thank you! ### Updates for v3.0.0! SeaLion v3.0 and up has had a lot of major milestones. The first thing is that all the code examples (in jupyter notebooks) for basically all of the modules in sealion are put into the examples directory. Most of them go over using actual datasets like iris, breast cancer, moons, blobs, MNIST, etc. These were all built using v3.0.8 -hopefully that clears up any confusion. I hope you enjoy them. Perhaps the biggest change in v3.0 is how we have changed the Cython compilation. A quick primer on Cython if you are unfamiliar - you take your python code (in .py files), change it and add some return types and type declarations, put that in a .pyx file, and compile it to a .so file. The .so file is then imported in the python module which you use. The main bug fixed was that the .so file is actually specific to the architecture of the user. I use macOS and compiled all my files in .so, so prior v3.0 I would just give those .so files to anybody else. However other architectures and OSs like Ubuntu would not be able to recognize those files. Instead what we do know is just store the .pyx files (universal for all computers) in the source code, and the first time you import sealion all of those .pyx files will get compiled into .so files (so they will work for whatever you are using.) This means the first import will take about 40 seconds, but after that it will be as quick as any other import. ## Machine Learning Algorithms The machine learning algorithms of SeaLion are listed below. Please note that the stucture of the listing isn't meant to resemble that of SeaLion's APIs and that hyperlinked algorithms are for those that are new/niche. Of course, new algorithms are being made right now. 1. **Deep Neural Networks** - Optimizers - Gradient Descent (and mini-batch gradient descent) - Momentum Optimization w/ Nesterov Accelerated Gradient - Stochastic gradient descent (w/ momentum + nesterov) - AdaGrad - RMSprop - Adam - Nadam - [AdaBelief](https://arxiv.org/abs/2010.07468) - Layers - Flatten (turn 2D+ data to 2D matrices) - Dense (fully-connected layers) - Batch Normalization! - Regularization - Dropout - Activations - ReLU - Tanh - Sigmoid - Softmax - Leaky ReLU - [PReLU](https://arxiv.org/abs/1502.01852) - ELU - SELU - Swish - Loss Functions - MSE (for regression) - CrossEntropy (for classification) - Transfer Learning - Save weights (in a pickle file) - reload them and then enter them into the same neural network - this is so you don't have to start training from scratch 2. **Regression** - Linear Regression (Normal Equation, closed-form) - Ridge Regression (L2 regularization, closed-form solution) - Lasso Regression (L1 regularization) - Elastic-Net Regression - Logistic Regression - Softmax Regression - Exponential Regression - Polynomial Regression 3. **Dimensionality Reduction** - Principal Component Analysis (PCA) - t-distributed Stochastic Neighbor Embedding (tSNE) 4. **Gaussian Mixture Models (GMMs)** - unsupervised clustering with "soft" predictions - anomaly detection - AIC & BIC calculation methods 5. **Unsupervised Clustering** - KMeans (w/ KMeans++) - DBSCAN 6. **Naive Bayes** - Multinomial Naive Bayes - Gaussian Naive Bayes - Bernoulli Naive Bayes 7. **Trees** - Decision Tree (with max\_branches, min\_samples regularization + CART training) 8. **Ensemble Learning** - Random Forests - Ensemble/Voting Classifier 9. **Nearest Neighbors** - k-nearest neighbors 10. **Utils** - one\_hot encoder function (one\_hot()) - plot confusion matrix function (confusion\_matrix()) - revert one hot encoding to 1D Array (revert\_one\_hot()) - revert softmax predictions to 1D Array (revert\_softmax()) ## Algorithms in progress Some of the algorithms we are working on right now. 1. **Barnes Hut t-SNE** (please, please contribute for this one) ## Contributing First, install the required libraries: ```bash pip install -r requirements.txt ``` If you feel you can do something better than how it is right now in SeaLion, please do! Believe me, you will find great joy in simplifying my code (probably using numpy) and speeding it up. The major problem right now is speed, some algorithms like PCA can handle 10000+ data points, whereas tSNE is unscalable with O(n\^2) time complexity. We have solved this problem with Cython + parallel processing (thanks joblib), so algorithms (aside from neural networks) are working well with \<1000 points. Getting to the next level will need some help. Most of the modules I use are numpy, pandas, joblib, and tqdm. I prefer using less dependencies in the code, so please keep it down to a minimum. Other than that, thanks for contributing! ## Acknowledgements Plenty of articles and people helped me a long way. Some of the tougher questions I dealt with were Automatic Differentiation in neural networks, in which this [tutorial](https://www.youtube.com/watch?v=o64FV-ez6Gw) helped me. I also got some help on the `O(n^2)` time complexity problem of the denominator of t-SNE from this [article](https://nlml.github.io/in-raw-numpy/in-raw-numpy-t-sne/) and understood the mathematical derivation for the gradients (original paper didn't go over it) from [here](http://pages.di.unipi.it/errica/assets/files/sne_tsne.pdf). Also I used the PCA method from handsonml so thanks for that too Aurélien Géron. Lastly special thanks to Evan M. Kim and Peter Washington for helping make the normal equation and cauchy distribution in tSNE make sense. Also thanks to [@Kento Nishi](http://github.com/KentoNishi) for helping me understand open-source. ## Feedback, comments, or questions If you have any feedback or something you would like to tell me, please do not hesitate to share! Feel free to comment here on github or reach out to me through ! ©Anish Lakkapragada 2021 %package -n python3-sealion Summary: SeaLion is a comprehensive machine learning and data science library for beginners and ml-engineers alike. Provides: python-sealion BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-sealion

# SeaLion ![python](https://img.shields.io/pypi/pyversions/sealion?color=blueviolet&style=plastic) ![License](https://img.shields.io/pypi/l/sealion?color=blue) [![Documentation Status](https://readthedocs.org/projects/sealion/badge/?version=latest)](https://sealion.readthedocs.io/en/latest/?badge=latest) ![total lines](https://img.shields.io/tokei/lines/github/anish-lakkapragada/SeaLion?color=brightgreen) ![issues](https://img.shields.io/github/issues/anish-lakkapragada/SeaLion?color=yellow&style=plastic) ![pypi](https://img.shields.io/pypi/v/sealion?color=red&style=plastic) ![repo size](https://img.shields.io/github/repo-size/anish-lakkapragada/SeaLion?color=important) ![Deploy to PyPI](https://github.com/anish-lakkapragada/SeaLion/workflows/Deploy%20to%20PyPI/badge.svg) SeaLion is designed to teach today's aspiring ml-engineers the popular machine learning concepts of today in a way that gives both intuition and ways of application. We do this through concise algorithms that do the job in the least jargon possible and examples to guide you through every step of the way. ## Quick Demo


SeaLion in Action

## General Usage For most classifiers you can just do (we'll use Logistic Regression as an example here) : ```python from sealion.regression import LogisticRegression log_reg = LogisticRegression() ``` to initialize, and then to train : ``` python log_reg.fit(X_train, y_train) ``` and for testing : ```python y_pred = log_reg.predict(X_test) evaluation = log_reg.evaluate(X_test, y_test) ``` For the unsupervised clustering algorithms you may do : ```python from sealion.unsupervised_clustering import KMeans kmeans = KMeans(k = 3) ``` and then to fit and predict : ```python predictions = kmeans.fit_predict(X) ``` Neural networks are a bit more complicated, so you may want to check an example [here.](https://github.com/anish-lakkapragada/SeaLion/blob/main/examples/deep_learning_example.ipynb) The syntax of the APIs was designed to be easy to use and familiar to most other ML libraries. This is to make sure both beginners and experts in the field can comfortably use SeaLion. Of course, none of the source code uses other ML frameworks. ## Testimonials, Stats, and Reddit Posts "Super Expansive Python ML Library" - [@Peter Washington](https://twitter.com/peter\_washing/status/1356766327541616644), Stanford PHD candidate in Bio-Engineering [Analytics Vidhya calls SeaLion's algorithms **beginner-friendly**, **efficient**, and **concise**.](https://www.analyticsvidhya.com/blog/2021/02/6-open-source-data-science-projects-that-provide-an-edge-to-your-portfolio/) Stats : - [**1,700+ downloads**](https://pypistats.org/packages/sealion) - **300+ stars** - **15,000 views on GitHub** - **100+ forks/clones** [r/Python Post](https://www.reddit.com/r/Python/comments/lf59bw/machine_learning_library_by_14year_old_sealion/) [r/learningmachinelearning Post](https://www.reddit.com/r/learnmachinelearning/comments/lfv72l/a_set_of_jupyter_notebooks_to_help_you_understand/) ## Installation The package is available on PyPI. Install like such : ``` shell pip install sealion ``` SeaLion can only support Python 3, so please make sure you are on the newest version. ## General Information SeaLion was built by Anish Lakkapragada, a freshman in high school, starting in Thanksgiving of 2020 and has continued onto early 2021. The library is meant for beginners to use when solving the standard libraries like iris, breast cancer, swiss roll, the moons dataset, MNIST, etc. ## Documentation WE HAVE DOCS NOW! Thanks [@Patrick Huang](http://github.com/phuang1024)! All available here: [docs](https://sealion.readthedocs.io/). Also, use the examples to your advantage! ### Updates for v4.1 and up! First things first - thank you for all of the support. The two reddit posts did much better than I expected (1.6k upvotes, about 200 comments) and I got a lot of feedback and advice. Thank you to anyone who participated in r/Python or r/learnmachinelearning. SeaLion has also taken off with the posts. We currently have had 3 issues (1 closed) and have reached 195 stars and 20 forks. I wasn't expecting this and I am grateful for everyone who has shown their appreciation for this library. Also some issues have popped up. Most of them can be easily solved by just deleting sealion manually (going into the folder where the source is and just deleting it - not pip uninstall) and then reinstalling the usual way, but feel free to put an issue up anytime. In versions 4.1+ we are hoping to polish the library more. Currently 4.1 comes with Bernoulli Naive Bayes and we also have added precision, recall, and the f1 metric in the utils module. We are hoping to include Gaussian Mixture Models and Batch Normalization in the future. Code examples for these new algorithms will be created within a day or two after release. Thank you! ### Updates for v3.0.0! SeaLion v3.0 and up has had a lot of major milestones. The first thing is that all the code examples (in jupyter notebooks) for basically all of the modules in sealion are put into the examples directory. Most of them go over using actual datasets like iris, breast cancer, moons, blobs, MNIST, etc. These were all built using v3.0.8 -hopefully that clears up any confusion. I hope you enjoy them. Perhaps the biggest change in v3.0 is how we have changed the Cython compilation. A quick primer on Cython if you are unfamiliar - you take your python code (in .py files), change it and add some return types and type declarations, put that in a .pyx file, and compile it to a .so file. The .so file is then imported in the python module which you use. The main bug fixed was that the .so file is actually specific to the architecture of the user. I use macOS and compiled all my files in .so, so prior v3.0 I would just give those .so files to anybody else. However other architectures and OSs like Ubuntu would not be able to recognize those files. Instead what we do know is just store the .pyx files (universal for all computers) in the source code, and the first time you import sealion all of those .pyx files will get compiled into .so files (so they will work for whatever you are using.) This means the first import will take about 40 seconds, but after that it will be as quick as any other import. ## Machine Learning Algorithms The machine learning algorithms of SeaLion are listed below. Please note that the stucture of the listing isn't meant to resemble that of SeaLion's APIs and that hyperlinked algorithms are for those that are new/niche. Of course, new algorithms are being made right now. 1. **Deep Neural Networks** - Optimizers - Gradient Descent (and mini-batch gradient descent) - Momentum Optimization w/ Nesterov Accelerated Gradient - Stochastic gradient descent (w/ momentum + nesterov) - AdaGrad - RMSprop - Adam - Nadam - [AdaBelief](https://arxiv.org/abs/2010.07468) - Layers - Flatten (turn 2D+ data to 2D matrices) - Dense (fully-connected layers) - Batch Normalization! - Regularization - Dropout - Activations - ReLU - Tanh - Sigmoid - Softmax - Leaky ReLU - [PReLU](https://arxiv.org/abs/1502.01852) - ELU - SELU - Swish - Loss Functions - MSE (for regression) - CrossEntropy (for classification) - Transfer Learning - Save weights (in a pickle file) - reload them and then enter them into the same neural network - this is so you don't have to start training from scratch 2. **Regression** - Linear Regression (Normal Equation, closed-form) - Ridge Regression (L2 regularization, closed-form solution) - Lasso Regression (L1 regularization) - Elastic-Net Regression - Logistic Regression - Softmax Regression - Exponential Regression - Polynomial Regression 3. **Dimensionality Reduction** - Principal Component Analysis (PCA) - t-distributed Stochastic Neighbor Embedding (tSNE) 4. **Gaussian Mixture Models (GMMs)** - unsupervised clustering with "soft" predictions - anomaly detection - AIC & BIC calculation methods 5. **Unsupervised Clustering** - KMeans (w/ KMeans++) - DBSCAN 6. **Naive Bayes** - Multinomial Naive Bayes - Gaussian Naive Bayes - Bernoulli Naive Bayes 7. **Trees** - Decision Tree (with max\_branches, min\_samples regularization + CART training) 8. **Ensemble Learning** - Random Forests - Ensemble/Voting Classifier 9. **Nearest Neighbors** - k-nearest neighbors 10. **Utils** - one\_hot encoder function (one\_hot()) - plot confusion matrix function (confusion\_matrix()) - revert one hot encoding to 1D Array (revert\_one\_hot()) - revert softmax predictions to 1D Array (revert\_softmax()) ## Algorithms in progress Some of the algorithms we are working on right now. 1. **Barnes Hut t-SNE** (please, please contribute for this one) ## Contributing First, install the required libraries: ```bash pip install -r requirements.txt ``` If you feel you can do something better than how it is right now in SeaLion, please do! Believe me, you will find great joy in simplifying my code (probably using numpy) and speeding it up. The major problem right now is speed, some algorithms like PCA can handle 10000+ data points, whereas tSNE is unscalable with O(n\^2) time complexity. We have solved this problem with Cython + parallel processing (thanks joblib), so algorithms (aside from neural networks) are working well with \<1000 points. Getting to the next level will need some help. Most of the modules I use are numpy, pandas, joblib, and tqdm. I prefer using less dependencies in the code, so please keep it down to a minimum. Other than that, thanks for contributing! ## Acknowledgements Plenty of articles and people helped me a long way. Some of the tougher questions I dealt with were Automatic Differentiation in neural networks, in which this [tutorial](https://www.youtube.com/watch?v=o64FV-ez6Gw) helped me. I also got some help on the `O(n^2)` time complexity problem of the denominator of t-SNE from this [article](https://nlml.github.io/in-raw-numpy/in-raw-numpy-t-sne/) and understood the mathematical derivation for the gradients (original paper didn't go over it) from [here](http://pages.di.unipi.it/errica/assets/files/sne_tsne.pdf). Also I used the PCA method from handsonml so thanks for that too Aurélien Géron. Lastly special thanks to Evan M. Kim and Peter Washington for helping make the normal equation and cauchy distribution in tSNE make sense. Also thanks to [@Kento Nishi](http://github.com/KentoNishi) for helping me understand open-source. ## Feedback, comments, or questions If you have any feedback or something you would like to tell me, please do not hesitate to share! Feel free to comment here on github or reach out to me through ! ©Anish Lakkapragada 2021 %package help Summary: Development documents and examples for sealion Provides: python3-sealion-doc %description help

# SeaLion ![python](https://img.shields.io/pypi/pyversions/sealion?color=blueviolet&style=plastic) ![License](https://img.shields.io/pypi/l/sealion?color=blue) [![Documentation Status](https://readthedocs.org/projects/sealion/badge/?version=latest)](https://sealion.readthedocs.io/en/latest/?badge=latest) ![total lines](https://img.shields.io/tokei/lines/github/anish-lakkapragada/SeaLion?color=brightgreen) ![issues](https://img.shields.io/github/issues/anish-lakkapragada/SeaLion?color=yellow&style=plastic) ![pypi](https://img.shields.io/pypi/v/sealion?color=red&style=plastic) ![repo size](https://img.shields.io/github/repo-size/anish-lakkapragada/SeaLion?color=important) ![Deploy to PyPI](https://github.com/anish-lakkapragada/SeaLion/workflows/Deploy%20to%20PyPI/badge.svg) SeaLion is designed to teach today's aspiring ml-engineers the popular machine learning concepts of today in a way that gives both intuition and ways of application. We do this through concise algorithms that do the job in the least jargon possible and examples to guide you through every step of the way. ## Quick Demo


SeaLion in Action

## General Usage For most classifiers you can just do (we'll use Logistic Regression as an example here) : ```python from sealion.regression import LogisticRegression log_reg = LogisticRegression() ``` to initialize, and then to train : ``` python log_reg.fit(X_train, y_train) ``` and for testing : ```python y_pred = log_reg.predict(X_test) evaluation = log_reg.evaluate(X_test, y_test) ``` For the unsupervised clustering algorithms you may do : ```python from sealion.unsupervised_clustering import KMeans kmeans = KMeans(k = 3) ``` and then to fit and predict : ```python predictions = kmeans.fit_predict(X) ``` Neural networks are a bit more complicated, so you may want to check an example [here.](https://github.com/anish-lakkapragada/SeaLion/blob/main/examples/deep_learning_example.ipynb) The syntax of the APIs was designed to be easy to use and familiar to most other ML libraries. This is to make sure both beginners and experts in the field can comfortably use SeaLion. Of course, none of the source code uses other ML frameworks. ## Testimonials, Stats, and Reddit Posts "Super Expansive Python ML Library" - [@Peter Washington](https://twitter.com/peter\_washing/status/1356766327541616644), Stanford PHD candidate in Bio-Engineering [Analytics Vidhya calls SeaLion's algorithms **beginner-friendly**, **efficient**, and **concise**.](https://www.analyticsvidhya.com/blog/2021/02/6-open-source-data-science-projects-that-provide-an-edge-to-your-portfolio/) Stats : - [**1,700+ downloads**](https://pypistats.org/packages/sealion) - **300+ stars** - **15,000 views on GitHub** - **100+ forks/clones** [r/Python Post](https://www.reddit.com/r/Python/comments/lf59bw/machine_learning_library_by_14year_old_sealion/) [r/learningmachinelearning Post](https://www.reddit.com/r/learnmachinelearning/comments/lfv72l/a_set_of_jupyter_notebooks_to_help_you_understand/) ## Installation The package is available on PyPI. Install like such : ``` shell pip install sealion ``` SeaLion can only support Python 3, so please make sure you are on the newest version. ## General Information SeaLion was built by Anish Lakkapragada, a freshman in high school, starting in Thanksgiving of 2020 and has continued onto early 2021. The library is meant for beginners to use when solving the standard libraries like iris, breast cancer, swiss roll, the moons dataset, MNIST, etc. ## Documentation WE HAVE DOCS NOW! Thanks [@Patrick Huang](http://github.com/phuang1024)! All available here: [docs](https://sealion.readthedocs.io/). Also, use the examples to your advantage! ### Updates for v4.1 and up! First things first - thank you for all of the support. The two reddit posts did much better than I expected (1.6k upvotes, about 200 comments) and I got a lot of feedback and advice. Thank you to anyone who participated in r/Python or r/learnmachinelearning. SeaLion has also taken off with the posts. We currently have had 3 issues (1 closed) and have reached 195 stars and 20 forks. I wasn't expecting this and I am grateful for everyone who has shown their appreciation for this library. Also some issues have popped up. Most of them can be easily solved by just deleting sealion manually (going into the folder where the source is and just deleting it - not pip uninstall) and then reinstalling the usual way, but feel free to put an issue up anytime. In versions 4.1+ we are hoping to polish the library more. Currently 4.1 comes with Bernoulli Naive Bayes and we also have added precision, recall, and the f1 metric in the utils module. We are hoping to include Gaussian Mixture Models and Batch Normalization in the future. Code examples for these new algorithms will be created within a day or two after release. Thank you! ### Updates for v3.0.0! SeaLion v3.0 and up has had a lot of major milestones. The first thing is that all the code examples (in jupyter notebooks) for basically all of the modules in sealion are put into the examples directory. Most of them go over using actual datasets like iris, breast cancer, moons, blobs, MNIST, etc. These were all built using v3.0.8 -hopefully that clears up any confusion. I hope you enjoy them. Perhaps the biggest change in v3.0 is how we have changed the Cython compilation. A quick primer on Cython if you are unfamiliar - you take your python code (in .py files), change it and add some return types and type declarations, put that in a .pyx file, and compile it to a .so file. The .so file is then imported in the python module which you use. The main bug fixed was that the .so file is actually specific to the architecture of the user. I use macOS and compiled all my files in .so, so prior v3.0 I would just give those .so files to anybody else. However other architectures and OSs like Ubuntu would not be able to recognize those files. Instead what we do know is just store the .pyx files (universal for all computers) in the source code, and the first time you import sealion all of those .pyx files will get compiled into .so files (so they will work for whatever you are using.) This means the first import will take about 40 seconds, but after that it will be as quick as any other import. ## Machine Learning Algorithms The machine learning algorithms of SeaLion are listed below. Please note that the stucture of the listing isn't meant to resemble that of SeaLion's APIs and that hyperlinked algorithms are for those that are new/niche. Of course, new algorithms are being made right now. 1. **Deep Neural Networks** - Optimizers - Gradient Descent (and mini-batch gradient descent) - Momentum Optimization w/ Nesterov Accelerated Gradient - Stochastic gradient descent (w/ momentum + nesterov) - AdaGrad - RMSprop - Adam - Nadam - [AdaBelief](https://arxiv.org/abs/2010.07468) - Layers - Flatten (turn 2D+ data to 2D matrices) - Dense (fully-connected layers) - Batch Normalization! - Regularization - Dropout - Activations - ReLU - Tanh - Sigmoid - Softmax - Leaky ReLU - [PReLU](https://arxiv.org/abs/1502.01852) - ELU - SELU - Swish - Loss Functions - MSE (for regression) - CrossEntropy (for classification) - Transfer Learning - Save weights (in a pickle file) - reload them and then enter them into the same neural network - this is so you don't have to start training from scratch 2. **Regression** - Linear Regression (Normal Equation, closed-form) - Ridge Regression (L2 regularization, closed-form solution) - Lasso Regression (L1 regularization) - Elastic-Net Regression - Logistic Regression - Softmax Regression - Exponential Regression - Polynomial Regression 3. **Dimensionality Reduction** - Principal Component Analysis (PCA) - t-distributed Stochastic Neighbor Embedding (tSNE) 4. **Gaussian Mixture Models (GMMs)** - unsupervised clustering with "soft" predictions - anomaly detection - AIC & BIC calculation methods 5. **Unsupervised Clustering** - KMeans (w/ KMeans++) - DBSCAN 6. **Naive Bayes** - Multinomial Naive Bayes - Gaussian Naive Bayes - Bernoulli Naive Bayes 7. **Trees** - Decision Tree (with max\_branches, min\_samples regularization + CART training) 8. **Ensemble Learning** - Random Forests - Ensemble/Voting Classifier 9. **Nearest Neighbors** - k-nearest neighbors 10. **Utils** - one\_hot encoder function (one\_hot()) - plot confusion matrix function (confusion\_matrix()) - revert one hot encoding to 1D Array (revert\_one\_hot()) - revert softmax predictions to 1D Array (revert\_softmax()) ## Algorithms in progress Some of the algorithms we are working on right now. 1. **Barnes Hut t-SNE** (please, please contribute for this one) ## Contributing First, install the required libraries: ```bash pip install -r requirements.txt ``` If you feel you can do something better than how it is right now in SeaLion, please do! Believe me, you will find great joy in simplifying my code (probably using numpy) and speeding it up. The major problem right now is speed, some algorithms like PCA can handle 10000+ data points, whereas tSNE is unscalable with O(n\^2) time complexity. We have solved this problem with Cython + parallel processing (thanks joblib), so algorithms (aside from neural networks) are working well with \<1000 points. Getting to the next level will need some help. Most of the modules I use are numpy, pandas, joblib, and tqdm. I prefer using less dependencies in the code, so please keep it down to a minimum. Other than that, thanks for contributing! ## Acknowledgements Plenty of articles and people helped me a long way. Some of the tougher questions I dealt with were Automatic Differentiation in neural networks, in which this [tutorial](https://www.youtube.com/watch?v=o64FV-ez6Gw) helped me. I also got some help on the `O(n^2)` time complexity problem of the denominator of t-SNE from this [article](https://nlml.github.io/in-raw-numpy/in-raw-numpy-t-sne/) and understood the mathematical derivation for the gradients (original paper didn't go over it) from [here](http://pages.di.unipi.it/errica/assets/files/sne_tsne.pdf). Also I used the PCA method from handsonml so thanks for that too Aurélien Géron. Lastly special thanks to Evan M. Kim and Peter Washington for helping make the normal equation and cauchy distribution in tSNE make sense. Also thanks to [@Kento Nishi](http://github.com/KentoNishi) for helping me understand open-source. ## Feedback, comments, or questions If you have any feedback or something you would like to tell me, please do not hesitate to share! Feel free to comment here on github or reach out to me through ! ©Anish Lakkapragada 2021 %prep %autosetup -n sealion-4.4.5 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-sealion -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Mon May 15 2023 Python_Bot - 4.4.5-1 - Package Spec generated