Information on Python
Python
The main python website is:
python.
For the record, here is the basic python tutorial from python.org (might be too dense to start with):
  The Python Tutorial
People install python two different ways.
There is the standard install into your system and the anaconda/miniconda install.
For the standard install, see downloads at python.
Most people start with anaconda.
Python is supported by many packages the provide crucial utility.
Just about about every time you go into python you will use stuff like:
In [1]: import numpy as np #efficient arrays (vectors, matrices ...)
In [2]: import pandas as pd #data analytic tools
Conda is an open source package management system and environment management system
that runs on Windows, macOS and Linux.
See conda
Anaconda is a python distribution that uses conda.
See anaconda.
Install page: Anaconda Installers
More info on anaconda installation: Installing Anaconda Distribution
Anaconda quick start quide: Getting started
There are a couple of nice things about anaconda:
(i) It bundles up a lot of tools and python packages you will need.
(ii) Using conda, you can maintain and switch between different python environments .
Each environment can be built on a different python version and include different python packages.
The downside to anaconda is that can take up a fair amount of disk space.
miniconda
allows you to install an minimal python/conda
setup up which you
can then add to as needed.
See miniconda (at conda website).
There is also minforge: conda-forge/miniforge
miniforge from chatGPT:
Developer: Community-driven (conda-forge community).
Python Versions: Available for multiple Python versions.
Default Channels:
Uses the conda-forge channel by default.
All packages are built from open-source software, avoiding proprietary packages.
Purpose:
Designed to provide a minimal Conda installation with a focus on open-source.
Ideal for users who prioritize transparency and community-maintained packages.
Platform Availability:
In addition to standard platforms, Miniforge offers versions optimized for Apple Silicon (arm64) and other architectures.
With the standard install, people usually use pip (pip3) as the package manager.
I think you can also use pip with anaconda.
Python Tools
Python tools you may want to have are:
(i) ipython: an enhanced python shell.
(ii) jupyter notebook: A note book where you can mix text, python code, python output, latex ...
(iii) a development environment such as spyder.
Anaconda will get you Jupyter lab which has all these tools and more.
ipython
has a lot of enhancements over the basic python
shell.
The jupyter notebook
has become a standard way to communicate results in data science.
That being said:
Chapter 1 of Python Data Science Handbook, by VanderPlas
"There are many options for development environments for Python,
and I'm often asked which one I use in my own work.
My answer sometimes surprises people: my preferred environment is
IPython plus a text editor.
Another thing to be aware of is google colab:
Welcome To Colaboratory
This is a remarkable free online notebook type environment with all the key Machine Learning tools available.
Be sure to check out the official help pages for each package (e.g. numpy).
The help tab in Jupyter notebook is also great.
Books
Rob's books: books
Some python links:
A Whirlwind Tour of Python, by Jake VanderPlas
Matloff's tutorial on Python, for those with a strong programming background.
python
conda
conda-cheatsheet
Getting started with conda
conda/miniconda
ipython
jupyter notebook
Nice short python intro
This package fits many statistical models giving the standard inferential ouput:
statsmodels
basic python packages:
scipy (scientific computing)
numpy (efficient arrays, e.g. matrices and vectors)
pandas (data structures for working with data, e.g Data Frames)
Nice pandas reference
matplotlib (graphics)
scikit-learn (machine learning)
Note
The sciki-learn webpage is absolutely wonderful!!!
The two basic software platforms for deep learning are tensorflow/keras and pytorch (torch in R).
Keras provides a nice interface to tensorflow.
The full tensorflow is like the SAS of deep-learniing.
pip
Note that the standard package manager for python is pip (as opposed to using conda),
see for example the python.org documentation here.
A good pip reference for all the basic commands: Commands
more pip: A Beginner's Guide to pip
pip cheat sheet: pip cheat sheet
One major advantage of pip is the ease of its command-line interface,
which makes installing Python software packages as easy as issuing one command:
$ pip3 install some-package-name
Users can also easily remove the package:
$ pip3 uninstall some-package-name
where the 3 in pip3 means you want to use python3.
Data Science in Python
Data Science in Python Cheatsheet
Hello World, Data Science in Python
simple-for-ipython.py, a very simple little python script with some of the basics
Hello world regression in python (.html)
   Video of simple plots and simple linear regression in with sklearn
   Video of Multiple Linear Regression and more on sklearn
   Video of multiple regression using statsmodels and dummies for the categorical color
Hello world regression in python, Jupyter note book (.ipynb)
Hello world regression in python, html, short version
OOS Loop in Python
Here is a simple example of a loop in python to estimate the out-of-same root mean square error
for linear regression and the susedcars.csv data set using just x=(mileage,year) for y=price:
do-cars-oos.py.
What is the oos loop trying to do?
Out of sample Loss..
Python and R
Note that you can call python from R:
R studio notes on package reticulate