Information on Python
Python
The main python website is:
python.
People install python two different ways.
There is the standard install into your system and the anaconda/miniconda install.
For the standard install, see downloads at python.
Python is supported by many packages the provide crucial utility.
Just about about every time you go into python you will use stuff like:
In [1]: import numpy as np #efficient arrays (vectors, matrices ...)
In [2]: import pandas as pd #data analytic tools
Conda is an open source package management system and environment management system
that runs on Windows, macOS and Linux.
See conda.
Anaconda is a python distribution that uses conda.
See anaconda.
Install page: Anaconda Installers
There are a couple of nice things about anaconda:
(i) It bundles up a lot of tools and python packages you will need.
(ii) Using conda, you can maintain and switch between different python environments .
Each environment can be built on a different python version and include different python packages.
The downside to anaconda is that can take up a fair amount of disk space.
miniconda
allows you to install an minimal python/conda
setup up which you
can then add to as needed.
See miniconda (at conda website).
With the standard install, people usually use pip (pip3) as the package manager.
Most people start with anaconda.
Python Tools
Python tools you may want to have are:
(i) ipython: an enhanced python shell.
(ii) jupyter notebook: A note book where you can mix text, python code, python output, latex ...
(iii) a development environment such as spyder.
Anaconda will get you Jupyter lab which has all these tools and more.
ipython
has a lot of enhancements over the basic python
shell.
The jupyter notebook
has become a standard way to communicate results in data science.
That being said:
Chapter 1 of Python Data Science Handbook, by VanderPlas
"There are many options for development environments for Python,
and I'm often asked which one I use in my own work.
My answer sometimes surprises people: my preferred environment is
IPython plus a text editor.
Another thing to be aware of is google colab:
Welcome To Colaboratory
This is a remarkable free online notebook type environment with all the key Machine Learning tools available.
Be sure to check out the official help pages for each package (e.g. numpy).
The help tab in Jupyter notebook is also great.
I like books and find the following very useful:
Python Distilled (David Beazley)
Introducing Python (Lubanovic).
Python Data Science Handbook (VanderPlas). (web version)
Python for Data Analysis (McKinney).
Machine Learning with Python Cookbook (Albon).
VanderPlas also has a quick python course:
A Whirlwind Tour of Python
Some python links:
A Whirlwind Tour of Python, by Jake VanderPlas
Matloff's tutorial on Python, for those with a strong programming background.
python
anaconda
anaconda cheat-sheet
conda
conda-cheatsheet
Getting started with conda
conda/miniconda
ipython
jupyter notebook
Nice short python intro
This page has instructions for installing anaconda on ubuntu, and after the simple install instructions
there is a nice simple intro to conda:
install anaconda on ubuntu, with conda intro
This package fits many statistical models giving the standard inferential ouput:
statsmodels
basic python packages:
scipy (scientific computing)
numpy (efficient arrays, e.g. matrices and vectors)
pandas (data structures for working with data, e.g Data Frames)
Nice pandas reference
matplotlib (graphics)
scikit-learn (machine learning)
pip
Note that the standard package manager for python is pip (as opposed to using conda),
see for example the python.org documentation here.
One major advantage of pip is the ease of its command-line interface,
which makes installing Python software packages as easy as issuing one command:
$ pip3 install some-package-name
Users can also easily remove the package:
$ pip3 uninstall some-package-name
where the 3 in pip3 means you want to use python3.
Data Science in Python
Data Science in Python Cheatsheet
Hello World, Data Science in Python
simple-for-ipython.py, a very simple little python script with some of the basics
Hello world regression in python (.html)
Hello world regression in python, Jupyter note book (.ipynb)
Hello world regression in python, pdf (.pdf)
Hello world regression in python, html, short version
OOS Loop in Python
Here is a simple example of a loop in python to estimate the out-of-same root mean square error
for linear regression and the susedcars.csv data set using just x=(mileage,year) for y=price:
do-cars-oos.py.
What is the oos loop trying to do?
Out of sample Loss..
Python and R
Note that you can call python from R:
R studio notes on package reticulate