Python Cheat Sheet

This is my first time learning Python because I mostly use R in my projects and for all I know, whatever can be done in Python can also be done in R. But as an aspiring data scientist, I shouldn’t just stick with R all the time. I enrolled in this course by Microsoft because it is data-science-centered. Of course this is for everyone interested in Python and not just limited for data scientists. Join me as I’m gonna write down my takeaways here.

Version 3.x – https://www.python.org/downloads/

Python Script – Text Files .py

Basics
print(3 + 4) # add
print(4 – 3) # subtract
print(4 * 3) # multiply
print(4 / 2) # divide
print(4 ** 2) # exponent, 4²
print(4 % 2) # modulo

Variables
height = 1.79
weight = 68.7
bmi = weight / height ** 2

Types
type(bmi) # float
type(5) # int
type(“body mass index”) # str
type(‘this works too’) # str
type(True) # bool
print(2 + 3) # 5
print(‘ab’ + ‘cd’) # ‘abcd’
“I said ” + (“Hey ” * 2) + “Hey!” # ‘I said Hey Hey Hey!’
str(5) # convert 5 to a string “5”
int(True) # convert True to 1
bool(“True”) # convert “True” to True
float(1) # convert 1 to t1.0

Lists
fam = [“liz”, 1.73, “emma”, 1.68, “mom”, 1.71, “dad”, 1.89, [a,b], [c,d]] # can contain different types, even lists too
type(fam) # list
fam[3] # 1.68, zero-based indexing
fam[-1] # [c,d]
fam[-3] # 1.89
fam[3:5] # [1.68, “mom”] [start:end] [inclusive:exclusive]
fam[:4] # 0 to 3 [“liz”, 1.73, “emma”, 1.68]
fam[5:] # 5 to last [1.71, “dad”, 1.89, [a,b], [c,d]]
fam[0:2] = [“lisa”, 1.74] # fam = [“lisa”, 1.74, “emma”, 1.68, “mom”, 1.71, “dad”, 1.89, [a,b], [c,d]]
fam + [“me”, 1.79] # [“lisa”, 1.74, “emma”, 1.68, “mom”, 1.71, “dad”, 1.89, [a,b], [c,d], “me”, 1.79]
del(fam[2]) # [“lisa”, 1.74, 1.68, “mom”, 1.71, “dad”, 1.89, [a,b], [c,d], “me”, 1.79]
x = [“a”, “b”, “c”]
y = x
y[1] = “z” # x[1] is also z because you copied the reference to the list, not the actual values themselves
y = list(x) # or y = x[:] to select all elements
fam.index(“mom”) # finds “mom” and returns its index: 4
fam.count(1.74) # counts the number of times 1.74 occurs in the list; returns 1
first = [11.25, 18.0, 20.0]
second = [10.75, 9.50]
full = first + second # paste together
full_sorted = sorted(full, reverse=True) # sort in descending order

Functions
max(fam) # maximum value in the list
round(1.68, 1) # round 1.68 to 1 decimal place, 1.7
round(1.68) # round to nearest whole number
help(round) # opens documentation of round function
len(fam) # length of list

Methods
Methods are functions but they differ from function because they call functions on objects.
sister = ‘liz’
sister.capitalize() # ‘Liz’
sister.replace(“z”, “sa”) # ‘lisa’
sister.index(“z”) # 2
fam = [“liz”, 1.73, “emma”, 1.68, “mom”, 1.71, “dad”, 1.89] 
fam.index(“mom”) # 4
fam.append(“me”) # fam = [“liz”, 1.73, “emma”, 1.68, “mom”, 1.71, “dad”, 1.89, “me”]  fam automatically updated even without re-assigning to fam
sister.upper() # ‘LIZ’
sister.count(“i”) # 1
fam.reverse() # fam = [“me”, 1.89, “dad”, 1.71, “mom”, 1.68, “emma”, 1.73, “liz”] fam automatically updated even without re-assigning to fam

Numpy
Numpy (Numeric Python) efficiently works with arrays. Once installed…
import numpy as np # personal preference for calling the numpy package; can be done without the as np but the whole numpy word should be used when calling a numpy function like array
np.array([1, 2, 3])
a = [1, 2, 3]
b = [4, 5, 6]
np_a = np.array(a)
np_b = np.array(b)
np_a / np_b ** 2 
# can perform element-wise operations
np.array([1.0, “is”, True]) # will all turn to string because because Numpy arrays contain only one type
python_list = [1, 2, 3]
python_list + python_list # [1, 2, 3, 1, 2, 3]
numpy_array = np.array([1, 2, 3])
numpy_array + numpy_array # array([2, 4, 6])
a[1] # 2
a > 1 # array([False, True, True], dtype=bool)
a[a > 1] # array([2, 3])
np_2d = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]) # 2D array
np_2d.shape # returns the dimension of the array; (2, 3) since 2 rows and 3 columns
np_2d[0] # array([1, 2, 3])
np_2d[0][1] # 2
np_2d[0, 1] # 2
np_2d[:, 1:3] # array([[1, 2, 3, 4], [6, 7, 8]])
np_2d[1, :] # array([5, 6, 7, 8])
np_2d_another = np.array([[1, 1, 1, 1], [1, 1, 1, 1]])
np_2d + np_2d_another
# array([[2, 3, 4, 5], [6, 7,  8, 9]])
np.mean(np_2d) # mean
np.median(np_2d) # median
np.corrcoef(np_2d, np_2d_another) # correlation
np.std(np_2d) # standard deviation
np.sum(np_a) # sum, faster
np.sort(np_a) # sort, faster
height = np.round(np.random.normal(1.75, 0.20, 5000), 2) # 1.75 distribution mean, 0.20 distribution standard deviation, 5000 number of samples
weight = np.round(np.random.normal(60.32, 15, 5000), 2)
np_city = np.column_stack(height, weight) # combine height and weight by column
gk_heights = np_heights[np_positions == ‘GK’] # use other array’s index

Matplotlib
Package usually used for data visualization
import matplotlib.pyplot as plt # import matplotlib with plt as alias
year = [1950, 1970, 1990, 2010]
pop = [2.519, 3.692, 5.263, 6.972]
plt.plot(year, pop) # (horizontal, vertical) line plot
pop = [1.0, 1.262, 1.650] + pop # include these values too
year = [1800, 1850, 1900] + year # include the 3 years
plt.fill_between(year, population, 0, color=’green’) # fill with color green
plt.xlabel(‘Year’) # x axis label
plt.ylabel(‘Population’) # y axis label
plt.title(‘World Population Projections’) # title label
plt.yticks([0, 2, 4, 6, 8, 10]) # all the ticks you want to display in the y axis
plt.yticks([0, 2, 4, 6, 8, 10], [‘0’, ‘2B’, ‘4B’, ‘6B’, ‘8B’, ’10B’]) # 2nd argument are the labels
plt.show() # only then the plot will build
plt.scatter(year, pop) # scatter plot
plt.xscale(‘log’)
# put the x axis on a logarithmic scale
help(plt.hist) # help of function hist in module matplotlib.pyplot
plt.hist(pop, bins = 3) # histogram of pop with 3 bins
plt.clf() # clean up plot
plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.8) # another scatter plot example

Boolean Logic and Control Flow
x = 12
x > 5 and x < 15 # True
y = 5
y <= 7 or y > 13 # True
z = 4
z % 2 == 0 # z modulo 2 is 0 or z is divisible by 2

Pandas
Unike Numpy, Pandas can handle an array with more than one type; for data frames.
import pandas as pd
brics = pd.read_csv(“brics.csv”) # load csv file
brics = pd.read_csv(“brics.csv”, index_col=0) # to indicate that there are row indexes
brics[“country”] # returns the column country
brics.country # returns the column country too
brics[“on_earth”] = [True, True, True, True, True] # adding a column
brics[“density”] = brics[“population”] / brics[“area”] * 10000000
brics.loc[“BR”] # row access by index
brics.loc[“CH”, “capital”] # element access; row, column
brics[“capital”].loc[“CH”] # element access; column, row
brics.loc[“CH”][“capital”] # element access; row, column
brics.loc[“BR”] # returns series
brics.loc[[“BR”]] # returns dataframe

Deep Learning Overview

I’ve watched a few videos of this course in Udacity to give myself an insight of what Deep Learning is. I do Machine Learning but I don’t know much about Deep Learning yet. What I’m going to do here is to discuss the important and not-so-technical points I got from the videos.

Deep Learning is a branch of Machine Learning.

It has emerged as a central tool to solve perception problems including recognizing images, what people are saying, helping robots interact with the world, computer vision, and speech recognition.

Deep Learning is a much better tool to solve problems like discovering new medicines, understanding natural language, and understanding documents like ranking them for search.

The course is divided to four parts

  1. Logistic Classification; Stochastic Optimization; Data and Parameter Tuning
  2. Deep Networks; Regularization
  3. Convolutional Networks
  4. Embeddings; Recurrent Models

And it uses Python! Unfortunately as of this writing, I don’t know how to write code in Python (but I do R) so I’ll skip the exercises and just watch the video lectures for now.

Neural Networks became important because of Speech Recognition, Computer Vision, and Machine Translation. This further led to Deep Learning.

Classification is the central building block of machine learning. This is where Regression, Ranking, Reinforcement Learning, and Detection build up from. Example of Classification is detecting whether pedestrians or non-pedestrians are present in an image. A Logistic Classifier is a linear classifier. It takes an input, in our example it can be the pixels of an image and applies a linear function to them to generate its predictions. A simple logistic classifier can be then turned into a Deep Network.

In neural networks, increasing the size of hidden layers in the middle is not efficient because it gets hard to train. Adding more layers and making the model deeper (rather than wider) leads to parameter efficiency. Another one is that natural phenomena tend to have a hierarchical structure which deep models naturally capture. Deep models are applicable if the data is large enough to train.

I think this will suffice for an overview. More helpful videos are present in Udacity’s Deep Learning course but I may have to review Python first so I can add more. Thanks for dropping by!