Why we use python for Data Science ?¶

  1. Python is very simple and easy to learn
  1. Python is open-source, which means it’s free and uses a community-based model for development.
  1. Python is well-supported by a large community: Standard and third parties libraries, and documentation.
  1. Python is the most popular language
  1. Python has in-built mathematical libraries and functions

Notebooks¶

  1. Jupyter notebook

    Option(A): Install Anaconda (recommended):

     - It is a package and enviornment manager
     - Comes with Python and Jupyet Notebook installed
     - Includes many packages like NumPy, Scikit-learn, Scipy, and pandas preinstalled.
     - Instructions available at "Python for Data Analysis" Page 34
    
    

    Option(B): Install Python and Jupyter

     - Download and install Python https://www.python.org
     - Install Jupyter https://jupyter.org/install
  1. Google Colab: It is similar to Jupyter Notebooks except they are freely hosted by Google so you don't have to install anything on your computer to get started

Let's start coding !¶

Python for Data Science¶

  1. Python basics
  2. Control flow
  3. Data structures
  4. Functions
  5. Exception handling

1. Python basic elements:¶

  • Structure
  • Variables
  • Attributes and Functions
  • Binary operation
  • Primitive data types

Indentation, not braces¶

Python uses whitespace (tabs or spaces) to structure code instead of using braces

In [ ]:
for x in array:
    if x < pivot:
        less.append(x) 
        x = 7
    else:
        greater.append(x)

Python statements do not need to be terminated by semicolons. But it can be used to separate multiple statements on a single line

In [ ]:
a = 5; b = 6; c = 7 

Everything is an object¶

  • Every number, string, data structure, function, class, module is an object
  • Each object has an associated type (e.g., string or function) and internal data

Comments¶

Any text preceded by the hash mark (pound sign) # is ignored by the Python interpreter

In [ ]:
results = []
for line in file_handle:
    # keep the empty lines for now
#     if len(line) == 0:
#         continue
    results.append(line.replace('foo', 'bar'))
    
In [ ]:
print("Reached this line") # Simple status report

Assigning variables¶

In [ ]:
a = [1, 2, 3]
b = a 
b.append(4)  
a

Passing variables as arguments¶

When you pass objects as arguments to a function, new local variables are created referencing the original objects without any copying

In [ ]:
def append_element(some_list, element):
    
    some_list.append(element)
    element = 10
  
data = [1, 2, 3]

append_element(data, 4)
data

Dynamic references¶

In contrast with many compiled languages, such as Java and C++, object references in Python have no type associated with them

In [ ]:
a = 5
type(a)
In [ ]:
a = 'foo'
type(a)

Attributes and methods¶

Objects in Python typically have both attributes and methods

In [ ]:
a = 'foo'
 
dir(a)

Imports¶

In Python a module is simply a file with the .py extension containing Python code.

In [ ]:
 # some_module.py
PI = 3.14159
def f(x):
    return x + 2 

def g(a, b):
    return a + b
In [ ]:
import some_module
result = some_module.f(5) 
pi = some_module.PI

Binary operations¶

In [ ]:
a = [1, 2, 3]
b = a
c = list(a)
In [ ]:
a == b
a == c 
In [ ]:
a is b
a is c

Scalar Types (Single value types)¶

Strings¶

  • You can write string literals using either single quotes ' or double quotes "
In [ ]:
a = 'one way of writing a string'
b = "another way"
  • For multiline strings with line breaks, you can use triple quotes
In [ ]:
c = """
   This is a longer string that
   spans multiple lines
   """
  • Python strings are immutable; you cannot modify a string
In [ ]:
a = 'this is a string'
# a[10] = 'f'
a.replace('s', 'f')
a
  • Adding two strings together concatenates them and produces a new string
In [ ]:
a = 'this is the first half '
b = 'and this is the second half'
a + b 
  • String can be formated using format function
In [ ]:
amount = 5
currency = 'Jordinan Dinar'
rate = 0.7

template = f'{amount} {currency} are worth US$ { rate * amount}'
template = f'{amount:.2f} {currency} are worth US$ {rate * amount}'

template

Comparison¶

Comparisons and other conditional expressions evaluate to either True or False. Boolean values are combined with the and and or keywords

In [ ]:
True and True

Type casting¶

The str, bool, int, and float types are also functions that can be used to cast values to those types

In [ ]:
s = '3.14159'
fval = float(s)
print(fval)
type(fval)

None¶

None is the Python null value type

In [ ]:
a = None 
a is None

Assignment # 1¶

  • Create a new notebook
  • Define the following variables and set them with your personal values:
    • first_name
    • last_name
    • university_id
    • expected_graduation_date
    • main_reason_for_joining_the_course
  • Using format function, define "about" variable and assignt to it a small paragraph talking about yourself with the variables defined in the previous step
  • make sure your notebook is runnable with no errors
  • Write your name, id, and assignment number on the top of the notebook using Markdown
  • Send your notebook to "hsoboh@birzeit.edu" before the next lecture
  • Email subject: Assignment_1-{ID}

Assignmrnt 2¶

2. Control flow:¶

  • if, elif, and else
  • loops
  • continue, break, pass

if, elif, and else¶

In [ ]:
if x < 0:
    print('It's negative')
elif x == 0:
    print('Equal to zero')
elif 0 < x < 5:
    print('Positive but smaller than 5')
else:
    print('Positive and larger than or equal to 5')

for loops¶

In [ ]:
sequence = [1, 2, None, 4, None, 5] 
total = 0
for value in sequence:
    if value is not None:
        total += value
total

while loop¶

In [ ]:
sequence = [1, 2, None, 4, None, 5] 
while value in sequence:
     if value is None:
        continue
     total += value
total

range¶

The range function returns an iterator that yields a sequence of evenly spaced integers:

In [ ]:
for x in range(0, 10, 2):
    print(x) 
    

Ternary expressions¶

In [ ]:
x = -5
result = None
if x > 0:
    result = "Non-negative"
else:
    result = "Negative"

result = "Non-Negative" if x > 0 else "Negative"
    

3. Data Structures¶

  • Tuple
  • List
  • Set
  • Comprehension

Tuple¶

A tuple is a fixed-length, immutable sequence of Python objects.

In [1]:
# Create tuple
tup = (4, 5, 6)
tup
Out[1]:
(4, 5, 6)
In [2]:
# Access elements by index 
tup[0]
Out[2]:
4
In [3]:
# Can we change tuple values ?
tup[0] = 3
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 2
      1 # Can we change tuple values ?
----> 2 tup[0] = 3

TypeError: 'tuple' object does not support item assignment
In [4]:
# Can we do the following?
tup = ('Ali', [80, 85], True)
tup[1].append(90)
tup
Out[4]:
('Ali', [80, 85, 90], True)
In [7]:
# Unpack tuple
name, scores, is_student = tup
is_student
Out[7]:
True
In [8]:
# Unpack using * 
scores = (80, 87, 83, 77, 76, 71, 69, 69, 69, 60)
first_grade, second_grade, *rest = scores
rest
Out[8]:
[83, 77, 76, 71, 69, 69, 69, 60]
In [9]:
# Discard rest
first_grade, second_grade, *_ = scores
In [10]:
# Count elements
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)
Out[10]:
4

List¶

Lists are variable-length and mutable

In [11]:
# Create list
l = ['foo', 'bar', 'baz']
l
Out[11]:
['foo', 'bar', 'baz']
In [12]:
# Access elements by index
l[2]
Out[12]:
'baz'
In [13]:
# Modify elements
l[1] = "fee"
l
Out[13]:
['foo', 'fee', 'baz']
In [14]:
# Appending elements
l.append('dwarf')
l
Out[14]:
['foo', 'fee', 'baz', 'dwarf']
In [15]:
# Insert at specific index
l.insert(2, 'red')
l
Out[15]:
['foo', 'fee', 'red', 'baz', 'dwarf']
In [16]:
# remove element
l.remove("fee")
l
Out[16]:
['foo', 'red', 'baz', 'dwarf']
In [ ]:
# remove element from specifc index
print(l.pop(2))
l
In [17]:
# Search for element
"fee" in l
Out[17]:
False
In [22]:
# Concatenate lists/tuples
l1 = [4, None, 'foo']
l2 = [7, 8, (2, 3)]
l1 + l2
Out[22]:
[4, None, 'foo', 7, 8, (2, 3)]
In [23]:
l1.extend(l2)
l1
Out[23]:
[4, None, 'foo', 7, 8, (2, 3)]
In [24]:
# Sorting list
a  = [7, 2, 5, 1, 3]
a.sort()
a
Out[24]:
[1, 2, 3, 5, 7]
In [25]:
# Can we sort tuples?
a = (7, 2, 5, 1, 3)
a.sort()
a
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[25], line 3
      1 # Can we sort tuples?
      2 a = (7, 2, 5, 1, 3)
----> 3 a.sort()
      4 a

AttributeError: 'tuple' object has no attribute 'sort'
In [26]:
# Slicing using start:stop
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1:5]
Out[26]:
[2, 3, 7, 5]
In [27]:
seq[:5]
Out[27]:
[7, 2, 3, 7, 5]
In [28]:
seq[-2:] 
Out[28]:
[0, 1]
In [29]:
seq[-6: -2]
Out[29]:
[3, 7, 5, 6]
In [30]:
# Use steps
seq[::2]
Out[30]:
[7, 3, 5, 0]

Iterating over sequences¶

In [ ]:
seq = ['d', 'a', 't', 'a', ' ', 's', 'c', 'i', 'e', 'n', 'c', 'e']

How to iterate over the sequnce and keep track of the index and the item?

In [31]:
# loop 
for i in range(len(seq)):
    print(f'index = {i}, value = {seq[i]}')
index = 0, value = 7
index = 1, value = 2
index = 2, value = 3
index = 3, value = 7
index = 4, value = 5
index = 5, value = 6
index = 6, value = 0
index = 7, value = 1
In [ ]:
for index, value in enumerate(seq):
    print(f"value {value} at location {index}")

sorted function¶

In [32]:
seq = ('d', 'a', 't', 'a', ' ', 's', 'c', 'i', 'e', 'n', 'c', 'e')
sorted(seq)
Out[32]:
[' ', 'a', 'a', 'c', 'c', 'd', 'e', 'e', 'i', 'n', 's', 't']

zip function¶

zip “pairs” up the elements of a number of sequences

In [33]:
names = ['Ali', 'Lina', 'Moneer', 'Haia', 'Haneen', 'Saad']
is_male = [True, False, True, False, False, True]
age = [19, 18, 18, 17, 19, 20, 21]

students = zip(names, is_male, age)
students
Out[33]:
<zip at 0x7fe6027ff740>
In [34]:
list(students)
Out[34]:
[('Ali', True, 19),
 ('Lina', False, 18),
 ('Moneer', True, 18),
 ('Haia', False, 17),
 ('Haneen', False, 19),
 ('Saad', True, 20)]

dict¶

flexibly sized collection of key-value pairs, where key and value are Python objects

In [35]:
# Create dict
d1 = {'a' : 'some value', 'b' : [1, 2, 3, 4]}
d1
Out[35]:
{'a': 'some value', 'b': [1, 2, 3, 4]}
In [37]:
# access elements by key
d1['c'] 
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[37], line 2
      1 # access elements by key
----> 2 d1['c']

KeyError: 'c'
In [39]:
d1.get('c', 20)
# default value
Out[39]:
20
In [40]:
# change values
d1['a'] = "new value"
d1
Out[40]:
{'a': 'new value', 'b': [1, 2, 3, 4]}
In [41]:
# Check if element in a dict
'b' in d1
Out[41]:
True
In [42]:
# Remove from dict
d1 = {'a' : 'some value', 'b' : [1, 2, 3, 4]}
d1.pop('a')
d1
Out[42]:
{'b': [1, 2, 3, 4]}

set¶

A set is an unordered collection of unique elements.

In [43]:
set([2, 2, 2, 1, 3, 3])
Out[43]:
{1, 2, 3}
In [44]:
# Union
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}
a.union(b)
a | b
Out[44]:
{1, 2, 3, 4, 5, 6, 7, 8}
In [45]:
# Intersection
a.intersection(b)
a & b
Out[45]:
{3, 4, 5}
In [ ]:
# Check other functions in book page 125

List comprehension¶

form a new list by filtering the elements of a collection,

In [48]:
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']

[s.upper() for s in strings]
{s:len(s) for s in strings}
# How to convert all words to upper case
# How to create dict of (words,len) pairs
Out[48]:
{'a': 1, 'as': 2, 'bat': 3, 'car': 3, 'dove': 4, 'python': 6}

4.Functions¶

Define function¶

In [49]:
# Function structure
def my_function(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)
    
In [52]:
# Call function
my_function(5, 6, 0.7)
my_function(x=5, y=6, z=0.7)
my_function(y=6, x=5) 
Out[52]:
16.5

Return multiple values¶

In [55]:
def f():
    a=5
    b=6
    c=7
    return a, b, c
v1, v2, v3 = f()

map function¶

In [57]:
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']
list(map(len, strings))
Out[57]:
[1, 2, 3, 3, 4, 6]

Lambda functions¶

anonymous functions

In [58]:
def short_function(x):
    return x * 2


equiv_anon = lambda x: x * 2
equiv_anon(2)
Out[58]:
4

5. Exception handling¶

In [ ]:
def convert_to_float(v):
    return float(v)
convert_to_float('a')
In [ ]:
#try catch
def convert_to_float(v):
    try:
        return float(v)
    except:
        return v
convert_to_float('a')

Assignment #2¶

  • Write a function that checks the validity of password and its reverse based on the following rules:
    • At least 1 letter between [a-z] and 1 letter between [A-Z].
    • At least 1 number between [0-9].
    • At least 1 character from [$#@].
    • Minimum length 6 characters.
    • Maximum length 16 characters.
  • You should use slice operation to generate the reverse of the function.
  • The function takes as input one parameter: password
  • The function return two boolean values that represent the validaty of the password and its reverse
  • Make sure your notebook is runnable with no errors
  • Write your name, id, and assignment number on the top of the notebook using Markdown
  • Share your notebook with "hsoboh@birzeit.edu" before the next lecture
  • Email subject: Assignment_2-{ID}