02. Virtual Environment for Python

Execute a python file in a virtual environment

First, activate the virtual environment.
The virtual environment name shows up in the left side of comand prompt.
Then, make sure, the folder in side command prompt contains the python file.
for mac, go up one level, using $ .. note: make sure there is a space between $ and ..
for mac, go down one level, using $ childfolder note: The next level folder is childfolder.
Finally, execute like python hello.py.
With this way, the involved packages will be retrieved from the virtual environment.

Before Preparing a Virtual Environment

Python package must be installed.
For later python installation, package pip and virtualenv are included.
pip is for package management.
virtualenv is for virtual environment. There are many ways for this. This one is for both python version 2 and 3.
Package virtualenv provides virtualenv command
Create a folder, any name will be OK. I call it my_ves.

The following 5 steps from creating a virtual environment...

step 1. Create

In the command window, navigate to the folder, ves.
Under folder ves, a new folder will be created for the virtual environment.
ves$virtualenv virtual4camelcase
run it
virtualenv is a command.
virtual4camelcase is the name of virtual environment.
virtual4camelcase is also the folder name.
Open the finder window, click menu go, enter ~, then select ves.
select the folder view, you can see the folder structure inside folder virtual4camelcase.
Under it, there is a folder, lib. Under lib, you can foler python3.7.
Command Virtualenv uses Python3 as the default.
If you want it for pyhton2, mac's defualt, you need extra flag like -p python2.
You can verify the folder under the virtual envirenment folder, you can see lib/python2.7.
click its subfolder, bin, you can see a file activate.

step 2. Activate

....ves$virtualenv . virtual4camelcase/bin/activate
The dot means to pick up all the related script from the environment.
run it.
In the command prompt, changed to (virtual4camelcase) (base) from (base).

step 3. install a package into it

pip install camelcase
Verify by $pip show camelcase
You can see ...../ves/virtual4camelcase/lib/python3.7/site-packages
$pip list There are 4 packages including camelcase.
In the finder window, click folder virtual4camelcase, you'll see subfolder lib.
Under it, you can see python3.7/site-packages/camelcase.

step 4. Run python file under it

                $pwd /Users/peterkao/desktop/code_p2/python
                in  the folder,  there is a file  32_pip_1.py
                   
                   import camelcase                   #module name
                   c = camelcase.Camelcase()          # class construster
                   txt = “hello world”
                   print(c.hump(txt)) 
                   
                $python 32_pip_1.py                   # Hello World

step 4x. sharing a package list.

When all the needed packages are installed and tested, you can export the package list.
$pip freeze is to present the list on the command window.
$pip freeze > requirements.txt is to output it to a file
then, export the file to your colleage, who can create his/her virtual environments
finally run $pip install -r requirements.txt to duplicate them.
When testing, you can do a second simulation. In your running environemnt of your device, pretends that it as your colleage's.

step 5. Deactivate

(virtual4camecase)(base) ~peterkao/ves$ deactivate
run it. the commmand prompt is change to (base) ~peterkao/ves$
Verify by enter$pip show camelcase. The result is not found.
$pip list It is is long list, but no camelcase.
When you activate the virtual environment, script deactivate will be dynamically created.

03. My Python Language Study

Study Resource

My main study resource is from W3C Python Tutorial.

Grouping study groups

There are more than 30 topics.
I group them as below:
- language construct, syntax, statements
- data types, dynamic
- collections
- functions,Python print function, Python Lambda
- Classes/Objects, inheritance
- Python built-in
- misc

Python language

        1. A interpreter is used for python code.
            - To begin with, parse code for the first statement.
            - translate the statement into the corresponding machine code
            - execute it
            - then, process the second statement...

            The following example has three statements as below:
                
                a = 12
                s = "hello"
                print("hello") 
                  
                There are three statements. All start from position 1.
                They are processed by the interpreter one by one in sequence. 
        
            The following example has if statement as below:
                
                if 200 > 30:
                 print("under if")
                 print("greater")
                else:
                     print("under else")
                     print("less")
                print("This is another statement.")
                  
                
                Any size identations indicate the ends of statements.
         
        2. Within a block code, like if-then-else, for loop, while loop, try-catch,
            there are statements to alter the execution sequence- like break, continue, pass.
               break statement: It is used in for-loop.
                  
                  During a for-loop is executed, a break statement is encountered, 
                  the rest of for-loop code will be skipped.
                   

                continue statement: It is also used in for-loop.
                   When a continue statement is encountered,
                   the current iteration will be skipped.

                pass statement: 
                   - no contents now
                   - a placeholder for later use
                   - avoid error

                try-catch:
                   - Example: an error occurs for reading a file.
                   - use catch code block handle the situation, like output this reason.
                   - The code will not crash.

        3. variables, types, classes/objects, collections
            3.1  variables
                 # The following code demonstrates variables, types,
                 #       and its dynamics features.
                 # comment: Unlike C, Java..., you do not declare a type for a vaiable.
                 x = 5
                 print("x = {}, its type = {}", x, type(x))     # 5, int
                 x = "Peter"
                 print("x = {}, its type = {}", x, type(x))     # Peter, str
                 x = 5.0
                 print("x = {}, its type = {}", x, type(x))     # 5.0, float
                 x = True
                 print("x = {}, its type = {}", x, type(x))     # True, bool

            3.2  classes and Objects
                3.1.1  Simple class example
                        
                            class MyClass:
                                x = 5
                            p1 = MyClass()
                            print(p1.x)         # 5
                            p1.y = "hello"      the property is dynamically created.
                            print(p1.y)         # hello
                        
                        notes:
                            OO supports and use inheritance, this form is for demo only.
                3.1.2   Practical example 
                        
                            class Person:
                                def __init__(self, name, age):    #like OO class constructor
                                    self.name = name              #define property
                                    self.age = age                # self is like this in OO.

                                def myfunc(self):                 #define method 
                                    print("Hello my name is " + self.name)

                            print("type of Person = {}", type(Person))    #class type
                                                                          class definition
                            p1 = Person("John", 37)    

                            print("type of p1 = {}", type(p1))    # class '__main__.Person'
                                                                          an object content
                            x = "John"
                            print("type of x = {}", type(x))      # class 'str'   built-in type   
                            print()
                         
                        
                3.1.3   Using classes/objects
                        
                            *in OO world,
                                 - create a class as template
                                 - instantiate many objects base on the class
                                 - OO supports inheritance
                            * two examples in Django, a python web framework package
                                 - example: database model
                                    
                                    from django.db import models
                                    class Post(models.Model):
                                        text = models.TextField()
                                    
                                    - django: package
                                    - db: module
                                    - models: parent class
                                    - Post: child class
                                    - text: field name
                                  - exampe:  for django web server views 
                                    
                                    from django.views.generic import ListView, DetailView 
                                    from .models import Post
                                        class BlogListView(ListView):
                                            ...
                                        class BlogDetailView(DetailView): 
                                            ...
                                    
                                    - class ListView and DetailView extend from generic.
                                    - They take care of different heavy-duty web-server tasks.
                                  
                                  - notes: Django uses its own easy way, not using __init__ method.
                        

                3.1.4   collections
                    - list
                      Code snippets are as belwo: 
                    
                        mylist = ["apple", "banana", "cherry"]     #create a list
                        for x in mylist:
                            print(x)
                        
                        import numpy as np                 #numpy package is needed
                        arr = np.array(mylist)             #create a numpy ndarray from a list          
                        for x in arr:
                            print(x)
                        
                        print("mylist[1] = {}", arr[1])    #banana
                        
                        print("arr[1] = {}", arr[1])       #banana
                        # note: Numpy array object is 50x faster than a list object.
                        #       Because its data stored in contiguous memory locations.
                    
                    
                        notes:
                          - A list uses [] to contain its items.
                          - The above demo is to create a list.
                          - It also demonstrates to create a numpy array with a list.
                    
                    
                    - tuple
                        In W3C python tutorial, Python MSQL, MySql Select,
                        the demo is to get all rows for the database.
                        the first row is like below:
                              (1, 'John', 'Highway 21') 
                          
                        notes:
                           - That is a tuple.
                           - Using () to contain its items.
                           - It contains the values of its items.
                           - It is for a person's address data.
                        

                    - dictionary
                        
                            thisdict = {
                                "brand": "Ford",
                                "model": "Mustang",
                                "year": 1964
                              }
                        
                        - The above code is for a python dictionary object.
                        - There are many key-value pairs.
                        - Using {} to contain its items.
                        - python dictionary
                            - It can be nested, expanding vertically. 
                            - Its value can be a list, expanding horizonally.
                        - A python dictionary can be from NonSql database like mongo.
                           - in json format
                           - JSON is a lightweight data-interchange format.
                               - It is used in js and python.
                               - The JSON format is text only.
                           - When python is used to interface with mongodb, 
                               - for adding, just insert one in python dictionary
                               - for retrieveing, a python dictionary object is returned
                               - No need stringify or parsing a text.
                        - note: The same code of a python dictionary object in javascript is called js object.
              
                    3.1.5  Iterable
                        -  Lists, tuples, dictionaries, and sets are all iterable objects.
                        -  They are iterable containers which you can get an iterator from.
                        -  An iterator is an object that can be iterated upon,
                        -                  meaning that you can traverse through all the values.
                        -  
                           Technically, in Python, an iterator is an object 
                                   which implements the iterator protocol, 
                                   which consist of the methods __iter__() and __next__().
                           

        4. functions, lambda
            The purpose to use a function is for code partition or code share.

            4.1 a simple function
            
                def my_function(x):
                    return 5 * x

                print(my_function(3))           # 15
            
            4.2 a lambda function
            
                x = lambda a : a + 10            
                print(x(5))                     # 15
            
                descriptions:
                4.2.1. syntax:     lambda arguments : expression
                4.2.2. A lambda function is a small anonymous function.
                        - lambda is a key word
                        - a is an argument
                        - expression is the function content.
                            
                            - if-then-else, looping is not allowed.
                            
                4.2.3. The power of lambda is better shown when you use them 
                           as an anonymous function inside another function.
                   
                    In W3C python tutiorial, python reference, python built-in functions,
                    map() functionmap() function    -------------------
                    
                    def myfunc(a):
                        return len(a)
                    
                    x = map(myfunc, ('apple', 'banana', 'cherry'))
                    
                    print(x)            # map object
                    
                    #convert the map into a list, for readability:
                    print(list(x))       # [5, 6, 6]
                    
                          - map function takes two arguments
                          - the second is a iterable object, a tuple is heres.
                          - The first is a inner function, to proccess each item.
                          - A iterable object is returned, the type is class map.
                          - Firanlly convert the result to a list.
                    The following is use lambda fucntion as below-------
                    
                    y = map(lambda x: len(x), [apple, 'banana', 'cherry'])
                    print(y)            # map object
                    print(list(y))      # [5, 6, 6]
                    

            4.3  Python Function Recursion
                
                # 4.3.1           a dictionary with nested structure.
                #       note: No additional line continuation is needed for collection
                nested_dict = {                #root dictionary item
                        "k1": {
                            "k12": {"k121": 121,  "k122": 122},
                            "k13": {"k131": 131}
                            },
                        "k2": {"k21": {"k211": 211}}
                    }
                
                # 4.3.2            Within a function, it can be called itself.
                def recursive_function(d):
                  for k,v in d.items():     #method items() will populate a key-value pair in for-loop.  
                                            # for-loop,  left  -> right, horizontally   
                     if isinstance(v, dict):     # if v is a dictionary 
                         recursive_function(v)   # top -> down,  verrtically
                                                 # set the dict parameter to v.
                                                 # call itself
                     else:                       # if not,      
                         print (k,":",v)         # do some work, the same for all dictionary items.
                
                # 4.3.3             to begin with, call it with the parameter for root item
                recursive_function(nested_dict)
                
                The output is as below:
                     k121 : 121
                     k122 : 122
                     k131 : 131
                     k211 : 211  
                     
                explanations:
                - The input is a nested python dictionary.
                - Recursion can be used for different scenarios.
                - The above code is to iterate all the dictionary items by function call recrursively.
                - 
                - *To begin with, call the function. The parameter is the root item.
                  *Then, use for-loop to iterate items horizonally.
                        *If the value of a item is a dictionary,
                            - trying to call itself. By doing so,
                               - the environemnt will push the instruction location 
                                                        and state data in a stack vaiable.
                            - the value will be the parameter for next function call.
                        *If the value of a item is not dictionary, 
                            - it will be processed. The task here is to print its key and value. 
                            - The same task is for ALL items.
                            - After completing the task, pop up the system stack variable, 
                                 to get location of the instruction and the state data.
                            - go
                - The recursive processes will be ended when the stack is empty.
                  
                

            4.4  python function, Arbituary Arguments
                
                def my_function(*kids):        # Arbituary Argumentsis defined in function like this    
                    youngest = kids[len(kids) - 1]
                    print("The youngest child is " + youngest])
                  
                my_function("Wiwi", "Tairo", "Emi")  #parameters are passed in when calling the function
                
                output: The youngest child is Emi
                    




        5. modules
           
           #  mymodule.py
            def greeting(name):             # function definition
                print("hello, " + name)

            class MyClass:                  # class definition
                y = 11

            #  test_module.py
            import mymodule    
            mymodule.greeting("Peter")    #using the function, hello, Peter

            p1 = mymodule.MyClass()       # create an object
            num = p1.y                    # get its property 
            print("num is {}", num)       # print the value, num is {} 11
            
            
            Module name is the file name without .py file extension
            
            note 1: Both files are in the same folder.
            note 2: The purpose is for code partition or code share.

        6. packages
             The follow lab is to used to demo about python packages.
             6.1  in Mac, command window, python 3.7, package is camecase.
             6.2  Using a package, camelcase
                   create a file, test_p1.py as below:
                       import camelcase              module
                       c = camelcase.CamelCase()     create an object from class   
                       txt = "hello world"  
                       print(c.hump(txt))             #Hello World
             6.3  Remove a package
                    in command window, enter $
                          pip uninstall camelcase
             6.4  download and install package camelcase
                    in command window, enter $
                          pip install camelcase
                    Verify by $python test_p1.py
             6.5  $pip list         see package camelcase in the list
             6.6  $pip show camelcase   see its location as below:
                      /Users/peterkao/anaconda3/lib/python3.7/site-packages/camelcase
                  in mac finder window, click menu go, go-to-folder, site_package,
                  under it, there is a folder, camelcase    
                  under folder camelcase, there is a file main.py 
                  inside main.py, there is a class, CamelCase
             6.7  
                  - A package is used for code distribution.
                  - A module is used inside your python app.
                  - The two use the SAME name, all lower case.

04. Pandas

04.01 Overview

Pandas is a Python library.
Pandas is a way to analyze data, like excel sheets and VBA, database ides.
It requires less programming knowledege.
The name "Pandas" has a reference to both "Panel Data", like the tables in databases.
Many times, we get data from databases, instead of csv files, or json files.
We usually get data from database. W3C separates the topic between Pandas and machine learning.
Pandas data also work for machine learning, like predictions.

    # The following demo to demonstrate the structure of dataframe.
    
    import pandas as pd
    mydataset = {
      'cars': ["Honda", "Toyota", "Ford"],
      'ages': [6, 14, 3],
      'color': ["blue", "grey", "black"]
    }
    myvar = pd.DataFrame(mydataset)
    print(myvar)

                cars       ages      colors
            0  Honda          6       blue
            1  Toyota        14       grey
            2  Ford           3       black

The process is to create a dataframe fom a python dictionary .
All the values are type list.
In the dataframe, cars, ages, colors are column names.
In the dataframe, the left side, 0, 1,2 are index for rows
Using method print for dataframe gives the whole table.

04.02 Named Indexes, Merge dataframes

With the index argument, you can name your own indexes
The following code, it is the same as default.
To merge dataframes, use pandas's concat method.


    import pandas as pd
    data1 = {
        "calories": [420, 380, 390],
        "duration": [50, 40, 45]
    } 
    df1 = pd.DataFrame(data1, index = [0, 1, 2])

    data2 = {
        "calories": [450, 580, 600],
        "duration": [55, 65, 75]
    } 
    df2 = pd.DataFrame(data2, index = [3, 4, 5])

    frames = pd.concat([df1, df2])  # list
    print(frames)        # 6 rows

04.03 Filtering or Subset

DataFrame has attribute loc for filtering.

Technically, it is not a FUNCTION.

It prepares a subset, and RETURNS it.

In this case, it returns a subset of a dataframe.


                # continue from the previous esample
                # demo 04.03.1           filtering by conditions
                frames = ...
                df_long_duration = frames.loc[frames.duration >= 65]
                print(df_long_duration)           # two rows
                Just like where clause in SQL

                # demo 04.03.2           select rows
                df24 = frames.loc[2:4]
                print(df24)             # rows, index 2, 3, 4(included)
            
                # demo 04.03.3           select columns
                df4cols = frames.loc[:,["durations"]]
                print(df4cols)           
                ALL rows, only column durations 
            
                # demo 04.03.4           select rows and columns
                dfrc = frames.loc[0:1,["durations"]]
                print(dfrc)              # rows with index 0, and 1, only column durations

04.04 Series

In the following code, step 1 is to create a dataframe with one column.

Step 2 is to convert it to a Series object.

Step 3 is to analyze data from the Series object.


                import pandas as pd
                data = {                            # dictionary, its item value is a list
                    "calories": [420, 380, 390, 500, 600],
                    "duration": [52, 40, 46, 60, 70]
                }
                
                df = pd.DataFrame(data, columns = ["duration"])  # data frame with  one column
                s = df.squeeze()       # df => series    
                print(s)               # 52, 40, 46, 60, 70
                print(type(s))         # Series
                
                mean = s.mean()
                print(mean)               # 53.6   sum / 5
                
                median = s.median()
                print(median)             # 52.0   middle
                
                std = s.std()
                print(std)                # 11.78  standard deviatio

04.05 IO

Pandas provides methods to interface with files or databases.

Examples are listed ad below:

pd.read_csv("data.csv") example from w3c

pd.read_json("data.json") example from w3c

pd.read_sql_query("SELECT ...", con) for sqlite , Sql db

pd.DataFrame(list(mycollection.find())) for mongodb, NoSqldb

note 1: The results are framework objects, which are for DATA VISUALIZATION or DATA ANALYSIS.

note 2: for file io

The data file and the python file do not have to be in the same folder.

one example like pd.read('../myfolder/data.csv')

The two is in peer relation.

.. is to go up one level
/myfoler is to from the top level to go down one level, myfolder.

In myfolder, access data2.csv.

note 3: examining w3c Pandas data.json

It is a dictionary with 4 key-value pairs,

The keys are Duration, Pulse, Maxpulse, Calories.

The values are the lists of numbers.

The list size are the SAME.

04.06 DateFrame for Plotting

--- DEMO 1 ---


        import pandas as pd
        import matplotlib.pyplot as plt

        data = {
            'new_cases':    [200, 180, 175, 160, 120],
            'hospitalized': [100,  80,  70,  50,  40],
            'deaths':       [35,   30,  33,  25,  20],
            'positive%':    [15,   14,  16,  12,   9]
        }
        
        df = pd.DataFrame(data, index=[7,8,9,10,11])
        
        myfig = df.plot.line()
        myfig.set_xlabel("date of a month")
        myfig.set_ylabel("counts or percentage")

        plt.show()

descriptions for demo 1

This is a very common case to see how things changes with time.

For y-axis, there are MANYfor monitor, like death count.
For x-axis, there is only time, like day of a month.
In method pd.DataFrame, the argument index is to define the time values.
We can more easily to get a idea by visualize a figure.
Only one statement to create a figure from a dataframe directly.

In demo 1, the data is from a python dictionary object.
Module panda is used to create a data frame object.
The dataframe is used to draw a figure drom data.
Module matplotlib.pylot is to render the figure on the device.

--- DEMO 2 ---

data from Sqlite relation database
in mac, use the ide, DB Browser Sqlite, to create a database.
The database name is test315.
The file name is test315.db.
The file is in the same folder for my later python code.
add a table, mytable, with 4 columns as demo 1.
added 5 rows.
create a python file as below.
run it, get the result as demo 1, with extra labels for x, y axis.

            data in the database
                new_cases   hospitalized    deaths     testpositive
                200         100             35         15
                ....
                15          14              16         9
            

            python code    
            import sqlite3
            conn = sqlite3.connect('test315.db')    # database in folder peter
            import pandas as pd
            import matplotlib.pyplot as plt
            
            df = pd.read_sql_query("select * FROM mytable", conn)   # DataFrame
            
            df2 = pd.DataFrame(df.values,
               index=[7,8,9,10,11],
               columns=['new cases', 'hospitalized', 'deaths', 'test positive %'])

            myfig = df2.plot.line(title='COV-19 update, town xyz, March 2021')
            
            myfig.set_xlabel("date of a month")
            myfig.set_ylabel("counts")
            plt.show()
            conn.close()

descriptions from demo 2

The data in demo 2 is from a relation database, sqlite.
Panda provides a method to retrieve data from the sqlite databases to a data frame for draw.
It is a common way to get data.
There are similar way for other relation or NoSql databases.
One method's argument is a sql statement.
A sql statement can also be used to process the data in database tier.

        Another way for io.
              mycursor = mydb.cursor()  # Cursor is an object to store rows temporary in memory.
              mycursor.execute("SELECT * FROM customers")
              myresult = mycursor.fetchall()
        If the above code is used to IO, a list of tuples will be returned.

--- DEMO 3, Scatter Plot ---

w3c tutorial, Pandas, Plot, scatter example is used.
data
- csv file
- Two columns are involved - exercise duration, calories consumption.
Plot points with duration as x-axis, calorire as y-axis for all rows.

          import pandas as pd
          import matplotlib.pyplot as plt
          df = pd.read_csv('data.csv')        #comment: It can be from a database.
          df.plot(kind = 'scatter', x = 'Duration', y = 'Calories')
          plt.show()

The plot helps you to see the correlation between the two.

In this case, there is a relationship between the two columns.

It is linear.

The calories consumes more as the duration increases.

--- DEMO 4, Line Plot ---

continue from demo 3, the same data
Replacing the line for plot as below:
df.plot.line(x = "Durations", y ="Calories").
The default kind for df.plot is line plot.
The method will plot lines between two points from any adjacent two data rows automatically.
The focus is the relationship between them.

--- DEMO 5, histogram ---

                  
              import pandas as pd                                         
              import matplotlib.pyplot as plt       
              df = pd.read_csv('data.csv')
              df["Duration"].hist(bins=20)          
              plt.show()

When people exercise, they record the duration of exercises.

In this demo, the data is in a csv file.

The duration ranges are in x-axis.

The counts for any duration range are in y-axis

From the plot, you can tell the counts for all ranges.

In this demo, the person ususaly spends 60-70 minutes for exercise.

--- DEMO 6, bar plot ---

            import pandas as pd
            import matplotlib.pyplot as plt

            speed = [18.5, 41, 49]                 #avg values
            lifespan = [9, 71, 1.6]                #avg values
            index = ['pig', 'elephant','rabbit']
            df = pd.DataFrame({'speed': speed, 'lifespan': lifespan},
                               index=index)
                               
            df.plot.bar(rot=10)

            plt.show()

comment: What you give, what you see.

--- DEMO 7, pie plot ---

         import pandas as pd
         import matplotlib.pyplot as plt

         df = pd.DataFrame(      
                  {
                      'town A':     [ 35001,   12000],      #for one subplot
                      'town B':     [ 29010,   55003],      #for one subplot
                      'town C':     [ 32597,   41055]       #for one subplot
                  },
                  index = ['YES', 'NO']        # pie items
          )
          df.plot.pie(subplots=True,
              figsize=(10,2),
              title = "Vote Count Pie Plot for three towns")

          plt.show()

--- DEMO 8, DataFrame, Mongodb, Plotting ---

------------ Prepare data in Mongodb ---------------

        import pymongo
        myclient = pymongo.MongoClient("mongodb://localhost:27017/")
        mydb = myclient["mydatabase"]
        mycol = mydb["mycollection"]    #collection, like table in sql

        # drop the collection if it exists
        mycol.drop()

        r = {"duration": 60, "calories": 409.1}     #dictionary
        d = mycol.insert_one(r)         #document, like row in sql
        r = {"duration": 45, "calories": 282.4}    
        d = mycol.insert_one(r)
        r = {"duration": 30, "calories": 195.1}    
        d = mycol.insert_one(r)    

        # test code
        for doc in mycol.find():
            print(doc)
        # See 3 dictionaries, each has three key-value pairs.

descriptions

Make sure the environment is setup properly - mongodb, and the driver
Two command windows are needed - running mongod and running your code.
Line 1 is the driver.
Line 2 is the mongodb client info.
Line 3 is the database name.
Line 4 is the collection name, like tabel in SQL db.
Then, prepare a dictionary, and insert it into the collection. Repeat 3 times.
Verify the result.

------- Using DataFrame to import NoSql db and plotting------

  
        import pymongo

        myclient = pymongo.MongoClient("mongodb://localhost:27017/")
        mydb = myclient["mydatabase"]
        mycol = mydb["mycollection"]

        import pandas as pd    
        df = pd.DataFrame(list(mycol.find()))   # for many documents
        # verify
        print(df)

        import matplotlib.pyplot as plt
        df.plot.line(x = "duration", y = "calories")
        plt.show()

                  verification result
                                _id  calories  duration
                            0  ....  409.1        60
                            1  ....  282.4        45
                            2  ....  195.1        30  
                            
                             
                  plotting

04.07 Pandas Documents...

DataFrame is the main section in Pandas documents.
In it, there are many methods like groupby, sort, pivot, etc.
Examples are provided, you can use them for the needs.
You can get data from a csr file or databases, then analyze, plot,update it.

05. Machine Learning

from w3c tutorial, python, machine learning

s1. getting started, data types

Numerical Data
- discrete data, integer like The number of cars passing by.
- continuout data, like the price of an item,
Categorical & a color value
Ordinal , like a school grade, A, B, C

s2,s3,s4, Mean, Median, Mode, std, percentile
In the following code, module numpy from its package numpy provides some methods to compute the the statistics from data source.

import numpy speed = [32,111,138,28,59,77,97] x = numpy.std(speed) print(x)
s5, s6, Data Distribution, Normal Data Distributiom
import numpy x = numpy.random.uniform(0.0, 5.0, 250) print(x)

The above code generates 250 numbers from 0.0 to 0.5.

When the result is presented in a histogram with five bins, the curve is pretty flat.

Numpy provides methods to generate data.

When a dice is thrown, numbers 1,2,3,4,5,6 have the equal probability 1 out of 6.

import numpy import matplotlib.pyplot as plt x = numpy.random.normal(5.0, 1.0, 100000) plt.hist(x, 100) plt.show()

The above numpy's random method will generate 100000 numbers.

mean: 5.0, std: 1.0.

The values should be concentrated around 5.0.

When a factory product is produced. The width is one of important specification.
The result of the width will follow normal distribution.

Other examples are like babies weights, students grades.

comments:
The above two distributions are just typical common styles.

s7, s8, s9, Scatter Plot, Linear and Polynomial Regresssion

The term regression is used when you try to find the relationship between variables.

First, plot a scatter plot to see the regression, linear, polynomial, or none.

The following code is for linear regression.

Module scipy from its packageis used for drawing.

The method stats.linregress returns many.

The return r reveals the relationship.

mymodel = list(map(myfunc, x))

Data list x has many values.

Method map has two arguments - myfunc, and list

Each value in x is passed into myfunc and return y-value in the line.

Loop for all x values, and return a list for y-values in the line, mymodel.

You can predict a y-value like print(myfunc(10)).

import matplotlib.pyplot as plt from scipy import stats x = [5,7,8,7,2,17,2,9,4,11,12,9,6] y = [99,86,87,88,111,86,103,87,94,78,77,85,86] slope, intercept, r, p, std_err = stats.linregress(x, y) def myfunc(x): return slope * x + intercept mymodel = list(map(myfunc, x)) plt.scatter(x, y) plt.plot(x, mymodel) plt.show()

Pandas Correlations

import pandas as pd import seaborn as sn import matplotlib.pyplot as plt df = pd.read_csv('data.csv') corrMatrix = df.corr() sn.heatmap(corrMatrix, annot=True) plt.show()

In One large tabular data, dataframe's method corr will help to find their relation.

W3C turorial has a topic in Pandas topic.

Method corr will create the matrix data.

Seaborn's heatmap will create an image, easy to look.

In this example, "Duration" and "Calories" got a 0.922721 correlation, which is a very good correlation,

It also help rendering significantly.

Once you know two columns are correlated, you can plots a line for detail. One columns is for x-axis, one for y-axis.

polynomial regression

In w3c example for 18 cars as they were passing a certain tollbooth.

The x-axis represents the hours of the day and the y-axis represents the speed

It is apparent that in the middle of night, car speed can be fast, but not in the middle of a day.

Module numpy provides a method to compute y_axis on the curve.

import numpy import matplotlib.pyplot as plt x = [1,2,3,5,6,7,8,9,10,12,13,14,15,16,18,19,21,22] y = [100,90,80,60,60,55,60,65,70,70,75,76,78,79,90,99,99,100] mymodel = numpy.poly1d(numpy.polyfit(x, y, 3)) myline = numpy.linspace(1, 22, 100) plt.scatter(x, y) plt.plot(myline, mymodel(myline)) plt.show()

s10,Multipe Regresssion

W3C uses the CO2 emission from engine size and car weight as example.

A dataframe is created in list containing two independent variables.

Assign it to a variable X, capital letter by convention.

One dependent variable, y is for prediction.

Specialist module sklearn provides this service.

The data must have the linear relationships for engine size - CO2 emission.

The same for car weight - CO2 emission.

The linear model is used to prediction.

import pandas from sklearn import linear_model df = pandas.read_csv("cars.csv") X = df[['Weight', 'Volume']] y = df['CO2'] regr = linear_model.LinearRegression() regr.fit(X, y) #predict the CO2 emission of a car where the weight is 2300kg, and the volume is 1300cm3: predictedCO2 = regr.predict([[2300, 1300]]) print(predictedCO2) # [107.208]

Object regr has an object coef_

This object contains [0.00755095 0.00780526]

If the weight increase by 1kg, the CO2 emission increases by 0.00755095g.

If the engine size (Volume) increases by 1 cm3, the CO2 emission increases by 0.00780526 g.

Another way to predict is to compute manually.

Under this model to predict, the two are independent, To sum up is the result.

s11, Scale

comment: To choose a proper units can be good enough.

In the previous examples, use cc for engine volume, not liter.

For the first row in data, 1000 cc vs 790 kilograms.

comment 2: CO2 emssion is good way for air polution evaluation.

s12, Train/Test

W3C uses module numpy's random.normal to create data.

In real world, you'll use real data, not from simulated data.

The training set should be 80% of the original data.

The testing set should be the remaining 20%.

In the example,two dataframes are created for x-axis, and y_axis.

The x in dataframe is not in ascending order.

When plotting, they are in ascending order.

This is the reason you can partition the data into two.

s13, Decision Tree

W3C's example
In the example, a person will try to decide if he/she should go to a comedy show or not.

The tabular data includes two parts.

Some information about the comedian, like age, experience, rank, nationality.

registered if he/she went or not.

based on this data set, Python can create a decision tree that can be used to decide if any new shows are worth attending to.

The following code

# ----- code -------------------------------------------------- import pandas from sklearn import tree from sklearn.tree import DecisionTreeClassifier df = pandas.read_csv("shows.csv") # Some features must be converted to numbers for DecisionTreeClassifier # update the column, then update it d = {'UK': 0, 'USA': 1, 'N': 2} df['Nationality'] = df['Nationality'].map(d) d = {'YES': 1, 'NO': 0} df['Go'] = df['Go'].map(d) ''' - create two dataframes from the columns that we try to predict from - and the target decision column ''' features = ['Age', 'Experience', 'Rank', 'Nationality'] X = df[features] y = df['Go'] ''' - create object dtree - Object dtree applies data - Object dtree creates the decision logic. - note: * My code exports the logic in text, using print to render the logic. * My code does not create an image file, then renders it. * The text is good enough to see the code. ''' dtree = DecisionTreeClassifier() dtree = dtree.fit(X, y) data = tree.export_graphviz(dtree, out_file=None, feature_names=features) print(data)

--------- print(data) output ---------------------------- digraph Tree { node [shape=box] ; 0 [label="Rank <= 6.5\ngini = 0.497\nsamples = 13\nvalue = [6, 7]"] ; 1 [label="gini = 0.0\nsamples = 5\nvalue = [5, 0]"] ; 0 -> 1 [..., label angle=45, headlabel="True"] ; 2 [label="Rank <= 8.5\ngini = 0.219\nsamples = 8\nvalue = [1, 7]"] ; 0 -> 2 [..., label angle=-45, headlabel="False"] ; 3 ...... ...

Result Explain

level-1, dtree box
From the original data, there are 13 shows.

From the original data, there are 6 for NO, 7 for go.

In this box, the generated logic check column Rank is too low.

If true, goto level-2, left

if false, goto level-2, right

level 2, left
From the original data, column Rank, there are 5.

From the original data, column GO, there are all NO.

Therefore, this is a Dtree Leaf node.

level 2, right
From the original data, column Rank, there are 8.

From the original data, column GO, 1 for NO, 7 for go.

Therefore, this is a Dtree decision node.

Here, column Nationality is selected by the generated logic.

People from different nationalities have different views for fun.

From the remaining samples, there 1 for no, 7 for GO

level 3, level 4
level 3 is for column Age.

level 4 is for column Experience

notes
For any level, a decision node is needed - All NO or all GO

In this demo, it clearly illustrate how dtree works.

Python

March, 2021

Contents

01. My Impression with Python, and My Lab Setup