Python

March, 2021
home

Contents

01. My Impression with Python, and My Lab Setup

02. Virtual Environment for Python

                $pwd /Users/peterkao/desktop/code_p2/python
                in  the folder,  there is a file  32_pip_1.py
                   
                   import camelcase                   #module name
                   c = camelcase.Camelcase()          # class construster
                   txt = “hello world”
                   print(c.hump(txt)) 
                   
                $python 32_pip_1.py                   # Hello World    
            

03. My Python Language Study

        1. A interpreter is used for python code.
            - To begin with, parse code for the first statement.
            - translate the statement into the corresponding machine code
            - execute it
            - then, process the second statement...

            The following example has three statements as below:
                
                a = 12
                s = "hello"
                print("hello") 
                  
                There are three statements. All start from position 1.
                They are processed by the interpreter one by one in sequence. 
        
            The following example has if statement as below:
                
                if 200 > 30:
                 print("under if")
                 print("greater")
                else:
                     print("under else")
                     print("less")
                print("This is another statement.")
                  
                
                Any size identations indicate the ends of statements.
         
        2. Within a block code, like if-then-else, for loop, while loop, try-catch,
            there are statements to alter the execution sequence- like break, continue, pass.
               break statement: It is used in for-loop.
                  
                  During a for-loop is executed, a break statement is encountered, 
                  the rest of for-loop code will be skipped.
                   

                continue statement: It is also used in for-loop.
                   When a continue statement is encountered,
                   the current iteration will be skipped.

                pass statement: 
                   - no contents now
                   - a placeholder for later use
                   - avoid error

                try-catch:
                   - Example: an error occurs for reading a file.
                   - use catch code block handle the situation, like output this reason.
                   - The code will not crash.

        3. variables, types, classes/objects, collections
            3.1  variables
                 # The following code demonstrates variables, types,
                 #       and its dynamics features.
                 # comment: Unlike C, Java..., you do not declare a type for a vaiable.
                 x = 5
                 print("x = {}, its type = {}", x, type(x))     # 5, int
                 x = "Peter"
                 print("x = {}, its type = {}", x, type(x))     # Peter, str
                 x = 5.0
                 print("x = {}, its type = {}", x, type(x))     # 5.0, float
                 x = True
                 print("x = {}, its type = {}", x, type(x))     # True, bool

            3.2  classes and Objects
                3.1.1  Simple class example
                        
                            class MyClass:
                                x = 5
                            p1 = MyClass()
                            print(p1.x)         # 5
                            p1.y = "hello"      the property is dynamically created.
                            print(p1.y)         # hello
                        
                        notes:
                            OO supports and use inheritance, this form is for demo only.
                3.1.2   Practical example 
                        
                            class Person:
                                def __init__(self, name, age):    #like OO class constructor
                                    self.name = name              #define property
                                    self.age = age                # self is like this in OO.

                                def myfunc(self):                 #define method 
                                    print("Hello my name is " + self.name)

                            print("type of Person = {}", type(Person))    #class type
                                                                          class definition
                            p1 = Person("John", 37)    

                            print("type of p1 = {}", type(p1))    # class '__main__.Person'
                                                                          an object content
                            x = "John"
                            print("type of x = {}", type(x))      # class 'str'   built-in type   
                            print()
                         
                        
                3.1.3   Using classes/objects
                        
                            *in OO world,
                                 - create a class as template
                                 - instantiate many objects base on the class
                                 - OO supports inheritance
                            * two examples in Django, a python web framework package
                                 - example: database model
                                    
                                    from django.db import models
                                    class Post(models.Model):
                                        text = models.TextField()
                                    
                                    - django: package
                                    - db: module
                                    - models: parent class
                                    - Post: child class
                                    - text: field name
                                  - exampe:  for django web server views 
                                    
                                    from django.views.generic import ListView, DetailView 
                                    from .models import Post
                                        class BlogListView(ListView):
                                            ...
                                        class BlogDetailView(DetailView): 
                                            ...
                                    
                                    - class ListView and DetailView extend from generic.
                                    - They take care of different heavy-duty web-server tasks.
                                  
                                  - notes: Django uses its own easy way, not using __init__ method.
                        

                3.1.4   collections
                    - list
                      Code snippets are as belwo: 
                    
                        mylist = ["apple", "banana", "cherry"]     #create a list
                        for x in mylist:
                            print(x)
                        
                        import numpy as np                 #numpy package is needed
                        arr = np.array(mylist)             #create a numpy ndarray from a list          
                        for x in arr:
                            print(x)
                        
                        print("mylist[1] = {}", arr[1])    #banana
                        
                        print("arr[1] = {}", arr[1])       #banana
                        # note: Numpy array object is 50x faster than a list object.
                        #       Because its data stored in contiguous memory locations.
                    
                    
                        notes:
                          - A list uses [] to contain its items.
                          - The above demo is to create a list.
                          - It also demonstrates to create a numpy array with a list.
                    
                    
                    - tuple
                        In W3C python tutorial, Python MSQL, MySql Select,
                        the demo is to get all rows for the database.
                        the first row is like below:
                              (1, 'John', 'Highway 21') 
                          
                        notes:
                           - That is a tuple.
                           - Using () to contain its items.
                           - It contains the values of its items.
                           - It is for a person's address data.
                        

                    - dictionary
                        
                            thisdict = {
                                "brand": "Ford",
                                "model": "Mustang",
                                "year": 1964
                              }
                        
                        - The above code is for a python dictionary object.
                        - There are many key-value pairs.
                        - Using {} to contain its items.
                        - python dictionary
                            - It can be nested, expanding vertically. 
                            - Its value can be a list, expanding horizonally.
                        - A python dictionary can be from NonSql database like mongo.
                           - in json format
                           - JSON is a lightweight data-interchange format.
                               - It is used in js and python.
                               - The JSON format is text only.
                           - When python is used to interface with mongodb, 
                               - for adding, just insert one in python dictionary
                               - for retrieveing, a python dictionary object is returned
                               - No need stringify or parsing a text.
                        - note: The same code of a python dictionary object in javascript is called js object.
              
                    3.1.5  Iterable
                        -  Lists, tuples, dictionaries, and sets are all iterable objects.
                        -  They are iterable containers which you can get an iterator from.
                        -  An iterator is an object that can be iterated upon,
                        -                  meaning that you can traverse through all the values.
                        -  
                           Technically, in Python, an iterator is an object 
                                   which implements the iterator protocol, 
                                   which consist of the methods __iter__() and __next__().
                           

        4. functions, lambda
            The purpose to use a function is for code partition or code share.

            4.1 a simple function
            
                def my_function(x):
                    return 5 * x

                print(my_function(3))           # 15
            
            4.2 a lambda function
            
                x = lambda a : a + 10            
                print(x(5))                     # 15
            
                descriptions:
                4.2.1. syntax:     lambda arguments : expression
                4.2.2. A lambda function is a small anonymous function.
                        - lambda is a key word
                        - a is an argument
                        - expression is the function content.
                            
                            - if-then-else, looping is not allowed.
                            
                4.2.3. The power of lambda is better shown when you use them 
                           as an anonymous function inside another function.
                   
                    In W3C python tutiorial, python reference, python built-in functions,
                    map() functionmap() function    -------------------
                    
                    def myfunc(a):
                        return len(a)
                    
                    x = map(myfunc, ('apple', 'banana', 'cherry'))
                    
                    print(x)            # map object
                    
                    #convert the map into a list, for readability:
                    print(list(x))       # [5, 6, 6]
                    
                          - map function takes two arguments
                          - the second is a iterable object, a tuple is heres.
                          - The first is a inner function, to proccess each item.
                          - A iterable object is returned, the type is class map.
                          - Firanlly convert the result to a list.
                    The following is use lambda fucntion as below-------
                    
                    y = map(lambda x: len(x), [apple, 'banana', 'cherry'])
                    print(y)            # map object
                    print(list(y))      # [5, 6, 6]
                    

            4.3  Python Function Recursion
                
                # 4.3.1           a dictionary with nested structure.
                #       note: No additional line continuation is needed for collection
                nested_dict = {                #root dictionary item
                        "k1": {
                            "k12": {"k121": 121,  "k122": 122},
                            "k13": {"k131": 131}
                            },
                        "k2": {"k21": {"k211": 211}}
                    }
                
                # 4.3.2            Within a function, it can be called itself.
                def recursive_function(d):
                  for k,v in d.items():     #method items() will populate a key-value pair in for-loop.  
                                            # for-loop,  left  -> right, horizontally   
                     if isinstance(v, dict):     # if v is a dictionary 
                         recursive_function(v)   # top -> down,  verrtically
                                                 # set the dict parameter to v.
                                                 # call itself
                     else:                       # if not,      
                         print (k,":",v)         # do some work, the same for all dictionary items.
                
                # 4.3.3             to begin with, call it with the parameter for root item
                recursive_function(nested_dict)
                
                The output is as below:
                     k121 : 121
                     k122 : 122
                     k131 : 131
                     k211 : 211  
                     
                explanations:
                - The input is a nested python dictionary.
                - Recursion can be used for different scenarios.
                - The above code is to iterate all the dictionary items by function call recrursively.
                - 
                - *To begin with, call the function. The parameter is the root item.
                  *Then, use for-loop to iterate items horizonally.
                        *If the value of a item is a dictionary,
                            - trying to call itself. By doing so,
                               - the environemnt will push the instruction location 
                                                        and state data in a stack vaiable.
                            - the value will be the parameter for next function call.
                        *If the value of a item is not dictionary, 
                            - it will be processed. The task here is to print its key and value. 
                            - The same task is for ALL items.
                            - After completing the task, pop up the system stack variable, 
                                 to get location of the instruction and the state data.
                            - go
                - The recursive processes will be ended when the stack is empty.
                  
                

            4.4  python function, Arbituary Arguments
                
                def my_function(*kids):        # Arbituary Argumentsis defined in function like this    
                    youngest = kids[len(kids) - 1]
                    print("The youngest child is " + youngest])
                  
                my_function("Wiwi", "Tairo", "Emi")  #parameters are passed in when calling the function
                
                output: The youngest child is Emi
                    




        5. modules
           
           #  mymodule.py
            def greeting(name):             # function definition
                print("hello, " + name)

            class MyClass:                  # class definition
                y = 11

            #  test_module.py
            import mymodule    
            mymodule.greeting("Peter")    #using the function, hello, Peter

            p1 = mymodule.MyClass()       # create an object
            num = p1.y                    # get its property 
            print("num is {}", num)       # print the value, num is {} 11
            
            
            Module name is the file name without .py file extension
            
            note 1: Both files are in the same folder.
            note 2: The purpose is for code partition or code share.

        6. packages
             The follow lab is to used to demo about python packages.
             6.1  in Mac, command window, python 3.7, package is camecase.
             6.2  Using a package, camelcase
                   create a file, test_p1.py as below:
                       import camelcase              module
                       c = camelcase.CamelCase()     create an object from class   
                       txt = "hello world"  
                       print(c.hump(txt))             #Hello World
             6.3  Remove a package
                    in command window, enter $
                          pip uninstall camelcase
             6.4  download and install package camelcase
                    in command window, enter $
                          pip install camelcase
                    Verify by $python test_p1.py
             6.5  $pip list         see package camelcase in the list
             6.6  $pip show camelcase   see its location as below:
                      /Users/peterkao/anaconda3/lib/python3.7/site-packages/camelcase
                  in mac finder window, click menu go, go-to-folder, site_package,
                  under it, there is a folder, camelcase    
                  under folder camelcase, there is a file main.py 
                  inside main.py, there is a class, CamelCase
             6.7  
                  - A package is used for code distribution.
                  - A module is used inside your python app.
                  - The two use the SAME name, all lower case.
                  
            

04. Pandas

    # The following demo to demonstrate the structure of dataframe.
    
    import pandas as pd
    mydataset = {
      'cars': ["Honda", "Toyota", "Ford"],
      'ages': [6, 14, 3],
      'color': ["blue", "grey", "black"]
    }
    myvar = pd.DataFrame(mydataset)
    print(myvar)
    
            
                cars       ages      colors
            0  Honda          6       blue
            1  Toyota        14       grey
            2  Ford           3       black
            

04.02 Named Indexes, Merge dataframes


    import pandas as pd
    data1 = {
        "calories": [420, 380, 390],
        "duration": [50, 40, 45]
    } 
    df1 = pd.DataFrame(data1, index = [0, 1, 2])

    data2 = {
        "calories": [450, 580, 600],
        "duration": [55, 65, 75]
    } 
    df2 = pd.DataFrame(data2, index = [3, 4, 5])

    frames = pd.concat([df1, df2])  # list
    print(frames)        # 6 rows
            

04.03 Filtering or Subset


                # continue from the previous esample
                # demo 04.03.1           filtering by conditions
                frames = ...
                df_long_duration = frames.loc[frames.duration >= 65]
                print(df_long_duration)           # two rows
                Just like where clause in SQL

                # demo 04.03.2           select rows
                df24 = frames.loc[2:4]
                print(df24)             # rows, index 2, 3, 4(included)
            
                # demo 04.03.3           select columns
                df4cols = frames.loc[:,["durations"]]
                print(df4cols)           
                ALL rows, only column durations 
            
                # demo 04.03.4           select rows and columns
                dfrc = frames.loc[0:1,["durations"]]
                print(dfrc)              # rows with index 0, and 1, only column durations
            

04.04 Series


                import pandas as pd
                data = {                            # dictionary, its item value is a list
                    "calories": [420, 380, 390, 500, 600],
                    "duration": [52, 40, 46, 60, 70]
                }
                
                df = pd.DataFrame(data, columns = ["duration"])  # data frame with  one column
                s = df.squeeze()       # df => series    
                print(s)               # 52, 40, 46, 60, 70
                print(type(s))         # Series
                
                mean = s.mean()
                print(mean)               # 53.6   sum / 5
                
                median = s.median()
                print(median)             # 52.0   middle
                
                std = s.std()
                print(std)                # 11.78  standard deviatio 
            

04.05    IO

04.06    DateFrame for Plotting

--- DEMO 1 ---
cov19-data plotting

        import pandas as pd
        import matplotlib.pyplot as plt

        data = {
            'new_cases':    [200, 180, 175, 160, 120],
            'hospitalized': [100,  80,  70,  50,  40],
            'deaths':       [35,   30,  33,  25,  20],
            'positive%':    [15,   14,  16,  12,   9]
        }
        
        df = pd.DataFrame(data, index=[7,8,9,10,11])
        
        myfig = df.plot.line()
        myfig.set_xlabel("date of a month")
        myfig.set_ylabel("counts or percentage")

        plt.show()
            

--- DEMO 2 ---
            data in the database
                new_cases   hospitalized    deaths     testpositive
                200         100             35         15
                ....
                15          14              16         9
            
python code import sqlite3 conn = sqlite3.connect('test315.db') # database in folder peter import pandas as pd import matplotlib.pyplot as plt df = pd.read_sql_query("select * FROM mytable", conn) # DataFrame df2 = pd.DataFrame(df.values, index=[7,8,9,10,11], columns=['new cases', 'hospitalized', 'deaths', 'test positive %']) myfig = df2.plot.line(title='COV-19 update, town xyz, March 2021') myfig.set_xlabel("date of a month") myfig.set_ylabel("counts") plt.show() conn.close()

        Another way for io.
              mycursor = mydb.cursor()  # Cursor is an object to store rows temporary in memory.
              mycursor.execute("SELECT * FROM customers")
              myresult = mycursor.fetchall()
        If the above code is used to IO, a list of tuples will be returned.   
            
---     DEMO 3,    Scatter Plot ---
          import pandas as pd
          import matplotlib.pyplot as plt
          df = pd.read_csv('data.csv')        #comment: It can be from a database.
          df.plot(kind = 'scatter', x = 'Duration', y = 'Calories')
          plt.show()
            

---     DEMO 4,    Line Plot ---
line plotting
---     DEMO 5,    histogram ---
hist plotting
                  
              import pandas as pd                                         
              import matplotlib.pyplot as plt       
              df = pd.read_csv('data.csv')
              df["Duration"].hist(bins=20)          
              plt.show()
            
---     DEMO 6,    bar plot ---
bar plotting
            import pandas as pd
            import matplotlib.pyplot as plt

            speed = [18.5, 41, 49]                 #avg values
            lifespan = [9, 71, 1.6]                #avg values
            index = ['pig', 'elephant','rabbit']
            df = pd.DataFrame({'speed': speed, 'lifespan': lifespan},
                               index=index)
                               
            df.plot.bar(rot=10)

            plt.show()
            

comment: What you give, what you see.

---     DEMO 7,    pie plot ---
pie plotting
         import pandas as pd
         import matplotlib.pyplot as plt

         df = pd.DataFrame(      
                  {
                      'town A':     [ 35001,   12000],      #for one subplot
                      'town B':     [ 29010,   55003],      #for one subplot
                      'town C':     [ 32597,   41055]       #for one subplot
                  },
                  index = ['YES', 'NO']        # pie items
          )
          df.plot.pie(subplots=True,
              figsize=(10,2),
              title = "Vote Count Pie Plot for three towns")

          plt.show()
  
            
---     DEMO 8,    DataFrame, Mongodb, Plotting ---
------------ Prepare data in Mongodb ---------------
        import pymongo
        myclient = pymongo.MongoClient("mongodb://localhost:27017/")
        mydb = myclient["mydatabase"]
        mycol = mydb["mycollection"]    #collection, like table in sql

        # drop the collection if it exists
        mycol.drop()

        r = {"duration": 60, "calories": 409.1}     #dictionary
        d = mycol.insert_one(r)         #document, like row in sql
        r = {"duration": 45, "calories": 282.4}    
        d = mycol.insert_one(r)
        r = {"duration": 30, "calories": 195.1}    
        d = mycol.insert_one(r)    

        # test code
        for doc in mycol.find():
            print(doc)
        # See 3 dictionaries, each has three key-value pairs.
            
------- Using DataFrame to import NoSql db and plotting------
  
        import pymongo

        myclient = pymongo.MongoClient("mongodb://localhost:27017/")
        mydb = myclient["mydatabase"]
        mycol = mydb["mycollection"]

        import pandas as pd    
        df = pd.DataFrame(list(mycol.find()))   # for many documents
        # verify
        print(df)

        import matplotlib.pyplot as plt
        df.plot.line(x = "duration", y = "calories")
        plt.show()    
            
                  verification result
                                _id  calories  duration
                            0  ....  409.1        60
                            1  ....  282.4        45
                            2  ....  195.1        30  
                            
                             
                  plotting
                  mongoplotting
            

04.07    Pandas Documents...



05. Machine Learning

from w3c tutorial, python, machine learning
    s2,s3,s4, Mean, Median, Mode, std, percentile
  • In the following code, module numpy from its package numpy provides some methods to compute the the statistics from data source.
        import numpy
        speed = [32,111,138,28,59,77,97]
        x = numpy.std(speed)
        print(x)
            
s5, s6, Data Distribution, Normal Data Distributiom
        import numpy
        x = numpy.random.uniform(0.0, 5.0, 250)
        print(x)      
            
  • The above code generates 250 numbers from 0.0 to 0.5.
  • When the result is presented in a histogram with five bins, the curve is pretty flat.
  • Numpy provides methods to generate data.
  • When a dice is thrown, numbers 1,2,3,4,5,6 have the equal probability 1 out of 6.
          import numpy 
          import matplotlib.pyplot as plt
          x = numpy.random.normal(5.0, 1.0, 100000)
          plt.hist(x, 100)
          plt.show()   
            
  • The above numpy's random method will generate 100000 numbers.
  • mean: 5.0, std: 1.0.
  • The values should be concentrated around 5.0.
  • When a factory product is produced. The width is one of important specification.
    The result of the width will follow normal distribution.
  • Other examples are like babies weights, students grades.
    comments:
  • The above two distributions are just typical common styles.

s7, s8, s9, Scatter Plot, Linear and Polynomial Regresssion

  • The term regression is used when you try to find the relationship between variables.
  • First, plot a scatter plot to see the regression, linear, polynomial, or none.

  • The following code is for linear regression.
  • Module scipy from its packageis used for drawing.
  • The method stats.linregress returns many.
  • The return r reveals the relationship.
  •   mymodel = list(map(myfunc, x))
    • Data list x has many values.
    • Method map has two arguments - myfunc, and list
    • Each value in x is passed into myfunc and return y-value in the line.
    • Loop for all x values, and return a list for y-values in the line, mymodel.
  • You can predict a y-value like print(myfunc(10)).
        import matplotlib.pyplot as plt
        from scipy import stats

        x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
        y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

        slope, intercept, r, p, std_err = stats.linregress(x, y)

        def myfunc(x):
          return slope * x + intercept

        mymodel = list(map(myfunc, x))

        plt.scatter(x, y)
        plt.plot(x, mymodel)
        plt.show()          
            

Pandas Correlations
        import pandas as pd
        import seaborn as sn
        import matplotlib.pyplot as plt

        df = pd.read_csv('data.csv')

        corrMatrix = df.corr() 

        sn.heatmap(corrMatrix, annot=True)
        plt.show()   
            
corr
  • In One large tabular data, dataframe's method corr will help to find their relation.
  • W3C turorial has a topic in Pandas topic.
  • Method corr will create the matrix data.
  • Seaborn's heatmap will create an image, easy to look.
  • In this example, "Duration" and "Calories" got a 0.922721 correlation, which is a very good correlation,
  • It also help rendering significantly.
  • Once you know two columns are correlated, you can plots a line for detail. One columns is for x-axis, one for y-axis.

polynomial regression

  • In w3c example for 18 cars as they were passing a certain tollbooth.
  • The x-axis represents the hours of the day and the y-axis represents the speed
  • It is apparent that in the middle of night, car speed can be fast, but not in the middle of a day.
  • Module numpy provides a method to compute y_axis on the curve.
  • polynomial
            import numpy
            import matplotlib.pyplot as plt
    
            x = [1,2,3,5,6,7,8,9,10,12,13,14,15,16,18,19,21,22]
            y = [100,90,80,60,60,55,60,65,70,70,75,76,78,79,90,99,99,100]
    
            mymodel = numpy.poly1d(numpy.polyfit(x, y, 3))
    
            myline = numpy.linspace(1, 22, 100)
    
            plt.scatter(x, y)
            plt.plot(myline, mymodel(myline))
            plt.show()    
                

    s10,Multipe Regresssion

    • W3C uses the CO2 emission from engine size and car weight as example.
    • A dataframe is created in list containing two independent variables.
    • Assign it to a variable X, capital letter by convention.
    • One dependent variable, y is for prediction.
      • Specialist module sklearn provides this service.
      • The data must have the linear relationships for engine size - CO2 emission.
      • The same for car weight - CO2 emission.
      • The linear model is used to prediction.
            import pandas
            from sklearn import linear_model
            df = pandas.read_csv("cars.csv")
            X = df[['Weight', 'Volume']]    
            y = df['CO2']
            regr = linear_model.LinearRegression()
            regr.fit(X, y)
    
            #predict the CO2 emission of a car where the weight is 2300kg, and the volume is 1300cm3:
            predictedCO2 = regr.predict([[2300,   1300]])
    
            print(predictedCO2)         # [107.208]
                
    • Object regr has an object coef_
    • This object contains [0.00755095 0.00780526]
    • If the weight increase by 1kg, the CO2 emission increases by 0.00755095g.
    • If the engine size (Volume) increases by 1 cm3, the CO2 emission increases by 0.00780526 g.
    • Another way to predict is to compute manually.
    • Under this model to predict, the two are independent, To sum up is the result.

    s11, Scale

    • comment: To choose a proper units can be good enough.
    • In the previous examples, use cc for engine volume, not liter.
    • For the first row in data, 1000 cc vs 790 kilograms.
    • comment 2: CO2 emssion is good way for air polution evaluation.

    s12, Train/Test

    • W3C uses module numpy's random.normal to create data.
    • In real world, you'll use real data, not from simulated data.
    • The training set should be 80% of the original data.
    • The testing set should be the remaining 20%.
    • In the example,two dataframes are created for x-axis, and y_axis.
    • The x in dataframe is not in ascending order.
    • When plotting, they are in ascending order.
    • This is the reason you can partition the data into two.

    s13, Decision Tree

      W3C's example
    • In the example, a person will try to decide if he/she should go to a comedy show or not.
    • The tabular data includes two parts.
      • Some information about the comedian, like age, experience, rank, nationality.
      • registered if he/she went or not.
    • based on this data set, Python can create a decision tree that can be used to decide if any new shows are worth attending to.
    • The following code
          # -----  code --------------------------------------------------
          import pandas
          from sklearn import tree
          from sklearn.tree import DecisionTreeClassifier
          df = pandas.read_csv("shows.csv")
          
          # Some features must be converted to numbers for DecisionTreeClassifier
          # update the column, then update it
          d = {'UK': 0, 'USA': 1, 'N': 2}
          df['Nationality'] = df['Nationality'].map(d)
          d = {'YES': 1, 'NO': 0}
          df['Go'] = df['Go'].map(d)
          
          '''
          
            - create two dataframes from the columns that we try to predict from
            - and the target decision column
              
          
          '''
          features = ['Age', 'Experience', 'Rank', 'Nationality']
          X = df[features]
          y = df['Go']
          
          '''
          
            - create object dtree
            - Object dtree applies data
            - Object dtree creates the decision logic.
            - note: 
                  *  My code exports the logic in text, using print to render the logic.
                  *  My code does not create an image file, then renders it.
                  *  The text is good enough to see the code.
          
          ''' 
          dtree = DecisionTreeClassifier()
          dtree = dtree.fit(X, y)
          data = tree.export_graphviz(dtree, out_file=None, feature_names=features)
    
          print(data)  
            
                ---------    print(data) output  ----------------------------
                digraph Tree {
                  node [shape=box] ;
                    0 [label="Rank <= 6.5\ngini = 0.497\nsamples = 13\nvalue = [6, 7]"] ;
                          1 [label="gini = 0.0\nsamples = 5\nvalue = [5, 0]"] ;
                             0 -> 1 [..., label angle=45, headlabel="True"] ;
                          2 [label="Rank <= 8.5\ngini = 0.219\nsamples = 8\nvalue = [1, 7]"] ;
                             0 -> 2 [..., label angle=-45, headlabel="False"] ;
                          3  ......
                          ...
              

    Result Explain

      level-1, dtree box
    • From the original data, there are 13 shows.
    • From the original data, there are 6 for NO, 7 for go.
    • In this box, the generated logic check column Rank is too low.
    • If true, goto level-2, left
    • if false, goto level-2, right

      level 2, left
    • From the original data, column Rank, there are 5.
    • From the original data, column GO, there are all NO.
    • Therefore, this is a Dtree Leaf node.

      level 2, right
    • From the original data, column Rank, there are 8.
    • From the original data, column GO, 1 for NO, 7 for go.
    • Therefore, this is a Dtree decision node.
    • Here, column Nationality is selected by the generated logic.
    • People from different nationalities have different views for fun.
    • From the remaining samples, there 1 for no, 7 for GO

      level 3, level 4
    • level 3 is for column Age.
    • level 4 is for column Experience

      notes
    • For any level, a decision node is needed - All NO or all GO
    • In this demo, it clearly illustrate how dtree works.