DataCamp - excercises

https://www.datacamp.com/

DataCamp is a platform that offers (big) data resources for practicing data processing, analysis, and visualization.

It provides a wide range of Resources, including regularly scheduled Webinars, as well as always available Tutorials, White papers and Podcasts.

If you remember a concept but not the exact code or module to use... it's always useful to have handy Cheat sheets, which are provided for various software and languages like Python, SQL, Excel, bash, and others.

There are also job-related guides, the latest industry news for employees, and discussions on employer hiring needs - ideal for employers looking to fill positions like Data Scientist.

Basic registration and services are free, with options for customization, especially for businesses or educational institutions.


See elaborated examples of Python-based or SQL-based study or the Power BI solution on the topic of "Online Retail" dataset and related questions. The first two were analyzed with the help of the in-built Notebook on the DataCamp / DataLab website, which offered Python, SQL interpretation, and basic in-built visualization tools (as table or graph).

Online Datasets - for Practice

Easy to reach online datasets 

There are plenty of websites that provide smaller or larger datasets that can be used for free or for a certain amount of money to practice DataScience related duties such as accessing data, recognition / understanding data content and data types, data cleaning / conversion / data manipulation (ETL) and finally representation of extracted key information, drawing consequences or making predictions, clustering, segmentation, and so on, depending on the predefined requirements or the possible ways of use of datasets.

It is rarely mentioned but double-checking the extracted information is an inevitable step not to mislead yourself or the stakeholders in a real-life project.

Here you find some websites helping to find datasets:

Here is a short list of such websites that give partially or completely free access to datasets:

  • Datacamp - this website offers notebook-based data juggling, tutorials, and education in AI & ML domains
  • Kaggle - this website offers competitions, but also tutorials and education in AI & ML domains
  • Data.gov - USA Federal Government datasets, of course, the non-confidential part
  • Earth data - collected by NASA
  • Global Health Observatory Data - for those who like health-related issues or facts

Have fun with the data suiting you the best!

sn - AI & ML for Data

I have found two interesting publications about the required skills and tools for Data Analyst, Data Scientist, Data Engineer, and Machine Learning Engineer positions. They collected skill lists for each position scratching multiple job advertisements for months and used AI (ML) to extract data and also for data clustering.

The required skill lists have nothing surprising, but I found elegant the way they did it.

One important finding is that sometimes the companies do not know which type of skills they need for their planned project, those who are frequently confused on the applicant side, as well.

The studies compare the positions from different aspects, and also the advertising companies (type, size, place, etc.).

text1, text2

They shared the code that they used to gain the data (website scratching and other).

One of their well organized graphs is clustering the skills and positions.


GeeksForGeeks for developers

I have recently found GeeksForGeeks (GFG), an amazing site that offers problem-solving challenges besides tutorials and short to long-term learning projects. The original aim of the site was to help Data Scientists and IT people to widen and deepen their knowledge in their field of work related interest and also in connected domains such as math, databases, but also system design, DevOps, including Linux and Android operation systems, sofware testing and so on.

This site helped me to learn about BST (binary search tree) and linked lists besides some tricks to make codes to run faster as in some cases of the defined problems the code verifying engine also checks for runtime and accepts the written code only if the defined time limit is not exceeded.

Opportunities offered to develop coding skills are availabe in  Java, C++, C#, Javascript and Python which I chose as preferred language. The site also offers teaching and tutorials in R, Scala, Kotlin, Go, C, PHP. Quite an amazing list of nowadays widely spread languages.

I highly recommend programmers and IT people to log in as you may easily gain a lot of experience and 'encouraging Geekbits' by solving different level (simple to hard) problems whenever you have 2-30 minutes. The problem solving is for free! 😎

This site also offers a vast variety of educational courses as videos and tutorials for a wide range of topics, e.g. in the domain of AI an ML ... which cost some money of course but in a tolerable/affordable range. From time to time they offer quite appealing courses at a very low price or other times certain (up to 90) percentage of the paid amount may be regained if you finish the course with a fast but reasonable pace, within a defined time limit. 

The coding problem section has a daily update and by solving the "Problem of the Day" you gain geekbits and if you do it consecutive days then the "streak days" amount increases opening new options to develop yourself or your GFG profile. 

I highly recommend this site! Well done Geeks4Geeks! 👍

I started in June and keep coding since then. Here you find my results up-to-now with almost every day spending 20-40 minutes:



Python, value presence/absence check using set

Step #1 Presence verification of a value in a list

Considering a case when the task is to determine whether a searched value (s) is present in an array (or list) you could cycle through the list elements and verify by equality the presence. If a matching case is reached, the cycle may be stopped and return with positive feedback.

In Python(3) comments that are not part of the code start with # symbol. Here you find examples with integers, but this works for other simple variable types as well.

#initialization of variables
a = [1, 2, 3, 4, 5, 10] #list of integers
s = 4                    #searched value

for element in a:
    if element == s:
        answer = True
    return False         # if there is no match

this answer may be printed out (on the screen) or better returned if we convert the above script to a function:

def check_presence(a, s):
    for element in a:
        if element == s:
            return True
    return False         # if there is no match

and call like:

# initialize variables
a = [1, 2, 3, 4, 5, 10] #list of integers
s = 4
# call function with returned value
answer = check_presence(a, s)

Image made from code typed into (Anaconda) Spyder IDE.

Answer may be printed or used for decision in an if structure:

if check_presence(a, s):
    #your code here
    print('Defined value is present in the list')
Note: in case of a matching situation, the cycle may be terminated earlier than the end of the list. Consequently, the required run time can not be predetermined only the maximum runtime meaning the situation when all elements in the list are verified. Maximum run time is proportional to the number of elements in the list (t ~ n). 

Step #2 Absence verification

If the task is to verify the absence of given value, then the code is similar but now the True/False branches of the code are inversed.

def check_absence(a, s):
    for element in a:
        if element == s:
            return False
    return True         # True, there is no match

Note that it is important to name the function appropriately and define True/False values accordingly. Inappropriate naming and wrong definition may lead to misunderstandings and functional but wrong code (logical/semantic error).

test absence check

Step #3 Absence of list elements in another list

Now, if not only one value should be checked but a list of values then the elements of the list to be verified one-by-one. The function that cycles through elements of s (list) and searches in a (list):

def findMissingElements(a,s):
    # elements of s are searched in elements of a
    notin_list = [] # predefined empty list
    
    for element in s:
        if element not in a:
            notin_list.append(element)
    return notin_list

Now we may test the code with two lists, for example:

#initialization of lists
s = [1, 2, 3, 4, 5, 10] #list elements to check for
a = [2, 3, 1, 0, 5]     #list to search in
#calling the function with return value
notin_list = findMissingElements(a,s)
print(notin_list)

This returns a list [4, 10] because 4 and 10 are the numbers present in list s but missing from list a.

test list element absence

We may consider that we are done, but this is not the fastest solution if occasionally a contains repeated values. In such case the following modification should be applied.

Step #4 verifying unique values only

Python has an option to create a list of unique values from a list, that is called a set. It is an inbuilt variable type and conversion has inbuilt module(s), so the process is fast. Conversion is faster then the time we may lose with verifying repeated values.

def findMissingElements(a,s):
    # elements of s are searched in elements of a
    notin_list = [] # predefined empty list
    a_unique = set(a)
    for element in s:
        if element not in a_unique:
            notin_list.append(element)
    return notin_list

Note: that instead of predefining a_unique could have been replaced in the for cycle as:

if element not in set(a):
    ...

this is valid code but in such case the list -> set conversion is made at every step of the cycle, so that runtime would be way more than in the #3 case.

test FAST list element absence

  

Snowflake universe, part #6 - Forecasting2

Forecasting with built-in ML module Further posts in  Snowflake  topic SnowFlake universe, part#1 SnowFlake, part#2 SnowPark Notebook Snow...