Enhancing SnowPark Notebook capabilities
SnowSpark as the online notebook app of the SnowFlake system may be upgraded in functionality. Not suprisingly, but coming from Python nature additional modules can be imported to enhance data handling. There are several modules such as
scikit learn for analysis and AI functionality with included visualization ability or 'simple' data visualization tools as
matplotlib or
seaborn (see
more).
Streamlit, the app generator
Streamlit website and a demo to start with if you are or become interested in it.
"Streamlit turns data scripts into shareable web apps in minutes.
All in pure Python. No front‑end experience required." (Streamlit website)
Streamlitis an open-source Python library that allows data scientists and developers to
easily collaborate, quickly and easily develop interactive data visualizations
and web applications with minimal web (html, css, php/javascript) development
skills. Streamlit stands out from the market by enabling rapid prototyping and further
developments through its streamlined and simple development process.
It is a
software having increasing impact on the market among similar applications, here you find the most widely used data
visualization and web application development tools:
- Dash
(Plotly, Python-based)
- Panel
(HoloViz library, also Python-based)
- Shiny (for
R, from RStudio)
Why
Streamlit?
Ease of use
and fast development, which is "one of Streamlit's biggest advantages
because it has an intuitive API, so data analysts and data scientists can
quickly build interactive apps with just a few lines of Python code."
Rapid app
development is made easier also because the app automatically reloads itself
after changes are made to the code, making the development process fast and
flexible. That is valid, if you don't mess up a step of the development process.😊
The dozens
of built-in visualization and interactive components (e.g. sliders, text boxes,
radio buttons) make interaction more satisfying for end users on the app's user
side. Individual Streamlit components can be easily embedded into applications,
even into a SnowPark Notebook. A typical case is, for example, that setting the
minimum and/or maximum values of a slider causes the
query from the database to run again with the changed parameter and thus update
the (filtered) extracted data from the dataset according to the user's
expectations.
Sliders demo made by Streamlit (source: their slider documentation website):
If it does not work
see image.
My own test on slider functionality: filtering fetched data by defining lower price limit (on a tutorial dataset) is shown in the following video. Read the notebook notes (markdown parts) in the video for better understanding.
After importing the Streamlit module we declare min_price variable with a given value received from a slider (range) set by the user.
import streamlit as st
st.markdown("# Move the slider to define lower price limit to filter data")
#col1 = st.columns(1)
#with col1:
min_price = st.slider('Define min_price', 1, 20, 2)
After the interactive slider is activated a Python query using previously defined variable min_price (1 line of code) can filter the result of the query, which is taken from that restricted part of the whole dataset, where the company is called 'Freezing point'.
df_menu_freezing_point[df_menu_freezing_point['SALE_PRICE_USD'] \
> min_price][['TRUCK_BRAND_NAME','MENU_ITEM_NAME','SALE_PRICE_USD']]
A similar, but SQL query using the previously defined variable min_price as lower sale price limit, now querying the whole dataset including all companies:
SELECT truck_brand_name, menu_item_name, sale_price_usd
FROM tasty_bytes_sample_data.raw_pos.menu
WHERE sale_price_usd > {{min_price}}
Streamlit
has the added advantage of being able to integrate with popular data visualization
libraries such as Matplotlib, Plotly and Altair, and therefore, although not by
itself, but offers a wide range of data visualization options through
integrated systems. Accordingly, its visualization capabilities are basically
determined by the integrated python module.
Streamlit
Cloud also offers Cloud support, making it easy to share and run prototypes and
applications. Find examples in the Streamlit App Gallery. Funny on its own reflection how..., but still impressive cheat-sheet website of Streamlit development made as an app by Streamlit itself.
In terms of
the SnowFlake system, Streamlit has the advantage of being able to connect
directly to the SnowFlake data warehouse, making it easy to create interactive
data visualizations based on data from SnowFlake and take advantage of the
backend solutions provided by the SnowFlake system. This can be particularly
useful for data analysts and business decision makers as they can query and visualize
data from SnowFlake in real time. In simpler cases, this can even replace the
use of more expensive BI software (I am not listing any software here, being
respectful).
Streamlit
compared to other software on the market
Streamlit
is not suitable for the development of large and complex (multimodal) web
applications, as it has limited scalability and does not support detailed user
permissions, nor advanced front-end customization options.
If an
application requires multiple pages, complex navigation or detailed user
identification, Streamlit is not an ideal choice as it does not support these
features well. In this respect, Dash or Shiny may offer more options.
Streamlit
applications are ideal for smaller data visualization projects, but if you are
working with more complex or larger data sets on the input side, or need to
display multiple types of data on the output side, or need to serve multiple
users simultaneously, performance can be severely degraded or constrained for the
development team and for the end-user as well.
Note that Dash
or Panel offers more customisation and performance optimisation options.
The lack of
built-in data manipulation tools, as well as the aforementioned data
visualization toolset, relies on the integration ability with various data
processing libraries, such as Pandas, with Pandas’ own built-in data
manipulation tools. Data processing must be handled by separate modules and the
results returned to the Streamlit application for final visualization. This is
not necessarily a real disadvantage as python users got used to this mentality,
but in the case of Dash and Shiny, the full integration of Plotly and R-Studio
as data manipulation software gives a wider range of built-in data processing
capabilities. I would say it makes coding simpler in the latter case, but it is
not an unbearable situation to push-and-pull data between modules.
Streamlit
allows more limited functionality, so applications with complex operations
cannot be created with it. The queries themselves (math or code) can be
complex, but they cannot, for example, be built on top of each other.
Not sure if
it's actually a drawback, but Streamlit is specifically Python-based and from
there on is not sympathetic to R-using statisticians (as far as I know). Dash may
be more popular among statisticians and data scientists because it's more
versatile for Python and R developers (I have not checked market data in regard
of this topic).
Stored Python procedures
(Almost) everyone heard about stored SQL procedures, but this is about stored Python procedures which is rarely found on the market. Read more about the
topic.