To make sure the Vega charts render correctly, view the notebook not from the Github repo but the published website here: https://walterra.github.io/jupyter2kibana/viz-1a-flights-histogram.html
This notebook covers how to get data via eland
(https://github.com/elastic/eland) from Elasticsearch and discusses some different approaches how to visualize small multiples of binned histograms with eland
and Altair. It covers the basic requirements to successfully create and publish a custom visualization from Jupyter to Kibana.
import datetime
import pandas as pd
import altair as alt
import eland as ed
from elasticsearch import Elasticsearch
import elastic_transport
import json
import numpy as np
import matplotlib.pyplot as plt
import urllib3
import logging
import requests
import warnings
alt.data_transformers.disable_max_rows()
logging.getLogger("elastic_transport").setLevel(logging.ERROR)
# Suppress insecure SSL connection warnings
# In dev environments with the `verify_certs=False`
# you might want to reduce those warnings.
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
urllib3.disable_warnings(elastic_transport.SecurityWarning)
# For rendering the notebook to HTML hide all warnings
warnings.filterwarnings('ignore')
with open('config.json') as config_file:
es_config = json.load(config_file)
# First instantiate an 'Elasticsearch' instance with the supplied config
es = Elasticsearch(
hosts=[es_config['es_client']],
basic_auth=[
es_config['user'],
es_config['password']
],
# Only in development environments with self signed certificates fall back to use `verify_certs=False`
verify_certs=False
)
df = ed.DataFrame(es, 'kibana_sample_data_flights')
df.head()
df.info()
The following code is great because it's short and concise. However, we cannot deploy these bitmap based charts to Kibana and make them dynamic. And as of Elastic Stack 7.9, these kind of charts (both small multiples and binned histograms) are a bit cumbersome to create with native Kibana tools.
df_number = df.select_dtypes(include=np.number)
df_number.hist(figsize=[6,10])
plt.show()
Let's try to create the same chart type using Altair, here's a most basic example with dummy data:
source = pd.DataFrame({
'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})
alt.Chart(source, height=160, width=120).mark_bar().encode(
x='a',
y='b'
)
To move forward, we convert the eland based data frame to a native pandas one. Note how this one requires more memory because it's kept locally now.
df_altair = ed.eland_to_pandas(df_number)
df_altair.info()
df_altair.head()
The above cell shows the data frame structure with the same schema as it was stored in Elasticsearch. This type of format could be used, but it's more cumbersome to work with in Altair/Vega. For more details, see Altair's docs on "Long-form vs. Wide-form Data" https://altair-viz.github.io/user_guide/data.html#long-form-vs-wide-form-data
The next cell converts the data frame to "long form" which makes it more easy to split data based on an entity/attribute.
df_melt = df_altair.melt(var_name='attribute', value_name='value')
df_melt.head()
Finally we're able to recreate the original chart with Altair/Vega:
data = df_melt
chart = alt.Chart(data).mark_bar().encode(
alt.X('value:Q', bin=True, title=''),
alt.Y('count()', title=''),
tooltip=[
alt.Tooltip('value:Q', bin=True, title='x'),
alt.Tooltip('count()', title='y')
]
).properties(
width=130,
height=130
)
alt.ConcatChart(
concat=[
chart.transform_filter(alt.datum.attribute == value).properties(title=value)
for value in sorted(data.attribute.unique())
],
columns=3
).resolve_axis(
x='independent',
y='independent'
).resolve_scale(
x='independent',
y='independent'
)
To achieve the same with the data still in Elasticsearch, we need to change the example using a remote URL.
Note this is meant as a middle step towards our final chart specification for demonstration purposes. This one doesn't work with security enabled for Elasticsearch, so take care. You'll need to add the following settings to your elasticsearch.yml
, again take care, this isn't recommended for production configs at all:
xpack.security.enabled: false
http.cors.allow-origin: "/.*/"
http.cors.enabled: true
The other difference to the previous code is that instead of using panda's .melt()
we're using Vega's .transform_fold()
to transpose the data from wide to long form.
The important bit is that we're using Altair only to do data transformations and not raw Python or pandas code, otherwise we're not able to publish the chart specification to Kibana later on.
# Note: To create the Vega spec using Altair we reference ES via URL first. This will only work
# for non-secured ES instances. If your ES instance runs using SSL and/or authentication the chart
# in this cell will render empty. You can still save the visualization in Kibana correctly in the
# next cell because there the URL gets replaced with an Elasticsearch query
# to be used via the Kibana Vega plugin.
# WARNING:
# Do the following approach using a proxy only for demo purposes in a development environment.
# It will expose a secured ES instance unsecured!
# To make this work for demo purposes run the nodejs based proxy in a separate terminal like this:
# NODE_TLS_REJECT_UNAUTHORIZED='0' node proxy
# URL as ES endpoint
# url = 'http://localhost:9220/kibana_sample_data_flights/_search?size=10000'
# URL static fallback
url = 'https://walterra.github.io/jupyter2kibana/data/kibana_sample_data_flights.json'
url_data = alt.Data(url=url, format=alt.DataFormat(property='hits.hits',type='json'))
fields = [
'AvgTicketPrice',
'DistanceKilometers',
'DistanceMiles',
'FlightDelayMin',
'FlightTimeMin',
'dayOfWeek'
]
rename_dict = dict((a, 'datum._source.'+a) for a in fields)
url_chart = alt.Chart(url_data).transform_calculate(**rename_dict).transform_fold(
fields,
as_=['attribute', 'value']
).mark_bar().encode(
alt.X('value:Q', bin=True, title=''),
alt.Y('count()', title=''),
tooltip=[
alt.Tooltip('value:Q', bin=True, title='x'),
alt.Tooltip('count()', title='y')
]
).properties(
width=150,
height=150
)
url_charts = alt.ConcatChart(
concat=[
url_chart.transform_filter(alt.datum.attribute == attribute).properties(title=attribute)
for attribute in sorted(fields)
],
columns=2
).resolve_axis(
x='independent',
y='independent'
).resolve_scale(
x='independent',
y='independent'
)
url_charts
Next we're picking up the Vega spec from the chart above, apply some options and save it as a Saved Object in Kibana.
def saveVegaVis(index, visName, altairChart):
chart_json = json.loads(altairChart.to_json())
chart_json['data']['url'] = {
"%context%": True,
"index": index,
"body": {
"size": 10000
}
}
visState = {
"type": "vega",
"aggs": [],
"params": {
"spec": json.dumps(chart_json, sort_keys=True, indent=4, separators=(',', ': ')),
},
"title": visName
}
visSavedObject={
"attributes" : {
"title" : visName,
"visState" : json.dumps(visState, sort_keys=True, indent=4, separators=(',', ': ')),
"uiStateJSON" : "{}",
"description" : "",
"version" : 1,
"kibanaSavedObjectMeta" : {
"searchSourceJSON" : json.dumps({
"query": {
"language": "kuery",
"query": ""
},
"filter": []
}),
}
},
}
return requests.post(
es_config['kibana_client'] + '/api/saved_objects/visualization/' + visName,
json=visSavedObject,
auth=(es_config['user'], es_config['password']),
headers={"kbn-xsrf":"jupyter2kibana"},
# Only in development environments with self signed certificates fall back to use `verify=False`
verify=False
)
r = saveVegaVis('kibana_sample_data_flights', 'def-vega-1', url_charts)
print(r.status_code)