To make sure the Vega charts render correctly, view the notebook not from the Github repo but the published website here: https://walterra.github.io/jupyter2kibana/viz-1a-flights-histogram.html

1a-flights-histogram.ipynb¶

This notebook covers how to get data via eland (https://github.com/elastic/eland) from Elasticsearch and discusses some different approaches how to visualize small multiples of binned histograms with eland and Altair. It covers the basic requirements to successfully create and publish a custom visualization from Jupyter to Kibana.

import datetime
import pandas as pd
import altair as alt
import eland as ed
from elasticsearch import Elasticsearch
import elastic_transport
import json
import numpy as np
import matplotlib.pyplot as plt
import urllib3
import logging
import requests
import warnings

alt.data_transformers.disable_max_rows()
logging.getLogger("elastic_transport").setLevel(logging.ERROR)

# Suppress insecure SSL connection warnings
# In dev environments with the `verify_certs=False`
# you might want to reduce those warnings.
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
urllib3.disable_warnings(elastic_transport.SecurityWarning)

# For rendering the notebook to HTML hide all warnings
warnings.filterwarnings('ignore')

Connect to the 'flights' example dataset, can be installed via Kibana.¶

with open('config.json') as config_file:
  es_config = json.load(config_file)

# First instantiate an 'Elasticsearch' instance with the supplied config
es = Elasticsearch(
    hosts=[es_config['es_client']],
    basic_auth=[
        es_config['user'],
        es_config['password']
    ],
    # Only in development environments with self signed certificates fall back to use `verify_certs=False`
    verify_certs=False
)

df = ed.DataFrame(es, 'kibana_sample_data_flights')
df.head()

Output the data frame's info. Note its low memory footprint because of the data residing in Elasticsearch.¶

df.info()

<class 'eland.dataframe.DataFrame'>
Index: 13059 entries, FK7wlYUB84Z0EQSGKFH6 to Fq7wlYUB84Z0EQSGMYQh
Data columns (total 27 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   AvgTicketPrice      13059 non-null  float64       
 1   Cancelled           13059 non-null  bool          
 2   Carrier             13059 non-null  object        
 3   Dest                13059 non-null  object        
 4   DestAirportID       13059 non-null  object        
 5   DestCityName        13059 non-null  object        
 6   DestCountry         13059 non-null  object        
 7   DestLocation        13059 non-null  object        
 8   DestRegion          13059 non-null  object        
 9   DestWeather         13059 non-null  object        
 10  DistanceKilometers  13059 non-null  float64       
 11  DistanceMiles       13059 non-null  float64       
 12  FlightDelay         13059 non-null  bool          
 13  FlightDelayMin      13059 non-null  int64         
 14  FlightDelayType     13059 non-null  object        
 15  FlightNum           13059 non-null  object        
 16  FlightTimeHour      13059 non-null  object        
 17  FlightTimeMin       13059 non-null  float64       
 18  Origin              13059 non-null  object        
 19  OriginAirportID     13059 non-null  object        
 20  OriginCityName      13059 non-null  object        
 21  OriginCountry       13059 non-null  object        
 22  OriginLocation      13059 non-null  object        
 23  OriginRegion        13059 non-null  object        
 24  OriginWeather       13059 non-null  object        
 25  dayOfWeek           13059 non-null  int64         
 26  timestamp           13059 non-null  datetime64[ns]
dtypes: bool(2), datetime64[ns](1), float64(4), int64(2), object(18)
memory usage: 64.000 bytes
Elasticsearch storage usage: 5.777 MB

Next we output small multiples of binned histograms via matplotlib.¶

The following code is great because it's short and concise. However, we cannot deploy these bitmap based charts to Kibana and make them dynamic. And as of Elastic Stack 7.9, these kind of charts (both small multiples and binned histograms) are a bit cumbersome to create with native Kibana tools.

df_number = df.select_dtypes(include=np.number)
df_number.hist(figsize=[6,10])
plt.show()

Let's try to create the same chart type using Altair, here's a most basic example with dummy data:

source = pd.DataFrame({
    'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})

alt.Chart(source, height=160, width=120).mark_bar().encode(
    x='a',
    y='b'
)

To move forward, we convert the eland based data frame to a native pandas one. Note how this one requires more memory because it's kept locally now.

df_altair = ed.eland_to_pandas(df_number)
df_altair.info()

<class 'pandas.core.frame.DataFrame'>
Index: 13059 entries, FK7wlYUB84Z0EQSGKFH6 to Fq7wlYUB84Z0EQSGMYQh
Data columns (total 6 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   AvgTicketPrice      13059 non-null  float64
 1   DistanceKilometers  13059 non-null  float64
 2   DistanceMiles       13059 non-null  float64
 3   FlightDelayMin      13059 non-null  int64  
 4   FlightTimeMin       13059 non-null  float64
 5   dayOfWeek           13059 non-null  int64  
dtypes: float64(4), int64(2)
memory usage: 714.2+ KB

df_altair.head()

The above cell shows the data frame structure with the same schema as it was stored in Elasticsearch. This type of format could be used, but it's more cumbersome to work with in Altair/Vega. For more details, see Altair's docs on "Long-form vs. Wide-form Data" https://altair-viz.github.io/user_guide/data.html#long-form-vs-wide-form-data

The next cell converts the data frame to "long form" which makes it more easy to split data based on an entity/attribute.

df_melt = df_altair.melt(var_name='attribute', value_name='value')
df_melt.head()

Finally we're able to recreate the original chart with Altair/Vega:

data = df_melt

chart = alt.Chart(data).mark_bar().encode(
    alt.X('value:Q', bin=True, title=''),
    alt.Y('count()', title=''),
    tooltip=[
        alt.Tooltip('value:Q', bin=True, title='x'),
        alt.Tooltip('count()', title='y')
    ]
).properties(
    width=130,
    height=130
)

alt.ConcatChart(
    concat=[
      chart.transform_filter(alt.datum.attribute == value).properties(title=value)
      for value in sorted(data.attribute.unique())
    ],
    columns=3
).resolve_axis(
    x='independent',
    y='independent'
).resolve_scale(
    x='independent', 
    y='independent'
)

To achieve the same with the data still in Elasticsearch, we need to change the example using a remote URL.

Note this is meant as a middle step towards our final chart specification for demonstration purposes. This one doesn't work with security enabled for Elasticsearch, so take care. You'll need to add the following settings to your elasticsearch.yml, again take care, this isn't recommended for production configs at all:

xpack.security.enabled: false
http.cors.allow-origin: "/.*/"
http.cors.enabled: true

The other difference to the previous code is that instead of using panda's .melt() we're using Vega's .transform_fold() to transpose the data from wide to long form.

The important bit is that we're using Altair only to do data transformations and not raw Python or pandas code, otherwise we're not able to publish the chart specification to Kibana later on.

# Note: To create the Vega spec using Altair we reference ES via URL first. This will only work
# for non-secured ES instances. If your ES instance runs using SSL and/or authentication the chart
# in this cell will render empty. You can still save the visualization in Kibana correctly in the
# next cell because there the URL gets replaced with an Elasticsearch query
# to be used via the Kibana Vega plugin.

# WARNING:
# Do the following approach using a proxy only for demo purposes in a development environment.
# It will expose a secured ES instance unsecured!
# To make this work for demo purposes run the nodejs based proxy in a separate terminal like this:
# NODE_TLS_REJECT_UNAUTHORIZED='0' node proxy

# URL as ES endpoint
# url = 'http://localhost:9220/kibana_sample_data_flights/_search?size=10000'

# URL static fallback
url = 'https://walterra.github.io/jupyter2kibana/data/kibana_sample_data_flights.json'

url_data = alt.Data(url=url, format=alt.DataFormat(property='hits.hits',type='json'))

fields = [
    'AvgTicketPrice',
    'DistanceKilometers',
    'DistanceMiles',
    'FlightDelayMin',
    'FlightTimeMin',
    'dayOfWeek'
]

rename_dict = dict((a, 'datum._source.'+a) for a in fields)

url_chart = alt.Chart(url_data).transform_calculate(**rename_dict).transform_fold(
    fields,
    as_=['attribute', 'value']
).mark_bar().encode(
    alt.X('value:Q', bin=True, title=''),
    alt.Y('count()', title=''),
    tooltip=[
        alt.Tooltip('value:Q', bin=True, title='x'),
        alt.Tooltip('count()', title='y')
    ]
).properties(
    width=150,
    height=150
)

url_charts = alt.ConcatChart(
    concat=[
      url_chart.transform_filter(alt.datum.attribute == attribute).properties(title=attribute)
      for attribute in sorted(fields)
    ],
    columns=2
).resolve_axis(
    x='independent',
    y='independent'
).resolve_scale(
    x='independent', 
    y='independent'
)

url_charts

Next we're picking up the Vega spec from the chart above, apply some options and save it as a Saved Object in Kibana.

def saveVegaVis(index, visName, altairChart):
    chart_json = json.loads(altairChart.to_json())
    chart_json['data']['url'] = {
        "%context%": True,
        "index": index,
        "body": { 
            "size": 10000
        }
    }

    visState = {
      "type": "vega",
      "aggs": [],
      "params": {
        "spec": json.dumps(chart_json, sort_keys=True, indent=4, separators=(',', ': ')),
      },
      "title": visName
    }

    visSavedObject={
        "attributes" : {
          "title" : visName,
          "visState" : json.dumps(visState, sort_keys=True, indent=4, separators=(',', ': ')),
          "uiStateJSON" : "{}",
          "description" : "",
          "version" : 1,
          "kibanaSavedObjectMeta" : {
            "searchSourceJSON" : json.dumps({
              "query": {
                "language": "kuery",
                "query": ""
              },
              "filter": []
            }),
          }
        },
    }
    
    return requests.post(
        es_config['kibana_client'] + '/api/saved_objects/visualization/' + visName,
        json=visSavedObject,
        auth=(es_config['user'], es_config['password']),
        headers={"kbn-xsrf":"jupyter2kibana"},
        # Only in development environments with self signed certificates fall back to use `verify=False`
        verify=False
    )

r = saveVegaVis('kibana_sample_data_flights', 'def-vega-1', url_charts)
print(r.status_code)

409

	AvgTicketPrice	Cancelled	Carrier	Dest	DestAirportID	DestCityName	DestCountry	DestLocation	DestRegion	DestWeather	...	FlightTimeMin	Origin	OriginAirportID	OriginCityName	OriginCountry	OriginLocation	OriginRegion	OriginWeather	timestamp
FK7wlYUB84Z0EQSGKFH6	841.265642	False	Kibana Airlines	Sydney Kingsford Smith International Airport	SYD	Sydney	AU	{'lat': '-33.94609833', 'lon': '151.177002'}	SE-BD	Rain	...	1030.770416	Frankfurt am Main Airport	FRA	Frankfurt am Main	DE	{'lat': '50.033333', 'lon': '8.570556'}	DE-HE	Sunny	2023-01-02 00:00:00
Fa7wlYUB84Z0EQSGKFH6	882.982662	False	Logstash Airways	Venice Marco Polo Airport	VE05	Venice	IT	{'lat': '45.505299', 'lon': '12.3519'}	IT-34	Sunny	...	464.389481	Cape Town International Airport	CPT	Cape Town	ZA	{'lat': '-33.96480179', 'lon': '18.60169983'}	SE-BD	Clear	2023-01-02 18:27:00
Fq7wlYUB84Z0EQSGKFH6	190.636904	False	Logstash Airways	Venice Marco Polo Airport	VE05	Venice	IT	{'lat': '45.505299', 'lon': '12.3519'}	IT-34	Cloudy	...	0.000000	Venice Marco Polo Airport	VE05	Venice	IT	{'lat': '45.505299', 'lon': '12.3519'}	IT-34	Rain	2023-01-02 17:11:14
F67wlYUB84Z0EQSGKFH6	181.694216	True	Kibana Airlines	Treviso-Sant'Angelo Airport	TV01	Treviso	IT	{'lat': '45.648399', 'lon': '12.1944'}	IT-34	Clear	...	222.749059	Naples International Airport	NA01	Naples	IT	{'lat': '40.886002', 'lon': '14.2908'}	IT-72	Thunder & Lightning	2023-01-02 10:33:28
GK7wlYUB84Z0EQSGKFH6	730.041778	False	Kibana Airlines	Xi'an Xianyang International Airport	XIY	Xi'an	CN	{'lat': '34.447102', 'lon': '108.751999'}	SE-BD	Clear	...	785.779071	Licenciado Benito Juarez International Airport	AICM	Mexico City	MX	{'lat': '19.4363', 'lon': '-99.072098'}	MX-DIF	Damaging Wind	2023-01-02 05:13:00

	AvgTicketPrice	DistanceKilometers	DistanceMiles	FlightDelayMin	FlightTimeMin
FK7wlYUB84Z0EQSGKFH6	841.265642	16492.326654	10247.856676	0	1030.770416
Fa7wlYUB84Z0EQSGKFH6	882.982662	8823.400140	5482.606665	0	464.389481
Fq7wlYUB84Z0EQSGKFH6	190.636904	0.000000	0.000000	0	0.000000
F67wlYUB84Z0EQSGKFH6	181.694216	555.737767	345.319439	180	222.749059
GK7wlYUB84Z0EQSGKFH6	730.041778	13358.244200	8300.428125	0	785.779071