10 min read · 2 days ago
--
Real-time data processing, visualizations, and real-time dashboards are essential in logistics and aviation. In the aviation sector, especially in flight operations, real-time analytics enhances efficiency and customer focus, leading to better decision-making, flight monitoring, and cost savings. It also ensures safe operations, optimizes airspace use, and helps air traffic controllers. Overall, real-time data processing and monitoring helps in flight tracking and overall better flight operations.
In this tutorial, we set up a real-time flight tracking system using RisingWave, Redpanda, and Metabase. We leverage the Aviationstack API to get real-time flight data, and then transmit this data into a Redpanda topic. These streams are then ingested into RisingWave, enabling us to create materialized views (MVs) for thorough flight data analysis. MVs maintain the latest results and are instantly queryable. We also use Metabase to create charts, tables, and a unified dashboard for real-time flight tracking.
Redpanda is a streaming data platform that offers scalable and low-latency data ingestion from real-time data sources. Redpanda natively supports the Kafka API, allowing it to seamlessly integrate with the existing Kafka ecosystem. Running on Redpanda Cloud, a fully managed service, it offers automated upgrades, patching, data balancing, and built-in connectors. It provides customizable cluster options to meet your data sovereignty, infrastructure, and development requirements.
This guide goes through the steps to create a Redpanda cluster on Redpanda Cloud and to connect it to RisingWave for data ingestion. For more information regarding the data ingestion from Redpanda, please refer to Redpanda Documentation.
Sign up for a Redpanda Cloud account
Begin by signing up for a free Redpanda Cloud account, which provides access to Redpanda services. To create an account, visit the Redpanda Cloud Account.
Create a Redpanda cluster
Redpanda offers various options for creating clusters. Choose the Serverless cluster for the free tier available to users.
Once you are logged in, create your Redpanda cluster with the following details before creating a resource group:
- Cluster Name: Give your Redpanda cluster a unique name for identification.
- Region: Choose the region where your Redpanda cluster will be hosted.
- Resource Group: Select the resource group that suits your needs.
Set up a Redpanda topic
After creating your Redpanda cluster, set up a Redpanda topic. Redpanda provides default configurations for the number of partitions and retention policy, simplifying the setup process.
The topic is successfully created.
Connect and interact with your Kafka cluster
You are now ready to connect to your Redpanda cluster using various Kafka clients. These clients enable you to both produce and consume data from your Redpanda topic. Therefore, you can extract real-time data from the Aviationstack API and feed it into a Redpanda topic.
To interact with the Redpanda cluster in Redpanda Cloud, we need to create a user as shown below.
After creating the user, we need to grant them permission to perform various operations, such as creating topics and producing and consuming messages from a Redpanda topic, as shown below.
The data ingested into a Redpanda topic contains real-time information from an aviation API, including details such as airport name, flight status, and flight location. Visit this Google Colab notebook for retrieving data from Aviationstack API and ingesting it into a Redpanda topic. A sample message is shown below:
{
"flight_date": "24-05-16",
"flight_status": "scheduled",
"departure_airport": "Auckland International",
"departure_timezone": "Pacific/Auckland",
"departure_iata": "AKL",
"departure_icao": "NZAA",
"departure_terminal": "D",
"departure_gate": "28",
"departure_delay": null,
"departure_scheduled": "2024-05-16T06:30:00+00:00",
"departure_estimated": "2024-05-16T06:30:00+00:00",
"departure_actual": null,
"departure_estimated_runway": null,
"departure_actual_runway": null,
"arrival_airport": "Wellington International",
"arrival_timezone": "Pacific/Auckland",
"arrival_iata": "WLG",
"arrival_icao": "NZWN",
"arrival_terminal": null,
"arrival_gate": "15",
"arrival_baggage": null,
"arrival_delay": null,
"arrival_scheduled": "2024-05-16T07:40:00+00:00",
"arrival_estimated": "2024-05-16T07:40:00+00:00",
"arrival_actual": null,
"arrival_estimated_runway": null,
"arrival_actual_runway": null,
"airline_name": "Singapore Airlines",
"airline_iata": "SQ",
"airline_icao": "SIA",
"flight_number": "SQ4438",
"flight_iata": "SQ4438",
"flight_icao": "SIA4438",
"codeshared_airline_name": "air new zealand",
"codeshared_airline_iata": "nz",
"codeshared_airline_icao": "anz",
"codeshared_flight_number": "401",
"codeshared_flight_iata": "nz401",
"flight_info": "Singapore Airlines flight SQ4438 is currently in the air, flying from Auckland International (AKL) to Wellington International (WLG)"
}
For ingesting and processing streaming data, there are two options available: the open-source RisingWave and the managed service, RisingWave Cloud. In this blog, we will focus on using RisingWave Cloud, which provides a user-friendly experience and simplifies the operational aspects of managing and utilizing RisingWave for our flight-tracking solution.
Create a RisingWave cluster
To create a RisingWave cluster in RisingWave Cloud and explore the various features it offers, you can sign up for the free plan available. The free plan allows you to test the functionalities of RisingWave without any cost. For detailed instructions on how to create a RisingWave cluster and get started, you can refer to the official RisingWave documentation. It will provide you with step-by-step guidance to set up and explore the features of RisingWave. If you need additional help with setting up this integration, join our active Slack community.
RisingWave supports various options for creating clusters, select the Developer cluster type that free tier for users.
- Cluster Name: Give your RisingWave cluster a unique name for identification.
- Cloud Provider: Select either AWS or Google Cloud for the RisingWave cluster.
- Region: Choose the region where your RisingWave cluster will be hosted.
Ingest data streams into RisingWave
Now that we have set up the RisingWave cluster, go to the Workspace and connect to the data streams using the below SQL statement.
CREATE SOURCE flight_tracking_source(
flight_date VARCHAR,
flight_status VARCHAR,
departure_airport VARCHAR,
departure_timezone VARCHAR,
departure_iata VARCHAR,
departure_icao VARCHAR,
departure_terminal VARCHAR,
departure_gate VARCHAR,
departure_delay INTERVAL,
departure_scheduled TIMESTAMP WITH TIME ZONE ,
departure_estimated TIMESTAMP WITH TIME ZONE,
departure_actual TIMESTAMP WITH TIME ZONE,
departure_estimated_runway TIMESTAMP WITH TIME ZONE,
departure_actual_runway TIMESTAMP WITH TIME ZONE,
arrival_airport VARCHAR,
arrival_timezone VARCHAR,
arrival_iata VARCHAR,
arrival_icao VARCHAR,
arrival_terminal VARCHAR,
arrival_gate VARCHAR,
arrival_baggage VARCHAR,
arrival_delay INTERVAL,
arrival_scheduled TIMESTAMP WITH TIME ZONE,
arrival_estimated TIMESTAMP WITH TIME ZONE,
arrival_actual TIMESTAMP WITH TIME ZONE,
arrival_estimated_runway TIMESTAMP WITH TIME ZONE,
arrival_actual_runway TIMESTAMP WITH TIME ZONE,
airline_name VARCHAR,
airline_iata VARCHAR,
airline_icao VARCHAR,
flight_number VARCHAR,
flight_iata VARCHAR,
flight_icao VARCHAR,
codeshared_airline_name VARCHAR,
codeshared_airline_iata VARCHAR,
codeshared_airline_icao VARCHAR,
codeshared_flight_number VARCHAR,
codeshared_flight_iata VARCHAR,
flight_info VARCHAR
)
WITH(
connector='kafka',
topic ='flights_tracking',
properties.bootstrap.server ='xxxxxxxxxx:9092',
properties.sasl.mechanism = 'SCRAM-SHA-256',
properties.security.protocol = 'SASL_SSL',
properties.sasl.username = 'xxxxxx',
properties.sasl.password = 'xxxxxx',
scan.startup.mode ='earliest'
)FORMAT PLAIN ENCODE JSON;
With a CREATE SOURCE statement, RisingWave is connected to the streams but has not started to consume data yet. For data to be processed and stored incrementally, we need to define materialized views. After a materialized view is created, RisingWave will consume data from the specified offset.
We’ll create different materialized views that keep track of and extract various attributes related to flight information from flight_tracking**_source**
. These attributes include flight date, status, departure and arrival details (airport, timezone, IATA code, ICAO code, scheduled and estimated times), airline information (name, IATA code, ICAO code), flight number and identifiers (IATA and ICAO codes), and general flight information.
The reason why we use materialized views is that they always maintain the latest results. The query creates a materialized view called Airline_Flight_Counts
that counts the number of flights for each airline within hourly intervals. It uses the flight_tracking_source
and groups the data by airline name and time windows of one hour.
CREATE MATERIALIZED VIEW Airline_Flight_Counts
SELECT airline_name,
COUNT(airline_name) AS total_flights,
window_start, window_end
FROM TUMBLE (flight_tracking_source, arrival_scheduled, INTERVAL '1 hour')
GROUP BY airline_name,window_start, window_end
ORDER BY total_flights desc;
This query creates a materialized view named Airport_Summary
that counts the total flights arriving and departing at each airport within hourly intervals from the flight_tracking_source
. The results are grouped by airport and time windows of one hour, and they're ordered by the total flight count in descending order.
CREATE MATERIALIZED VIEW Airport_Summary
WITH ArrivalCounts AS (
SELECT
arrival_airport,
COUNT(arrival_airport) AS total_flights_arrival,
window_start,
window_end
FROM
TUMBLE (flight_tracking_source, arrival_scheduled, INTERVAL '1 hour')
GROUP BY
arrival_airport,
window_start,
window_end
),
DepartureCounts AS (
SELECT
departure_airport,
COUNT(departure_airport) AS total_flights_departure,
window_start,
window_end
FROM
TUMBLE (flight_tracking_source, arrival_scheduled, INTERVAL '1 hour')
GROUP BY
departure_airport,
window_start,
window_end
)
SELECT
ArrivalCounts.arrival_airport,
ArrivalCounts.total_flights_arrival,
DepartureCounts.departure_airport,
DepartureCounts.total_flights_departure,
ArrivalCounts.window_start,
ArrivalCounts.window_end
FROM
ArrivalCounts
INNER JOIN
DepartureCounts ON ArrivalCounts.window_start = DepartureCounts.window_start
AND ArrivalCounts.window_end = DepartureCounts.window_end
AND ArrivalCounts.arrival_airport = DepartureCounts.departure_airport
ORDER BY
ArrivalCounts.total_flights_arrival DESC,
DepartureCounts.total_flights_departure DESC;
This query creates a materialized view called Timezone_Summary
that calculates the total number of flights arriving and departing from each timezone within hourly intervals. It uses the flight_tracking_source
, grouping the data by timezone and one-hour time windows, then orders the results by the total flight count in descending order.
CREATE MATERIALIZED VIEW Timezone_Summary
WITH ArrivalCounts AS (
SELECT
arrival_timezone,
COUNT(arrival_timezone) AS total_flights_arrival,
window_start,
window_end
FROM
TUMBLE (flight_tracking_source, arrival_scheduled, INTERVAL '1 hour')
GROUP BY
arrival_timezone,
window_start,
window_end
),
DepartureCounts AS (
SELECT
departure_timezone,
COUNT(departure_timezone) AS total_flights_departure,
window_start,
window_end
FROM
TUMBLE (flight_tracking_source, arrival_scheduled, INTERVAL '1 hour')
GROUP BY
departure_timezone,
window_start,
window_end
)
SELECT
ArrivalCounts.arrival_timezone,
ArrivalCounts.total_flights_arrival,
DepartureCounts.departure_timezone,
DepartureCounts.total_flights_departure,
ArrivalCounts.window_start,
ArrivalCounts.window_end
FROM
ArrivalCounts
INNER JOIN
DepartureCounts ON ArrivalCounts.window_start = DepartureCounts.window_start
AND ArrivalCounts.window_end = DepartureCounts.window_end
AND ArrivalCounts.arrival_timezone = DepartureCounts.departure_timezone
ORDER BY
ArrivalCounts.total_flights_arrival DESC,
DepartureCounts.total_flights_departure DESC;
Metabase is an open-source business intelligence tool that lets you visualize and share data insights. It provides an easy way to create charts, dashboards, and metrics on top of databases.
Connect RisingWave to Metabase
Since RisingWave is compatible with PostgreSQL, you can connect Metabase to RisingWave as a data source and build analytics on streaming data.
You can use RisingWave as a data source in Metabase for creating visualizations and dashboards using the tables and materialized views in RisingWave.
First, you need to install Metabase using Docker. Ensure Docker Desktop is installed and running in your environment.
docker pull metabase/metabase:latest
Then start the Metabase container:
docker run -d -p 3000:3000 --name metabase metabase/metabase
This will launch a Metabase server on port 3000 by default.
Once startup completes, you can access your Open Source Metabase at http://localhost:3000
.
- Open the Metabase admin interface and click “Add a database”.
- For the database type, select “PostgreSQL” since RisingWave uses the PostgreSQL wire protocol.
- Fill in the connection details such as name, host, port, database name, username, and password.
- Save the connection.
Once connected, you will see the RisingWave database available in Metabase. You can now build dashboards, charts, and graphs on top of the real-time data in RisingWave.
It should be noted that Metabase’s minimum auto-refresh is 1 minute, while RisingWave typically delivers a second-level data freshness. As a workaround, you can append #refresh=5 to the URL, such as http://127.0.0.1:3000/dashboard/1-jaffle-shop#refresh=5
, thus setting the refresh interval to 5 seconds.
After successfully connecting RisingWave to Metabase, we use the materialized views in RisingWave as data sources to create tables, various charts, and a unified dashboard.
Visualizing data with Metabase: table, charts, and dashboard
We create these tables, charts, and dashboards using the materialized views and a source in RisingWave, such as flight_tracking_source
, Airline_Flight_Counts
, Airport_Summary
, and Timezone_Summary
. For this follow these steps:
- Go to New and select the SQL Query option as shown below.
- Query the source and various materialized views in RisingWave, create tables and bar charts and Save them to the dashboard as shown below.
- Query the source and various materialized views in RisingWave, create tables and bar charts and Save them to the dashboard as shown below.
This table is based on the flight_tracking_source
in RisingWave and has been added to the dashboard.
This bar chart is created based on the Airline_Flight_Counts
materialized view in RisingWave and has been added to the dashboard.
This bar chart is generated based on Airport_Summary
in RisingWave and has been added to the dashboard.
This chart is based on the Timezone_Summary
materialized view and added to the dashboard.
This unified dashboard presents a collection of charts for real-time flight tracking. It provides a holistic view of flight operations, offering insights into total flights categorized by airline, airport, and timezone. Additionally, it features detailed information on the current flight, empowering users with comprehensive monitoring capabilities and actionable insights.
CONCLUSION
In this tutorial, we developed a real-time flight tracking system using Redpanda, RisingWave, and Metabase. We used the Aviationstack API as a real-time flight data source and ingested this data into a Redpanda topic in Redpanda Cloud. We read this data into RisingWave and created various materialized views for in-depth real-time analysis. Finally, we used Metabase to create visualizations and a real-time dashboard that helps users monitor flight operations and make data-driven decisions.