Quick start
Create a new database and connect Tantor PipelineDB extension:
psql -c "CREATE DATABASE pipeline;"
psql -d pipeline -c "CREATE EXTENSION pipelinedb;"
Analysis of air traffic data
In this example, we will analyze the airlines passenger traffic data. The data contains information on amount of passenger by month over three years: 1958, 1959 и 1960. We will use Tantor PipelineDB to create an external table, load data from a CSV file and then analyze that data using a view.
Creating an external table
First, let’s create an external table air_travel_stream
, which will store data about passenger traffic data. This table will contain three columns for each year and one column for the month:
psql -d pipeline -c "
CREATE FOREIGN TABLE air_travel_stream (
month text,
year_1958 int,
year_1959 int,
year_1960 int
) SERVER pipelinedb;"
The format of the entries in the air_travel_stream
table will be as follows:
month | year_1958 | year_1959 | year_1960
where:
month
— month of the year (e.g.JAN
for January).year_1958
— number of passengers in 1958.year_1959
— number of passengers in 1959.year_1960
— number of passengers in 1960.
Creating a view for data aggregation
After creating the air_travel_stream
table, let’s create a air_travel_stats
view that will aggregate the data by month. This view will provide summary data such as the minimum, maximum, average and total number of passengers for each year.
psql -d pipeline -c "
CREATE VIEW air_travel_stats WITH (action=materialize) AS
SELECT month,
count(*) AS total_records,
sum(year_1958) AS total_1958,
min(year_1958) AS min_1958,
max(year_1958) AS max_1958,
avg(year_1958)::int AS avg_1958,
sum(year_1959) AS total_1959,
min(year_1959) AS min_1959,
max(year_1959) AS max_1959,
avg(year_1959)::int AS avg_1959,
sum(year_1960) AS total_1960,
min(year_1960) AS min_1960,
max(year_1960) AS max_1960,
avg(year_1960)::int AS avg_1960
FROM air_travel_stream
GROUP BY month;"
Loading data and executing queries
Now let’s load the data from the CSV file into the table air_travel_stream
. This file contains the number of passengers for each month in 1958, 1959, and 1960.
curl -sL https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv | \
grep -v '^$' | \
psql -d pipeline -c "COPY air_travel_stream (month, year_1958, year_1959, year_1960) FROM STDIN WITH CSV HEADER"
Once the data is loaded, you can query the air_travel_stats
view to retrieve the aggregated data. This query will sort the data by the number of passengers in 1958 in descending order.
psql -d pipeline -c "SELECT * FROM air_travel_stats ORDER BY total_1958 DESC;"
The result of the query will be as follows:
month | total_records | total_1958 | min_1958 | max_1958 | avg_1958 | total_1959 | min_1959 | max_1959 | avg_1959 | total_1960 | min_1960 | max_1960 | avg_1960
-------+---------------+------------+----------+----------+----------+------------+----------+----------+----------+------------+----------+----------+----------
AUG | 1 | 505 | 505 | 505 | 505 | 559 | 559 | 559 | 559 | 606 | 606 | 606 | 606
JUL | 1 | 491 | 491 | 491 | 491 | 548 | 548 | 548 | 548 | 622 | 622 | 622 | 622
JUN | 1 | 435 | 435 | 435 | 435 | 472 | 472 | 472 | 472 | 535 | 535 | 535 | 535
SEP | 1 | 404 | 404 | 404 | 404 | 463 | 463 | 463 | 463 | 508 | 508 | 508 | 508
MAY | 1 | 363 | 363 | 363 | 363 | 420 | 420 | 420 | 420 | 472 | 472 | 472 | 472
MAR | 1 | 362 | 362 | 362 | 362 | 406 | 406 | 406 | 406 | 419 | 419 | 419 | 419
OCT | 1 | 359 | 359 | 359 | 359 | 407 | 407 | 407 | 407 | 461 | 461 | 461 | 461
APR | 1 | 348 | 348 | 348 | 348 | 396 | 396 | 396 | 396 | 461 | 461 | 461 | 461
JAN | 1 | 340 | 340 | 340 | 340 | 360 | 360 | 360 | 360 | 417 | 417 | 417 | 417
DEC | 1 | 337 | 337 | 337 | 337 | 405 | 405 | 405 | 405 | 432 | 432 | 432 | 432
FEB | 1 | 318 | 318 | 318 | 318 | 342 | 342 | 342 | 342 | 391 | 391 | 391 | 391
NOV | 1 | 310 | 310 | 310 | 310 | 362 | 362 | 362 | 362 | 390 | 390 | 390 | 390
(12 rows)
As can be seen from the results, the data are sorted by the month with the highest number of passengers in 1958. The air_travel_stats
view provides a quick summary by month for each year.