Functions for Working with Dictionaries
Note
For dictionaries created with DDL queries, the dict_name
parameter must be fully specified, like <database>.<dict_name>
. Otherwise, the current database is used.
For information on connecting and configuring dictionaries, see Dictionaries.
dictGet, dictGetOrDefault, dictGetOrNull
Retrieves values from a dictionary.
dictGet('dict_name', attr_names, id_expr)
dictGetOrDefault('dict_name', attr_names, id_expr, default_value_expr)
dictGetOrNull('dict_name', attr_name, id_expr)
Arguments
dict_name
— Name of the dictionary. String literal.
attr_names
— Name of the column of the dictionary, String literal, or tuple of column names, Tuple(String literal.
id_expr
— Key value. Expression returning dictionary key-type value or Tuple-type value depending on the dictionary configuration.
default_value_expr
— Values returned if the dictionary does not contain a row with the id_expr
key. Expression or Tuple(Expression), returning the value (or values) in the data types configured for the attr_names
attribute.
Returned value
-
If ClickHouse parses the attribute successfully in the attribute's data type, functions return the value of the dictionary attribute that corresponds to id_expr
.
-
If there is no the key, corresponding to id_expr
, in the dictionary, then:
dictGet
returns the content of the <null_value>
element specified for the attribute in the dictionary configuration.
dictGetOrDefault
returns the value passed as the default_value_expr
parameter.
dictGetOrNull
returns NULL
in case key was not found in dictionary.
ClickHouse throws an exception if it cannot parse the value of the attribute or the value does not match the attribute data type.
Example for simple key dictionary
Create a text file ext-dict-test.csv
containing the following:
The first column is id
, the second column is c1
.
Configure the dictionary:
<clickhouse>
<dictionary>
<name>ext-dict-test</name>
<source>
<file>
<path>/path-to/ext-dict-test.csv</path>
<format>CSV</format>
</file>
</source>
<layout>
<flat />
</layout>
<structure>
<id>
<name>id</name>
</id>
<attribute>
<name>c1</name>
<type>UInt32</type>
<null_value></null_value>
</attribute>
</structure>
<lifetime>0</lifetime>
</dictionary>
</clickhouse>
Perform the query:
SELECT
dictGetOrDefault('ext-dict-test', 'c1', number + 1, toUInt32(number * 10)) AS val,
toTypeName(val) AS type
FROM system.numbers
LIMIT 3;
┌─val─┬─type───┐
│ 1 │ UInt32 │
│ 2 │ UInt32 │
│ 20 │ UInt32 │
└─────┴────────┘
Example for complex key dictionary
Create a text file ext-dict-mult.csv
containing the following:
The first column is id
, the second is c1
, the third is c2
.
Configure the dictionary:
<clickhouse>
<dictionary>
<name>ext-dict-mult</name>
<source>
<file>
<path>/path-to/ext-dict-mult.csv</path>
<format>CSV</format>
</file>
</source>
<layout>
<flat />
</layout>
<structure>
<id>
<name>id</name>
</id>
<attribute>
<name>c1</name>
<type>UInt32</type>
<null_value></null_value>
</attribute>
<attribute>
<name>c2</name>
<type>String</type>
<null_value></null_value>
</attribute>
</structure>
<lifetime>0</lifetime>
</dictionary>
</clickhouse>
Perform the query:
SELECT
dictGet('ext-dict-mult', ('c1','c2'), number + 1) AS val,
toTypeName(val) AS type
FROM system.numbers
LIMIT 3;
┌─val─────┬─type──────────────────┐
│ (1,'1') │ Tuple(UInt8, String) │
│ (2,'2') │ Tuple(UInt8, String) │
│ (3,'3') │ Tuple(UInt8, String) │
└─────────┴───────────────────────┘
Example for range key dictionary
Input table:
CREATE TABLE range_key_dictionary_source_table
(
key UInt64,
start_date Date,
end_date Date,
value String,
value_nullable Nullable(String)
)
ENGINE = TinyLog();
INSERT INTO range_key_dictionary_source_table VALUES(1, toDate('2019-05-20'), toDate('2019-05-20'), 'First', 'First');
INSERT INTO range_key_dictionary_source_table VALUES(2, toDate('2019-05-20'), toDate('2019-05-20'), 'Second', NULL);
INSERT INTO range_key_dictionary_source_table VALUES(3, toDate('2019-05-20'), toDate('2019-05-20'), 'Third', 'Third');
Create the dictionary:
CREATE DICTIONARY range_key_dictionary
(
key UInt64,
start_date Date,
end_date Date,
value String,
value_nullable Nullable(String)
)
PRIMARY KEY key
SOURCE(CLICKHOUSE(HOST 'localhost' PORT tcpPort() TABLE 'range_key_dictionary_source_table'))
LIFETIME(MIN 1 MAX 1000)
LAYOUT(RANGE_HASHED())
RANGE(MIN start_date MAX end_date);
Perform the query:
SELECT
(number, toDate('2019-05-20')),
dictHas('range_key_dictionary', number, toDate('2019-05-20')),
dictGetOrNull('range_key_dictionary', 'value', number, toDate('2019-05-20')),
dictGetOrNull('range_key_dictionary', 'value_nullable', number, toDate('2019-05-20')),
dictGetOrNull('range_key_dictionary', ('value', 'value_nullable'), number, toDate('2019-05-20'))
FROM system.numbers LIMIT 5 FORMAT TabSeparated;
Result:
(0,'2019-05-20') 0 \N \N (NULL,NULL)
(1,'2019-05-20') 1 First First ('First','First')
(2,'2019-05-20') 1 Second \N ('Second',NULL)
(3,'2019-05-20') 1 Third Third ('Third','Third')
(4,'2019-05-20') 0 \N \N (NULL,NULL)
See Also
dictHas
Checks whether a key is present in a dictionary.
dictHas('dict_name', id_expr)
Arguments
dict_name
— Name of the dictionary. String literal.
id_expr
— Key value. Expression returning dictionary key-type value or Tuple-type value depending on the dictionary configuration.
Returned value
- 0, if there is no key. UInt8.
- 1, if there is a key. UInt8.
dictGetHierarchy
Creates an array, containing all the parents of a key in the hierarchical dictionary.
Syntax
dictGetHierarchy('dict_name', key)
Arguments
Returned value
dictIsIn
Checks the ancestor of a key through the whole hierarchical chain in the dictionary.
dictIsIn('dict_name', child_id_expr, ancestor_id_expr)
Arguments
dict_name
— Name of the dictionary. String literal.
child_id_expr
— Key to be checked. Expression returning a UInt64-type value.
ancestor_id_expr
— Alleged ancestor of the child_id_expr
key. Expression returning a UInt64-type value.
Returned value
- 0, if
child_id_expr
is not a child of ancestor_id_expr
. UInt8.
- 1, if
child_id_expr
is a child of ancestor_id_expr
or if child_id_expr
is an ancestor_id_expr
. UInt8.
dictGetChildren
Returns first-level children as an array of indexes. It is the inverse transformation for dictGetHierarchy.
Syntax
dictGetChildren(dict_name, key)
Arguments
Returned values
Example
Consider the hierarchic dictionary:
┌─id─┬─parent_id─┐
│ 1 │ 0 │
│ 2 │ 1 │
│ 3 │ 1 │
│ 4 │ 2 │
└────┴───────────┘
First-level children:
SELECT dictGetChildren('hierarchy_flat_dictionary', number) FROM system.numbers LIMIT 4;
┌─dictGetChildren('hierarchy_flat_dictionary', number)─┐
│ [1] │
│ [2,3] │
│ [4] │
│ [] │
└──────────────────────────────────────────────────────┘
dictGetDescendant
Returns all descendants as if dictGetChildren function was applied level
times recursively.
Syntax
dictGetDescendants(dict_name, key, level)
Arguments
dict_name
— Name of the dictionary. String literal.
key
— Key value. Expression returning a UInt64-type value.
level
— Hierarchy level. If level = 0
returns all descendants to the end. UInt8.
Returned values
Example
Consider the hierarchic dictionary:
┌─id─┬─parent_id─┐
│ 1 │ 0 │
│ 2 │ 1 │
│ 3 │ 1 │
│ 4 │ 2 │
└────┴───────────┘
All descendants:
SELECT dictGetDescendants('hierarchy_flat_dictionary', number) FROM system.numbers LIMIT 4;
┌─dictGetDescendants('hierarchy_flat_dictionary', number)─┐
│ [1,2,3,4] │
│ [2,3,4] │
│ [4] │
│ [] │
└─────────────────────────────────────────────────────────┘
First-level descendants:
SELECT dictGetDescendants('hierarchy_flat_dictionary', number, 1) FROM system.numbers LIMIT 4;
┌─dictGetDescendants('hierarchy_flat_dictionary', number, 1)─┐
│ [1] │
│ [2,3] │
│ [4] │
│ [] │
└────────────────────────────────────────────────────────────┘
dictGetAll
Retrieves the attribute values of all nodes that matched each key in a regular expression tree dictionary.
Besides returning values of type Array(T)
instead of T
, this function behaves similarly to dictGet
.
Syntax
dictGetAll('dict_name', attr_names, id_expr[, limit])
Arguments
dict_name
— Name of the dictionary. String literal.
attr_names
— Name of the column of the dictionary, String literal, or tuple of column names, Tuple(String literal).
id_expr
— Key value. Expression returning array of dictionary key-type value or Tuple-type value depending on the dictionary configuration.
limit
- Maximum length for each value array returned. When truncating, child nodes are given precedence over parent nodes, and otherwise the defined list order for the regexp tree dictionary is respected. If unspecified, array length is unlimited.
Returned value
-
If ClickHouse parses the attribute successfully in the attribute's data type as defined in the dictionary, returns an array of dictionary attribute values that correspond to id_expr
for each attribute specified by attr_names
.
-
If there is no key corresponding to id_expr
in the dictionary, then an empty array is returned.
ClickHouse throws an exception if it cannot parse the value of the attribute or the value does not match the attribute data type.
Example
Consider the following regexp tree dictionary:
CREATE DICTIONARY regexp_dict
(
regexp String,
tag String
)
PRIMARY KEY(regexp)
SOURCE(YAMLRegExpTree(PATH '/var/lib/clickhouse/user_files/regexp_tree.yaml'))
LAYOUT(regexp_tree)
...
# /var/lib/clickhouse/user_files/regexp_tree.yaml
- regexp: 'foo'
tag: 'foo_attr'
- regexp: 'bar'
tag: 'bar_attr'
- regexp: 'baz'
tag: 'baz_attr'
Get all matching values:
SELECT dictGetAll('regexp_dict', 'tag', 'foobarbaz');
┌─dictGetAll('regexp_dict', 'tag', 'foobarbaz')─┐
│ ['foo_attr','bar_attr','baz_attr'] │
└───────────────────────────────────────────────┘
Get up to 2 matching values:
SELECT dictGetAll('regexp_dict', 'tag', 'foobarbaz', 2);
┌─dictGetAll('regexp_dict', 'tag', 'foobarbaz', 2)─┐
│ ['foo_attr','bar_attr'] │
└──────────────────────────────────────────────────┘
Other Functions
ClickHouse supports specialized functions that convert dictionary attribute values to a specific data type regardless of the dictionary configuration.
Functions:
dictGetInt8
, dictGetInt16
, dictGetInt32
, dictGetInt64
dictGetUInt8
, dictGetUInt16
, dictGetUInt32
, dictGetUInt64
dictGetFloat32
, dictGetFloat64
dictGetDate
dictGetDateTime
dictGetUUID
dictGetString
dictGetIPv4
, dictGetIPv6
All these functions have the OrDefault
modification. For example, dictGetDateOrDefault
.
Syntax:
dictGet[Type]('dict_name', 'attr_name', id_expr)
dictGet[Type]OrDefault('dict_name', 'attr_name', id_expr, default_value_expr)
Arguments
dict_name
— Name of the dictionary. String literal.
attr_name
— Name of the column of the dictionary. String literal.
id_expr
— Key value. Expression returning a UInt64 or Tuple-type value depending on the dictionary configuration.
default_value_expr
— Value returned if the dictionary does not contain a row with the id_expr
key. Expression returning the value in the data type configured for the attr_name
attribute.
Returned value
-
If ClickHouse parses the attribute successfully in the attribute's data type, functions return the value of the dictionary attribute that corresponds to id_expr
.
-
If there is no requested id_expr
in the dictionary then:
dictGet[Type]
returns the content of the <null_value>
element specified for the attribute in the dictionary configuration.
dictGet[Type]OrDefault
returns the value passed as the default_value_expr
parameter.
ClickHouse throws an exception if it cannot parse the value of the attribute or the value does not match the attribute data type.
Example dictionaries
The examples in this section make use of the following dictionaries. You can create them in ClickHouse
to run the examples for the functions described below.
Example dictionary for dictGet<T> and dictGet<T>OrDefault functions
-- Create table with all the required data types
CREATE TABLE all_types_test (
`id` UInt32,
-- String type
`String_value` String,
-- Unsigned integer types
`UInt8_value` UInt8,
`UInt16_value` UInt16,
`UInt32_value` UInt32,
`UInt64_value` UInt64,
-- Signed integer types
`Int8_value` Int8,
`Int16_value` Int16,
`Int32_value` Int32,
`Int64_value` Int64,
-- Floating point types
`Float32_value` Float32,
`Float64_value` Float64,
-- Date/time types
`Date_value` Date,
`DateTime_value` DateTime,
-- Network types
`IPv4_value` IPv4,
`IPv6_value` IPv6,
-- UUID type
`UUID_value` UUID
) ENGINE = MergeTree()
ORDER BY id;
-- Insert test data
INSERT INTO all_types_test VALUES
(
1, -- id
'ClickHouse', -- String
100, -- UInt8
5000, -- UInt16
1000000, -- UInt32
9223372036854775807, -- UInt64
-100, -- Int8
-5000, -- Int16
-1000000, -- Int32
-9223372036854775808, -- Int64
123.45, -- Float32
987654.123456, -- Float64
'2024-01-15', -- Date
'2024-01-15 10:30:00', -- DateTime
'192.168.1.1', -- IPv4
'2001:db8::1', -- IPv6
'550e8400-e29b-41d4-a716-446655440000' -- UUID
)
-- Create dictionary
CREATE DICTIONARY all_types_dict
(
id UInt32,
String_value String,
UInt8_value UInt8,
UInt16_value UInt16,
UInt32_value UInt32,
UInt64_value UInt64,
Int8_value Int8,
Int16_value Int16,
Int32_value Int32,
Int64_value Int64,
Float32_value Float32,
Float64_value Float64,
Date_value Date,
DateTime_value DateTime,
IPv4_value IPv4,
IPv6_value IPv6,
UUID_value UUID
)
PRIMARY KEY id
SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 USER 'default' TABLE 'all_types_test' DB 'default'))
LAYOUT(HASHED())
LIFETIME(MIN 300 MAX 600);
Example dictionary for dictGetAll
Create a table to store the data for the regexp tree dictionary:
CREATE TABLE regexp_os(
id UInt64,
parent_id UInt64,
regexp String,
keys Array(String),
values Array(String)
)
ENGINE = Memory;
Insert data into the table:
INSERT INTO regexp_os
SELECT *
FROM s3(
'https://datasets-documentation.s3.eu-west-3.amazonaws.com/' ||
'user_agent_regex/regexp_os.csv'
);
Create the regexp tree dictionary:
CREATE DICTIONARY regexp_tree
(
regexp String,
os_replacement String DEFAULT 'Other',
os_v1_replacement String DEFAULT '0',
os_v2_replacement String DEFAULT '0',
os_v3_replacement String DEFAULT '0',
os_v4_replacement String DEFAULT '0'
)
PRIMARY KEY regexp
SOURCE(CLICKHOUSE(TABLE 'regexp_os'))
LIFETIME(MIN 0 MAX 0)
LAYOUT(REGEXP_TREE);
Example range key dictionary
Create the input table:
CREATE TABLE range_key_dictionary_source_table
(
key UInt64,
start_date Date,
end_date Date,
value String,
value_nullable Nullable(String)
)
ENGINE = TinyLog();
Insert the data into the input table:
INSERT INTO range_key_dictionary_source_table VALUES(1, toDate('2019-05-20'), toDate('2019-05-20'), 'First', 'First');
INSERT INTO range_key_dictionary_source_table VALUES(2, toDate('2019-05-20'), toDate('2019-05-20'), 'Second', NULL);
INSERT INTO range_key_dictionary_source_table VALUES(3, toDate('2019-05-20'), toDate('2019-05-20'), 'Third', 'Third');
Create the dictionary:
CREATE DICTIONARY range_key_dictionary
(
key UInt64,
start_date Date,
end_date Date,
value String,
value_nullable Nullable(String)
)
PRIMARY KEY key
SOURCE(CLICKHOUSE(HOST 'localhost' PORT tcpPort() TABLE 'range_key_dictionary_source_table'))
LIFETIME(MIN 1 MAX 1000)
LAYOUT(RANGE_HASHED())
RANGE(MIN start_date MAX end_date);
Example complex key dictionary
Create the source table:
CREATE TABLE dict_mult_source
(
id UInt32,
c1 UInt32,
c2 String
) ENGINE = Memory;
Insert the data into the source table:
INSERT INTO dict_mult_source VALUES
(1, 1, '1'),
(2, 2, '2'),
(3, 3, '3');
Create the dictionary:
CREATE DICTIONARY ext_dict_mult
(
id UInt32,
c1 UInt32,
c2 String
)
PRIMARY KEY id
SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 USER 'default' TABLE 'dict_mult_source' DB 'default'))
LAYOUT(FLAT())
LIFETIME(MIN 0 MAX 0);
Example hierarchical dictionary
Create the source table:
CREATE TABLE hierarchy_source
(
id UInt64,
parent_id UInt64,
name String
) ENGINE = Memory;
Insert the data into the source table:
INSERT INTO hierarchy_source VALUES
(0, 0, 'Root'),
(1, 0, 'Level 1 - Node 1'),
(2, 1, 'Level 2 - Node 2'),
(3, 1, 'Level 2 - Node 3'),
(4, 2, 'Level 3 - Node 4'),
(5, 2, 'Level 3 - Node 5'),
(6, 3, 'Level 3 - Node 6');
-- 0 (Root)
-- └── 1 (Level 1 - Node 1)
-- ├── 2 (Level 2 - Node 2)
-- │ ├── 4 (Level 3 - Node 4)
-- │ └── 5 (Level 3 - Node 5)
-- └── 3 (Level 2 - Node 3)
-- └── 6 (Level 3 - Node 6)
Create the dictionary:
CREATE DICTIONARY hierarchical_dictionary
(
id UInt64,
parent_id UInt64 HIERARCHICAL,
name String
)
PRIMARY KEY id
SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 USER 'default' TABLE 'hierarchy_source' DB 'default'))
LAYOUT(HASHED())
LIFETIME(MIN 300 MAX 600);