COPY FROM
Imports data from a CSV file into a table.
Imports data from a comma-separated values (CSV) file or a delimited text file into an existing table. Each line in the source file is imported as a row. All rows in the dataset must contain the same number of fields and have values in the PRIMARY KEY fields.
The process verifies the PRIMARY KEY and updates existing records. If HEADER =
false
and no column names are specified, the fields are imported in deterministic
order. When HEADER = true
, the first row of a file is a header row.
Synopsis
COPY table_name [ ( column_list ) ] FROM 'file_name'[ , 'file2_name', ... ] | STDIN [ WITH option = 'value' [ AND ... ] ]
Syntax conventions | Description |
---|---|
UPPERCASE | Literal keyword. |
Lowercase | Not literal. |
Italics |
Variable value. Replace with a user-defined value. |
[] |
Optional. Square brackets ( [] ) surround
optional command arguments. Do not type the square brackets. |
( ) |
Group. Parentheses ( ( ) ) identify a group to
choose from. Do not type the parentheses. |
| |
Or. A vertical bar ( | ) separates alternative
elements. Type any one of the elements. Do not type the vertical
bar. |
... |
Repeatable. An ellipsis ( ... ) indicates that
you can repeat the syntax element as often as required. |
'Literal string' |
Single quotation ( ' ) marks must surround
literal strings in CQL statements. Use single quotation marks to
preserve upper case. |
{ key : value
} |
Map collection. Braces ( { } ) enclose map
collections or key value pairs. A colon separates the key and the
value. |
<datatype1,datatype2> |
Set, list, map, or tuple. Angle brackets ( <
> ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma. |
cql_statement; |
End CQL statement. A semicolon ( ; ) terminates
all CQL statements. |
[--] |
Separate the command line options from the command arguments with
two hyphens ( -- ). This syntax is useful when
arguments might be mistaken for command line options. |
' <schema> ... </schema>
' |
Search CQL only: Single quotation marks ( ' )
surround an entire XML schema declaration. |
@xml_entity='xml_entity_type' |
Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files. |
Setting copy options
Copy options set in the COPY statement take precedence over the cqlshrc
file and the default settings. If an
option is not set on the command line, the cqlshrc file
takes precedence over the default settings.
- table_name
- Table for the copy operation.
- column_list
- List of columns in the table. All fields are included when no column names are specified. To omit columns, specify a column list with only the columns to include.
- file_name, file2_name
- CSV file name.
- BOOLSTYLE
- Boolean indicators for true and false. The values are case-insensitive.
For example:
yes,no
andYES,NO
are the same.Default:
True,False
- CONFIGFILE
- Directory that contains the
cqlshrc
configuration file.Note: Command line options always override the cqlshrc file. - DATETIMEFORMAT
- Time format for reading or writing CSV time data. The timestamp uses the
strftime format. If not set, the default value is set to the
datetimeformat
value in the cqlshrc file.Default:
%Y-%m-%d %H:%M:%S%z
- DECIMALSEP
- Decimal value separator.
Default:
.
(period) - DELIMITER
- Field separator.
Default:
,
(comma) - ESCAPE
- Single character that escapes literal uses of the QUOTE
character.
Default:
\
(backslash) - HEADER
-
true
- first row contains headers (column names).false
- first row does not have headers.
Default:
false
- MAXATTEMPTS
- Maximum number of attempts for
errors.
Default:
5
- NULL
- Value used when no value is in the field.
Default:
<empty>
- NUMPROCESSES
- Number of worker processes. Maximum value is
16.
Default:
-1
- QUOTE
- Encloses field values.
Default: " (double quotation mark)
- REPORTFREQUENCY
- Frequency with which status is displayed in
seconds.
Default:
0.25
- RATEFILE
- Print output statistics to this file.
- SKIPCOLS
- Name of column to skip.
- SKIPROWS
- Number of rows starting from the first row of data to skip.
- THOUSANDSSEP
- Separator for thousands digit groups.
Default:
None
- CHUNKSIZE
- Chunk size passed to worker processes.
Default:
1000
- INGESTRATE
- Approximate ingest rate in rows per second. Must be
greater than the chunk
size.
Default:
100000
- MAXBATCHSIZE
- Maximum size of an import
batch.
Default:
20
- MAXINSERTERRORS
- Maximum global number of insert errors. Use
-1
for no maximum.Default:
-1
- MAXPARSEERRORS
- Maximum global number of parsing errors. Use
-1
for no maximum.Default:
-1
- MAXROWS
- Maximum number of rows. Use
-1
for no maximum.Default:
-1
- MINBATCHSIZE
- Minimum size of an import
batch.
Default:
2
Examples
Create the sample dataset
Set up the environment used for the COPY command examples:
- Using CQL, create a
cycling
keyspace:CREATE KEYSPACE cycling WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 };
- Create the
cycling.cyclist_name
table:CREATE TABLE cycling.cyclist_name ( id UUID PRIMARY KEY, lastname text, firstname text );
- Insert data into
cycling.cyclist_name
:INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (5b6962dd-3f90-4c93-8f61-eabfa4a803e2, 'VOS','Marianne'); INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (e7cd5752-bc0d-4157-a80f-7523add8dbcd, 'VAN DER BREGGEN','Anna'); INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (e7ae5cf3-d358-4d99-b900-85902fda9bb0, 'FRAME','Alex'); INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (220844bf-4860-49d6-9a4b-6b5d3a79cbfb, 'TIRALONGO','Paolo'); INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47, 'KRUIKSWIJK','Steven'); INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (fb372533-eb95-4bb4-8685-6ef61e994caa, 'MATTHEWS', 'Michael');
Export and import data from the
cyclist_name
table
- Export only the
id
andlastname
columns from thecyclist_name
table to a CSV file:COPY cycling.cyclist_name (id,lastname) TO '../cyclist_lastname.csv' WITH HEADER = TRUE;
The cyclist_lastname.csv file is created in the directory above the current working directory (indicated by ../). If the CSV file already exists, it is overwritten. If you do not have permission to create the file in the directory, you can use a different directory; for example, to use the current working directory, omit the directory path before the file name.
Using 7 child processes Starting copy of cycling.cyclist_name with columns [id, lastname]. Processed: 6 rows; Rate: 29 rows/s; Avg. rate: 29 rows/s 6 rows exported to 1 files in 0.223 seconds.
- Copy the
id
andfirstname
to a different CSV file named cyclist_firstname.csv:COPY cycling.cyclist_name (id,firstname) TO '../cyclist_firstname.csv' WITH HEADER = TRUE;
The CSV file is created:
Using 7 child processes Starting copy of cycling.cyclist_name with columns [id, firstname]. Processed: 6 rows; Rate: 30 rows/s; Avg. rate: 30 rows/s 6 rows exported to 1 files in 0.213 seconds.
- Remove all records from the cyclist name
table:
TRUNCATE cycling.cyclist_name;
- Verify that there are no
rows:
SELECT * FROM cycling.cyclist_name;
Query results are empty:
id | firstname | lastname ----+-----------+---------- (0 rows)
- Import the cyclist first
names:
COPY cycling.cyclist_name (id,firstname) FROM '../cyclist_firstname.csv' WITH HEADER = TRUE;
The rows are imported:
Using 7 child processes Starting copy of cycling.cyclist_name with columns [id, firstname]. Processed: 6 rows; Rate: 10 rows/s; Avg. rate: 14 rows/s 6 rows imported from 1 files in 0.423 seconds (0 skipped).
- Verify the new
rows:
SELECT * FROM cycling.cyclist_name;
The rows were created with null last names because the
lastname
field was not in the imported data set:id | firstname | lastname --------------------------------------+-----------+---------- e7ae5cf3-d358-4d99-b900-85902fda9bb0 | Alex | null fb372533-eb95-4bb4-8685-6ef61e994caa | Michael | null 5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | Marianne | null 220844bf-4860-49d6-9a4b-6b5d3a79cbfb | Paolo | null 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | Steven | null e7cd5752-bc0d-4157-a80f-7523add8dbcd | Anna | null (6 rows)
- Import the last
names:
COPY cycling.cyclist_name (id,lastname) FROM '../cyclist_lastname.csv' WITH HEADER = TRUE;
The records are imported but no new records are created:
Using 7 child processes Starting copy of cycling.cyclist_name with columns [id, lastname]. Processed: 6 rows; Rate: 10 rows/s; Avg. rate: 14 rows/s 6 rows imported from 1 files in 0.422 seconds (0 skipped).
- Verify the that the records were
updated:
SELECT * FROM cycling.cyclist_name;
The PRIMARY KEY
id
matched for all records and thelastname
is populated.:id | firstname | lastname --------------------------------------+-----------+----------------- e7ae5cf3-d358-4d99-b900-85902fda9bb0 | Alex | FRAME fb372533-eb95-4bb4-8685-6ef61e994caa | Michael | MATTHEWS 5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | Marianne | VOS 220844bf-4860-49d6-9a4b-6b5d3a79cbfb | Paolo | TIRALONGO 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | Steven | KRUIKSWIJK e7cd5752-bc0d-4157-a80f-7523add8dbcd | Anna | VAN DER BREGGEN
Copy data from standard input to a table.
- Clear the data from the
cyclist_name
table:TRUNCATE cycling.cyclist_name;
- Start the copy input operation using the
FROM STDIN
option:COPY cycling.cyclist_name FROM STDIN;
The line prompt changes to
[copy]
:Using 7 child processes Starting copy of cycling.cyclist_name with columns [id, firstname, lastname]. [Use . on a line by itself to end input] [copy]
- Next to the
[copy]
prompt, enter the field values in a common-separated list; on the last line of data, enter a period:[copy] e7cd5752-bc0d-4157-a80f-7523add8dbcd,Anna,VAN DER BREGGEN [copy] .
- Press Enter after the
period:
Processed: 1 rows; Rate: 0 rows/s; Avg. rate: 0 rows/s 1 rows imported from 1 files in 36.991 seconds (0 skipped).
- Run this query to view the contents of the
cyclist_name
table:SELECT * FROM cycling.cyclist_name;
id | firstname | lastname --------------------------------------+-----------+----------------- e7cd5752-bc0d-4157-a80f-7523add8dbcd | Anna | VAN DER BREGGEN (1 rows)