MySQL – LOAD DATA INFILE variable data

[adsense id=”0514458240″ width=”468″ height=”60″]

This blog post is to look at MySQL’s capabilities at managing data loads to the database. In particular with a view to performing some processing on the data prior to this being uploaded into the database. This is particularly handy when the format of the data is not consistent, for example different date formats provided.

In this paper we will look at two things:

  • Processing different date formats
  • Merging two fields into a single fields stored in the database

We have a table “upload” which contains an id, a date field and a varchar field.

CREATE TABLE upload (
  id INT auto_increment primary key,
  sdate date,
  name varchar(100)
);

If we had a file containing an id value, an sdate value and a name value this would be very simple:

"id","start date","name"
1,"2011-08-21","Tom Thumb"

With everything consistent our load data command would take on the following form:

LOAD DATA INFILE 'upload.csv'
INTO TABLE upload
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(id, sdate, name);

So far so good. If we load we get the following:

mysql> LOAD DATA INFILE 'upload.csv' 
    -> INTO TABLE upload 
    -> FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' 
    -> IGNORE 1 LINES (id, sdate, name);
Query OK, 1 row affected (0.00 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

mysql> select * from upload;
+----+------------+-----------+
| id | sdate      | name      |
+----+------------+-----------+
|  1 | 2011-08-21 | Tom Thumb | 
+----+------------+-----------+
1 row in set (0.00 sec)

Now what happens if we actually have a file that contains the dates in two different formats. How do we handle the different formats? Well MySQL LOAD DATA INFILE contains SET which enables data transformations before they are assigned to the fields. In the next example, imagine we have for some odd reason a file that contains both the short date and the long date format MM/DD/YY and MM/DD/YYYY. How do we process this?

If we look at our file to be loaded we have:

"id","start date","name"
2,"01/23/08","Tiny Tim"
3,"05/28/2011","Donald Duck"

What we need to do is to determine the pattern. For instance, the length of the start date string will determine the format: 01/23/08 is 8 characters long, we will assume all else will be processed using the longer date format. How do we handle this? Lets look at the code.

LOAD DATA INFILE 'upload.csv'
INTO TABLE upload
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(id, @somedate, name)
SET sdate = IF(LENGTH(@somedate)=8,STR_TO_DATE(@somedate,'%m/%d/%y'),STR_TO_DATE(@somedate,'%m/%d/%Y'));

What we are doing here is loading the second field from the file into a session variable called @somedate. The SET command allows us to define how the value is to be assigned to that particular field.

mysql> LOAD DATA INFILE 'upload.csv'
    -> INTO TABLE upload
    -> FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
    -> LINES TERMINATED BY '\n'
    -> IGNORE 1 LINES
    -> (id, @somedate, name)
    -> SET sdate = IF(LENGTH(@somedate)=8,STR_TO_DATE(@somedate,'%m/%d/%y'),STR_TO_DATE(@somedate,'%m/%d/%Y'));
Query OK, 2 rows affected (0.00 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

mysql> select * from upload;
+----+------------+-------------+
| id | sdate      | name        |
+----+------------+-------------+
|  1 | 2011-08-21 | Tom Thumb   | 
|  2 | 2008-01-23 | Tiny Tim    | 
|  3 | 2011-05-28 | Donald Duck | 
+----+------------+-------------+
3 rows in set (0.00 sec)

Nice!! But what happens if the file contains the name as two separate fields i.e. first name and last name but we need to merge that into a single field called name and also we do not have an identifier value for each column but will allow one to be automatically generated for us. The file to upload is now:

"start date","first name","last name"
"11/23/2015","Scooby","Doo"

Again using the same technique as before we will store the information into session variables and then use the transformation of the SET to CONCAT the session variables into the name field.

LOAD DATA INFILE 'upload.csv'
INTO TABLE upload
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(@somedate, @firstname,@lastname)
SET sdate = IF(LENGTH(@somedate)=8,STR_TO_DATE(@somedate,'%m/%d/%y'),STR_TO_DATE(@somedate,'%m/%d/%Y')),
name = CONCAT(@firstname,' ', @lastname);

Lets have a look at the results from this script:

mysql> LOAD DATA INFILE 'upload.csv'
    -> INTO TABLE upload
    -> FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
    -> LINES TERMINATED BY '\n'
    -> IGNORE 1 LINES
    -> (@somedate, @firstname,@lastname)
    -> SET sdate = IF(LENGTH(@somedate)=8,STR_TO_DATE(@somedate,'%m/%d/%y'),STR_TO_DATE(@somedate,'%m/%d/%Y')),
    -> name = CONCAT(@firstname,' ', @lastname);
Query OK, 1 row affected (0.00 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

mysql> select * from upload;
+----+------------+-------------+
| id | sdate      | name        |
+----+------------+-------------+
|  1 | 2011-08-21 | Tom Thumb   | 
|  2 | 2008-01-23 | Tiny Tim    | 
|  3 | 2011-05-28 | Donald Duck | 
|  4 | 2015-11-23 | Scooby Doo  | 
+----+------------+-------------+
4 rows in set (0.00 sec)

2 responses to “MySQL – LOAD DATA INFILE variable data”