Call us Toll-Free:
1-800-218-1525
Live ChatEmail us

 Sponsors

How to: Use MySQL fast LOAD DATA for UPDATEs

Dawn Rossi, 04-07-2009
Dealing with massive amounts of data in MySQL, one of the first lessons you are taught is -

-- Use LOAD DATA for quickly inserting lots of records into a table
-- Avoid multiple Inserts.

From MySQL documentation "The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed".

How fast?

In a simple test I conducted, inserting 1 million records into a table took 2.6 seconds with LOAD DATA and a little over 10 minutes (!) when using 1 million INSERT queries.

Yes, using a persistent MySQL connection does improve performance of multiple INSERTs a bit but it's still a far cry from the performance of LOAD DATA and persistent connections are a bad idea.

LOAD DATA Syntax:


LOAD DATA INFILE
'data.txt' INTO TABLE tbl_name (column1,column2)
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n';

LOAD DATA performs a blind insert of all rows from data.txt into the target table tbl_name.

Rows that include a duplicate key are automatically ignored, but the MySQL result string doesn't offer any information about which rows were ignored.

Why use LOAD DATA for UPDATEs?

LOAD DATA was originally designed to handle rapid INSERTs of new data into a table. It does a very good job doing just that. But in the real world, after inserting 1 million records into a table, you often need to efficiently and quickly update portions of that table.

Even if you only need to update 10% of the target table, running 100,000 UPDATE queries on a table with 1 million records, is going to take a while (minutes)

As part of this post I'll describe a method we developed that allows us to reap the performance benefits of LOAD DATA for updating records in a table.

You can use this function for both UPDATEs and INSERTs.

The Approach

Step 1: Use LOAD DATA to import the entire input file into a temporary table and update primary key id's for any existing records we already have

Step 2: Use SELECT INTO OUTFILE to write all records we couldn't find an id for (new records) into a temporary text file

Step 3: Use LOAD DATA to add all new records (from step 2) into the target table

Step 4: Use a single UPDATE query, to update all records that match an existing primary key

PHP Code

Below is a production-ready function that handles the entire process for you, start to finish. It will INSERT or UPDATE a table based on an input csv file, using a combination of two LOAD DATA calls, one SELECT INTO OUTFILE and one efficient UPDATE call.

Using this function allows us to shave long minutes of processing time when dealing with huge files.

As always, your comments are most welcome


function DBLoadDataUpdate($temp_dir, $arr_inputfields, $arr_existingfields, $tablename, $tablename_recordid_keyname,       
$tablename_externalid_keyname, $inputfile, $sql_synthasize, $flag_debug, $arr_sql_secondary="")
{
 
$cnt_records = 0;   
 
 
// Create temporary MYSQL table to hold all input data 
 
mysql_query("DROP TABLE loaddata_temp");
 
$sql_createtable = "CREATE TEMPORARY TABLE loaddata_temp (id_external char(50) default '' not null,";         
  foreach (
$arr_inputfields as $field)
  {
   
$sql_createtable .= $field['name'];
    if (empty(
$field['type']))
    {
     
$sql_createtable .= " char(";
      if (!empty(
$field['length']))
      {
       
$sql_createtable .= $field['length'];
      }
      else
      {
       
$sql_createtable .= "50";
      }
     
$sql_createtable .= ") default '' not null,";
    }
    else       
    {
     
$sql_createtable .= " ".$field['type'].",";
    } 
  }
 
$sql_createtable .= "id_existing int default 0 not null,"
 
$sql_createtable .= "index (id_existing), primary key (id_external))";
 
mysql_query($sql_createtable);
   
// == Step 1:
//
// Import all new records into a temporary table and find all IDs in one query
//

  // Load all records into temporary table
 
mysql_query("LOAD DATA LOCAL INFILE '$inputfile' ignore INTO TABLE loaddata_temp FIELDS TERMINATED BY ',' ".
 
" OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\\r\\n'");

 
// Count how many records
 
$Result = @mysql_query("SELECT count(*) FROM loaddata_temp");
 
$Row = @mysql_fetch_row($Result);
 
$cnt_records = $Row[0];

 
// (Optional) Synthasize data
 
if (!empty($sql_synthasize))
  {
   
$sql = strtok($sql_synthasize, ";");
    while (!empty(
$sql))
    {
      @
mysql_query("UPDATE loaddata_temp $sql");

     
$sql = strtok(";");
    }
  }

 
// Update all ids
  // (So that we can link every record in the temporary campaigns table to a record in the
  // live ppc_campaigns table, if such a record exists)
 
mysql_query("UPDATE loaddata_temp,$tablename SET loaddata_temp.id_existing=".
 
"$tablename.$tablename_recordid_keyname where $tablename.$tablename_externalid_keyname=loaddata_temp.id_external");


 
// If we need to update a secondary id, do so now
 
if (!empty($arr_sql_secondary))
  {
    foreach (
$arr_sql_secondary as $sql_secondary)
   
mysql_query("UPDATE loaddata_temp,".$sql_secondary['tablename']." SET loaddata_temp.".
   
$sql_secondary['id_secondary_to']."=".$sql_secondary['tablename'].".".
$sql_secondary['id_secondary_from'].
   
" where ".$sql_secondary['tablename'].".".$sql_secondary['id_secondary_matchto'].
"=loaddata_temp.".
   
$sql_secondary['id_secondary_to']);
  }

 
// == Step 2:
//
// Use SELECT INTO OUTFILE to write all records we couldn't find
// an id for (new records) into a temporary text file
//

  // Create temporary file
 
$filename = tempnam($temp_dir, "loaddata");
 
unlink($filename);

 
// Write all new records (the ones for which we couldn't find an id) into temporary file
 
$sql_fields_values = "loaddata_temp.id_external,";
 
$sql_fields_names = "$tablename_externalid_keyname,";
 
$sql_fields_both = "$tablename_externalid_keyname=loaddata_temp.id_external,";
  foreach (
$arr_existingfields as $field)
  {
   
$sql_fields_names .= $field['name'].",";
   
$sql_fields_both .= "$tablename.".$field['name']."=";
    if (isset(
$field['value']))
    {
     
$sql_fields_values .= "'".$field['value']."',";
     
$sql_fields_both .= "'".$field['value']."',";
    }
    else
    {
     
$sql_fields_values .= $field['matchto'].",";
     
$sql_fields_both .= "loaddata_temp.".$field['matchto'].",";
    }
  }

 
$sql_fields_both = substr($sql_fields_both, 0, strlen($sql_fields_both)-1);
 
$sql_fields_names = substr($sql_fields_names, 0, strlen($sql_fields_names)-1);
 
$sql_fields_values = substr($sql_fields_values, 0, strlen($sql_fields_values)-1);
 
mysql_query("select $sql_fields_values INTO OUTFILE \"$filename\" ".
 
" FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\\r\\n' ".
 
" FROM loaddata_temp where id_existing=0");
 
$error = mysql_error();
  if (!empty(
$error) && $flag_debug)
  {
    echo
"Error: $error\r\n";
  }

// == Step 3:
//
// Use LOAD DATA to add all new records (from step 2) into the target table
//

  // Add all new records
 
mysql_query("LOAD DATA LOCAL INFILE '$filename' ignore INTO TABLE $tablename FIELDS TERMINATED BY ',' ".
 
" OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\\r\\n' ($sql_fields_names)");
 
$error = mysql_error();
  if (!empty(
$error) && $flag_debug) echo "Error: $error\r\n";
 
$cnt_new = mysql_affected_rows();
  if (
$cnt_new<0) $cnt_new=0;

 
// Delete temporary file
 
unlink($filename);

// == Step 4:
//
// Use a single UPDATE query, to update all records that match an existing primary key
//

  // Update existing records
 
mysql_query("UPDATE $tablename,loaddata_temp SET $sql_fields_both where $tablename.$tablename_externalid_keyname".
 
"=loaddata_temp.id_external and id_existing!=0");
 
$cnt_updated = mysql_affected_rows();

  if (
$flag_debug)
  {
    echo
" + Inserted $cnt_new new records and updated $cnt_updated records.\r\n";
  }

 
// Drop temporary table
 
mysql_query("DROP TABLE loaddata_temp");

 
// Return total number of records added & updated
 
return ($cnt_records);
}


Example

The example below demonstrates using the DBLoadDataUpdate function to quickly import & update a PPC Campaigns table.


 
// Update existing ppc_campaigns table
 
$arr_inputfields = array(
            array(
"name" => "name", "length" => "100"),
            array(
"name" => "status", "length" => "20"),
            array(
"name" => "startDate", "length" => "30"),
            array(
"name" => "endDate", "length" => "30"),
            array(
"name" => "dailyBudget", "length" => "", "type" => "decimal(8,2) default 0 not null"),
            array(
"name" => "languages", "length" => "50"),
            array(
"name" => "geoTargets", "length" => "100"),
            array(
"name" => "networkTargeting", "length" => "100"),
            array(
"name" => "adScheduling", "length" => "100"));
 
$arr_existingfields = array(
            array(
"name" => "name", "matchto" => "name"),
            array(
"name" => "storecheckedoutcarts_id", "value" => "$item_id"),
            array(
"name" => "is_active", "matchto" => "status"),
            array(
"name" => "budget", "matchto" => "dailyBudget"));
 
$cnt_campaigns = DBLoadDataUpdate ("/tmp/", $arr_inputfields, $arr_existingfields, "ppc_campaigns",
   
"rr_record_id", "external_id", $filename,
   
"", 1);

Note: First field of inputfile should be the "external_id" primary key used to match records

azdog1, 04-20-2009
Thanks Dawn,

I wasn't familiar with LOAD DATA INFILE method and was having to use INSERT IGNORE INTO on a table with over a million records and growing with over a million records that are pushed to my server each night in a flat file. INSERT IGNORE INTO method was taking about 8 minutes and the LOAD DATA INFILE takes less than 1 minute. I read the flat file in with a C routine and add additional field data separated with commas and then write it back out to another flat file that I suck in with LOAD DATA INFILE. All in less than a minute, so thanks again.

Murrah Boswell

andreas, 04-22-2009
thanks
After Ihave gone through this artitle I had the way froword of how to use mySQL. And that was a that is a good reflect to all pleople who had no flincker of how to aply mySQL into their research or their studies, believe me, I had no clue of how to about things before I visit this artitle but now I can reeally aply it.

many thank indeed.

Matt Goon, 04-23-2009
Dawn,
thank you so much.
I have another question. What if I want to update only some columns on the target table for matched records in step4? I want to ignore some columns from the input file on the record that match and preserve whatever info exists in these columns on the target, while updating the rest of the columns.

is is possible? Thanks

Sumedh, 05-04-2009
Hi Dawn...

This looks great!!! :)

LOAD DATA can't use a condition anywhere right? for example, compare the timestamp column of 2 tables, and update only if the source is later than target...such a facility will be great for syncing DBs...it should be a fairly commonly needed requirement...

Sumedh, 05-06-2009
BTW, i think you can avoid hastle of creating exact table with details of all the fields with this command -

CREATE TABLE newtable LIKE oldtable;

pippo, 05-19-2009
HI Dawn,

Great code !!

but i have a problem: the code doesn't work because after created the tempfile, Mysql show me this error:

Error: Access denied for user 'pippo'@'localhost' (using password: YES)

I looked around with google and im sure to give all privileges to 'pippo@localhost' whit this syntax:

GRANT ALL PRIVILEGES ON *.* TO 'pippo'@'localhost'

someone can help me???
How can resolve this error??

pippo

Dhaval Sheth, 05-22-2009
Hi Pippo,

try this..

On your MySQL Command Prompt, type following:

mysql> connect schema_name
mysql> grant all privileges on *.* to `pippo`@`localhost` identified by `password` with grant option;

wherein
connect command will return you with command id, and
grant command will grant all privileges to the user,

either ways you can specify ur pc id instead of local host, this becomes handy when you want other users to access your database server.

pippo, 05-26-2009
Hi sheth,

i done what u wrote but not working.. :(
i tryed but now i have another problem : the file, created by tempnam, is empty when will be full of my data !!
i try to modify the code to know what is the problem:

//----------
//after #118 rows
//----------

$descriptorspec = array(
0 => array("pipe", "r"), // stdin is a pipe that the child will read from
1 => array("pipe", "w"), // stdout is a pipe that the child will write to
2 => array("file", getcwd() .$filename, "a") // stderr is a file to write to
);

$cwd = '/tmp';
$env = array('some_option' => '');

$handle = fopen("sqlfile.sql", "w+");
fwrite($handle, $query_outfile);
$success = fclose($handle);

$process = proc_open('mysql -h localhost --user=pippo --password=123456 pippo < sqlfile.sql > '.$filename.'', $descriptorspec, $pipes, null, null);

if (is_resource($process)) {
// $pipes now looks like this:
// 0 => writeable handle connected to child stdin
// 1 => readable handle connected to child stdout
// Any error output will be appended to /tmp/error-output.txt

fwrite($pipes[0], '<?php print_r($_ENV); ?>');
fclose($pipes[0]);

echo stream_get_contents($pipes[1]);
fclose($pipes[1]);

// It is important that you close any pipes before calling
// proc_close in order to avoid a deadlock
$return_value = proc_close($process);

echo "command returned $return_value\n";
}

//----------
// end
//----------

In this case, i created 3 files :
a.the file with the data (but is empty);
b.the file "sqlfile.sql" with the query;
c.another tmp file, but in a different position, that write some problems about the query that i've done.

in that tmp file, mysql wrote this sentence:

ERROR 1086 (HY000) at line 1: File '/tmp/loaddataecvXfC' already exists

what i wrong ???
thank u very much who help me..

pippo

BlaST, 06-02-2009
Can't make it work :/ what the syntax of the dump file should be? Somebody, give me please the example. I've the table with some columns and I need to update 2 cols depending on 2 others. If I use many queries, they're like:
UPDATE table SET `col1` = 1, `time` = "2009-05-02 21:23" WHERE `col2` = a AND `col3` = b;
UPDATE table SET `col1` = 1, `time` = "2009-05-02 21:25" WHERE `col2` = b AND `col3` = c;

Thank you very much for the help :)

Dhaval, 07-02-2009
Hi Blast,

LOAD DATA INFILE syntax is required to dump data from text/csv file to DataBase.

As far as I know, you can use Update statement in LOAD DATA INFILE statement.

Lazzo, 10-08-2009
Hi!

Trying to use the following LOAD DATA INFILE syntax using both a php-script and PHPAdmin:

LOAD DATA INFILE '/home/magdemo/domains/magentodemo.xxx.xx/import/file.csv' INTO TABLE temp_table FIELDS TERMINATED BY ';' LINES TERMINATED BY 'n' IGNORE 1 LINES

The result are always
#1045 - Access denied for user 'username'@'localhost' (using password: YES)

This is a hosted environment - with db and webserver on the same machine. I've asked the administrator to run

GRANT FILE ON *.* TO 'username'@'databasename'

Without any success (still gets Access denied). Seems like mySQL cant access my CVS-file - any advice how get the LOAD DATA INFILE command to work?

Rory P, 09-23-2010
How does this compare to a bulk insert command? What about on ON KEY UPDATE style queries?

Aytac GUL, 11-06-2010
I have a trouble this issue. Our csv file almost 10MB and total 130000 records. Your Codes very intricate :(( Sorry.

We have 15 columns and 130000 rows in our csv files.
Update query check 3 columns and;

if csv records match old records -> update,
if not match -> insert.

Our service provider restricts us. Because the process takes too much time.

Lee, 01-28-2011
Thanks for this, helped me solve a problem that had me stuck for days.

Chad, 08-09-2012
This looks promising but seriously needs more documentation and commenting... at the very least an example csv file. Burned the last 3 hours trying to break it apart to no avail.
Enjoyed this post?

Subscribe Now to receive new posts via Email as soon as they come out.

 Comments
Post your comments












Note: No link spamming! If your message contains link/s, it will NOT be published on the site before manually approved by one of our moderators.



About Us  |  Contact us  |  Privacy Policy  |  Terms & Conditions