Importing data
The CiviCRM import system can be used to import contacts, activities, contributions, event participants, and memberships. You can extend the import system to allow it to parse different data sources, such as XML, JSON, Excel, or OpenDocument. The import system is especially useful when you have legacy data in another system (such as a spreadsheet), or have data you need to migrate from one system to another.
The import system has a few important limitations. It is currently not very good at importing:
- Large data sets
- Data sets which contain information for more than component (such as contact data that also contains contributions)
- Data sets which should be imported into more than one group or tag (currently, you can choose only one group or tag for the entire import rather than specifying the value in a field of the imported data set).
These limitation apply to any extensions you build on the import. To overcome these limitations, one strategy would be to write a custom script using the CiviCRM API that parses the incoming data and makes the appropriate API calls to import it.
Data Sources
The import system supports two data sources out of the box. CSV and SQL. CSV is exported by most spreadsheets, many web sites, and a lot of legacy systems. SQL is the language of relational databases.
Here's your first insider tip on the import system: Every import is an SQL import. The CSV import data source, for example, starts out by dumping the CSV into a temporary database table and then running an SQL import on those data. If you write a custom data source, it's important to remember this because it will save you a lot of mental energy. You don't need to worry about formatting or validating the import data at all. Just get it into a database table and let CiviCRM take it from there.
When would you want to write an import data source? It might be useful if, say, a client were migrating from another CRM to CiviCRM and wanted to import their data themselves. Another example would be to support a different import file format.
To make things easy on yourself, check for an existing PHP library that can read the format of the data you want to import. If you fine one, use that library to read the data into a database table.
Data Source API
The data source API is essentially a set of functions you should implement in a new PHP class. Many software developers refer to this as an "interface" or "protocol." The abstract class that defines this interface is located in CRM/Import/DataSource.php. You should also look at the CRM/Import/DataSource/CSV.php file, because this implements the interface for the CSV import feature.
Here are the functions you will implement and what they should do:
getInfo(); # should return an array with the title of your data source plugin preProcess( &$form ); # if you need to set anything up before the form is displayed, this iss the place to do it buildQuickForm( &$form ); # here you should build a form snippet that asks the user for the appropriate information to access the incoming data postProcess( &$params, &$db ); # this is where you dump the incoming data into the database table
After defining your import data source, you need a new Smarty template to create your form snippet. This is pretty specific to the type of import data source you're defining. Let's look at an example of a real custom data source.
Example Data Source
This example of a custom import data source reads a JSON file and imports its contents as new contacts. This file should be named JSON.php and be added to the CRM/Import/DataSource/ directory.
<?php /* My awesome JSON importer of dubious utility and efficiency (but still awesome!) * * JSON should look like this: * { * "contacts" : [ * { * "first_name" : "Foo", * "last_name" : "Bar", * "other_field" : "baz" * }, * { * "first_name" : "Other", * "last_name" : "Contact", * "something_else" : "yep" * } * ] * } */ require_once 'CRM/Import/DataSource.php'; class CRM_Import_DataSource_JSON extends CRM_Import_DataSource { function getInfo() { return array('title' => ts('JavaScript Object Notation (JSON)')); } function preProcess(&$form) { # nothing to do here, but we still define the function } function buildQuickForm(&$form) { ### In this function we're calling a lot of QuickForm functions. ### If you're unfamiliar with that library, it might be good ### to look up the documentation for it. # define a hidden field that tells the system which # data source class we're using $form->add( 'hidden', 'hidden_dataSource', 'CRM_Import_DataSource_JSON' ); # grab the config object so we respect some system settings $config = CRM_Core_Config::singleton(); # get the max upload file size from the config $uploadFileSize = $config->maxImportFileSize; $uploadSize = round( ( $uploadFileSize / (1024*1024) ), 2 ); # assign the max upload file size to a template variable # see the template documentation for more info on this $form->assign( 'uploadSize', $uploadSize ); # add the file selection field to the form $form->add( 'file', 'uploadFile', ts('Import Data File'), 'size=30 maxlength=60', true ); # now set the max file size so the form enforces it $form->setMaxFileSize($uploadFileSize); $form->addRule( 'uploadFile', ts('File size should be less than %1 MBytes (%2 bytes)', array(1 => $uploadSize, 2 => $uploadFileSize)), 'maxfilesize', $uploadFileSize ); # not a very smart rule, but it'll do for now $form->addRule( 'uploadFile', ts('Input file must be in JSON format'), 'utf8File' ); # make sure we end up with a file after the form posts $form->addRule( 'uploadFile', ts('A valid file must be uploaded.'), 'uploadedfile' ); } function postProcess(&$params, &$db) { # grab the name of the file that was uploaded $file = $params['uploadFile']['name']; # call a helper function we'll define below to parse the JSON $result = self::_JsonToTable( $db, $file, CRM_Utils_Array::value( 'import_table_name', $params ) ); # grab the import table name (CiviCRM determines this for us) $table = $result['import_table_name']; # create a new ImportJob object for our table require_once 'CRM/Import/ImportJob.php'; $importJob = new CRM_Import_ImportJob( $table ); # the ImportJob modifies the table name a bit, so let's update it $this->set( 'importTableName', $importJob->getTableName() ); } /** * We define this function just to keep things cleaner, the import data source * interface doesn't look for it; it's a private function. We use an * underscore at the beginning of the name to indicate this. */ private static function _JsonToTable(&$db, $file, $table ) { # read the JSON into a string variable $jsonString = file_get_contents($file); if (!$jsonString) { # oops, reading the file didn't work, generate an error CRM_Core_Error::fatal("Could not read $file"); } # grab the config object again $config = CRM_Core_Config::singleton(); # this is a bit presumptuous of us, but oh well $db->query("DROP TABLE IF EXISTS $table"); # create the table where we'll store the incoming data $create = "CREATE TABLE $table LIKE civicrm_contact"; $db->query($create); # drop the id column because the import system will add one $dropId = "ALTER TABLE $table DROP COLUMN id"; $db->query($dropId); # decode the JSON and INSERT the records one by one; # it might be more efficient to build one big multi-insert, # but we'll leave that as an exercise for the reader ### BE CAREFUL THAT YOU DON'T RUN THIS ON A MASSIVE JSON FILE!! ### It creates one big object from the entire JSON file, so you'll quickly ### eat up every bit of memory PHP can use if you try to import a large ### file. # this requires that the JSON PECL extension is installed and # enabled in PHP $importObj = json_decode( $jsonString, true ); # loop through each record in the JSON object, put it into an SQL query, # and insert it into the database foreach ($importObj['contacts'] as $newContact) { $fields = array_map( '_civicrm_mysql_real_escape_string', array_keys( $newContact ) ); $sqlFields = "(" . implode( ',', $fields ) . ")"; $values = array_map( '_civicrm_mysql_real_escape_and_quote_string', $newContact ); $sqlValues = "VALUES (" . implode( ',', $values ) . ")"; # construct the query and run it $sql = "INSERT IGNORE INTO $table $sqlFields $sqlValues"; $db->query($sql); } # get the import tmp table name and return it $result = array( ); $result['import_table_name'] = $table; return $result; } } # Another couple private helper functions we define function _civicrm_mysql_real_escape_and_quote_string( $string ) { return _civicrm_mysql_real_escape_string( $string, true ); } function _civicrm_mysql_real_escape_string( $string, $quote = false ) { static $dao = null; if ( ! $dao ) { $dao = new CRM_Core_DAO( ); } $returnString = $quote ? "\"{$dao->escape( $string )}\"" : $dao->escape( $string ); return $returnString; }
And here's the Smarty template. It should be saved in templates/CRM/Import/Form/JSON.tpl.
<fieldset><legend>{ts}Upload JSON File{/ts}</legend> <table class="form-layout"> <tr> <td class="label">{$form.uploadFile.label}</td> <td>{$form.uploadFile.html}<br /> <div class="description">{ts}File format must be JavaScript Object Notation (JSON). File must be UTF8 encoded if it contains special characters (e.g. accented letters, etc.).{/ts}</div> {ts 1=$uploadSize}Maximum Upload File Size: %1 MB{/ts} </td> </tr> </table> </fieldset>
Summary
The CiviCRM Import system has a few limitations, but it is by far the easiest way for end users to get data from other systems into CiviCRM. When a client has ongoing data import needs and wants non-technical users to initiate and manage the imports, writing a custom data source may be a good solution.
If the import system cannot handle the type or amount of data you need to import, your options are to write a separate import program that calls the CiviCRM API, write your own improvements to the import system, or sponsor other developers to write improvements. If you write or sponsor improvements to the import system, make sure you contribute them back to the core system! The CiviCRM development team would be happy to discuss potential improvements with you. You can find them in the #civicrm IRC channel on irc.freenode.net.