Data Import

Delia is useful for importing data into a database. It has ETL features such as data cleaning, type conversion, validation, and data mapping.

Let's consider a CSV file with customer data that we wish to import into our database.

CID,NAME,EMAILADDRESS,BIRTHDATE,POINTS
100,Bill Jones,bjones@comp.com,31/4/1993,150
101,Wanda Wilson,ww35@drcc.com,1/1/1972,100

We want to import this data into the Customer type. Note that we have some validation on the loyaltyPoints field.

type Customer struct {
 id int primaryKey,
 name string,
 email string unique,
 birthDate date,
 loyaltyPoints int
}
 loyaltyPoints >= 0
 loyaltyPoints <= 1000
end

First, create an input function in our Delia source code. It defines the mapping between the CSV columns and the Customer fields. For the date field we specify the date format using asDate

input function import1(Customer c) {
  CID -> c.id,
  NAME -> c.name,
  EMAILADDRESS -> c.email
  BIRTHDATE -> c.birthDate using { asDate('dd/MM/yyyy')}
  POINTS -> c.loyaltyPoints
}

Now we can load the data into the database with this Java code:

DataImportService importSvc = delia.createImportService();
CSVFileLoader fileLoader = new FileLoader("/tmp/customers.csv");
InputFunctionResult result = importSvc.importIntoDatabase("import1", fileLoader);

As each line of CSV is read, it is converting using the import1 mapping we defined.

Dry Runs

We may wish to do a dry run first, to see if there are any errors, before we actually insert the data into the database. The importDryRun method imports the data into an in-memory database (DBType.MEM) and returns a list of any conversion or validation errors.

DataImportService importSvc = delia.createImportService();
CSVFileLoader fileLoader = new FileLoader("/tmp/customers.csv");
fileLoader.useDelimiter(',');
fileLoader.beginsWithHeaderRow(true)
InputFunctionResult result = importSvc.importDryRun("import1", fileLoader);
if (!result.ok) {
	for(DeliaError err: result.errors) {
	  log.log("error: %s", err);
  }
}

Note that the email field has unique. Dry runs will validate uniqueness among the imported data.

The unique field modifier is enforced locally within a dry runs. This does not enforce uniqueness with respect to existing records in the real databse.