SQLizer Logo

Easily convert files into SQL databases

Date handling

Dates can be represented in hundreds of different ways. And SQLizer does a pretty good job of detecting them, and handling them properly.

Here are a few examples. Consider the following CSV file:

ID, Date Column
1, 01/01/2017
2, 02/01/2017
3, 12/01/2017

It’s clear to any human reading this that the second column should be a date column. But CSV data files don’t specify the types of data being used in each column, whereas databases do. So SQLizer needs to detect the columns that should be dates and sets up the right column type, which it does:

CREATE TABLE example_dates (
    `ID` INT,
    `Date_Column` DATETIME
);
INSERT INTO example_dates VALUES (1,'2017-01-01 00:00:00');
INSERT INTO example_dates VALUES (2,'2017-01-02 00:00:00');
INSERT INTO example_dates VALUES (3,'2017-01-12 00:00:00');

Notice how the dates in the original file could have been either US dates (mm/dd/yyyy) or the more internationally used dd/mm/yyyy format. But SQLizer assumed it was dd/mm/yyyy.

Now consider this CSV file:

ID, Date Column
1, 01/01/2017
2, 02/01/2017
3, 12/01/2017
4, 12/13/2017

We’ve added a fourth row, and when you look at that row, it’s clear that the dates are in US format: mm/dd/yyyy. If we feed that through to SQLizer, it works out that all the dates in that column should be US dates.

CREATE TABLE example_dates (
    `ID` INT,
    `Date_Column` DATETIME
);
INSERT INTO example_dates VALUES (1,'2017-01-01 00:00:00');
INSERT INTO example_dates VALUES (2,'2017-02-01 00:00:00');
INSERT INTO example_dates VALUES (3,'2017-12-01 00:00:00');
INSERT INTO example_dates VALUES (4,'2017-12-13 00:00:00');

A similar thing is happening for different time formats. Consider the following CSV file

ID, Date Column
1, 01/01/2017 11:30
2, 02/01/2017 10:20
3, 12/01/2017 12:12

SQLizer assumes the dates are in 24 hour format, so it’s 11 AM and 12 PM etc.

CREATE TABLE example_dates (
    `ID` INT,
    `Date_Column` DATETIME
);
INSERT INTO example_dates VALUES (1,'2017-01-01 11:30:00');
INSERT INTO example_dates VALUES (2,'2017-01-02 10:20:00');
INSERT INTO example_dates VALUES (3,'2017-01-12 12:12:00');

But if we add AM and PM specifiers to those dates...

ID, Date Column
1, 01/01/2017 11:30 PM
2, 02/01/2017 10:20 AM
3, 12/01/2017 12:12 PM

SQLizer picks those up and uses them:

CREATE TABLE example_dates (
    `ID` INT,
    `Date_Column` DATETIME
);
INSERT INTO example_dates VALUES (1,'2017-01-01 23:30:00');
INSERT INTO example_dates VALUES (2,'2017-02-01 10:20:00');
INSERT INTO example_dates VALUES (3,'2017-12-01 12:12:00');

Finally, having interpreted the second column as dates, if SQLizer spots some invalid data, like this (The 32nd of Undecember):

ID, Date Column
1, 01/01/2017 
2, 02/01/2017 
3, 12/01/2017 
4, 32/13/2017

It will simply assume that they weren’t supposed to be dates after all, and make the second column a text column:

CREATE TABLE example_dates (
    `ID` INT,
    `Date_Column` VARCHAR(11) CHARACTER SET utf8
);
INSERT INTO example_dates VALUES (1,' 01/01/2017');
INSERT INTO example_dates VALUES (2,' 02/01/2017');
INSERT INTO example_dates VALUES (3,' 12/01/2017');
INSERT INTO example_dates VALUES (4,' 32/13/2017');

Doing this on a four line CSV file isn’t too hard. But SQLizer sometimes has to detect the correct date format on files containing millions of lines of data. We never take shortcuts on this, we don’t base type decisions on a sample of the data, or the first N rows. We check the whole file before deciding what types to use for each column.