Datatypes

Datatypes #

The Oracle RDBMS does not support every SQL standard data type natively. Some are supported but converted into an equivalent internal type. For example, all native integer types, like integer being converted to NUMBER(38). Also, the VARCHAR2 type is specific to Oracle and differs from the standard on its way to handle the NULL value. Additionally, many Oracle specific data types are named differently in PostgreSQL.

type Oracletype PostgreSQL
varchar2, nchar2, nvarchar2, nclobvarchar, text
clob, longvarchar, text
blob, raw, long rawbytea
numbernumeric, integer, bigint, smallint, real, double precision
datedate, timestamp
binary floatreal
binary doubledouble precision

Boolean type #

Oracle does not handle a proper boolean type. Depending on the application developper, a CHAR or a NUMBER type is used to emulate the behavior. Ora2Pg can handle the transformation of a boolean value to PostgreSQL.

Also, note that the lack of boolean value in Oracle can result in some pains with ORM like Hibernate if the datatypes are not ported correctly. The column seen configured as a boolean on the ORM side will be stored as a NUMBER or CHAR type on the Oracle side. If no care has been taken to transform the boolean type and values accordingly, the ORM will look for a boolean value in PostgreSQL and will find something different, resulting in an error. More generally, when using ORM, it is better to start from a database schema generated by the application and convert the data accordingly.

SELECT true AND false;
-- -[ RECORD 1 ]
-- ?column? | f

SELECT true OR false;
-- -[ RECORD 1 ]
-- ?column? | t

Character types #

Oracle implements a specific datatype, VARCHAR2. This type does not handle an empty string and NULL according the SQL standard, as an empty string and a NULL are both seen as a NULL value. Thus, in Oracle, when you compare a value with '', this is equivalent to IS NULL.

On the contrary, PostgreSQL provides an SQL standard conformant varchar datatype: comparing to '' is something distinct from IS NULL.

There is another subtle difference regarding length limit, but it usually is not a problem during porting. While Oracle’s length limit is by default expressed in bytes (it can be changed with NLS_LENGTH_SEMANTICS), PostgreSQL length limit is expressed in characters. Moreover, in PostgreSQL, one can use a varchar datatype without providing a limit, which is equivalent to PostgreSQL’s text datatype. In this case, the string will be limited to 1GB.

While porting applications, another subtle differences can appear while comparing char and varchar datas. This is not clearly described in the SQL standard and each RDBMS has its own interpretation: while comparing thoses types, PostgreSQL converts each data from varchar to char. Some empty spaces (padding) will be added on the right of varchar string, in order to obtain the same length as char(x).

For example:

SELECT 'foo'::CHAR(5)='foo '::VARCHAR(5);
--  ?column? 
-- ----------
--  t

SELECT 'foo '::CHAR(5)='foo'::VARCHAR(5);
--  ?column? 
-- ----------
--  t

SELECT 'foo '::VARCHAR(5)='foo'::VARCHAR(5);
--  ?column? 
-- ----------
--  f

This feature can have undesirable effects on execution plans:

CREATE TABLE t1 (a VARCHAR(5));

INSERT INTO t1 SELECT generate_series(1,10000);

CREATE INDEX idx1 ON t1(a);

EXPLAIN SELECT * FROM t1 WHERE a='foo'::CHAR(5);
--                      QUERY PLAN                     
-- ----------------------------------------------------
--  Seq Scan on t1  (cost=0.00..170.00 rows=1 width=4)
--    Filter: ((a)::bpchar = 'foo '::character(5))

This query can’t take advantage of the index on column a: each value must be converted to char (in fact to bpchar, which is the internal type for char) before being compared to the right argument.

This problem appears frequently when the query is embedded into a function, and when the function parameters are of type char(x). For example, the following function can be problematic:

CREATE OR REPLACE FUNCTION public.demo_char(p1 CHARACTER)
 RETURNS CHARACTER VARYING
 LANGUAGE plpgsql
AS $function$
DECLARE
  v1 VARCHAR;
BEGIN
  SELECT INTO v1 a FROM t1 WHERE a=p1;
  RETURN v1;
END
$function$

In this function, the SELECT statement will always do a sequential scan of table t1 because of the implicit casting issue.

References:

Character encoding and collation #

In PostgreSQL, Server-side encoding is set at the database-level. A session variable named client_encoding can define the encoding at the client-level. Its default value is set accordingly to the database encoding, but can be redefined dynamically at session start with SET, for example: SET client_encoding = UTF8.

Regarding collation handling, some improvements were made since PostgreSQL 8.4:

  • collation global to the database cluster before release 8.4;
  • collation per database since release 8.4;
  • collation per column/index/SQL query since release 9.1.

References:

Temporal types #

Oracle supports several datatypes to manipulate temporal data:

  • DATE, encodes both date and time, with a resolution of a second;
  • TIMESTAMP, encodes date and time but with a better resolution than what PostgreSQL can handle (nanosecond);
  • TIMESTAMP WITH TIME ZONE, same as above, with information about the time zone;
  • INTERVAL, can have 2 resolutions: year/month or days/seconds.

PostgreSQL, on the other side, implements the following datatypes:

  • date, encodes a date, conforming to the SQL standard;
  • timestamp, encodes date and time, with a resolution up to 1 microsecond;
  • time, encodes time only, with a resolution up to 1 microsecond;
  • interval which offers a resolution up to 1 microsecond;
  • time and timestamp can optionally hold time zone information by adding with time zone keywords, like in Oracle.

The problem when porting data typed as DATE in Oracle resides in knowning if it holds only the date part or it also holds the time part.

SELECT ('1970-01-01'::DATE 
      + '15 YEARS 3 MONTHS 2 DAYS 1 HOUR 23 MINUTES'::INTERVAL) 
    AS calc_interval;
-- -[ RECORD 1 ]-+--------------------
-- calc_interval | 1985-04-03 01:23:00

References:

Composite types #

All of the types that can be defined by a user are supported, but may require some adaptation. One could need to define input/output functions for these types, to manage selects and inserts on this peculiar data type. In most cases, the types are composite or array types that are completely supported by PostgreSQL.

The following composite type for Oracle:

CREATE OR REPLACE TYPE phone_t AS OBJECT (
    a_code   CHAR(3),
    p_number CHAR(8)
);

and the same for PostgreSQL:

CREATE TYPE phone_t AS (
    a_code   CHAR(3),
    p_number CHAR(8)
);

The following example using an array:

CREATE OR REPLACE TYPE phonelist AS VARRAY(50) OF phone_t;

will be translated to:

CREATE TYPE phonelist AS (phonelist phone_t[50]);

Previous
Porting database objects
Next
Table migration