Feeds:
Posts
Comments

Archive for the ‘Postgres’ Category

This allows multiple data stores but without rewriting all sql queries. Note you have to create two functions, one to accept dates, the other to accept times.

Specifically the 4D SQL function DATE_TO_CHAR. Luckily PostgreSQL has the equivalent as a formatting function to_char.

For business reasons it’s not practical to replace all instances of DATE_TO_CHAR to to_char.

Solution

Create a function in the postgresql data base that maps the DATE_TO_CHAR function to to_char. Luckily the formatting options I need are available.

Now SELECT DATE_TO_CHAR(DateField1, "YYYY-MM-DD") FROM Table1 will return the correct value regardless of the database queried. It’s important to note this works great for getting integer values from dates and casting as date objects. If queries rely on returning non-iso formatting your mileage may vary.

-- Function: date_to_char(date, text)
CREATE OR REPLACE FUNCTION date_to_char(date, text)
  RETURNS text AS
$BODY$
  DECLARE
  BEGIN
       RETURN to_char($1,$2)::text;
  END;
  $BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;
ALTER FUNCTION date_to_char(date, text) OWNER TO postgres;
-- Function: date_to_char(time without time zone, text)
CREATE OR REPLACE FUNCTION date_to_char(time without time zone, text)
  RETURNS text AS
$BODY$
  DECLARE
  BEGIN
       RETURN to_char($1,$2)::text;
  END;
  $BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;
ALTER FUNCTION date_to_char(time without time zone, text) OWNER TO postgres;

Read Full Post »

There are lots of great backup tools and utilities out there, but utilizing a simple script and cron task to target specific PostgreSQL database is often the fastest way to a locally based backup procedure.

The below shell script is run as the postgres user on a Linux version 2.6.9-42.0.3.ELsmp (Red Hat 3.4.6-3) with PostgreSQL 8.2.0 that has support for command-line execution. Not that there is anything fancy in the code that would require such specific versions.

Unique to the code is the ability to pass in a database name, or none at all to backup the entire cluster. This code only connects and writes files locally. Assumes appropriate permissions given to executing user. Maybe this works just fine for you, otherwise feel free to make your own modifications.

Personally I gzip the output, but improvements could be made to prevent hard disk saturation on larger databases and archiving utilities.

crontab

# CRON table for postgres user.

# run backup every night at 22:00 hours (10PM)
0 22 * * * /var/lib/pgsql/backups/backup.sh database_name

# run backup every week at midnight hour on sunday
0 0 * * 0 /var/lib/pgsql/backups/backup.sh

backup.sh

#!/bin/bash
# This script will backup the postgresql database
# and store it in a specified directory

# PARAMETERS
# $1 database name (if none specified run pg_dumpall)

# CONSTANTS
# postgres home folder backups directory
# !! DO NOT specify trailing '/' as it is included below for readability !!
BACKUP_DIRECTORY="/var/lib/pgsql/backups"

# Date stamp (formated YYYYMMDD)
# just used in file name
CURRENT_DATE=$(date "+%Y%m%d")

# !!! Important pg_dump command does not export users/groups tables
# still need to maintain a pg_dumpall for full disaster recovery !!!

# this checks to see if the first command line argument is null
if [ -z "$1" ]
then
# No database specified, do a full backup using pg_dumpall
pg_dumpall | gzip - > $BACKUP_DIRECTORY/pg_dumpall_$CURRENT_DATE.sql.gz

else
# Database named (command line argument) use pg_dump for targed backup
pg_dump $1 | gzip - > $BACKUP_DIRECTORY/$1_$CURRENT_DATE.sql.gz

fi

Read Full Post »

This script will monitor a hot folder, take it’s contents and execute the files against a postgres server.

Why post this? This python script in conjunction with a parameter setting batch file can take SQL output and apply it to the postgres database. Originally I was searching how to do this via DOS when I realized I didn’t have all the error catching capability that I wanted.

Developing this came from trying to solve how to systematically apply changes from other systems to one database. This script does not make a distinction between files. So if one system outputs several files that need uploading and separate files target the same record, last loaded is last applied.

I’ve simplified the script a little for ease of posting, this does require system variable PGPASSWORD to run.

import os, glob, shutil

# count
fileCount = 0

# ASSUMES this file is above INCOMING/ BAD/ and ARCHIVE/
filelist = glob.glob("INCOMING/*.sql")

for file in filelist:
    # take the file and thrown it against psql
    # read psql --help for details about options
    # setting ON_ERROR_STOP to nothing tells psql to pass back an error status code
    errorlevel = os.system("psql -X -U some_user -d database --variable=ON_ERROR_STOP= -1 -w -f "+file)
    
    # check for errors (thrown by psql)
    if errorlevel != 0:
        # error was thrown, lets report it and stash the file
        print errorlevel
        shutil.move(file,"BAD/")
    else:
        print file + " processed"
        shutil.move(file,"ARCHIVE/")
        fileCount += 1

print str(fileCount) + " files processed"

Read Full Post »

This is a companion post to the 4D mirroring The approach to this problem, is to have data from the source mirror itself out to a target and accept changes from that mirror system back. All sql based INSERT UPDATE and DELETE operations are to send their changes to the target.

It is very important to stress this is only used for Postgres UPDATE statements sending data back to the source. It does however provide an example of iterating over all the fields in a table using the python dict that is populated in TD["new"] and TD["old"].

    Limitations

  • Requires plpythonu language installed on postgres
  • Not designed to handle DELETE
  • Not designed to handle INSERT can do so with UUID (not until 4Dv12)
  • This is a work in progress, but it is a starting point I hope

Iterate over table fields via python
You could even do a dictionary compare to find the difference between TD["new"] and TD["old"].

for field_name, field_value in TD["new"].items():
	# now we have field_name and field_value to do work on
	if field_name='alpha_field':
		tmp=field_value

Full mirroring code
Warning there is code here that makes the generated SQL statements 4D compliant only. This is merely a starting point.

-- Function: trigger_sync()

CREATE OR REPLACE FUNCTION trigger_sync()
  RETURNS trigger AS
$BODY$ 
# now we are using the python lanaguage
# postgresql imports object "plpy" http://www.postgresql.org/docs/8.4/interactive/plpython-database.html
# ie TD["event"].upper() IN (UPDATE INSERT DELETE etc)
# we only want to execute this if it is being updated from Postgres.
# because otherwise we could set up an infinate loop accidently 
# (Target writes to postgres, trigger fires, writes back to Target, Target writes back etc)
# Only triggers sets the synced flag to true, then this trigger will clear it
# if this trigger fails the row is "SKIP"PED

# not built to handle deletes yet
# there is no ["new"] only ["old"]
#if TD["event"].upper() == "DELETE":

mirrorFieldName='trigger_sync_flag'
# pass in the field to target
pkFieldName=TD["args"][0] 

# for now we need to send INSERT and DELETE to 4D and then 4D mirrors that back
# to us. v12 4D will have UUID so we can insert in either database.
if TD["new"][mirrorFieldName] or TD["event"].upper() == "INSERT":
	# this is being sent from 4D don't send it back
	# lets clear the flag and then return from the trigger
	TD["new"][mirrorFieldName] = False
	return "MODIFY"
else:
	# we need to know the data type to make the appropiate mappings, table name is constant on a per trigger basis
	plan = plpy.prepare("""SELECT data_type FROM information_schema.columns 
				WHERE column_name = $1 
				AND table_name='"""+(TD["table_name"])+"'", ["text"]);
	# build sql
	
	# 4D handles SQL strings differently 
	# single apostrophes enclose strings, double apostrope is a single apostrophe escaped
	# names are enclosed in [] with a right ] to escape a pair
	
	sql_UPDATE = "UPDATE ["+TD["table_name"]+"] SET "
	sql_INSERT = "INSERT INTO ["+TD["table_name"]+"]"
	sql_insert_fields = ''
	sql_insert_values = ''
	sql_DELETE = "DELETE FROM ["+TD["table_name"]+"] WHERE "+pkFieldName+"=%s" % str(TD["new"][pkFieldName])
	
	# build the sql string first
	for field_name, field_value in TD["new"].items():
		rv = plpy.execute(plan, [field_name], 1);
		
		sql_insert_fields += "["+field_name+"], "
		
		# append to stmts 
		if rv[0]["data_type"] == "boolean":
			# flip the mirroring light switch
			if field_name == mirrorFieldName:
				field_value = True
			
			# boolean values can't be directly passed (True/False) so we need to change it to a 1 or 0 via python
			sql_UPDATE += "["+field_name+"]" +"=CAST(%s as BOOLEAN), " % (str(int(field_value)));
			sql_insert_values += "CAST(%s as BOOLEAN), " % (str(int(field_value)))
		elif rv[0]["data_type"] == "double precision" or rv[0]["data_type"] == "integer" or rv[0]["data_type"] == "smallint":
			sql_UPDATE += "["+field_name+"]" +"=%s, " % (field_value);
			
			if field_name == pkFieldName: #auto increment needs nulls
				sql_insert_values +="null, "
			else:
				sql_insert_values +="%s, " % (field_value)
		elif rv[0]["data_type"] == "time without time zone":
			sql_UPDATE += "["+field_name+"]" +"='%s', " % (field_value);
			sql_insert_values +="'%s', " % (field_value)
		elif rv[0]["data_type"] == "date":
			tmpDateString = 'null' if field_value is None else "'"+field_value+"'"
			sql_UPDATE += "["+field_name+"]" +"=%s, " % (tmpDateString);
			sql_insert_values +="%s, " % (tmpDateString)
		else:
			# text characater varying, lets make sure we escape the apostrophe, otherwise lets clear it out in 4D
			if field_value is not None:
				field_value = "'" + field_value.replace("'", "''") + "'"
			else:
				# if I don't this this "None" gets inserted as a string value
				field_value = 'null'

			# string is properly escaped above
			sql_UPDATE += "["+field_name+"]" +"=%s, " % (field_value);
			sql_insert_values +="%s, " % (field_value)
	
	# remove the trailing chars
	sql_UPDATE = sql_UPDATE.rstrip(', ');
	sql_UPDATE += " WHERE ["+pkFieldName+"]="+str(TD["new"][pkFieldName]);
	
	# insert
	sql_insert_fields = sql_insert_fields.rstrip(', ');
	sql_insert_values = sql_insert_values.rstrip(', ');
	sql_INSERT += "("+sql_insert_fields+") VALUES ("+sql_insert_values+")"
	
	import pyodbc
	from datetime import datetime, date, time
	
	# make sure sqlStmt is populated
	sqlStmt = "CNC"
	
	try:
		# open a connection
		cnxn = pyodbc.connect("DSN=A DSN") # we're connecting using the stored username/password in ODBC DSN
		# block out a cursor
		cursor = cnxn.cursor()
		
		# find the right statment to execute
		event = TD["event"].upper()
		if event == "UPDATE":
			sqlStmt = sql_UPDATE
		elif event == "DELETE":
			sqlStmt = sql_DELETE
		elif event == "INSERT":
			sqlStmt = sql_INSERT
		else:
			pass; # should never happen (TRUNCATE maybe)
	
		# commit and print results
		plpy.debug(sqlStmt)
		cursor.execute(sqlStmt)
		cnxn.commit() # don't forget this!
		#plpy.notice('attempt id: %s updated record: %s' % (TD["new"][pkFieldName],cursor.rowcount))
	
		# TODO: this fails if the record is locked. so we need to do checks based on that.
		cnxn.close()
		
		# trigger is OK
		return "OK"
		
	except Exception, e:
		
		#fail propegates up to calling layer
		plpy.error(repr(e))
		
		# want to skip these changes because most likely cause for error is record locking; have user try again
		return "SKIP"
$BODY$
  LANGUAGE 'plpythonu' VOLATILE
  COST 100;
ALTER FUNCTION trigger_sync() OWNER TO postgres;
GRANT EXECUTE ON FUNCTION trigger_sync() TO public;
GRANT EXECUTE ON FUNCTION trigger_sync() TO postgres;

Read Full Post »

%d bloggers like this: