PosgreSQL encoding problems

By | 9 May, 2014

A very simple problem im sure all postgreSQL users/devs have come across one time or the other:

So you have your shiny PostgreSQL Database in some nice server somehwre and you want to manually make some changes. Say you have tons of records on some table and want to replace a word with a different one. All good, except the database in encoded in utf8 and you have text with accented characters.

Jump to learn.

You have this text:

“Era uma vez no séc. XII.”

And you want to have:

“Era uma vez no século XII.”

you can easily replace a word in a text filed like this:

update table1 set field1 = replace(field1,'oldword','newword');

but if the database has its encoding in UTF8, for example, that query will give you an invalid byte sequence error.

so at the psql prompt you can change the encoding on the fly, like so:

\encoding latin1

and you can run your update command without worrying about the accente characters it might have.

NOTE: This was speed used on a specific problem i had replacing all occurences of an accented word in all records of a table, on a particular field, so before you try this on a production server, do your homework – backup, try locally, eat healthily.

Go read:
PostgreSQL character set support