<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>tail -f /dev/dim</title>
    <link>http://blog.tapoueh.org/blog.dim.html</link>
    <description>dim's PostgreSQL blog</description>
    <language>en-us</language>
    <generator>Emacs Muse</generator>

<item>
<title> Getting out of SQL_ASCII, part 2</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20Getting%20out%20of%20SQL_ASCII%2C%20part%202</link>
<description><![CDATA[
<p><a name="20100223-17:30" id="20100223-17:30"></a>
<a name="%20Getting%20out%20of%20SQL_ASCII%2C%20part%202" id="%20Getting%20out%20of%20SQL_ASCII%2C%20part%202"></a>
So, if you followed the previous blog entry, now you have a new database
containing all the <em>static</em> tables encoded in <code>UTF-8</code> rather than
<code>SQL_ASCII</code>. Because if it was not yet the case, you now severely distrust
this non-encoding.</p>

<p>Now is the time to have a look at properly encoding the <em>live</em> data, those
stored in tables that continue to receive write traffic. The idea is to use
the <code>UPDATE</code> facilities of PostgreSQL to tweak the data, and too fix the
applications so as not to continue inserting badly encoded strings in there.</p>

<h3>Finding non UTF-8 data</h3>

<p class="first">First you want to find out the badly encoded data. You can do that with this
helper function that <a href="http://blog.rhodiumtoad.org.uk/">RhodiumToad</a> gave me on IRC. I had a version from the
archives before that, but the <em>regexp</em> was hard to maintain and quote into a
<code>PL</code> function. This is avoided by two means, first one is to have a separate
pure <code>SQL</code> function for the <em>regexp</em> checking (so that you can index it should
you need to) and the other one is to apply the regexp to <code>hex</code> encoded
data. Here we go:</p>

<pre class="src">
<span style="color: #729fcf; font-weight: bold;">create</span> <span style="color: #729fcf; font-weight: bold;">or</span> replace <span style="color: #729fcf; font-weight: bold;">function</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">public.utf8hex_valid</span>(str text)
 <span style="color: #729fcf; font-weight: bold;">returns</span> <span style="color: #8ae234; font-weight: bold;">boolean</span>
 <span style="color: #729fcf; font-weight: bold;">language</span> <span style="color: #729fcf; font-weight: bold;">sql</span> immutable
<span style="color: #729fcf; font-weight: bold;">as</span> $f$
   <span style="color: #729fcf; font-weight: bold;">select</span> $1 ~ $r$(?x)
                  ^(?:(?:[0-7][0-9a-f])
                     |(?:(?:c[2-9a-f]|d[0-9a-f])
                        |e0[ab][0-9a-f]
                        |ed[89][0-9a-f]
                        |(?:(?:e[1-9abcef])
                           |f0[9ab][0-9a-f]
                           |f[1-3][89ab][0-9a-f]
                           |f48[0-9a-f]
                          )[89ab][0-9a-f]
                       )[89ab][0-9a-f]
                    )*$
                $r$;
$f$;
</pre>

<p>Now some little scripting around it in order to skip intense manual and
boring work (and see, some more catalog queries). Don't forget we will have
to work on a per-column basis here...</p>

<pre class="src">
<span style="color: #729fcf; font-weight: bold;">create</span> <span style="color: #729fcf; font-weight: bold;">or</span> replace <span style="color: #729fcf; font-weight: bold;">function</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">public.check_encoding_utf8</span>
 (
   <span style="color: #729fcf; font-weight: bold;">IN</span> schemaname text,
   <span style="color: #729fcf; font-weight: bold;">IN</span> tablename  text,
  <span style="color: #729fcf; font-weight: bold;">OUT</span> relname    text,
  <span style="color: #729fcf; font-weight: bold;">OUT</span> attname    text,
  <span style="color: #729fcf; font-weight: bold;">OUT</span> <span style="color: #729fcf;">count</span>      bigint
 )
 <span style="color: #729fcf; font-weight: bold;">returns</span> setof record
 <span style="color: #729fcf; font-weight: bold;">language</span> plpgsql
<span style="color: #729fcf; font-weight: bold;">as</span> $f$
<span style="color: #729fcf; font-weight: bold;">DECLARE</span>
  v_sql text;
<span style="color: #729fcf; font-weight: bold;">BEGIN</span>
  <span style="color: #729fcf; font-weight: bold;">FOR</span> relname, attname
   <span style="color: #729fcf; font-weight: bold;">IN</span> <span style="color: #729fcf; font-weight: bold;">SELECT</span> c.relname, a.attname
        <span style="color: #729fcf; font-weight: bold;">FROM</span> pg_attribute a
             <span style="color: #729fcf; font-weight: bold;">JOIN</span> pg_class c <span style="color: #729fcf; font-weight: bold;">on</span> a.attrelid = c.oid
             <span style="color: #729fcf; font-weight: bold;">JOIN</span> pg_namespace s <span style="color: #729fcf; font-weight: bold;">on</span> s.oid = c.relnamespace
             <span style="color: #729fcf; font-weight: bold;">JOIN</span> pg_roles r <span style="color: #729fcf; font-weight: bold;">on</span> r.oid = c.relowner
       <span style="color: #729fcf; font-weight: bold;">WHERE</span> s.nspname = schemaname
         <span style="color: #729fcf; font-weight: bold;">AND</span> atttypid <span style="color: #729fcf; font-weight: bold;">IN</span> (25, 1043) <span style="color: #888a85;">-- text, varchar
</span>         <span style="color: #729fcf; font-weight: bold;">AND</span> relkind = <span style="color: #ad7fa8; font-style: italic;">'r'</span>          <span style="color: #888a85;">-- ordinary table
</span>         <span style="color: #729fcf; font-weight: bold;">AND</span> r.rolname = <span style="color: #ad7fa8; font-style: italic;">'some_specific_role'</span>
         <span style="color: #729fcf; font-weight: bold;">AND</span> <span style="color: #729fcf; font-weight: bold;">CASE</span> <span style="color: #729fcf; font-weight: bold;">WHEN</span> tablename <span style="color: #729fcf; font-weight: bold;">IS</span> <span style="color: #729fcf; font-weight: bold;">NOT</span> <span style="color: #729fcf; font-weight: bold;">NULL</span>
                  <span style="color: #729fcf; font-weight: bold;">THEN</span> c.relname ~ tablename
                  <span style="color: #729fcf; font-weight: bold;">ELSE</span> <span style="color: #729fcf; font-weight: bold;">true</span>
              <span style="color: #729fcf; font-weight: bold;">END</span>
  LOOP
    v_sql := <span style="color: #ad7fa8; font-style: italic;">'SELECT count(*) '</span>
          || <span style="color: #ad7fa8; font-style: italic;">'  FROM ONLY '</span>|| schemaname || <span style="color: #ad7fa8; font-style: italic;">'.'</span> || relname
          || <span style="color: #ad7fa8; font-style: italic;">' WHERE NOT public.utf8hex_valid(encode(textsend('</span>
          || attname
          || <span style="color: #ad7fa8; font-style: italic;">'), ''hex''))'</span>;

    <span style="color: #888a85;">-- RAISE NOTICE 'Checking: %.%', relname, attname;
</span>    <span style="color: #888a85;">-- RAISE NOTICE 'SQL: %', v_sql;
</span>    <span style="color: #729fcf; font-weight: bold;">EXECUTE</span> v_sql <span style="color: #729fcf; font-weight: bold;">INTO</span> <span style="color: #729fcf;">count</span>;
    <span style="color: #729fcf; font-weight: bold;">RETURN</span> <span style="color: #729fcf; font-weight: bold;">NEXT</span>;
  <span style="color: #729fcf; font-weight: bold;">END</span> LOOP;
<span style="color: #729fcf; font-weight: bold;">END</span>;
$f$;
</pre>

<p>Note that the <code>tablename</code> is compared using the <code>~</code> operator, so that's <em>regexp</em>
matching there too. Also note that I wanted only to check those tables that
are owned by a specific role, your case may vary.</p>

<p>The way I used this function was like this:</p>

<pre class="src">
<span style="color: #729fcf; font-weight: bold;">create</span> <span style="color: #729fcf; font-weight: bold;">table</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">leon.check_utf8</span> <span style="color: #729fcf; font-weight: bold;">as</span>
 <span style="color: #729fcf; font-weight: bold;">select</span> *
   <span style="color: #729fcf; font-weight: bold;">from</span> public.check_encoding_utf8();
</pre>

<p>Then you need to take action on those lines in <code>leon.check_utf8</code> table which
have a <code>count &gt; 0</code>. Rince and repeat, but you may soon realise building the
table over and over again is costly.</p>


<h3>Cleaning up the data</h3>

<p class="first">Up for some more helper tools? Unless you really want to manually fix this
huge amount of columns where some data ain't <code>UTF-8</code> compatible... here's some
more:</p>

<pre class="src">
<span style="color: #729fcf; font-weight: bold;">create</span> <span style="color: #729fcf; font-weight: bold;">or</span> replace <span style="color: #729fcf; font-weight: bold;">function</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">leon.nettoyeur</span>
 (
  <span style="color: #729fcf; font-weight: bold;">IN</span>  <span style="color: #729fcf; font-weight: bold;">action</span>      text,
  <span style="color: #729fcf; font-weight: bold;">IN</span>  encoding    text,
  <span style="color: #729fcf; font-weight: bold;">IN</span>  tablename   text,
  <span style="color: #729fcf; font-weight: bold;">IN</span>  columname   text,

  <span style="color: #729fcf; font-weight: bold;">OUT</span> orig        text,
  <span style="color: #729fcf; font-weight: bold;">OUT</span> utf8        text
 )
 <span style="color: #729fcf; font-weight: bold;">returns</span> setof record
 <span style="color: #729fcf; font-weight: bold;">language</span> plpgsql
<span style="color: #729fcf; font-weight: bold;">as</span> $f$
<span style="color: #729fcf; font-weight: bold;">DECLARE</span>
  p_convert text;
<span style="color: #729fcf; font-weight: bold;">BEGIN</span>
  IF encoding <span style="color: #729fcf; font-weight: bold;">IS</span> <span style="color: #729fcf; font-weight: bold;">NULL</span>
  <span style="color: #729fcf; font-weight: bold;">THEN</span>
    p_convert := <span style="color: #ad7fa8; font-style: italic;">'translate('</span>
              || columname || <span style="color: #ad7fa8; font-style: italic;">', '</span>
              || $$<span style="color: #ad7fa8; font-style: italic;">'\211\203\202'</span>$$
              || <span style="color: #ad7fa8; font-style: italic;">', '</span>
              || $$<span style="color: #ad7fa8; font-style: italic;">'   '</span>$$
              || <span style="color: #ad7fa8; font-style: italic;">') '</span>;
  <span style="color: #729fcf; font-weight: bold;">ELSE</span>
    <span style="color: #888a85;">-- in 8.2, write convert using, in 8.3, the other expression
</span>    <span style="color: #888a85;">-- p_convert := 'convert(' || columname || ' using ' || conversion || ') ';
</span>    p_convert := <span style="color: #ad7fa8; font-style: italic;">'convert(textsend('</span> || columname || <span style="color: #ad7fa8; font-style: italic;">'), '''</span>|| encoding ||<span style="color: #ad7fa8; font-style: italic;">''', ''utf-8'' ) '</span>;
  <span style="color: #729fcf; font-weight: bold;">END</span> IF;

  IF <span style="color: #729fcf; font-weight: bold;">action</span> = <span style="color: #ad7fa8; font-style: italic;">'select'</span>
  <span style="color: #729fcf; font-weight: bold;">THEN</span>
    <span style="color: #729fcf; font-weight: bold;">FOR</span> orig, utf8
     <span style="color: #729fcf; font-weight: bold;">IN</span> <span style="color: #729fcf; font-weight: bold;">EXECUTE</span> <span style="color: #ad7fa8; font-style: italic;">'SELECT '</span> || columname || <span style="color: #ad7fa8; font-style: italic;">', '</span>
         || p_convert
         || <span style="color: #ad7fa8; font-style: italic;">'  FROM ONLY '</span> || tablename
         || <span style="color: #ad7fa8; font-style: italic;">' WHERE not public.utf8hex_valid('</span>
         || <span style="color: #ad7fa8; font-style: italic;">'encode(textsend('</span>|| columname ||<span style="color: #ad7fa8; font-style: italic;">'), ''hex''))'</span>
    LOOP
      <span style="color: #729fcf; font-weight: bold;">RETURN</span> <span style="color: #729fcf; font-weight: bold;">NEXT</span>;
    <span style="color: #729fcf; font-weight: bold;">END</span> LOOP;

  ELSIF <span style="color: #729fcf; font-weight: bold;">action</span> = <span style="color: #ad7fa8; font-style: italic;">'update'</span>
  <span style="color: #729fcf; font-weight: bold;">THEN</span>
    <span style="color: #729fcf; font-weight: bold;">EXECUTE</span> <span style="color: #ad7fa8; font-style: italic;">'UPDATE ONLY '</span> || tablename
         || <span style="color: #ad7fa8; font-style: italic;">' SET '</span> || columname || <span style="color: #ad7fa8; font-style: italic;">' = '</span> || p_convert
         || <span style="color: #ad7fa8; font-style: italic;">' WHERE not public.utf8hex_valid('</span>
         || <span style="color: #ad7fa8; font-style: italic;">'encode(textsend('</span>|| columname ||<span style="color: #ad7fa8; font-style: italic;">'), ''hex''))'</span>;

    <span style="color: #729fcf; font-weight: bold;">FOR</span> orig, utf8
     <span style="color: #729fcf; font-weight: bold;">IN</span> <span style="color: #729fcf; font-weight: bold;">SELECT</span> *
          <span style="color: #729fcf; font-weight: bold;">FROM</span> leon.nettoyeur(<span style="color: #ad7fa8; font-style: italic;">'select'</span>, encoding, tablename, columname)
    LOOP
      <span style="color: #729fcf; font-weight: bold;">RETURN</span> <span style="color: #729fcf; font-weight: bold;">NEXT</span>;
    <span style="color: #729fcf; font-weight: bold;">END</span> LOOP;

  <span style="color: #729fcf; font-weight: bold;">ELSE</span>
    RAISE <span style="color: #729fcf; font-weight: bold;">EXCEPTION</span> <span style="color: #ad7fa8; font-style: italic;">'L&#233;on, Nettoyeur, veut de l''action.'</span>;

  <span style="color: #729fcf; font-weight: bold;">END</span> IF;
<span style="color: #729fcf; font-weight: bold;">END</span>;
$f$;
</pre>

<p>As you can see, this function allows to check the conversion process from a
given supposed encoding before to actually convert the data in place. This
is very useful as even when you're pretty sure the non-utf8 data is <code>latin1</code>,
sometime you find it's <code>windows-1252</code> or such. So double check before telling
<code>leon.nettoyeur()</code> to update your precious data!</p>

<p>Also, there's a facility to use <code>translate()</code> when none of the encoding match
your expectations. This is a skeleton just replacing invalid characters with
a <code>space</code>, tweak it at will!</p>


<h3>Conclusion</h3>

<p class="first">Enjoy your clean database now, even if it still accepts new data that will
probably not pass the checks, so we still have to be careful about that and
re-clean every day until the migration is effective. Or maybe add a <code>CHECK</code>
clause that will reject badly encoded data...</p>

<p>In fact here we're using <a href="http://wiki.postgresql.org/wiki/Londiste_Tutorial">Londiste</a> to replicate the <em>live</em> data from the old to
the new server, and that means the replication will break each time there's
new data written in non-utf8, as the new server is running <code>8.4</code>, which by
design ain't very forgiving. Our plan is to clean-up as we go (remove table
from the <em>subscriber</em>, fix it, add it again) and migrate as soon as possible!</p>

<p>Bonus points to those of you getting the convoluted reference :)</p>



]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Tue, 23 Feb 2010 17:30:00 CET</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20Getting%20out%20of%20SQL_ASCII%2C%20part%202</guid>

</item>

<item>
<title> Getting out of SQL_ASCII, part 1</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20Getting%20out%20of%20SQL_ASCII%2C%20part%201</link>
<description><![CDATA[
<p><a name="20100218-11:37" id="20100218-11:37"></a>
<a name="%20Getting%20out%20of%20SQL_ASCII%2C%20part%201" id="%20Getting%20out%20of%20SQL_ASCII%2C%20part%201"></a>
It happens that you have to manage databases <em>designed</em> by your predecessor,
and it even happens that the team used to not have a <em>DBA</em>. Those <em>histerical
raisins</em> can lead to having a <code>SQL_ASCII</code> database. The horror!</p>

<p>What <code>SQL_ASCII</code> means, if you're not already familiar with the consequences
of such a choice, is that all the <code>text</code> and <code>varchar</code> data that you put in the
database is accepted as-is. No checks. At all. It's pretty nice when you're
lazy enough to not dealing with <em>strange</em> errors in your application, but if
you think that t's a smart move, please go read
<a href="http://www.joelonsoftware.com/articles/Unicode.html">The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)</a>
by <a href="http://www.joelonsoftware.com/">Joel Spolsky</a> now. I said now, I'm waiting for you to get back here. Yes,
I'll wait.</p>

<p>The problem of course is not being able to read the data you just stored,
which is seldom the use case anywhere you use a database solution such as
<a href="http://www.postgresql.org/">PostgreSQL</a>.</p>

<p>Now, it happens too that it's high time to get off of <code>SQL_ASCII</code>, the
infamous. In our case we're lucky enough in that the data are all in fact
<code>latin1</code> or about that, and this comes from the fact that all the applications
connecting to the database are sharing some common code and setup. Then we
have some tables that can be tagged <em>archives</em> and some other <em>live</em>. This blog
post will only deal with the former category.</p>

<p>For those tables that are not receiving changes anymore, we will migrate
them by using a simple but time hungry method: <code>COPY OUT|recode|COPY IN</code>. I've
tried to use <code>iconv</code> for recoding our data, but it failed to do so in lots of
cases, so I've switched to using the <a href="http://www.gnu.org/software/recode/recode.html">GNU recode</a> tool, which works just fine.</p>

<p>The fact that it takes so much time doing the conversion is not really a
problem here, as you can do it <em>offline</em>, while the applications are still
using the <code>SQL_ASCII</code> database. So, here's the program's help:</p>

<pre class="src">
recode.sh [-npdf0TI] [-U user ] -s schema [-m mintable] pattern
        -d    debug
        -n    dry run, only print table names and expected files
        -s    schema
        -m    mintable, to skip already processed once
        -U    connect to PostgreSQL as user
        -f    force table loading even when export files do exist
        -0    only (re)load tables with zero-sized copy files
        -T    Truncate the tables before COPYing recoded data
        -I    Temporarily drop the indexes of the table while COPYing
   pattern    ^table_name_, e.g.
</pre>

<p>The <code>-I</code> option is neat enough to create the indexes in parallel, but with no
upper limit on the number of index creation launched. In our case it worked
well, so I didn't have to bother.</p>

<p>Take a look at the <a href="static/recode.sh">recode.sh</a> script, and don't hesitate editing it for your
purpose. It's missing some obvious options to get useful in the large, such
as the <code>recode</code> <em>request</em> which is currently hardcoded to <code>l1..utf8</code>. If there's
any demand about it, I'll setup a <a href="http://github.com/dimitri">GitHub</a> project for the little script.</p>

<p>We'll get back to the subject of this entry in <em>part 2</em>, dealing with how to
recode your data in the database itself, thanks to some insane regexp based
queries and helper functions. And thanks to a great deal of IRC based
helping, too.</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Thu, 18 Feb 2010 11:37:00 CET</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20Getting%20out%20of%20SQL_ASCII%2C%20part%201</guid>

</item>

<item>
<title> Resetting sequences. All of them, please!</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20Resetting%20sequences%2E%20All%20of%20them%2C%20please%21</link>
<description><![CDATA[
<p><a name="20100216-16:23" id="20100216-16:23"></a>
<a name="%20Reseting%20sequences%2E%20All%20of%20them%2C%20please%21" id="%20Reseting%20sequences%2E%20All%20of%20them%2C%20please%21"></a>
So, after restoring a production dump with intermediate filtering, none of
our sequences were set to the right value. I could have tried to review the
process of filtering the dump here, but it's a <em>one-shot</em> action and you know
what that sometimes mean. With some pressure you don't script enough of it
and you just crawl more and more.</p>

<p>Still, I think how I solved it is worthy of a blog entry. Not that it's
about a super unusual <em>clever</em> trick, quite the contrary, because questions
involving this trick are often encountered on the support <code>IRC</code>.</p>

<p>The idea is to query the catalog for all sequences, and produce from there
the <code>SQL</code> command you will have to issue for each of them. Once you have this
query, it's quite easy to arrange from the <code>psql</code> prompt as if you had dynamic
scripting capabilities. Of course in <code>9.0</code> you will have <em>inline anonymous</em> <code>DO</code>
blocks.</p>

<pre class="src">
#&gt; \o /tmp/sequences.sql
#&gt; \t
Showing only tuples.
#&gt; YOUR QUERY HERE
#&gt; \o
#&gt; \t
Tuples only is off.
</pre>

<p>Once you have the <code>/tmp/sequences.sql</code> file, you can ask <code>psql</code> to execute its
command as you're used to, that's using <code>\i</code> in an explicit transaction block.</p>

<p>Now, the interresting part if you got here attracted by the blog entry title
is in fact the query itself. A nice way to start is to <code>\set ECHO_HIDDEN</code> then
describe some table, you now have a catalog example query to work with. Then
you tweak it somehow and get this:</p>

<pre class="src">
  <span style="color: #729fcf; font-weight: bold;">SELECT</span> <span style="color: #ad7fa8; font-style: italic;">'select '</span>
          || <span style="color: #729fcf;">trim</span>(<span style="color: #729fcf; font-weight: bold;">trailing</span> <span style="color: #ad7fa8; font-style: italic;">')'</span>
             <span style="color: #729fcf; font-weight: bold;">from</span> replace(pg_get_expr(d.adbin, d.adrelid),
                          <span style="color: #ad7fa8; font-style: italic;">'nextval'</span>, <span style="color: #ad7fa8; font-style: italic;">'setval'</span>))
          || <span style="color: #ad7fa8; font-style: italic;">', (select max( '</span> || a.attname || <span style="color: #ad7fa8; font-style: italic;">') from only '</span>
          || nspname || <span style="color: #ad7fa8; font-style: italic;">'.'</span> || relname || <span style="color: #ad7fa8; font-style: italic;">'));'</span>
    <span style="color: #729fcf; font-weight: bold;">FROM</span> pg_class c
         <span style="color: #729fcf; font-weight: bold;">JOIN</span> pg_namespace n <span style="color: #729fcf; font-weight: bold;">on</span> n.oid = c.relnamespace
         <span style="color: #729fcf; font-weight: bold;">JOIN</span> pg_attribute a <span style="color: #729fcf; font-weight: bold;">on</span> a.attrelid = c.oid
         <span style="color: #729fcf; font-weight: bold;">JOIN</span> pg_attrdef d <span style="color: #729fcf; font-weight: bold;">on</span> d.adrelid = a.attrelid
                            <span style="color: #729fcf; font-weight: bold;">and</span> d.adnum = a.attnum
                            <span style="color: #729fcf; font-weight: bold;">and</span> a.atthasdef
  <span style="color: #729fcf; font-weight: bold;">WHERE</span> relkind = <span style="color: #ad7fa8; font-style: italic;">'r'</span> <span style="color: #729fcf; font-weight: bold;">and</span> a.attnum &gt; 0
        <span style="color: #729fcf; font-weight: bold;">and</span> pg_get_expr(d.adbin, d.adrelid) ~ <span style="color: #ad7fa8; font-style: italic;">'^nextval'</span>;
</pre>

<p>Coming next, a <code>recode</code> based script in order to get from <code>SQL_ASCII</code> to <code>UTF-8</code>,
and some strange looking queries too.</p>

<pre class="src">
recode.sh [-npdf0TI] [-U user ] -s schema [-m mintable] pattern
</pre>

<p>Stay tuned!</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Tue, 16 Feb 2010 16:23:00 CET</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20Resetting%20sequences%2E%20All%20of%20them%2C%20please%21</guid>

</item>

<item>
<title> pg_staging's bird view</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20pg_staging%27s%20bird%20view</link>
<description><![CDATA[
<p><a name="20091208-12:04" id="20091208-12:04"></a>
<a name="%20pg_staging%27s%20bird%20view" id="%20pg_staging%27s%20bird%20view"></a>
One of the most important feedback I got about the presentation of <a href="pgstaging.html">pgstaging</a>
were the lack of pictures, something like a bird-view of how you operate
it. Well, thanks to <a href="http://ditaa.sourceforge.net/">ditaa</a> and Emacs <code>picture-mode</code> here it is:</p>

<center>
<p><img src="../images/pg_staging.png" alt=""></p>
</center>

<p>Hope you enjoy, it should not be necessary to comment much if I got to the
point!</p>

<p>Of course I commited the <a href="http://github.com/dimitri/pg_staging/blob/master/bird-view.txt">text source file</a> to the <code>Git</code> repository. The only
problem I ran into is that <code>ditaa</code> defaults to ouputing a quite big right
margin containing only white pixels, and that didn't fit well, visually, in
this blog. So I had to resort to <a href="http://www.imagemagick.org/script/command-line-options.php#crop">ImageMagik crop command</a> in order to avoid
any mouse usage in the production of this diagram.</p>

<pre class="src">
convert .../pg_staging/bird-view.png -crop <span style="color: #ad7fa8; font-style: italic;">'!550'</span> bird-view.png
mv bird-view-0.png pg_staging.png
</pre>

<p>Quicker than learning to properly use a mouse, at least for me :)</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Tue, 08 Dec 2009 12:04:00 CET</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20pg_staging%27s%20bird%20view</guid>

</item>

<item>
<title> PGday.eu feedback</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20PGday%2Eeu%20feedback</link>
<description><![CDATA[
<p><a name="20091201-16:45" id="20091201-16:45"></a>
<a name="%20PGday%2Eeu%20feedback" id="%20PGday%2Eeu%20feedback"></a>
At <a href="http://2009.pgday.eu/">pgday</a> there was this form you could fill to give speakers some <em>feedback</em>
about their talks. And that's a really nice way as a speaker to know what to
improve. And as <a href="http://blog.hagander.net/archives/157-Feedback-from-pgday.eu.html">Magnus</a> was searching a nice looking chart facility in python
and I spoke about <a href="http://matplotlib.sourceforge.net/gallery.html">matplotlib</a>, it felt like having to publish something.</p>

<p>Here is my try at some nice graphics. Well I'll let you decide how nice the
result is:</p>

<center>
<p><a class="image-link" href="../images/feedback.png">
<img src="../images/feedback.png"></a></p>
</center>

<p>If you want to see the little python script I used, here it is: <a href="http://pgsql.tapoueh.org/confs/pgday_2009/feedback.py">feedback.py</a>,
with the data embedded and all...</p>

<p>Now, how to read it? Well, the darker the color the better the score. For
example I had <code>5</code> people score me <code>5</code> for <em>Topic Importance</em> on the Hi-Media talk
(in french) and only <code>3</code> people at this same score and topic for <code>pg_staging</code>
talk. The scores are from <code>1</code> to <code>5</code>, <code>5</code> being the best.</p>

<p>The comitee accepted interesting enough topics and it seems I managed to
deliver acceptable content from there. Not very good content, because
reading the comments I missed some nice birds-eye pictures to help the
audience get into the subject. As I'm unable to draw (with or without a
mouse) I plan to fix this in latter talks by using <a href="http://ditaa.sourceforge.net/">ditaa</a>, the <em>DIagrams
Through Ascii Art</em> tool. I already used it and together with <a href="news.dim.html">Emacs</a>
<code>picture-mode</code> it's very nice.</p>

<p>Oh yes the baseline of this post is that there will be later talks. I seem
to be liking those and the audience feedback this time is saying that it's
not too bad for them. See you soon :)</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Tue, 01 Dec 2009 16:45:00 CET</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20PGday%2Eeu%20feedback</guid>

</item>

<item>
<title> prefix 1.1.0</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20prefix%201%2E1%2E0</link>
<description><![CDATA[
<p><a name="20091130-12:10" id="20091130-12:10"></a>
<a name="%20prefix%201%2E1%2E0" id="%20prefix%201%2E1%2E0"></a>
So I had two <a href="http://archives.postgresql.org/pgsql-general/2009-11/msg01042.php">bug</a> <a href="http://lists.pgfoundry.org/pipermail/prefix-users/2009-November/000005.html">reports</a> about <a href="prefix.html">prefix</a> in less than a week. It means several
things, one of them is that my code is getting used in the wild, which is
nice. The other side of the coin is that people do find bugs in there. This
one is about the behavior of the <code>btree opclass</code> of the type <code>prefix range</code>. We
cheat a lot there by simply having written one, because a range does not
have a strict ordering: is <code>[1-3]</code> before of after <code>[2-4]</code>? But when you know
you have no overlapping intervals in your <code>prefix_range</code> column, being able to
have it part of a <em>primary key</em> is damn useful.</p>

<p>Note: in <code>8.5</code> we should have a way to express <em>contraint exclusion</em> and have
PostgreSQL forbids overlapping entries for us. Not being there yet, you
could write a <em>constraint trigger</em> and use the <em>GiST index</em> to have nice speed
there, which is exactly what this <em>constraint exclusion</em> support is about.</p>

<p>It turns out the code change required is pretty simple:</p>

<pre class="src">
-    <span style="color: #729fcf; font-weight: bold;">return</span> (a-&gt;first == b-&gt;first) ? (a-&gt;last - b-&gt;last) : (a-&gt;first - b-&gt;first);
+    <span style="color: #888a85;">/*</span><span style="color: #888a85;">
+     * we are comparing e.g. '1' and '12' (the shorter contains the
+     * smaller), so let's pretend '12' &lt; '1' as it contains less elements.
+     </span><span style="color: #888a85;">*/</span>
+    <span style="color: #729fcf; font-weight: bold;">return</span> (alen == mlen) ? 1 : -1;
</pre>

<p>This happens in the <em>compare support function</em> (see
<a href="http://www.postgresql.org/docs/8.4/interactive/xindex.html">Interfacing Extensions To Indexes</a>) so that means you now have to rebuild
your <code>prefix_range</code> btree indexes, hence the version number bump.</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Mon, 30 Nov 2009 12:10:00 CET</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20prefix%201%2E1%2E0</guid>

</item>

<item>
<title> Yet Another PostgreSQL tool hits debian</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20Yet%20Another%20PostgreSQL%20tool%20hits%20debian</link>
<description><![CDATA[
<p><a name="20091125-11:49" id="20091125-11:49"></a>
<a name="%20Yet%20Another%20PostgreSQL%20tool%20hits%20debian" id="%20Yet%20Another%20PostgreSQL%20tool%20hits%20debian"></a>
So there it is, this newer contribution of mine that I presented at <a href="http://2009.pgday.eu">PGDay</a> is
now in <code>debian NEW</code> queue. <a href="pgstaging.html">pg_staging</a> will empower you with respect to what
you do about those nightly backups (<code>pg_dump -Fc</code> or something).</p>

<p>The tool provides a lot of commands to either <code>dump</code> or <code>restore</code> a database. It
comes with documentation covering about it all, except for the <em>londiste</em>
support part, which will be there in time for <code>1.0.0</code> release. The <a href="http://github.com/dimitri/pg_staging/blob/master/TODO">Todo list</a>
is getting smaller and smaller, the version you'll soon find in <code>debian sid</code>
is already called <code>0.9</code>.</p>

<p>So, how do you go about using this software, and what service it implements?</p>

<h3>it's all about deriving a staging environment from your backups</h3>

<p class="first">To validate backups, you want to restore them and check the database you get
from them. And your developers will want to sometime refresh the database
they're working with. And you could have both an integration environment and
a pre-live one: On the former, you develop new code atop a stable set of
data; while on the latter you test stable enough code (ready to go live) on
a set of data as near as live data as possible.</p>

<p>And you want to be flexible about it, so that there's not a fulltime job to
handle retoring databases each and every days, for project A integration or
project B pre-live testing, or project C accounting snapshot. Or you name
it.</p>

<p>And of course you want to have a single point of control of all your
databases. Let's call it the <em>controler</em>.</p>


<h3>setting up pg_staging</h3>

<p class="first">The <a href="pgstaging.html">pg_staging</a> setup consists of one <code>pg_staging.ini</code> file wherein you
describe your different target databases (those <code>dev</code> and <code>prelive</code> ones), and
of course where to get the production backups from. Currently you have to
serve the backups file in a format suitable for <code>pg_restore</code> (that means you
use either <code>pg_dump -Ft</code> or <code>pg_dump -Fc</code>) on an <code>apache</code> folder. The produced
<code>HTML</code> will get parsed.</p>

<p>So you setup the <code>DEFAULT</code> section with common settings, then one section per
target: the databases you want to restore. Tell <code>pg_staging</code> where they are
(<code>host</code>), etc, and it'll be able to drive them.</p>

<p>In order to being able to host more than a single restored dump on a staging
server, for the same database, we use <code>pgbouncer</code>:</p>

<pre class="src">
pg_staging&gt; pgbouncer some_db.dev
              some_db      some_db_20091029 :5432
     some_db_20090717      some_db_20090717 :5432
     some_db_20091029      some_db_20091029 :5432
</pre>

<p>So as explained into the <code>pg_staging(1)</code> man page, you have to open
non-interactive <code>SSH</code> connection from the <em>controler</em> to the <em>hosts</em> where the
databases will get restored. Then you have to do a minimal setup pgbouncer
on the <em>hosts</em> with a <code>trust</code> connection. It'll get used from <code>pg_staging</code> for
adding newly restored database and have them accessible. Then you can also
<code>switch</code> the new database to being the virtual <em>some_db</em> so that you avoid
editing any connection string on your softwares.</p>

<p>Also, install the <code>pgstaging-client</code> package on every host you target. The
client is a simple shell script that must run as root (<code>sudo</code> is used) in
order to replace your <code>pgbouncer</code> setup or manage your <code>londiste</code> services.</p>

<p>See <code>man 5 pg_staging</code> for available options, including <em>schemas</em> to filter out
either completely or just skipping data restoring in those.</p>


<h3>pg_staging usage</h3>

<p class="first">Now you're all setup, you can begin to enjoy using <code>pgstaging</code>. Enter the
console and see what you have in there.</p>

<pre class="src">
$ pg_staging
Welcome to pg_staging 0.9.
pg_staging&gt; databases
...
pg_staging&gt; restore some_db.dev
...
pg_staging&gt; pgbouncer some_db.dev
...
pg_staging&gt; dbsizes --all some_db.dev
...
pg_staging&gt; psql some_db.dev
some_db_20091125=#
</pre>

<p>And as you can see in <code>man pg_staging</code> there are a lot of commands
already. You can for example obtain a new <em>pg_restore catalog</em> from a dump
file, with some <em>schemas</em> commented out. It will even comment out <code>triggers</code>
that are using a <code>function</code> which is defined in a filtered out <code>schema</code>, for
example a <code>PGQ</code> trigger. And much much more.</p>

<p><a href="pgstaging.html">pg_staging</a> will even allow you to <code>dump</code> your production databases, but
consider installing a separate instance of it on the machine serving the
backups to your local network thanks to an <code>apache</code> directory listing!</p>


<h3>Roadmap to <code>1.0.0</code></h3>

<p class="first">What's remain to be done is testing and having <code>PITR</code> based restoring to work,
and adding some documentation (tutorial, which this blog post about is; and
<em>londiste</em> support). At this point, unless some reader here asks for a new
feature (set), I'll consider <code>pg_staging</code> ready for <code>1.0.0</code>. After all, we're
using it about daily here :)</p>

<p>Consider commenting, you should be able to easily spot my private mail
address...</p>



]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Wed, 25 Nov 2009 11:49:00 CET</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20Yet%20Another%20PostgreSQL%20tool%20hits%20debian</guid>

</item>

<item>
<title> PGDay.eu, Paris: it was awesome!</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20PGDay%2Eeu%2C%20Paris%3A%20it%20was%20awesome%21</link>
<description><![CDATA[
<p><a name="20091109-09:50" id="20091109-09:50"></a>
<a name="%20PGDay%2Eeu%2C%20Paris%3A%20it%20was%20awesome%21" id="%20PGDay%2Eeu%2C%20Paris%3A%20it%20was%20awesome%21"></a>
<a href="http://2009.pgday.eu/">PGDay.eu</a> was held this week-end in Paris, and it really was a great
moment. Lots of <a href="http://2009.pgday.eu/_media/group_2009_1.jpg?cache=">attendees</a>, lots of quality talks (<a href="http://wiki.postgresql.org/wiki/PGDay.EU%2C_Paris_2009">slides</a> are online), good
food, great party: all the ingredients were there!</p>

<p>It also was for me the occasion to first talk about this tool I've been
working on for months, called <a href="pgstaging.html">pg_staging</a>, which aims to empower those boring
production backups to help maintaining <em>staging</em> environments (for your
developers and testers).</p>

<p>All in all such events keep reminding me what it means exactly when we way
that one of the greatest things about <a href="http://www.postgresql.org/">PostgreSQL</a> is its community. If you
don't know what I'm talking about, consider <a href="http://www.postgresql.org/community/">joining</a>!</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Mon, 09 Nov 2009 09:50:00 CET</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20PGDay%2Eeu%2C%20Paris%3A%20it%20was%20awesome%21</guid>

</item>

<item>
<title> prefix 1.0.0</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20prefix%201%2E0%2E0</link>
<description><![CDATA[
<p><a name="20091006-15:56" id="20091006-15:56"></a>
<a name="%20prefix%201%2E0%2E0" id="%20prefix%201%2E0%2E0"></a>
So there it is, at long last, the final <code>1.0.0</code> release of prefix! It's on its
way into the debian repository (targetting sid, in testing in 10 days) and
available on <a href="http://pgfoundry.org/frs/?group_id=1000352">pgfoundry</a> to.</p>

<p>In order to make it clear that I intend to maintain this version, the number
has 3 digits rather than 2... which is also what <a href="http://www.postgresql.org/support/versioning">PostgreSQL</a> users will
expect.</p>

<p>The only last minute change is that you can now use the first version of the
two following rather than the second one:</p>

<pre class="src">
-  <span style="color: #729fcf; font-weight: bold;">create</span> index idx_prefix <span style="color: #729fcf; font-weight: bold;">on</span> prefixes <span style="color: #729fcf; font-weight: bold;">using</span> gist(<span style="color: #729fcf; font-weight: bold;">prefix</span> gist_prefix_range_ops);
+  <span style="color: #729fcf; font-weight: bold;">create</span> index idx_prefix <span style="color: #729fcf; font-weight: bold;">on</span> prefixes <span style="color: #729fcf; font-weight: bold;">using</span> gist(<span style="color: #729fcf; font-weight: bold;">prefix</span>);
</pre>

<p>For you information, I'm thinking about leaving <code>pgfoundry</code> as far as the
source code management goes, because I'd like to be done with <code>CVS</code>. I'd still
use the release file hosting though at least for now. It's a burden but it's
easier for the users to find them, when they are not using plain <code>apt-get
install</code>. That move would lead to host <a href="http://pgfoundry.org/projects/prefix/">prefix</a> and <a href="http://pgfoundry.org/projects/pgloader">pgloader</a> and the <a href="http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/backports/">backports</a>
over there at <a href="http://github.com/dimitri">github</a>, where my next pet project, <code>pg_staging</code>, will be hosted
too.</p>

<p>The way to see this <em>pgfoundry</em> leaving is that if everybody does the same,
then migrating the facility to some better or more recent hosting software
will be easier. Maybe some other parts of the system are harder than the
sources to migrate, though. If that's the case I'll consider moving them out
too, maybe getting listed on the <a href="http://www.postgresql.org/download/product-categories">PostgreSQL Software Catalogue</a> will prove
enough as far as web presence goes?</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Tue, 06 Oct 2009 16:56:00 CEST</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20prefix%201%2E0%2E0</guid>

</item>

<item>
<title> hstore-new &amp; preprepare reach debian too</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20hstore%2Dnew%20%26%20preprepare%20reach%20debian%20too</link>
<description><![CDATA[
<p><a name="20090818-09:14" id="20090818-09:14"></a>
<a name="%20hstore%2Dnew%20%26%20preprepare%20reach%20debian%20too" id="%20hstore%2Dnew%20%26%20preprepare%20reach%20debian%20too"></a>
It seems like debian developers are back from annual conference and holiday,
so they have had a look at the <code>NEW</code> queue and processed the packages in
there. Two of them were mines and waiting to get in <code>unstable</code>, <a href="http://packages.debian.org/hstore-new">hstore-new</a> and
<a href="http://packages.debian.org/preprepare">preprepare</a>.</p>

<p>Time to do some bug fixing already, as <code>hstore-new</code> packaging is using a
<em>bash'ism</em> I shouldn't rely on (or so the debian buildfarm is <a href="https://buildd.debian.org/~luk/status/package.php?p=hstore-new">telling me</a>) and
for <code>preprepare</code> I was waiting for inclusion before to go improving the <code>GUC</code>
management, stealing some code from <a href="http://blog.endpoint.com/search/label/postgres">Selena</a>'s <a href="http://blog.endpoint.com/2009/07/pggearman-01-release.html">pgGearman</a> :)</p>

<p>As some of you wonder about <code>prefix 1.0</code> scheduling, it should soon get there
now it's been in testing long enough and no bug has been reported. Of course
releasing <code>1.0</code> in august isn't good timing, so maybe I should just wait some
more weeks.</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Tue, 18 Aug 2009 10:14:00 CEST</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20hstore%2Dnew%20%26%20preprepare%20reach%20debian%20too</guid>

</item>

<item>
<title> prefix 1.0~rc2 in debian testing</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20prefix%201%2E0%7Erc2%20in%20debian%20testing</link>
<description><![CDATA[
<p><a name="20090803-14:50" id="20090803-14:50"></a>
<a name="%20prefix%201%2E0%7Erc2%20in%20debian%20testing" id="%20prefix%201%2E0%7Erc2%20in%20debian%20testing"></a>
At long last, <a href="http://packages.debian.org/search?searchon=sourcenames&amp;keywords=prefix">here it is</a>. With binary versions both for <code>postgresal-8.3</code> and
<code>postgresal-8.4</code>! Unfortunately my other packaging efforts are still waiting
on the <code>NEW</code> queue, but I hope to soon see <code>hstore-new</code> and <code>preprepare</code> enter
debian too.</p>

<p>Anyway, the plan for <code>prefix</code> is to now wait something like 2 weeks, then,
baring showstopper bugs, release the <code>1.0</code> final version. If you have a use
for it, now is the good time for testing it!</p>

<p>About upgrading a current <code>prefix</code> installation, the advice is to save data as
<code>text</code> instead of <code>prefix_range</code>, remove prefix support, install new version,
change again the columns data type:</p>

<pre class="src">
<span style="color: #729fcf; font-weight: bold;">BEGIN</span>;
  <span style="color: #729fcf; font-weight: bold;">ALTER</span> <span style="color: #729fcf; font-weight: bold;">TABLE</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">foo</span>
     <span style="color: #729fcf; font-weight: bold;">ALTER</span> <span style="color: #729fcf; font-weight: bold;">COLUMN</span> <span style="color: #729fcf; font-weight: bold;">prefix</span>
             <span style="color: #729fcf; font-weight: bold;">TYPE</span> text <span style="color: #729fcf; font-weight: bold;">USING</span> text(<span style="color: #729fcf; font-weight: bold;">prefix</span>);

  <span style="color: #729fcf; font-weight: bold;">DROP</span> <span style="color: #729fcf; font-weight: bold;">TYPE</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">prefix_range</span> <span style="color: #729fcf; font-weight: bold;">CASCADE</span>;
  \i prefix.sql

  <span style="color: #729fcf; font-weight: bold;">ALTER</span> <span style="color: #729fcf; font-weight: bold;">TABLE</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">foo</span>
     <span style="color: #729fcf; font-weight: bold;">ALTER</span> <span style="color: #729fcf; font-weight: bold;">COLUMN</span> <span style="color: #729fcf; font-weight: bold;">prefix</span>
             <span style="color: #729fcf; font-weight: bold;">TYPE</span> prefix_range <span style="color: #729fcf; font-weight: bold;">USING</span> prefix_range(<span style="color: #729fcf; font-weight: bold;">prefix</span>);

  <span style="color: #729fcf; font-weight: bold;">CREATE</span> INDEX idx_foo_prefix <span style="color: #729fcf; font-weight: bold;">ON</span> foo
         <span style="color: #729fcf; font-weight: bold;">USING</span> gist(<span style="color: #729fcf; font-weight: bold;">prefix</span> gist_prefix_range_ops);
<span style="color: #729fcf; font-weight: bold;">COMMIT</span>;
</pre>

<p>Note: I just added the <code>gist_prefix_range_ops</code> as default for type
<code>prefix_range</code> so it'll be optional to specify this in final <code>1.0</code>. I got so
used to typing it I didn't realize we don't have to :)</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Mon, 03 Aug 2009 15:50:00 CEST</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20prefix%201%2E0%7Erc2%20in%20debian%20testing</guid>

</item>

<item>
<title> prefix 1.0~rc2-1</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20prefix%201%2E0%7Erc2%2D1</link>
<description><![CDATA[
<p><a name="20090709-12:48" id="20090709-12:48"></a>
<a name="%20prefix%201%2E0%7Erc2%2D1" id="%20prefix%201%2E0%7Erc2%2D1"></a>
I've been having problem with building both <code>postgresql-8.3-prefix</code> and
<code>postgresql-8.4-prefix</code> debian packages from the same source package, and
fixing the packaging issue forced me into modifying the main <code>prefix</code>
<code>Makefile</code>. So while reaching <code>rc2</code>, I tried to think about missing pieces easy
to add this late in the game: and there's one, that's a function
<code>length(prefix_range)</code>, so that you don't have to cast to text no more in the
following wildspread query:</p>

<pre class="src">
  <span style="color: #729fcf; font-weight: bold;">SELECT</span> foo, bar
    <span style="color: #729fcf; font-weight: bold;">FROM</span> prefixes
   <span style="color: #729fcf; font-weight: bold;">WHERE</span> <span style="color: #729fcf; font-weight: bold;">prefix</span> @&gt; <span style="color: #ad7fa8; font-style: italic;">'012345678'</span>
<span style="color: #729fcf; font-weight: bold;">ORDER</span> <span style="color: #729fcf; font-weight: bold;">BY</span> <span style="color: #729fcf; font-weight: bold;">length</span>(<span style="color: #729fcf; font-weight: bold;">prefix</span>) <span style="color: #729fcf; font-weight: bold;">DESC</span>
   <span style="color: #729fcf; font-weight: bold;">LIMIT</span> 1;
</pre>

<p>And here's a simple stupid benchmark of the new function, here in
<a href="http://prefix.projects.postgresql.org/prefix-1.0~rc2.tar.gz">prefix-1.0~rc2.tar.gz</a>. And it'll soon reach debian, if my QA dept agrees (my
<a href="http://julien.danjou.info/blog/">sponsor</a> is a QA dept all by himself!).</p>

<p>First some preparation:</p>

<pre class="src">
dim=#   <span style="color: #729fcf; font-weight: bold;">create</span> <span style="color: #729fcf; font-weight: bold;">table</span> prefixes (
dim(#          <span style="color: #729fcf; font-weight: bold;">prefix</span>    prefix_range <span style="color: #729fcf; font-weight: bold;">primary</span> <span style="color: #729fcf; font-weight: bold;">key</span>,
dim(#          <span style="color: #729fcf; font-weight: bold;">name</span>      text <span style="color: #729fcf; font-weight: bold;">not</span> <span style="color: #729fcf; font-weight: bold;">null</span>,
dim(#          shortname text,
dim(#          status    <span style="color: #8ae234; font-weight: bold;">char</span> <span style="color: #729fcf; font-weight: bold;">default</span> <span style="color: #ad7fa8; font-style: italic;">'S'</span>,
dim(#
dim(#          <span style="color: #729fcf; font-weight: bold;">check</span>( status <span style="color: #729fcf; font-weight: bold;">in</span> (<span style="color: #ad7fa8; font-style: italic;">'S'</span>, <span style="color: #ad7fa8; font-style: italic;">'R'</span>) )
dim(#   );
NOTICE:  <span style="color: #729fcf; font-weight: bold;">CREATE</span> <span style="color: #729fcf; font-weight: bold;">TABLE</span> / <span style="color: #729fcf; font-weight: bold;">PRIMARY</span> <span style="color: #729fcf; font-weight: bold;">KEY</span> will <span style="color: #729fcf; font-weight: bold;">create</span> implicit index "prefixes_pkey" <span style="color: #729fcf; font-weight: bold;">for</span>
 <span style="color: #729fcf; font-weight: bold;">table</span> "prefixes"
<span style="color: #729fcf; font-weight: bold;">CREATE</span> <span style="color: #729fcf; font-weight: bold;">TABLE</span>
<span style="color: #8ae234; font-weight: bold;">Time</span>: 74,357 ms
dim=#   \copy prefixes <span style="color: #729fcf; font-weight: bold;">from</span> <span style="color: #ad7fa8; font-style: italic;">'prefixes.fr.csv'</span> <span style="color: #729fcf; font-weight: bold;">with</span> delimiter ; csv quote <span style="color: #ad7fa8; font-style: italic;">'"'</span>
<span style="color: #8ae234; font-weight: bold;">Time</span>: 200,982 ms
dim=# <span style="color: #729fcf; font-weight: bold;">select</span> <span style="color: #729fcf;">count</span>(*) <span style="color: #729fcf; font-weight: bold;">from</span> prefixes ;
 <span style="color: #729fcf;">count</span>
<span style="color: #888a85;">-------
</span> 11966
(1 <span style="color: #8ae234; font-weight: bold;">row</span>)
<span style="color: #8ae234; font-weight: bold;">Time</span>: 3,047 ms
</pre>

<p>And now for the micro-benchmark:</p>

<pre class="src">
dim=# \o /dev/<span style="color: #729fcf; font-weight: bold;">null</span>
dim=# <span style="color: #729fcf; font-weight: bold;">select</span> <span style="color: #729fcf; font-weight: bold;">length</span>(<span style="color: #729fcf; font-weight: bold;">prefix</span>) <span style="color: #729fcf; font-weight: bold;">from</span> prefixes;
<span style="color: #8ae234; font-weight: bold;">Time</span>: 16,040 ms
dim=# <span style="color: #729fcf; font-weight: bold;">select</span> <span style="color: #729fcf; font-weight: bold;">length</span>(<span style="color: #729fcf; font-weight: bold;">prefix</span>::text) <span style="color: #729fcf; font-weight: bold;">from</span> prefixes;
<span style="color: #8ae234; font-weight: bold;">Time</span>: 23,364 ms
dim=# \o
</pre>

<p>Hope you enjoy!</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Thu, 09 Jul 2009 13:48:00 CEST</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20prefix%201%2E0%7Erc2%2D1</guid>

</item>

<item>
<title> prefix extension reaches 1.0 (rc1)</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20prefix%20extension%20reaches%201%2E0%20</link>
<description><![CDATA[
<p><a name="20090623-10:53" id="20090623-10:53"></a>
<a name="%20prefix%20extension%20reaches%201%2E0%20" id="%20prefix%20extension%20reaches%201%2E0%20"></a>
At long last, after millions and millions of queries just here at work and
some more in other places, the <a href="prefix.html">prefix</a> project is reaching <code>1.0</code> milestone. The
release candidate is getting uploaded into debian at the moment of this
writing, and available at the following place: <a href="http://prefix.projects.postgresql.org/prefix-1.0~rc1.tar.gz">prefix-1.0~rc1.tar.gz</a>.</p>

<p>If you have any use for it (as some <em>VoIP</em> companies have already), please
consider testing it, in order for me to release a shiny <code>1.0</code> next week! :)</p>

<p>Recent changes include getting rid of those square brackets output when it's
not neccesary, fixing btree operators, adding support for more operators in
the <code>GiST</code> support code (now supported: <code>@&gt;</code>, <code>&lt;@</code>, <code>=</code>, <code>&amp;&amp;</code>). Enjoy!</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Tue, 23 Jun 2009 11:53:00 CEST</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20prefix%20extension%20reaches%201%2E0%20</guid>

</item>

<item>
<title> PgCon 2009</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20PgCon%202009</link>
<description><![CDATA[
<p><a name="20090527" id="20090527"></a>
<a name="%20PgCon2009" id="%20PgCon2009"></a>
I can't really compare <a href="http://www.pgcon.org/2009/">PgCon 2009</a> with previous years versions, last time I
enjoyed the event it was in 2006, in Toronto. But still I found the
experience to be a great one, and I hope I'll be there next year too!</p>

<p>I've met a lot of known people in the community, some of them I already had
the chance to run into at Toronto or <a href="http://2008.pgday.org/en/">Prato</a>, but this was the first time I
got to talk to many of them about interresting projects and ideas. That only
was awesome already, and we also had a lot of talks to listen to: as others
have said, it was really hard to get to choose to go to only one place out
of three.</p>

<p>I'm now back home and seems to be recovering quite fine from jet lag, and I
even begun to move on the todo list from the conference. It includes mainly
<code>Skytools 3</code> testing and contributions (code and documentation),
<a href="http://wiki.postgresql.org/wiki/ExtensionPackaging">Extension Packaging</a> work (Stephen Frost seems to be willing to help, which I
highly appreciate) begining with <a href="http://archives.postgresql.org/pgsql-hackers/2009-05/msg00912.php">search_path issues</a>, and posting some
backtrace to help fix some <a href="http://archives.postgresql.org/pgsql-hackers/2009-05/msg00923.php">SPI_connect()</a> bug at <code>_PG_init()</code> time in an
extension.</p>

<p>The excellent <a href="http://wiki.postgresql.org/wiki/PgCon_2009_Lightning_talks">lightning talk</a> about <u>How not to Review a Patch</u> by Joshua
Tolley took me out of the <em>dim</em>, I'll try to be <em>bright</em> enough and participate
as a reviewer in later commit fests (well maybe not the first next ones as
some personal events on the agenda will take all my <em>&quot;free&quot;</em> time)...</p>

<p>Oh and the <a href="http://code.google.com/p/golconde/">Golconde</a> presentation gave some insights too: this queueing based
solution is to compare to the <code>listen/notify</code> mechanisms we already have in
<a href="http://www.postgresql.org/docs/current/static/sql-listen.html">PostgreSQL</a>, in the sense that's it's not transactional, and the events are
kept in memory only to achieve very high distribution rates. So it's a very
fine solution to manage a distributed caching system, for example, but not
so much for asynchronous replication (you need not to replicate events tied
to rollbacked transactions).</p>

<p>So all in all, spending last week in Ottawa was a splendid way to get more
involved in the PostgreSQL community, which is a very fine place to be
spending ones free time, should you ask me. See you soon!</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Wed, 27 May 2009 15:30:00 CEST</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20PgCon%202009</guid>

</item>

<item>
<title> Prepared Statements and pgbouncer</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20Prepared%20Statements%20and%20pgbouncer</link>
<description><![CDATA[
<p><a name="20090514" id="20090514"></a>
<a name="%20Prepared%20Statements%20and%20pgbouncer" id="%20Prepared%20Statements%20and%20pgbouncer"></a>
On the performance mailing list, a recent <a href="http://archives.postgresql.org/pgsql-performance/2009-05/msg00026.php">thread</a> drew my attention. It
devired to be about using a connection pool software and prepared statements
in order to increase scalability of PostgreSQL when confronted to a lot of
concurrent clients all doing simple <code>select</code> queries. The advantage of the
<em>pooler</em> is to reduce the number of <em>backends</em> needed to serve the queries, thus
reducing PostgreSQL internal bookkeeping. Of course, my choice of software
here is clear: <a href="https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer">PgBouncer</a> is an excellent top grade solution, performs real
well (it won't parse queries), reliable, flexible.</p>

<p>The problem is that while conbining <code>pgbouncer</code> and <a href="http://www.postgresql.org/docs/current/static/sql-prepare.html">prepared statements</a> is
possible, it requires the application to check at connection time if the
statements it's interrested in are already prepared. This can be done by a
simple catalog query of this kind:</p>

<pre class="src">
  <span style="color: #729fcf; font-weight: bold;">SELECT</span> <span style="color: #729fcf; font-weight: bold;">name</span>
    <span style="color: #729fcf; font-weight: bold;">FROM</span> pg_prepared_statements
   <span style="color: #729fcf; font-weight: bold;">WHERE</span> <span style="color: #729fcf; font-weight: bold;">name</span> <span style="color: #729fcf; font-weight: bold;">IN</span> (<span style="color: #ad7fa8; font-style: italic;">'my'</span>, <span style="color: #ad7fa8; font-style: italic;">'prepared'</span>, <span style="color: #ad7fa8; font-style: italic;">'statements'</span>);
</pre>

<p>Well, this is simple but requires to add some application logic. What would
be great would be to only have to <code>EXECUTE my_statement(x, y, z)</code> and never
bother if the <code>backend</code> connection is a fresh new one or an existing one, as
to avoid having to check if the application should <code>prepare</code>.</p>

<p>The <a href="http://preprepare.projects.postgresql.org/">preprepare</a> pgfoundry project is all about this: it comes with a
<code>prepare_all()</code> function which will take all statements present in a given
table (<code>SET preprepare.relation TO 'schema.the_table';</code>) and prepare them for
you. If you now tell <code>pgbouncer</code> to please call the function at <code>backend</code>
creation time, you're done (see <code>connect_query</code>).</p>

<p>There's even a detailed <a href="http://preprepare.projects.postgresql.org/README.html">README</a> file, but no release yet (check out the code
in the <a href="http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/preprepare/preprepare/">CVS</a>, <code>pgfoundry</code> project page has <a href="http://pgfoundry.org/scm/?group_id=1000442">clear instruction</a> about how to do so.</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Thu, 14 May 2009 01:00:00 CEST</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20Prepared%20Statements%20and%20pgbouncer</guid>

</item>

<item>
<title> Skytools 3.0 reaches alpha1</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20Skytools%203%2E0%20reaches%20alpha1</link>
<description><![CDATA[
<p><a name="20090414" id="20090414"></a>
<a name="%20Skytools%203%2E0%20reaches%20alpha1" id="%20Skytools%203%2E0%20reaches%20alpha1"></a>
It's time for <a href="http://wiki.postgresql.org/wiki/Skytools">Skytools</a> news again! First, we did improve documentation of
current stable branch with hosting high level presentations and <a href="http://wiki.postgresql.org/wiki/Londiste_Tutorial">tutorials</a> on
the <a href="http://wiki.postgresql.org/">PostgreSQL wiki</a>. Do check out the <a href="http://wiki.postgresql.org/wiki/Londiste_Tutorial">Londiste Tutorial</a>, it seems that's
what people hesitating to try out londiste were missing the most.</p>

<p>The other things people miss out a lot in current stable Skytools (version
<code>2.1.9</code> currently) are cascading replication (which allows for <em>switchover</em> and
<em>failover</em>) and <code>DDL</code> support. The new incarnation of skytools, version <code>3.0</code>
<a href="http://lists.pgfoundry.org/pipermail/skytools-users/2009-April/001029.html">reaches alpha1</a> today. It comes with full support for <em>cascading</em> and <em>DDL</em>, so
you might want to give it a try.</p>

<p>It's a rough release, documentation is still to get written for a large part
of it, and bugs are still to get fixed. But it's all in the Skytools spirit:
simple and efficient concepts, easy to use and maintain. Think about this
release as a <em>developer preview</em> and join us :)</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Tue, 14 Apr 2009 01:00:00 CEST</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20Skytools%203%2E0%20reaches%20alpha1</guid>

</item>

<item>
<title> Prefix GiST index now in 8.1</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20Prefix%20GiST%20index%20now%20in%208%2E1</link>
<description><![CDATA[
<p><a name="20090210" id="20090210"></a>
<a name="%20Prefix%20GiST%20index%20now%20in%208%2E1" id="%20Prefix%20GiST%20index%20now%20in%208%2E1"></a>
The <a href="http://blog.tapoueh.org/prefix.html">prefix</a> project is about matching a <em>literal</em> against <em>prefixes</em> in your
table, the typical example being a telecom routing table. Thanks to the
excellent work around <em>generic</em> indexes in PostgreSQL with <a href="http://www.postgresql.org/docs/current/static/gist-intro.html">GiST</a>, indexing
prefix matches is easy to support in an external module. Which is what
the <a href="http://prefix.projects.postgresql.org/">prefix</a> extension is all about.</p>

<p>Maybe you didn't come across this project before, so here's the typical
query you want to run to benefit from the special indexing, where the <code>@&gt;</code>
operator is read <em>contains</em> or <em>is a prefix of</em>:</p>

<pre class="src">
  <span style="color: #729fcf; font-weight: bold;">SELECT</span> * <span style="color: #729fcf; font-weight: bold;">FROM</span> prefixes <span style="color: #729fcf; font-weight: bold;">WHERE</span> <span style="color: #729fcf; font-weight: bold;">prefix</span> @&gt; <span style="color: #ad7fa8; font-style: italic;">'0123456789'</span>;
</pre>

<p>Now, a user asked about an <code>8.1</code> version of the module, as it's what some
distributions ship (here, Red Hat Enterprise Linux 5.2). It turned out it
was easy to support <code>8.1</code> when you already support <code>8.2</code>, so the <code>CVS</code> now hosts
<a href="http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/prefix/prefix/">8.1 support code</a>. And here's what the user asking about the feature has to
say:</p>

<blockquote>
<p class="quoted">
It's works like a charm now with 3ms queries over 200,000+ rows.  The speed
also stays less than 4ms when doing complex queries designed for fallback,
priority shuffling, and having multiple carriers.</p>

</blockquote>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Tue, 10 Feb 2009 00:00:00 CET</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20Prefix%20GiST%20index%20now%20in%208%2E1</guid>

</item>

<item>
<title> Importing XML content from file</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20Importing%20XML%20content%20from%20file</link>
<description><![CDATA[
<p><a name="20090205" id="20090205"></a>
<a name="%20Importing%20XML%20content%20from%20file" id="%20Importing%20XML%20content%20from%20file"></a>
The problem was raised this week on <a href="http://www.postgresql.org/community/irc">IRC</a> and this time again I felt it would
be a good occasion for a blog entry: how to load an <code>XML</code> file content into a
single field?</p>

<p>The usual tool used to import files is <a href="http://www.postgresql.org/docs/current/interactive/sql-copy.html">COPY</a>, but it'll want each line of the
file to host a text representation of a database tuple, so it doesn't apply
to the case at hand. <a href="http://blog.rhodiumtoad.org.uk/">RhodiumToad</a> was online and offered the following code
to solve the problem:</p>

<pre class="src">
<span style="color: #729fcf; font-weight: bold;">create</span> <span style="color: #729fcf; font-weight: bold;">or</span> replace <span style="color: #729fcf; font-weight: bold;">function</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">xml_import</span>(filename text)
  <span style="color: #729fcf; font-weight: bold;">returns</span> xml
  volatile
  <span style="color: #729fcf; font-weight: bold;">language</span> plpgsql <span style="color: #729fcf; font-weight: bold;">as</span>
$f$
    <span style="color: #729fcf; font-weight: bold;">declare</span>
        content bytea;
        loid oid;
        lfd <span style="color: #8ae234; font-weight: bold;">integer</span>;
        lsize <span style="color: #8ae234; font-weight: bold;">integer</span>;
    <span style="color: #729fcf; font-weight: bold;">begin</span>
        loid := lo_import(filename);
        lfd := lo_open(loid,262144);
        lsize := lo_lseek(lfd,0,2);
        perform lo_lseek(lfd,0,0);
        content := loread(lfd,lsize);
        perform lo_close(lfd);
        perform lo_unlink(loid);

        <span style="color: #729fcf; font-weight: bold;">return</span> xmlparse(document convert_from(content,<span style="color: #ad7fa8; font-style: italic;">'UTF8'</span>));
    <span style="color: #729fcf; font-weight: bold;">end</span>;
$f$;
</pre>

<p>As you can see, the trick here is to use the <a href="http://www.postgresql.org/docs/current/interactive/largeobjects.html">large objects</a> API to load the
file content into memory (<code>content</code> variable), then to parse it knowing it's
an <code>UTF8</code> encoded <code>XML</code> file and return an <a href="http://www.postgresql.org/docs/current/interactive/datatype-xml.html">XML</a> datatype object.</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Thu, 05 Feb 2009 00:00:00 CET</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20Importing%20XML%20content%20from%20file</guid>

</item>

<item>
<title> Asko Oja talks about Skype architecture</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20Asko%20Oja%20talks%20about%20Skype%20architecture</link>
<description><![CDATA[
<p><a name="20090204" id="20090204"></a>
<a name="%20Asko%20Oja%20talks%20about%20Skype%20architecture" id="%20Asko%20Oja%20talks%20about%20Skype%20architecture"></a>
In this <a href="http://postgresqlrussia.org/articles/view/131">russian page</a> you'll see a nice presentation of Skype databases
architectures by Asko Oja himself. It's the talk at Russian PostgreSQL
Community meeting, October 2008, Moscow, and it's a good read.</p>

<center>
<p><a class="image-link" href="http://postgresqlrussia.org/articles/view/131">
<img src="../images/Moskva_DB_Tools.v3.png"></a></p>
</center>

<p>The presentation page is in russian but the slides are in English, so have a
nice read!</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Wed, 04 Feb 2009 00:00:00 CET</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20Asko%20Oja%20talks%20about%20Skype%20architecture</guid>

</item>

<item>
<title> Skytools ticker daemon and londiste</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20Skytools%20ticker%20daemon%20and%20londiste</link>
<description><![CDATA[
<p><a name="20090203" id="20090203"></a>
<a name="20090203%20Skytools%20ticker%20daemon%20and%20londiste" id="20090203%20Skytools%20ticker%20daemon%20and%20londiste"></a>
One of the difficulties in getting to understand and configure <code>londiste</code>
reside in the relation between the <code>ticker</code> and the replication. This question
was raised once more on IRC yesterday, so I made a new FAQ entry about it:
<a href="http://blog.tapoueh.org/skytools.html#ticker">How do this ticker thing relates to londiste?</a></p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Tue, 03 Feb 2009 00:00:00 CET</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20Skytools%20ticker%20daemon%20and%20londiste</guid>

</item>

<item>
<title> Comparing Londiste and Slony</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20Comparing%20Londiste%20and%20Slony</link>
<description><![CDATA[
<p><a name="20090131" id="20090131"></a>
<a name="%20Skytools%20ticker%20daemon%20and%20londiste" id="%20Skytools%20ticker%20daemon%20and%20londiste"></a>
In the page about <a href="skytools.html">Skytools</a> I've encouraged people to ask some more questions
in order for me to be able to try and answer them. That just happened, as
usual on the <code>#postgresql</code> IRC, and the question is
<a href="skytools.html#slony">What does londiste lack that slony has?</a></p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Sat, 31 Jan 2009 00:00:00 CET</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20Comparing%20Londiste%20and%20Slony</guid>

</item>

<item>
<title> Controling HOT usage in 8.3</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20Controling%20HOT%20usage%20in%208%2E3</link>
<description><![CDATA[
<p><a name="20090128" id="20090128"></a>
<a name="%20Controling%20HOT%20usage%20in%208%2E3" id="%20Controling%20HOT%20usage%20in%208%2E3"></a>
As it happens, I've got some environments where I want to make sure <code>HOT</code> (<em>aka
Heap Only Tuples</em>) is in use. Because we're doing so much updates a second
that I want to get sure it's not killing my database server. I not only
wrote some checking view to see about it, but also made a <a href="http://www.postgresql.fr/support:trucs_et_astuces:controler_l_utilisation_de_hot_a_partir_de_la_8.3">quick article</a>
about it in the <a href="http://postgresql.fr/">French PostgreSQL website</a>. Handling around in <code>#postgresql</code>
means that I'm now bound to write about it in English too!</p>

<p>So <code>HOT</code> will get used each time you update a row without changing an indexed
value of it, and the benefit is skipping index maintenance, and as far as I
understand it, easying <code>vacuum</code> hard work too. To get the benefit, <code>HOT</code> will
need some place where to put new version of the <code>UPDATEd</code> tuple in the same
disk page, which means you'll probably want to set your table <a href="http://www.postgresql.org/docs/8.3/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS">fillfactor</a> to
something much less than <code>100</code>.</p>

<p>Now, here's how to check you're benefitting from <code>HOT</code>:</p>

<pre class="src">
<span style="color: #729fcf; font-weight: bold;">SELECT</span> schemaname, relname,
       n_tup_upd,n_tup_hot_upd,
       <span style="color: #729fcf; font-weight: bold;">case</span> <span style="color: #729fcf; font-weight: bold;">when</span> n_tup_upd &gt; 0
            <span style="color: #729fcf; font-weight: bold;">then</span> ((n_tup_hot_upd::<span style="color: #8ae234; font-weight: bold;">numeric</span>/n_tup_upd::<span style="color: #8ae234; font-weight: bold;">numeric</span>)*100.0)::<span style="color: #8ae234; font-weight: bold;">numeric</span>(5,2)
            <span style="color: #729fcf; font-weight: bold;">else</span> <span style="color: #729fcf; font-weight: bold;">NULL</span>
       <span style="color: #729fcf; font-weight: bold;">end</span> <span style="color: #729fcf; font-weight: bold;">AS</span> hot_ratio

 <span style="color: #729fcf; font-weight: bold;">FROM</span> pg_stat_all_tables;

 schemaname | relname | n_tup_upd | n_tup_hot_upd | hot_ratio
<span style="color: #888a85;">------------+---------+-----------+---------------+-----------
</span> <span style="color: #729fcf; font-weight: bold;">public</span>     | table1  |         6 |             6 |    100.00
 <span style="color: #729fcf; font-weight: bold;">public</span>     | table2  |   2551200 |       2549474 |     99.93
</pre>

<p>Here's even an extended version of the same request, displaying the
<code>fillfactor</code> option value for the tables you're inquiring about. This comes
separated from the first example because you get the <code>fillfactor</code> of a
relation into the <code>pg_class</code> catalog <code>reloptions</code> field, and to filter against a
schema qualified table name, you want to join against <code>pg_namespace</code> too.</p>

<pre class="src">
<span style="color: #729fcf; font-weight: bold;">SELECT</span> t.schemaname, t.relname, c.reloptions,
       t.n_tup_upd, t.n_tup_hot_upd,
       <span style="color: #729fcf; font-weight: bold;">case</span> <span style="color: #729fcf; font-weight: bold;">when</span> n_tup_upd &gt; 0
            <span style="color: #729fcf; font-weight: bold;">then</span> ((n_tup_hot_upd::<span style="color: #8ae234; font-weight: bold;">numeric</span>/n_tup_upd::<span style="color: #8ae234; font-weight: bold;">numeric</span>)*100.0)::<span style="color: #8ae234; font-weight: bold;">numeric</span>(5,2)
            <span style="color: #729fcf; font-weight: bold;">else</span> <span style="color: #729fcf; font-weight: bold;">NULL</span>
        <span style="color: #729fcf; font-weight: bold;">end</span> <span style="color: #729fcf; font-weight: bold;">AS</span> hot_ratio
<span style="color: #729fcf; font-weight: bold;">FROM</span> pg_stat_all_tables t
      <span style="color: #729fcf; font-weight: bold;">JOIN</span> (pg_class c <span style="color: #729fcf; font-weight: bold;">JOIN</span> pg_namespace n <span style="color: #729fcf; font-weight: bold;">ON</span> c.relnamespace = n.oid)
        <span style="color: #729fcf; font-weight: bold;">ON</span> n.nspname = t.schemaname <span style="color: #729fcf; font-weight: bold;">AND</span> c.relname = t.relname

 schemaname | relname |   reloptions    | n_tup_upd | n_tup_hot_upd | hot_ratio
<span style="color: #888a85;">------------+---------+-----------------+-----------+---------------+-----------
</span> <span style="color: #729fcf; font-weight: bold;">public</span>     | table1  | {fillfactor=50} |   1585920 |       1585246 |     99.96
 <span style="color: #729fcf; font-weight: bold;">public</span>     | table2  | {fillfactor=50} |   2504880 |       2503154 |     99.93
</pre>

<p>Don't let the <code>HOT</code> question affect your sleeping no more!</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Wed, 28 Jan 2009 00:00:00 CET</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20Controling%20HOT%20usage%20in%208%2E3</guid>

</item>

<item>
<title> Londiste Trick</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20Londiste%20Trick</link>
<description><![CDATA[
<p><a name="20090121" id="20090121"></a>
<a name="%20Londiste%20Trick" id="%20Londiste%20Trick"></a>
So, you're using <code>londiste</code> and the <code>ticker</code> has not been running all night
long, due to some restart glitch in your procedures, and the <em>on call</em> admin
didn't notice the restart failure. If you blindly restart the replication
daemon, it will load in memory all those events produced during the night,
at once, because you now have only one tick where to put them all.</p>

<p>The following query allows you to count how many events that represents,
with the magic tick numbers coming from <code>pgq.subscription</code> in columns
<code>sub_last_tick</code> and <code>sub_next_tick</code>.</p>

<pre class="src">
<span style="color: #729fcf; font-weight: bold;">SELECT</span> <span style="color: #729fcf;">count</span>(*)
  <span style="color: #729fcf; font-weight: bold;">FROM</span> pgq.event_1,
      (<span style="color: #729fcf; font-weight: bold;">SELECT</span> tick_snapshot
         <span style="color: #729fcf; font-weight: bold;">FROM</span> pgq.tick
        <span style="color: #729fcf; font-weight: bold;">WHERE</span> tick_id <span style="color: #729fcf; font-weight: bold;">BETWEEN</span> 5715138 <span style="color: #729fcf; font-weight: bold;">AND</span> 5715139
      ) <span style="color: #729fcf; font-weight: bold;">as</span> t(snapshots)
<span style="color: #729fcf; font-weight: bold;">WHERE</span> txid_visible_in_snapshot(ev_txid, snapshots);
</pre>

<p>In our case, this was more than <em>5 millions and 400 thousands</em> of events. With
this many events to care about, if you start londiste, it'll eat as many
memory as needed to have them all around, which might be more that what your
system is able to give it. So you want a way to tell <code>londiste</code> not to load
all events at once. Here's how: add the following knob to your <em>.ini</em>
configuration file before to restart the londiste daemon:</p>

<pre class="src">
    pgq_lazy_fetch = 500
</pre>

<p>Now, <code>londiste</code> will lazyly fetch <code>500</code> events at once or less, even if a single
<code>batch</code> (which contains all <em>events</em> between two <em>ticks</em>) contains a huge number
of events. This number seems a good choice as it's the default <code>PGQ</code> setting
of number of events in a single <em>batch</em>. This number is only outgrown when the
ticker is not running or when you're producing more <em>events</em> than that in a
single transaction.</p>

<p>Hope you'll find the tip useful!</p>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Wed, 21 Jan 2009 00:00:00 CET</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20Londiste%20Trick</guid>

</item>

<item>
<title> Fake entry</title>
<link>http://blog.tapoueh.org/blog.dim.html#%20Fake%20entry</link>
<description><![CDATA[
<p><a name="20081204" id="20081204"></a>
<a name="20081204%20Fake%20entry" id="20081204%20Fake%20entry"></a>
This is a test of a fake entry to see how muse will manage this.</p>

<p>With some <code>SQL</code> inside:</p>

<blockquote>
<p class="quoted"></p>

<pre class="src">
<span style="color: #729fcf; font-weight: bold;">SELECT</span> * <span style="color: #729fcf; font-weight: bold;">FROM</span> planet.postgresql.org <span style="color: #729fcf; font-weight: bold;">WHERE</span> author = "dim";

</pre></p>

</blockquote>

]]></description>
<author>Dimitri Fontaine</author>
<pubDate>Thu, 04 Dec 2008 00:00:00 CET</pubDate>
<guid>http://blog.tapoueh.org/blog.dim.html#%20Fake%20entry</guid>

</item>

  </channel>
</rss>
